. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture which is available at www.cs.huji.ac.il /~nir. Changes made by Dan Geiger, then Shlomo Moran. Background Readings : The second chapter (pages 12-45) in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Sequence Alignment I Lecture #2. Background Readings : The second chapter (pages 12-45) in the text book, Biological Sequence Analysis , Durbin et al., 2001. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
.
Sequence Alignment ILecture #2
This class has been edited from Nir Friedman’s lecture which is available at www.cs.huji.ac.il/~nir. Changes made by Dan Geiger, then Shlomo Moran.
Background Readings: The second chapter (pages 12-45) in the text book, Biological Sequence Analysis, Durbin et al., 2001.
F[i,j] + B[i,j] = score of best alignment through (i,j)
We compute F[i,j] as we did before We compute B[i,j] in exactly the same manner,
going “backward” from B[n,m]
Requires linear space complexity
32
Time Complexity Analysis Time to find a mid-point: cnm (c - a constant) Size of recursive sub-problems is
(n/2,j) and (n/2,m-j-1), hence
T(n,m) = cnm + T(n/2,j) + T(n/2,m-j-1)
Lemma: T(n,m) 2cnmProof (by induction):
T(n,m) cnm + 2c(n/2)j + 2c(n/2)(m-j-1) 2cnm.
Thus, time complexity is linear in size of the problem
At worst, twice the cost of the regular solution.
33
Local Alignment
Consider now a different question: Can we find similar substrings of s and t Formally, given s[1..n] and t[1..m] find i,j,k, and l
such that d(s[i..j],t[k..l]) is maximal
34
Local Alignment
As before, we use dynamic programming We now want to setV[i,j] to record the best
alignment of a suffix of s[1..i] and a suffix of t[1..j]
How should we change the recurrence rule? Same as before but with an option to start afresh
The result is called the Smith-Waterman algorithm
35
Local Alignment
New option: We can start a new match instead of extending a
previous alignment
0
1jtj1iV1is1jiV
1jt1isjiV
1j1iV])[,(],[)],[(],[
])[],[(],[
max],[
Alignment of empty suffixes
]))1[,(],0[,0max(]1,0[
))],1[(]0,[,0max(]0,1[
0]0,0[
jtjVjV
isiViV
V
36
Local Alignment Example
0
A 1
T 2
C 3
T 4
A 5
A 6
0 0 0 0 0 0 0 0
T 1 0
A 2 0
A 3 0
T 4 0
A 5 0
s = TAATAt = TACTAA
ST
37
Local Alignment Example
0
T 1
A 2
C 3
T 4
A 5
A 6
0 0 0 0 0 0 0 0
T 1 0 1 0 0 1 0 0
A 2 0 0 2 0 0 2 1
A 3 0
T 4 0
A 5 0
s = TAATAt = TACTAA
ST
38
Local Alignment Example
0T1
A2
C3
T4
A5
A6
0 0 0 0 0 0 0 0
T 1 0 1 0 0 1 0 0
A 2 0 0 2 0 0 2 1
A 3 0 0 1 1 0 1 3
T 4 0 0 0 0 2 0 1
A 5 0 0 1 0 0 3 1
s = TAATAt = TACTAA
ST
39
Local Alignment Example
0T1
A2
C3
T4
A5
A6
0 0 0 0 0 0 0 0
T 1 0 1 0 0 1 0 0
A 2 0 0 2 0 0 2 1
A 3 0 0 1 1 0 1 3
T 4 0 0 0 0 2 0 1
A 5 0 0 1 0 0 3 1
s = TAATAt = TACTAA
ST
40
Local Alignment Example
0T1
A2
C3
T4
A5
A6
0 0 0 0 0 0 0 0
T 1 0 1 0 0 1 0 0
A 2 0 0 2 0 0 2 1
A 3 0 0 1 1 0 1 3
T 4 0 0 0 0 2 0 1
A 5 0 0 1 0 0 3 1
s = TAATAt = TACTAA
ST
41
Variants of Sequence Alignment
We have seen two variants of sequence alignment: Global alignment Local alignment
Other variants in the book and in tutorial time:1. Finding best overlap
2. Using an affine cost d(g) = -d –(g-1)e for gaps of length g. The –d is for introducing a gap and –e for continuing the gap. We used d=e=2. We could use smaller e.
These variants are based on the same basic idea of dynamic programming.
42
Remark: Edit Distance
Instead of speaking about the score of an alignment, one often talks about an edit distance between two sequences, defined to be the “cost” of the “cheapest” set of edit operations needed to transform one sequence into the other.
Cheapest operation is “no change” Next cheapest operation is “replace” The most expensive operation is “add space”.
Our goal is now to minimize the cost of operations, which is exactly what we actually did.
43
Where do scoring rules come from ?
We have defined an additive scoring function by specifying a function ( , ) such that (x,y) is the score of replacing x by y (x,-) is the score of deleting x (-,x) is the score of inserting x
But how do we come up with the “correct” score ?
Answer: By encoding experience of what are similar sequences for the task at hand.