1 Overview of Pairwise Seq uence Alignment • Dynamic Programming – Applied to optimization problems – Useful when • Problem can be recursively divided into sub-problems • Sub-problems are not independent • Needleman-Wunsch is a global alignment technique tha t uses an iterative algorithm and no gap penalty (co uld extend to fixed gap penalty). • Smith-Waterman is a local alignment technique that u ses a recursive algorithm. Smith-Waterman’s algorit hm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local a nd global alignment. 報報報 報報報 :
Overview of Pairwise Sequence Alignment. 報告者:林哲鋒. Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Overview of Pairwise Sequence Alignment
• Dynamic Programming– Applied to optimization problems
– Useful when• Problem can be recursively divided into sub-problems• Sub-problems are not independent
• Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty).
• Smith-Waterman is a local alignment technique that uses a recursive algorithm. Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment.
報告者:林哲鋒
2
「最長共同子序列」 (LCS, Longest Common Subsequence) 問題
• 首先我們先解釋什麼是子序列 (subsequence) ,所謂子序列就是將一個序列中的一些( 可能是零個 ) 字元去掉所得到的序列,例如: pred 、 sdn 、 predent 等都是 ” president” 的子序列。
p r o c e d u r e O u tp u t - L C S (A , p r e v , i , j )
1 i f i = 0 o r j = 0 t h e n r e t u r n
2 i f p r e v ( i , j ) = ” “ t h e n
ia
jiprevALCSOutput
print
)1,1,,(
3 e l s e i f p r e v ( i , j ) = ” “ t h e n O u tp u t - L C S (A , p r e v , i - 1 , j )
4 e l s e O u tp u t - L C S (A , p r e v , i , j - 1 )
9
i j 0 1 p
2 r
3 o
4 v
5 i
6 d
7 e
8 n
9 c
10 e
0 0 0 0 0 0 0 0 0 0 0 0
1 p 2
0 1 1 1 1 1 1 1 1 1 1
2 r 0 1 2 2 2 2 2 2 2 2 2
3 e 0 1 2 2 2 2 2 3 3 3 3
4 s 0 1 2 2 2 2 2 3 3 3 3
5 i 0 1 2 2 2 3 3 3 3 3 3
6 d 0 1 2 2 2 3 4 4 4 4 4
7 e 0 1 2 2 2 3 4 5 5 5 5
8 n 0 1 2 2 2 3 4 5 6 6 6
9 t 0 1 2 2 2 3 4 5 6 6 6
圖: Output-LCS的回溯路線,深色陰影(priden)為LCS
所在。
Output : priden
10
Identification of Common Molecular Subsequences
T. F. SMITE AND M. S. WATERM
J. Mol. Bwl. (1981), 147, 195-197
11
ABSTRACT
• The identification of maximally homologous subsequences among sets of long sequences is an important problem.
• To find a pair of segments, one from each of two long sequences, such that there is no other pair of segments with greater similarity.
12
Algorithm
• two molecular sequences will be A=a1a2 . . . an, and B=b1b2 . . . bm.
• A similarity s(a,b) is given between sequence elements a and b.
• Deletions of length k are given weight Wk
• Set up a matrix H. First set
Hko = Hol = 0 for 0 k n & 0 l m
13
Algorithm cont.
• Hij is the maximum similarity of two segments ending in ai and bj
• These values are obtained from the relationship
14
• (1) If ai and bj are associated, the similarity is
• (2) If ai is at the end of a deletion of length k, the similarity is
• (3) If bj is at the end of a deletion of length I , the similarity is
• (4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to a i and bj
Hij follows by considering the possibilities for ending ,the segments at any ai and bj.
Hi,j-l ─Wl
15
• The pair of segments with maximum similarity is found by first locating the maximum element of H.
• The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero
16
• in Figure 1.
• A match, ai = bj , s(ai,bj) =1 ,
a mismatch produced a minus one-third.
17
Local VS global alignment
18
Global Alignment vs. Local Alignment
• global alignment:
• local alignment:
19
Global Alignment vs. Local Alignment
),(
),(),(
0
max
1,1
1,
,1
,
jiji
jji
iji
ji
baws
bwsaws
s
),(
),(
),(
max
1,1
1,
,1
,
jiji
jji
iji
ji
baws
bws
aws
s
local global
20
0 0 0 0 0 0 0 0 0
0 8 5 2 0 0 8 5 2
0 5 3 0 0 8 5 3 13
0 2 0 0 0 8 5 2 11
0 0 0 0 8 5 3 13 10
0 0 0 0 8 5 2 11 8
0 8 5 2 5 3 13 10 7
0 5 3 0 2 13 10 8 18
C G G A T C A T
C
T
T
A
A
C
T
A – C - TA T C A T8-3+8-3+8 = 18
Local alignment exampleMatch: 8
Mismatch: -5
Gap symbol: -3
21
global alignment
• Needleman Wunsch(1970)• Three steps in dynamic programming• Initialization • Matrix fill (scoring) • Traceback (alignment
• Match: +8 (w(x, y) = 8, if x = y)• Mismatch: -5 (w(x, y) = -5, if x ≠ y)• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)
C A A T - T G AG A A T C T G C global alignment example2
24
Affine gap penalties• A gap of length k is penalized x + k·y.
gap-open penalty
gap-symbol penaltyThree cases for alignment endings:
1. ...x...x
2. ...x...-
3. ...-...x
an aligned pair
a deletion
an insertion
25
Affine gap penalties• Let D(i, j) denote the maximum score of any alig
nment between a1a2…ai and b1b2…bj ending with a deletion.
• Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion.
• Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.
26
Affine gap penalties
),(
),(
),()1,1(
max),(
)1,(
)1,(max),(
),1(
),1(max),(
jiI
jiD
bawjiS
jiS
yxjiS
yjiIjiI
yxjiS
yjiDjiD
ji
(A gap of length k is penalized x + k·y.)
27
Affine gap penalties
• Match: +8 (w(x, y) = 8, if x = y)• Mismatch: -5 (w(x, y) = -5, if x ≠ y)• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)• Each gap is charged an extra gap-open penalty: -4.