Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999
Dec 21, 2015
Simple and fast linear space computation of Longest common subsequences
Claus Rick, 1999
What is the LCS problem?
A A B A C
A B C
…Finding a sequence of greatest possible length that can be obtained From both A and B by deleting zero or more (not necessarily adjacent) symbols.
Some boring history…Year Author Time Constants Paradigm
1975 Hirschberg O(mn) 2 Dyn. Prog.
1985 Apostolico, Guerra O(mLgm+pm) [2,logm] contours
1986 Myers O(n(n-p)) 2 Shortest path
1987 Kumar, Rangan O(n(m-p)) 3 contours
1990 Wu et al. O(n(m-p)) 2 Shortest path
1992 Apostolico, et al. O(n(m-p)) 3 contours
1992 Apostolico, et al. O(pm) 3 contours
1999 Goeman, Clausen O(min(pm, mLgm + p(n-p)])
[5,25,lgM] contours
1999 This article O(min(pm,p(n-p)]) 2 contours
Pre-Info
Divide and conquer Midpoint
Some basic terms
Ordered Pair (i,j)
A A B A C
A B C
(2,3)= (A,C)
Some basic terms
Match
A A B A C
A B C
Some basic terms
Chain
A A B A C
A B C
Rank k
A A B A C
A B C
Some basic terms
c b a b b a c a cabacbcba
Matching Matrix
Some basic terms
Dominant matches
All Upper-left matches in each rank
c b a b b a c a cabacbcba
Dominant matches
1
2
3
4
5
A A B A C
A B C
c b a b b a c a cabacbcba
c b a b b a c a c
abacbcba
Backward contours (BC)
1
2
3
4
5
Some last basic terms
FCk
BCk
c b a b b a c a cabacbcba
1
2
3
4
5
Forward contours (FC)
c b a b b a c a c
abacbcba
Backward contours (BC)
1
2
3
4
5
Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:
•There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.
Lemma 1
Lemma 1- proof
|BC|- (p-k+1)|FC|= (k)
P
P
K <(p-k+1)<(p-k+1)
Start calculating
FC1 BC1 FC2 BC2
Sooner or later…
Really really last terms
Define sets Mi as:
M0= M
M1= M0\FC1
M2= M1\BC1
M2i-1=M2(i-1) \FCi
M2i=M2i-1\BCi
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
M
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
M1M2M3M4M5
Let call the first empty Mi….
M p’
Lemma 2
The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint
Lemma 2- proof
K
M 0
K-1K-210
M 2M 1M k-1M kK=p
Little problem…
We can`t keep tracks of each set- very expensive
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
What do we do?
Keep only dominant matches…
When we see a dominant match below- done.
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
Lets define:
FCf’ , BCb’ the minimal indices as stated above
Lemma 3
The Length of an LCS is b’ + f’ -1.
Complexity
Finding the dominant matches each contour:
O(min(m, (n-p))
Number of contours:
P
O(Min(pm, p(n-p)
The End
Simple and fast linear space computation of longest common subsequence
Written by: Claus Rick,1999
Based on algorithm by:D.Hirschberg, 1975
Cast:
MatricesLines
ArrowsSquares
Blue Red
BrownGreyBlack
String AString B
Presentation: Uri Scheiner
No Dominant Matches were harmed during the making of this presentation
Appendix
What is the LCS
Divided And Conquer
Match
Chain
Dominant Matches
FC
BC
Lemma 1
Define M…
Lemma 2
Keep just Dominant…
Lemma 3
Complexity