Top Banner
Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999
36

Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Simple and fast linear space computation of Longest common subsequences

Claus Rick, 1999

Page 2: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

What is the LCS problem?

A A B A C

A B C

…Finding a sequence of greatest possible length that can be obtained From both A and B by deleting zero or more (not necessarily adjacent) symbols.

Page 3: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some boring history…Year Author Time Constants Paradigm

1975 Hirschberg O(mn) 2 Dyn. Prog.

1985 Apostolico, Guerra O(mLgm+pm) [2,logm] contours

1986 Myers O(n(n-p)) 2 Shortest path

1987 Kumar, Rangan O(n(m-p)) 3 contours

1990 Wu et al. O(n(m-p)) 2 Shortest path

1992 Apostolico, et al. O(n(m-p)) 3 contours

1992 Apostolico, et al. O(pm) 3 contours

1999 Goeman, Clausen O(min(pm, mLgm + p(n-p)])

[5,25,lgM] contours

1999 This article O(min(pm,p(n-p)]) 2 contours

Page 4: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Pre-Info

Divide and conquer Midpoint

Page 5: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some basic terms

Ordered Pair (i,j)

A A B A C

A B C

(2,3)= (A,C)

Page 6: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some basic terms

Match

A A B A C

A B C

Page 7: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some basic terms

Chain

A A B A C

A B C

Page 8: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Rank k

A A B A C

A B C

Page 9: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some basic terms

c b a b b a c a cabacbcba

Matching Matrix

Page 10: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some basic terms

Dominant matches

All Upper-left matches in each rank

Page 11: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

Dominant matches

1

2

3

4

5

Page 12: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

A A B A C

A B C

Page 13: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

Page 14: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a c

abacbcba

Backward contours (BC)

1

2

3

4

5

Page 15: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Some last basic terms

FCk

BCk

Page 16: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

1

2

3

4

5

Forward contours (FC)

Page 17: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a c

abacbcba

Backward contours (BC)

1

2

3

4

5

Page 18: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:

•There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.

Lemma 1

Page 19: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Lemma 1- proof

|BC|- (p-k+1)|FC|= (k)

P

P

K <(p-k+1)<(p-k+1)

Page 20: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Start calculating

FC1 BC1 FC2 BC2

Sooner or later…

Page 21: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Really really last terms

Define sets Mi as:

M0= M

M1= M0\FC1

M2= M1\BC1

M2i-1=M2(i-1) \FCi

M2i=M2i-1\BCi

Page 22: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

M

Page 23: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

M1M2M3M4M5

Page 24: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Let call the first empty Mi….

M p’

Page 25: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Lemma 2

The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint

Page 26: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Lemma 2- proof

K

M 0

K-1K-210

M 2M 1M k-1M kK=p

Page 27: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Little problem…

We can`t keep tracks of each set- very expensive

Page 28: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

Page 29: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

What do we do?

Keep only dominant matches…

When we see a dominant match below- done.

Page 30: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

c b a b b a c a cabacbcba

abacbcba

c b a b b a c a c

Page 31: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Lets define:

FCf’ , BCb’ the minimal indices as stated above

Page 32: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Lemma 3

The Length of an LCS is b’ + f’ -1.

Page 33: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Complexity

Finding the dominant matches each contour:

O(min(m, (n-p))

Number of contours:

P

O(Min(pm, p(n-p)

Page 34: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

The End

Page 35: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Simple and fast linear space computation of longest common subsequence

Written by: Claus Rick,1999

Based on algorithm by:D.Hirschberg, 1975

Cast:

MatricesLines

ArrowsSquares

Blue Red

BrownGreyBlack

String AString B

Presentation: Uri Scheiner

No Dominant Matches were harmed during the making of this presentation

Page 36: Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999.

Appendix

What is the LCS

Divided And Conquer

Match

Chain

Dominant Matches

FC

BC

Lemma 1

Define M…

Lemma 2

Keep just Dominant…

Lemma 3

Complexity