1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541.

Post on 27-Mar-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

1

Parameterized Pattern Matching by Boyer-Moore-type Algorithms

Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Al

gorithms, 1995, pp. 541 - 550  

Brenda S. Baker

Advisor: Prof. R. C. T. Lee

Speaker: Kuei-hao Chen

2

Let us consider two strings:

A=a1a2a3a4a5=xaxby

B=b1b2b3b4b5=bacbc

If the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b

3 and a5 by b5.

3

In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.

4

A=a1a2a3a4a5=xaxby

B=b1b2b3b4b5=bacbc

Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus,

A’=babbyIt can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.

5

For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.

6

We define bijection to be a global substitution of a set of distinct characters into another set characters.

A string P p-matches a string Q if P can be transformed to Q by a bijection.

7

Let

A=ababc

B=bcbcd

Then A p-matches B because there is a bijection, namely which transforms A to B.

, , , dccbba

8

On the other hand, for A=ababc and B=bcbdc, A does not p-match B.

It is actually easy to determine whether A p-matches B. Given A=a1a2… aN and B=b1b2…bN. A p-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.

9

For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc.

Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.

10

There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.

11

This paper considers the following problem:

Given a text T and a pattern P, find all occurrence where P p-matches a substring of T.

For example:

Let

and

We can see that P p-matches strings in T.

T=abcadbcbdabccacbd

P=abaecS1 S2

12

For P=abaec and S2=cacbd, the substitution will transform P to S2.

For S2=cacbd and S1=bcbda, the substitution

transforms S2 to S1.

It can be seen that P=abaec will be transformed to S1=bcbda by

, , , , bedcabca

, , , , caaddbbc

. , , , cbacdeba

13

The substitution can be visualized as follows:

S1 S2T

P

14

This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.

15

Good Suffix Rule 1 for p-match

Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:

T1T

P

xwindow

P1yP2z

T1T

P

xwindow

P1yP2zshift

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T v v x x v v v v x x w v v w w

P u u u v v v w w v v1 2 3 4 5 6 7 8 9 10

Shift

Example

p-mismatch

P u u u v v v w w v v1 2 3 4 5 6 7 8 9 10

u u u x x x v v x xTransform

P’

17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T v v x x v v v v x x w v v w w

P u u u v v v w w v v1 2 3 4 5 6 7 8 9 10

v v v x x w v v w wTransform

After moving, we compare T and P from right to left. We found out T6,15≡P1,10.

P’

18

Good Suffix Rule 2 for p-match

T

P

xT1

yP1

'1T

'1P'

2P

'1P

Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P.

Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:

'1P

T

P

xT1

'1T

'2Pshift

19

1 2 3 4 5 6 7 8 9 10 11 12 13

T x x v v v v x x w v v w w

P u v v v w w v v1 2 3 4 5 6 7 8

Shift

p-mismatch

P u v v v w w v v3 4 5 6 7 8 9 10

u x x x v v x xTransform

P’

Example

20

1 2 3 4 5 6 7 8 9 10 11 12 13

T x x v v v v x x w v v w w

P u v v v w w v v3 4 5 6 7 8 9 10

u x x x v v x xTransform

P’

21

The shift function ∆ is

) and 2( rulesuffix Good

) and 1(0 rulesuffix Goodmaxmin

1,-1,

1,,1

mm

mjmj

PPmj

PPjm

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

T G A T C G A T C A A T C A T A T C A T C A T

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

Example

C A T C T C A T C A T CP’

AT

TC

CA

Transform

p-mismatch

j’=7 j=9

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

Shift

23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

T G A T C G A T C A A T C A T A T C A T C A T

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

AT

TC

CATransform

p-mismatch

j’=7 j=9

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

Shift

C A T C T C A T C A T CP’

24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

T G A T C G A T C A A T C A T A T C A T C A T

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

CT

AC

TATransform

T C A T A T C A T C A TP’

25

Time Complexity• In average case, the preprocessing phase in O(m

log min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .

26

References

• [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp.111-115, 1994.

• [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear.

• [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992.

• [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993.

• [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993.

• [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp.762-772, 1977.

27

References

• [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp.328-343, 1990.

• [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992.

• [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp.589-600, 1992.

• [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp.224-234, pp.1991.

• [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp.501-506, 1980.

28

References

• [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp.1221-1248, 1991.

• [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp.226-239, 1994.

• [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp.323-350, 1977.

• [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp.509-512, 1980.

• [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp.648-659, 1988.

• [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp132-139, 1990

29

THANK YOU

top related