Top Banner
Boyer Moore Searches Boyer Moore Searches on Binary Texts on Binary Texts Shmuel Tomi Klein Shmuel Tomi Klein Miri Kopel Ben-Nissan Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL Bar Ilan University, ISRAEL Accelerating Accelerating
25

Boyer Moore Searches on Binary Texts

Jan 05, 2016

Download

Documents

jered

Accelerating. Boyer Moore Searches on Binary Texts. Shmuel Tomi Klein Miri Kopel Ben-Nissan Bar Ilan University, ISRAEL. Background and motivation. Boyer Moore algorithm. New binary variant. Analysis. Experiments. Summary. Outline. Background and motivation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Boyer Moore Searches  on Binary Texts

Boyer Moore Searches Boyer Moore Searches

on Binary Textson Binary TextsShmuel Tomi Klein Shmuel Tomi Klein Miri Kopel Ben-NissanMiri Kopel Ben-Nissan

Bar Ilan University, ISRAELBar Ilan University, ISRAEL

AcceleratingAccelerating

Page 2: Boyer Moore Searches  on Binary Texts

Outline

Background and motivationBoyer Moore algorithm

Analysis

Experiments

New binary variant

Summary

Background and motivationBoyer Moore algorithm

New binary variant

Analysis

Experiments

Summary

Page 3: Boyer Moore Searches  on Binary Texts

Important application of Automata:

PATTERN MATCHING

KMP BDM BM

Boyer & Moore

this-is-a-sample-text---

pattern

Match Backwards ! !

Page 4: Boyer Moore Searches  on Binary Texts

Mismatch – case 1: Mismatch – case 1: deltadelta11

ub

ua

b does not occur in x

x

y

contains no bcontains no bx

shift

Boyer – Moore Algorithm

Page 5: Boyer Moore Searches  on Binary Texts

ub

uax

y

contains no bcontains no bbx

shift

b occurs in x

Mismatch – case 2: Mismatch – case 2: deltadelta11

Boyer – Moore Algorithm

Page 6: Boyer Moore Searches  on Binary Texts

ub

uax

y

ucx

shift

Mismatch – case 3: Mismatch – case 3: deltadelta22

u reoccurs in x preceded by c ≠ a

Boyer – Moore Algorithm

Page 7: Boyer Moore Searches  on Binary Texts

ub

uax

y

vx

v shift

Mismatch – case 4: Mismatch – case 4: deltadelta22

Only a suffix v of u reoccurs in x

Boyer – Moore Algorithm

Page 8: Boyer Moore Searches  on Binary Texts

Boyer – Moore Example

aaeellmmppxxresrestt

44001133225577

eexxaammppllee

12121111101099887711

example

deltadelta11

deltadelta22

here ihere iss a simple example a simple example

exampleexamplehere is a simhere is a simpple examplele example

exampleexamplehere is a shere is a siimplemple example example

exaexamplemplehere is a simple examhere is a simple exampplele

exampleexamplehere is a simple here is a simple exampleexample

exampleexample

Page 9: Boyer Moore Searches  on Binary Texts

Problems of Binary Boyer & Moore

deltadelta1 1 uselessuseless

most work bymost work by delta delta11

0100101101011101000100110101001

1101100

this-is-a-sample-text---

pattern

Bit-level processing

Page 10: Boyer Moore Searches  on Binary Texts

Need for Binary Boyer & Moore

Compressed Matching

Given E(T) and P look for E(P) in E(T)

rather than P in D(E(T))

Suggested Solution:

BBBMM Blocked Binary Boyer Moore

Matching

Page 11: Boyer Moore Searches  on Binary Texts

k

shsl

BBBMM

Text [ i ]

Pat [ sh , j ]

Page 12: Boyer Moore Searches  on Binary Texts

ffghabdgttiocbsbgghj

0110001001101010

BBBMM

More information in binary case

ASCII

BINARY

Page 13: Boyer Moore Searches  on Binary Texts

BBBMM

101

101

i i + 1i – 1

T

P

101

100

extended extended delta delta11

01

ksl 1slB 20

mBsldelta ],[1

Page 14: Boyer Moore Searches  on Binary Texts

BBBMM

Total size of delta1 tables:

2221

1 k

sl

ksl

If too large, use limit value kK

T

P

sl k

K

Size of delta1 tables reduced to

12 K

Page 15: Boyer Moore Searches  on Binary Texts

BBBMM

Original delta1 : increase of text pointer BBBMM delta1 : shift size

T

P

Mismatch not in last block

Correct[sh,j]

Page 16: Boyer Moore Searches  on Binary Texts

BBBMM

T

P

deltadelta22

][2 matchlenmdelta

jj11223344556677889910

11

12

13

14

15

16

Pat[Pat[jj]]11001100110011001111110011110011deltadelta22[[jj

]]1133

1133

1133

1133

1133

1133

1133

1133

1133

1133

1133

33771155

2211

Page 17: Boyer Moore Searches  on Binary Texts

AnalysisAssumption : random input

Reasonable for compressed text

Expected # comparisons till mismatch:

Bit-wise:

221

m

j

jj

Blocked:

kk

k

sl

km

t

sltk 112

11

1

/

1

)(

Page 18: Boyer Moore Searches  on Binary Texts

AnalysisExpected # bits shifted after mismatch:

Bit-wise: M

Blocked: M’

mmME jm

j

j log),2min(2)(1

MM '

Page 19: Boyer Moore Searches  on Binary Texts

Experiments

English Bible (2.5MB) World Factbook (1.5MB)

Text: Huffman encoded

Patterns: Random substrings

of lengths 10 to 500

k = 8

Page 20: Boyer Moore Searches  on Binary Texts

Experiments:Average # comparisons between shiftsAverage # comparisons between shifts

Bit-wiseBlocked

100 200 300 400 500

1.1

1.2

1.3

1.4

1.5

length of pattern

Page 21: Boyer Moore Searches  on Binary Texts

Experiments:Average size of shiftsAverage size of shifts

Bit-wise

100 200 300 400 500

20

40

60

80

100

length of pattern

Blocked

Page 22: Boyer Moore Searches  on Binary Texts

Experiments:Average # comparisons for 1000 bitsAverage # comparisons for 1000 bits

100 200 300 400 500

100

200

300

400

500

length of pattern

Bit-wise

Blocked

BDM

Page 23: Boyer Moore Searches  on Binary Texts

Experiments:Time to locate first occurrence (ms)Time to locate first occurrence (ms)

100 200 300 400 500

50

100

150

200

250

length of pattern

300

Bit-wise

Blocked

BDMTurbo-BDM

Page 24: Boyer Moore Searches  on Binary Texts

Summary

Blocked variant of BMBlocked variant of BM

Faster than alternatives, Overhead 1-10 KFaster than alternatives, Overhead 1-10 K

Extensions:Extensions:

ASCII, words instead of characters

Page 25: Boyer Moore Searches  on Binary Texts

Thank you Thank you !!