Top Banner
KMP Pattern Search QUICK OVERVIEW Created by, Arjun SK arjunsk.com
14

Kmp pattern search explained simple

Apr 07, 2017

Download

Engineering

Arjun SK
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kmp pattern search explained simple

KMP Pattern SearchQUICK OVERVIEW

Created by,Arjun SKarjunsk.com

Page 2: Kmp pattern search explained simple

What is Patter Searching ?

o Suppose you are reading a text document.o You want to search for a word.o You click CTRL + F and search for that word. o The word processor scans the document and shows the position

of occurrence.

What exactly happens is that, word i.e. pattern is searched inside the text document.

Page 3: Kmp pattern search explained simple

Implementation

Page 4: Kmp pattern search explained simple

Naïve ApproachThe naïve approach is to check whether the pattern matches the string at every possible position in the string.

P = Pattern (word) of length mT = Text (document) of length n

Naive string matching algorithm takes time O((n-m+1)m)

Page 5: Kmp pattern search explained simple

Basic Idea of KMPa b c d a b c a a a b c b a b

a b c d a b c d

Text

Pattern

Text

Pattern

We can find the next position for comparison, by looking at the pattern.

a b c d a b c a a a b c b a b

a b c d a b c d

Page 6: Kmp pattern search explained simple

KMP (Knuth-Morris-Prattern String Matching Algorithm)

Why KMP?Best known for linear time for exact pattern matching.

How is it implemented?

o We find patterns within the search pattern.

o When a pattern comparison partially fails, we can skip to next occurrence of prefix pattern.

o In this way, we can skip trivial comparisons.

Page 7: Kmp pattern search explained simple

Pre-processing

Let’s say we’re matching the pattern “abababca” against the text “bacbababaabcbab”.

Here’s our prefix match table : i.e. prefix-table[i]index 0 1 2 3 4 5 6 7char a b a b a b c avalue 0 0 1 2 3 4 0 1

Matching prefix i.e. aMatching prefix i.e. ab

Matching prefix i.e. abaMatching prefix i.e. abab

No matching prefix

Page 8: Kmp pattern search explained simple

Pre-processing - cont.

• partial_match_length = length of the matched pattern in a step.

• prefix-table = pre-processed prefix table

• If prefix-table[ partial_match_length ] > 1we may skip ahead

partial_match_length - prefix-table[ partial_match_length – 1 ] characters.

// Used to skip, already compared prefix match in the pattern.

Page 9: Kmp pattern search explained simple

Searchingb a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

This is a partial match length of 1The value at prefix-table[partial_match_length - 1] (or prefix-table[0]) is 0.so we don’t get to skip ahead any.

Page 10: Kmp pattern search explained simple

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

Page 11: Kmp pattern search explained simple

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

In naïve approach we shift right and compare again:

Step 2

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

Step 1

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

Page 12: Kmp pattern search explained simple

But in KMP approach, we can directly skip Step 1

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern X X

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

This is a partial match length of 5The value at prefix-table[partial_match_length - 1] (or prefix-table[4]) is 3.

That means we get to skip ahead partial_match_length – prefix-table[partial_match_length - 1] (or 5 - table[4] = 5 - 3 = 2) characters:

We skip comparing “b”. The next comparison starts at next “ab” i.e. the prefix match.

Page 13: Kmp pattern search explained simple

In KMP we can directly skip comparing “ab”

This is a partial match length of 3The value at prefix-table[partial_match_length - 1] (or prefix-table[2]) is 1.

That means we get to skip ahead partial_match_length – prefix-table[partial_match_length - 1] (or 3 - table[2] = 3 - 1 = 2) characters:

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern

b a c b a b a b a a b c b a b

a b a b a b c a

Text

Pattern X X

We skip comparing “b”. The next comparison starts at next “a” i.e. the prefix match.

Page 14: Kmp pattern search explained simple

Complexity

O(m) - It is to compute the prefix function values. O(n) - It is to compare the pattern to the text. Total of O(n + m) run time.