S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.

Post on 22-Dec-2015

218 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

Transcript

SCALEDPattern Matching

Amihood Amir Ayelet Butman Bar-Ilan University Moshe

Lewenstein and

Johns Hopkins University Bar-Ilan University

Motivation

Searching for Templates in Aerial Photographs

Input Aerial photo Template

Task Search for all locations where the template appears in the image

Model

bull Low level (pixel level) avoid costly processing

bull Asymptotically efficient solutions

bull Serial exact algorithms

Types of Approximations

Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

O(nsup2ksup2( edit distance k errors

rectangular patterns

O(nsup2kradic(m log m) radic(k log k)

edit distance k errors

half rectangular patterns

AL-88

AF-95

Types of ApproximationOrientation results O(nsup2m ) FU-98

O(nsup2msup3) ACL-98

Scaling Natural scales results O(n) 1-d EV-88

O(nsup2 log |Σ|) 2-d ALV-92

O(nsup2) dictionary AC-96

Real scales this result O(n) 1-d truncation

5

It seems daunting buthellip

CPM 2003 Morelia Mexico

Problem inherently inexact

What if occurrence is 1frac12 times bigger

What is the meaning of ldquofrac12 a pixelrdquo

Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

DefinitionText Pattern

Find all occurrences of the pattern in the text in all discrete sizes

m

m

n

n

Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

A A A A A A A A A A C C A A C A

A A A C C A A A A A C C A A A A

A A A C C A A A A A A A A

A A A A A A A A A A A A A

A A A A A A A A A C C A A

A A A A A A A A A C C A A

A A A C C C A A A A A A A

A A A C C C A A A A A A A

A A A C C C A A A A C A A

A A A A A A A A A A A A A

A A A A A A A A A C C A C

A A A A A A A A A A A A A

Discrete exact Scaled Matching

P Z U Y K V S X E T

Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

Idea Fix a scale s

Constant amount of work for each square (s-block)

s

s

nns

Algorithm time

Time for scale s

Total time

converges to a constant

Making the total time O(nsup2)

sn2

2

mn

mn

ss ssn n

1

2

122

2

1

Problem Real scales

Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

Formally

nTT ||

mPrrrP aaa j

j ||21

21

aaaa crrc jjjj

121

121

1

rcrc jj

11

Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

appears for some

r timesr

Remark

α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

loss of resolution

From ldquofar enoughrdquo away everything looks the same

By our definition for klt1m there is a match at every text location

Simplify definition

bcba4312 2

323

23

23

aaaa rrrrjj

jj

121121

Definition 2 Look for in the textExample P=aabcccbbbb

Match by definition 2 daaabccccbbbbbbe Match by definition 1

but not by def 2 daaaabccccbbbbbbbe

Why are definitions equivalent

Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

Time

Time for split O(n+m)

Finding Ps in Ts O(n+m) (eg KMP)

HARD PART Finding PL in TL

Definitions are Equivalent

aa rrj

j

1212

Claim Solving def 2 in time O(f(n))

Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

TLTotal time O(f(n)+n)=O(f(n))

Naiumlve algorithm for matching PL in TL

For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

This is the interval of possible scales since

tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

The intersection is empty thus no scaled match in location 1 Buthellip

Check intersectionIf intersection of all intervals is not

empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

The intersection is [7352) thus there is a scaled match in location 2

Improvement ndash Parameterized Matching

Introduced Baker 1994

Motivation ldquocopyingrdquo code

Parameterized Matching

Input two strings s and t |s|=|t| over alphabets sums and sumt

s parameterize matches t if bijection sums sumt such that (s) = t

exist

(a)=x

(b)=y

Π Π

ΠΠ

a ab b b

x xy y y

Example

Parameterized Matching

Claim (AFM-94)

For Σ that can be sorted in linear time (eg Σ=1 n)

Parameterized matching can be done in time O(n)

The reduction

1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

Proof Assume PL does not p-match TL at

location i

The possible situations are

Possibility 1wlog c ge a+1

For c = a+1 (smallest possible)

TL

PL

a

b b

cnea

b

a

b

a

b

a

b

a 211

Possibility 2

wlog c ge b+1

Intersection not empty only if

(a+1)(b+1) gt ab ie

ab+b gt ab+a

bgta

But this can never happen if α ge 1

TL

PL

a

b cneb

a

1

11

1

b

a

b

a

b

a

b

a

Algorithm for Real Scaled String Matching

Let Pi1 Pi2 Pij be the different numbers in PL

1 P-match PL in TL2 For each match chack intersection

of intervals between Pi1 Pij and corresponding symbols in TL

End Algorithm

PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

TL = 5 6 5 6 5 6 10 6 10 6 10 7

scaled match

Example

2133 32

21

3121 2232

3121 2255

3231

21 3333

Important Fact

So there are at most O(radicm) different Pikrsquos

Time O(n) for parameterized matching (Σ=12

hellipn) O(radicm) verification for each location Total O(nradicm)

mi

j

kP

k

1

Tighter analysis

Upper bound number of possible p-matches

Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

Then there are at most n2j p-matches of PL in TL

Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

O((n2j) middot j) = O(n)

Proof of Lemma

1st appearance of Pi1 Pij

PL Pi1 Pi2 Pij

TL a1 a2 aj

m-match

2

2

1

ja

j

ki

Lemmarsquos proof (cont)

Let x be the total number of p-matches in the text

The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

ge (xjsup2)2

But There are overlaps How many

Lemmarsquos proof (cont)

For each text location at most j matches will count it Thereforehellip

Total count without overlaps ge

Clearly xmiddotj2 le n thus x le (2n)j

2

1

2

2

xjxjj

Open Problem

Give 1-d algorithm linear in run-length compressed text and pattern

  • SCALED Pattern Matching
  • Motivation
  • Slide 3
  • Model
  • Types of Approximations
  • Types of Approximation
  • It seems daunting buthellip
  • CPM 2003 Morelia Mexico
  • Problem inherently inexact
  • Definition
  • Discrete exact Scaled Matching
  • Slide 12
  • Idea Fix a scale s
  • Algorithm time
  • Problem Real scales
  • Formally
  • Remark
  • Simplify definition
  • Why are definitions equivalent
  • Time
  • Definitions are Equivalent
  • Naiumlve algorithm for matching PL in TL
  • Check intersection
  • Slide 24
  • Improvement ndash Parameterized Matching
  • Parameterized Matching
  • Slide 27
  • The reduction
  • Possibility 1
  • Possibility 2
  • Algorithm for Real Scaled String Matching
  • Example
  • Important Fact
  • Tighter analysis
  • Proof of Lemma
  • Lemmarsquos proof (cont)
  • Slide 37
  • Open Problem

    Motivation

    Searching for Templates in Aerial Photographs

    Input Aerial photo Template

    Task Search for all locations where the template appears in the image

    Model

    bull Low level (pixel level) avoid costly processing

    bull Asymptotically efficient solutions

    bull Serial exact algorithms

    Types of Approximations

    Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

    O(nsup2ksup2( edit distance k errors

    rectangular patterns

    O(nsup2kradic(m log m) radic(k log k)

    edit distance k errors

    half rectangular patterns

    AL-88

    AF-95

    Types of ApproximationOrientation results O(nsup2m ) FU-98

    O(nsup2msup3) ACL-98

    Scaling Natural scales results O(n) 1-d EV-88

    O(nsup2 log |Σ|) 2-d ALV-92

    O(nsup2) dictionary AC-96

    Real scales this result O(n) 1-d truncation

    5

    It seems daunting buthellip

    CPM 2003 Morelia Mexico

    Problem inherently inexact

    What if occurrence is 1frac12 times bigger

    What is the meaning of ldquofrac12 a pixelrdquo

    Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

    DefinitionText Pattern

    Find all occurrences of the pattern in the text in all discrete sizes

    m

    m

    n

    n

    Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

    A A A A A A A A A A C C A A C A

    A A A C C A A A A A C C A A A A

    A A A C C A A A A A A A A

    A A A A A A A A A A A A A

    A A A A A A A A A C C A A

    A A A A A A A A A C C A A

    A A A C C C A A A A A A A

    A A A C C C A A A A A A A

    A A A C C C A A A A C A A

    A A A A A A A A A A A A A

    A A A A A A A A A C C A C

    A A A A A A A A A A A A A

    Discrete exact Scaled Matching

    P Z U Y K V S X E T

    Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

    Idea Fix a scale s

    Constant amount of work for each square (s-block)

    s

    s

    nns

    Algorithm time

    Time for scale s

    Total time

    converges to a constant

    Making the total time O(nsup2)

    sn2

    2

    mn

    mn

    ss ssn n

    1

    2

    122

    2

    1

    Problem Real scales

    Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

    Formally

    nTT ||

    mPrrrP aaa j

    j ||21

    21

    aaaa crrc jjjj

    121

    121

    1

    rcrc jj

    11

    Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

    appears for some

    r timesr

    Remark

    α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

    loss of resolution

    From ldquofar enoughrdquo away everything looks the same

    By our definition for klt1m there is a match at every text location

    Simplify definition

    bcba4312 2

    323

    23

    23

    aaaa rrrrjj

    jj

    121121

    Definition 2 Look for in the textExample P=aabcccbbbb

    Match by definition 2 daaabccccbbbbbbe Match by definition 1

    but not by def 2 daaaabccccbbbbbbbe

    Why are definitions equivalent

    Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

    Time

    Time for split O(n+m)

    Finding Ps in Ts O(n+m) (eg KMP)

    HARD PART Finding PL in TL

    Definitions are Equivalent

    aa rrj

    j

    1212

    Claim Solving def 2 in time O(f(n))

    Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

    TLTotal time O(f(n)+n)=O(f(n))

    Naiumlve algorithm for matching PL in TL

    For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

    This is the interval of possible scales since

    tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

    Check intersectionIf intersection of all intervals is not

    empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

    The intersection is empty thus no scaled match in location 1 Buthellip

    Check intersectionIf intersection of all intervals is not

    empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

    The intersection is [7352) thus there is a scaled match in location 2

    Improvement ndash Parameterized Matching

    Introduced Baker 1994

    Motivation ldquocopyingrdquo code

    Parameterized Matching

    Input two strings s and t |s|=|t| over alphabets sums and sumt

    s parameterize matches t if bijection sums sumt such that (s) = t

    exist

    (a)=x

    (b)=y

    Π Π

    ΠΠ

    a ab b b

    x xy y y

    Example

    Parameterized Matching

    Claim (AFM-94)

    For Σ that can be sorted in linear time (eg Σ=1 n)

    Parameterized matching can be done in time O(n)

    The reduction

    1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

    Proof Assume PL does not p-match TL at

    location i

    The possible situations are

    Possibility 1wlog c ge a+1

    For c = a+1 (smallest possible)

    TL

    PL

    a

    b b

    cnea

    b

    a

    b

    a

    b

    a

    b

    a 211

    Possibility 2

    wlog c ge b+1

    Intersection not empty only if

    (a+1)(b+1) gt ab ie

    ab+b gt ab+a

    bgta

    But this can never happen if α ge 1

    TL

    PL

    a

    b cneb

    a

    1

    11

    1

    b

    a

    b

    a

    b

    a

    b

    a

    Algorithm for Real Scaled String Matching

    Let Pi1 Pi2 Pij be the different numbers in PL

    1 P-match PL in TL2 For each match chack intersection

    of intervals between Pi1 Pij and corresponding symbols in TL

    End Algorithm

    PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

    TL = 5 6 5 6 5 6 10 6 10 6 10 7

    scaled match

    Example

    2133 32

    21

    3121 2232

    3121 2255

    3231

    21 3333

    Important Fact

    So there are at most O(radicm) different Pikrsquos

    Time O(n) for parameterized matching (Σ=12

    hellipn) O(radicm) verification for each location Total O(nradicm)

    mi

    j

    kP

    k

    1

    Tighter analysis

    Upper bound number of possible p-matches

    Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

    Then there are at most n2j p-matches of PL in TL

    Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

    O((n2j) middot j) = O(n)

    Proof of Lemma

    1st appearance of Pi1 Pij

    PL Pi1 Pi2 Pij

    TL a1 a2 aj

    m-match

    2

    2

    1

    ja

    j

    ki

    Lemmarsquos proof (cont)

    Let x be the total number of p-matches in the text

    The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

    ge (xjsup2)2

    But There are overlaps How many

    Lemmarsquos proof (cont)

    For each text location at most j matches will count it Thereforehellip

    Total count without overlaps ge

    Clearly xmiddotj2 le n thus x le (2n)j

    2

    1

    2

    2

    xjxjj

    Open Problem

    Give 1-d algorithm linear in run-length compressed text and pattern

    • SCALED Pattern Matching
    • Motivation
    • Slide 3
    • Model
    • Types of Approximations
    • Types of Approximation
    • It seems daunting buthellip
    • CPM 2003 Morelia Mexico
    • Problem inherently inexact
    • Definition
    • Discrete exact Scaled Matching
    • Slide 12
    • Idea Fix a scale s
    • Algorithm time
    • Problem Real scales
    • Formally
    • Remark
    • Simplify definition
    • Why are definitions equivalent
    • Time
    • Definitions are Equivalent
    • Naiumlve algorithm for matching PL in TL
    • Check intersection
    • Slide 24
    • Improvement ndash Parameterized Matching
    • Parameterized Matching
    • Slide 27
    • The reduction
    • Possibility 1
    • Possibility 2
    • Algorithm for Real Scaled String Matching
    • Example
    • Important Fact
    • Tighter analysis
    • Proof of Lemma
    • Lemmarsquos proof (cont)
    • Slide 37
    • Open Problem

      Model

      bull Low level (pixel level) avoid costly processing

      bull Asymptotically efficient solutions

      bull Serial exact algorithms

      Types of Approximations

      Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

      O(nsup2ksup2( edit distance k errors

      rectangular patterns

      O(nsup2kradic(m log m) radic(k log k)

      edit distance k errors

      half rectangular patterns

      AL-88

      AF-95

      Types of ApproximationOrientation results O(nsup2m ) FU-98

      O(nsup2msup3) ACL-98

      Scaling Natural scales results O(n) 1-d EV-88

      O(nsup2 log |Σ|) 2-d ALV-92

      O(nsup2) dictionary AC-96

      Real scales this result O(n) 1-d truncation

      5

      It seems daunting buthellip

      CPM 2003 Morelia Mexico

      Problem inherently inexact

      What if occurrence is 1frac12 times bigger

      What is the meaning of ldquofrac12 a pixelrdquo

      Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

      DefinitionText Pattern

      Find all occurrences of the pattern in the text in all discrete sizes

      m

      m

      n

      n

      Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

      A A A A A A A A A A C C A A C A

      A A A C C A A A A A C C A A A A

      A A A C C A A A A A A A A

      A A A A A A A A A A A A A

      A A A A A A A A A C C A A

      A A A A A A A A A C C A A

      A A A C C C A A A A A A A

      A A A C C C A A A A A A A

      A A A C C C A A A A C A A

      A A A A A A A A A A A A A

      A A A A A A A A A C C A C

      A A A A A A A A A A A A A

      Discrete exact Scaled Matching

      P Z U Y K V S X E T

      Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

      Idea Fix a scale s

      Constant amount of work for each square (s-block)

      s

      s

      nns

      Algorithm time

      Time for scale s

      Total time

      converges to a constant

      Making the total time O(nsup2)

      sn2

      2

      mn

      mn

      ss ssn n

      1

      2

      122

      2

      1

      Problem Real scales

      Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

      Formally

      nTT ||

      mPrrrP aaa j

      j ||21

      21

      aaaa crrc jjjj

      121

      121

      1

      rcrc jj

      11

      Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

      appears for some

      r timesr

      Remark

      α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

      loss of resolution

      From ldquofar enoughrdquo away everything looks the same

      By our definition for klt1m there is a match at every text location

      Simplify definition

      bcba4312 2

      323

      23

      23

      aaaa rrrrjj

      jj

      121121

      Definition 2 Look for in the textExample P=aabcccbbbb

      Match by definition 2 daaabccccbbbbbbe Match by definition 1

      but not by def 2 daaaabccccbbbbbbbe

      Why are definitions equivalent

      Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

      Time

      Time for split O(n+m)

      Finding Ps in Ts O(n+m) (eg KMP)

      HARD PART Finding PL in TL

      Definitions are Equivalent

      aa rrj

      j

      1212

      Claim Solving def 2 in time O(f(n))

      Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

      TLTotal time O(f(n)+n)=O(f(n))

      Naiumlve algorithm for matching PL in TL

      For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

      This is the interval of possible scales since

      tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

      Check intersectionIf intersection of all intervals is not

      empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

      The intersection is empty thus no scaled match in location 1 Buthellip

      Check intersectionIf intersection of all intervals is not

      empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

      The intersection is [7352) thus there is a scaled match in location 2

      Improvement ndash Parameterized Matching

      Introduced Baker 1994

      Motivation ldquocopyingrdquo code

      Parameterized Matching

      Input two strings s and t |s|=|t| over alphabets sums and sumt

      s parameterize matches t if bijection sums sumt such that (s) = t

      exist

      (a)=x

      (b)=y

      Π Π

      ΠΠ

      a ab b b

      x xy y y

      Example

      Parameterized Matching

      Claim (AFM-94)

      For Σ that can be sorted in linear time (eg Σ=1 n)

      Parameterized matching can be done in time O(n)

      The reduction

      1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

      Proof Assume PL does not p-match TL at

      location i

      The possible situations are

      Possibility 1wlog c ge a+1

      For c = a+1 (smallest possible)

      TL

      PL

      a

      b b

      cnea

      b

      a

      b

      a

      b

      a

      b

      a 211

      Possibility 2

      wlog c ge b+1

      Intersection not empty only if

      (a+1)(b+1) gt ab ie

      ab+b gt ab+a

      bgta

      But this can never happen if α ge 1

      TL

      PL

      a

      b cneb

      a

      1

      11

      1

      b

      a

      b

      a

      b

      a

      b

      a

      Algorithm for Real Scaled String Matching

      Let Pi1 Pi2 Pij be the different numbers in PL

      1 P-match PL in TL2 For each match chack intersection

      of intervals between Pi1 Pij and corresponding symbols in TL

      End Algorithm

      PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

      TL = 5 6 5 6 5 6 10 6 10 6 10 7

      scaled match

      Example

      2133 32

      21

      3121 2232

      3121 2255

      3231

      21 3333

      Important Fact

      So there are at most O(radicm) different Pikrsquos

      Time O(n) for parameterized matching (Σ=12

      hellipn) O(radicm) verification for each location Total O(nradicm)

      mi

      j

      kP

      k

      1

      Tighter analysis

      Upper bound number of possible p-matches

      Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

      Then there are at most n2j p-matches of PL in TL

      Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

      O((n2j) middot j) = O(n)

      Proof of Lemma

      1st appearance of Pi1 Pij

      PL Pi1 Pi2 Pij

      TL a1 a2 aj

      m-match

      2

      2

      1

      ja

      j

      ki

      Lemmarsquos proof (cont)

      Let x be the total number of p-matches in the text

      The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

      ge (xjsup2)2

      But There are overlaps How many

      Lemmarsquos proof (cont)

      For each text location at most j matches will count it Thereforehellip

      Total count without overlaps ge

      Clearly xmiddotj2 le n thus x le (2n)j

      2

      1

      2

      2

      xjxjj

      Open Problem

      Give 1-d algorithm linear in run-length compressed text and pattern

      • SCALED Pattern Matching
      • Motivation
      • Slide 3
      • Model
      • Types of Approximations
      • Types of Approximation
      • It seems daunting buthellip
      • CPM 2003 Morelia Mexico
      • Problem inherently inexact
      • Definition
      • Discrete exact Scaled Matching
      • Slide 12
      • Idea Fix a scale s
      • Algorithm time
      • Problem Real scales
      • Formally
      • Remark
      • Simplify definition
      • Why are definitions equivalent
      • Time
      • Definitions are Equivalent
      • Naiumlve algorithm for matching PL in TL
      • Check intersection
      • Slide 24
      • Improvement ndash Parameterized Matching
      • Parameterized Matching
      • Slide 27
      • The reduction
      • Possibility 1
      • Possibility 2
      • Algorithm for Real Scaled String Matching
      • Example
      • Important Fact
      • Tighter analysis
      • Proof of Lemma
      • Lemmarsquos proof (cont)
      • Slide 37
      • Open Problem

        Types of Approximations

        Local errors Level of detail Occlusion Noise results O(nsup2 log m) mismatches

        O(nsup2ksup2( edit distance k errors

        rectangular patterns

        O(nsup2kradic(m log m) radic(k log k)

        edit distance k errors

        half rectangular patterns

        AL-88

        AF-95

        Types of ApproximationOrientation results O(nsup2m ) FU-98

        O(nsup2msup3) ACL-98

        Scaling Natural scales results O(n) 1-d EV-88

        O(nsup2 log |Σ|) 2-d ALV-92

        O(nsup2) dictionary AC-96

        Real scales this result O(n) 1-d truncation

        5

        It seems daunting buthellip

        CPM 2003 Morelia Mexico

        Problem inherently inexact

        What if occurrence is 1frac12 times bigger

        What is the meaning of ldquofrac12 a pixelrdquo

        Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

        DefinitionText Pattern

        Find all occurrences of the pattern in the text in all discrete sizes

        m

        m

        n

        n

        Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

        A A A A A A A A A A C C A A C A

        A A A C C A A A A A C C A A A A

        A A A C C A A A A A A A A

        A A A A A A A A A A A A A

        A A A A A A A A A C C A A

        A A A A A A A A A C C A A

        A A A C C C A A A A A A A

        A A A C C C A A A A A A A

        A A A C C C A A A A C A A

        A A A A A A A A A A A A A

        A A A A A A A A A C C A C

        A A A A A A A A A A A A A

        Discrete exact Scaled Matching

        P Z U Y K V S X E T

        Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

        Idea Fix a scale s

        Constant amount of work for each square (s-block)

        s

        s

        nns

        Algorithm time

        Time for scale s

        Total time

        converges to a constant

        Making the total time O(nsup2)

        sn2

        2

        mn

        mn

        ss ssn n

        1

        2

        122

        2

        1

        Problem Real scales

        Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

        Formally

        nTT ||

        mPrrrP aaa j

        j ||21

        21

        aaaa crrc jjjj

        121

        121

        1

        rcrc jj

        11

        Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

        appears for some

        r timesr

        Remark

        α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

        loss of resolution

        From ldquofar enoughrdquo away everything looks the same

        By our definition for klt1m there is a match at every text location

        Simplify definition

        bcba4312 2

        323

        23

        23

        aaaa rrrrjj

        jj

        121121

        Definition 2 Look for in the textExample P=aabcccbbbb

        Match by definition 2 daaabccccbbbbbbe Match by definition 1

        but not by def 2 daaaabccccbbbbbbbe

        Why are definitions equivalent

        Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

        Time

        Time for split O(n+m)

        Finding Ps in Ts O(n+m) (eg KMP)

        HARD PART Finding PL in TL

        Definitions are Equivalent

        aa rrj

        j

        1212

        Claim Solving def 2 in time O(f(n))

        Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

        TLTotal time O(f(n)+n)=O(f(n))

        Naiumlve algorithm for matching PL in TL

        For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

        This is the interval of possible scales since

        tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

        Check intersectionIf intersection of all intervals is not

        empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

        The intersection is empty thus no scaled match in location 1 Buthellip

        Check intersectionIf intersection of all intervals is not

        empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

        The intersection is [7352) thus there is a scaled match in location 2

        Improvement ndash Parameterized Matching

        Introduced Baker 1994

        Motivation ldquocopyingrdquo code

        Parameterized Matching

        Input two strings s and t |s|=|t| over alphabets sums and sumt

        s parameterize matches t if bijection sums sumt such that (s) = t

        exist

        (a)=x

        (b)=y

        Π Π

        ΠΠ

        a ab b b

        x xy y y

        Example

        Parameterized Matching

        Claim (AFM-94)

        For Σ that can be sorted in linear time (eg Σ=1 n)

        Parameterized matching can be done in time O(n)

        The reduction

        1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

        Proof Assume PL does not p-match TL at

        location i

        The possible situations are

        Possibility 1wlog c ge a+1

        For c = a+1 (smallest possible)

        TL

        PL

        a

        b b

        cnea

        b

        a

        b

        a

        b

        a

        b

        a 211

        Possibility 2

        wlog c ge b+1

        Intersection not empty only if

        (a+1)(b+1) gt ab ie

        ab+b gt ab+a

        bgta

        But this can never happen if α ge 1

        TL

        PL

        a

        b cneb

        a

        1

        11

        1

        b

        a

        b

        a

        b

        a

        b

        a

        Algorithm for Real Scaled String Matching

        Let Pi1 Pi2 Pij be the different numbers in PL

        1 P-match PL in TL2 For each match chack intersection

        of intervals between Pi1 Pij and corresponding symbols in TL

        End Algorithm

        PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

        TL = 5 6 5 6 5 6 10 6 10 6 10 7

        scaled match

        Example

        2133 32

        21

        3121 2232

        3121 2255

        3231

        21 3333

        Important Fact

        So there are at most O(radicm) different Pikrsquos

        Time O(n) for parameterized matching (Σ=12

        hellipn) O(radicm) verification for each location Total O(nradicm)

        mi

        j

        kP

        k

        1

        Tighter analysis

        Upper bound number of possible p-matches

        Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

        Then there are at most n2j p-matches of PL in TL

        Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

        O((n2j) middot j) = O(n)

        Proof of Lemma

        1st appearance of Pi1 Pij

        PL Pi1 Pi2 Pij

        TL a1 a2 aj

        m-match

        2

        2

        1

        ja

        j

        ki

        Lemmarsquos proof (cont)

        Let x be the total number of p-matches in the text

        The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

        ge (xjsup2)2

        But There are overlaps How many

        Lemmarsquos proof (cont)

        For each text location at most j matches will count it Thereforehellip

        Total count without overlaps ge

        Clearly xmiddotj2 le n thus x le (2n)j

        2

        1

        2

        2

        xjxjj

        Open Problem

        Give 1-d algorithm linear in run-length compressed text and pattern

        • SCALED Pattern Matching
        • Motivation
        • Slide 3
        • Model
        • Types of Approximations
        • Types of Approximation
        • It seems daunting buthellip
        • CPM 2003 Morelia Mexico
        • Problem inherently inexact
        • Definition
        • Discrete exact Scaled Matching
        • Slide 12
        • Idea Fix a scale s
        • Algorithm time
        • Problem Real scales
        • Formally
        • Remark
        • Simplify definition
        • Why are definitions equivalent
        • Time
        • Definitions are Equivalent
        • Naiumlve algorithm for matching PL in TL
        • Check intersection
        • Slide 24
        • Improvement ndash Parameterized Matching
        • Parameterized Matching
        • Slide 27
        • The reduction
        • Possibility 1
        • Possibility 2
        • Algorithm for Real Scaled String Matching
        • Example
        • Important Fact
        • Tighter analysis
        • Proof of Lemma
        • Lemmarsquos proof (cont)
        • Slide 37
        • Open Problem

          Types of ApproximationOrientation results O(nsup2m ) FU-98

          O(nsup2msup3) ACL-98

          Scaling Natural scales results O(n) 1-d EV-88

          O(nsup2 log |Σ|) 2-d ALV-92

          O(nsup2) dictionary AC-96

          Real scales this result O(n) 1-d truncation

          5

          It seems daunting buthellip

          CPM 2003 Morelia Mexico

          Problem inherently inexact

          What if occurrence is 1frac12 times bigger

          What is the meaning of ldquofrac12 a pixelrdquo

          Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

          DefinitionText Pattern

          Find all occurrences of the pattern in the text in all discrete sizes

          m

          m

          n

          n

          Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

          A A A A A A A A A A C C A A C A

          A A A C C A A A A A C C A A A A

          A A A C C A A A A A A A A

          A A A A A A A A A A A A A

          A A A A A A A A A C C A A

          A A A A A A A A A C C A A

          A A A C C C A A A A A A A

          A A A C C C A A A A A A A

          A A A C C C A A A A C A A

          A A A A A A A A A A A A A

          A A A A A A A A A C C A C

          A A A A A A A A A A A A A

          Discrete exact Scaled Matching

          P Z U Y K V S X E T

          Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

          Idea Fix a scale s

          Constant amount of work for each square (s-block)

          s

          s

          nns

          Algorithm time

          Time for scale s

          Total time

          converges to a constant

          Making the total time O(nsup2)

          sn2

          2

          mn

          mn

          ss ssn n

          1

          2

          122

          2

          1

          Problem Real scales

          Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

          Formally

          nTT ||

          mPrrrP aaa j

          j ||21

          21

          aaaa crrc jjjj

          121

          121

          1

          rcrc jj

          11

          Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

          appears for some

          r timesr

          Remark

          α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

          loss of resolution

          From ldquofar enoughrdquo away everything looks the same

          By our definition for klt1m there is a match at every text location

          Simplify definition

          bcba4312 2

          323

          23

          23

          aaaa rrrrjj

          jj

          121121

          Definition 2 Look for in the textExample P=aabcccbbbb

          Match by definition 2 daaabccccbbbbbbe Match by definition 1

          but not by def 2 daaaabccccbbbbbbbe

          Why are definitions equivalent

          Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

          Time

          Time for split O(n+m)

          Finding Ps in Ts O(n+m) (eg KMP)

          HARD PART Finding PL in TL

          Definitions are Equivalent

          aa rrj

          j

          1212

          Claim Solving def 2 in time O(f(n))

          Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

          TLTotal time O(f(n)+n)=O(f(n))

          Naiumlve algorithm for matching PL in TL

          For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

          This is the interval of possible scales since

          tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

          Check intersectionIf intersection of all intervals is not

          empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

          The intersection is empty thus no scaled match in location 1 Buthellip

          Check intersectionIf intersection of all intervals is not

          empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

          The intersection is [7352) thus there is a scaled match in location 2

          Improvement ndash Parameterized Matching

          Introduced Baker 1994

          Motivation ldquocopyingrdquo code

          Parameterized Matching

          Input two strings s and t |s|=|t| over alphabets sums and sumt

          s parameterize matches t if bijection sums sumt such that (s) = t

          exist

          (a)=x

          (b)=y

          Π Π

          ΠΠ

          a ab b b

          x xy y y

          Example

          Parameterized Matching

          Claim (AFM-94)

          For Σ that can be sorted in linear time (eg Σ=1 n)

          Parameterized matching can be done in time O(n)

          The reduction

          1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

          Proof Assume PL does not p-match TL at

          location i

          The possible situations are

          Possibility 1wlog c ge a+1

          For c = a+1 (smallest possible)

          TL

          PL

          a

          b b

          cnea

          b

          a

          b

          a

          b

          a

          b

          a 211

          Possibility 2

          wlog c ge b+1

          Intersection not empty only if

          (a+1)(b+1) gt ab ie

          ab+b gt ab+a

          bgta

          But this can never happen if α ge 1

          TL

          PL

          a

          b cneb

          a

          1

          11

          1

          b

          a

          b

          a

          b

          a

          b

          a

          Algorithm for Real Scaled String Matching

          Let Pi1 Pi2 Pij be the different numbers in PL

          1 P-match PL in TL2 For each match chack intersection

          of intervals between Pi1 Pij and corresponding symbols in TL

          End Algorithm

          PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

          TL = 5 6 5 6 5 6 10 6 10 6 10 7

          scaled match

          Example

          2133 32

          21

          3121 2232

          3121 2255

          3231

          21 3333

          Important Fact

          So there are at most O(radicm) different Pikrsquos

          Time O(n) for parameterized matching (Σ=12

          hellipn) O(radicm) verification for each location Total O(nradicm)

          mi

          j

          kP

          k

          1

          Tighter analysis

          Upper bound number of possible p-matches

          Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

          Then there are at most n2j p-matches of PL in TL

          Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

          O((n2j) middot j) = O(n)

          Proof of Lemma

          1st appearance of Pi1 Pij

          PL Pi1 Pi2 Pij

          TL a1 a2 aj

          m-match

          2

          2

          1

          ja

          j

          ki

          Lemmarsquos proof (cont)

          Let x be the total number of p-matches in the text

          The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

          ge (xjsup2)2

          But There are overlaps How many

          Lemmarsquos proof (cont)

          For each text location at most j matches will count it Thereforehellip

          Total count without overlaps ge

          Clearly xmiddotj2 le n thus x le (2n)j

          2

          1

          2

          2

          xjxjj

          Open Problem

          Give 1-d algorithm linear in run-length compressed text and pattern

          • SCALED Pattern Matching
          • Motivation
          • Slide 3
          • Model
          • Types of Approximations
          • Types of Approximation
          • It seems daunting buthellip
          • CPM 2003 Morelia Mexico
          • Problem inherently inexact
          • Definition
          • Discrete exact Scaled Matching
          • Slide 12
          • Idea Fix a scale s
          • Algorithm time
          • Problem Real scales
          • Formally
          • Remark
          • Simplify definition
          • Why are definitions equivalent
          • Time
          • Definitions are Equivalent
          • Naiumlve algorithm for matching PL in TL
          • Check intersection
          • Slide 24
          • Improvement ndash Parameterized Matching
          • Parameterized Matching
          • Slide 27
          • The reduction
          • Possibility 1
          • Possibility 2
          • Algorithm for Real Scaled String Matching
          • Example
          • Important Fact
          • Tighter analysis
          • Proof of Lemma
          • Lemmarsquos proof (cont)
          • Slide 37
          • Open Problem

            It seems daunting buthellip

            CPM 2003 Morelia Mexico

            Problem inherently inexact

            What if occurrence is 1frac12 times bigger

            What is the meaning of ldquofrac12 a pixelrdquo

            Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

            DefinitionText Pattern

            Find all occurrences of the pattern in the text in all discrete sizes

            m

            m

            n

            n

            Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

            A A A A A A A A A A C C A A C A

            A A A C C A A A A A C C A A A A

            A A A C C A A A A A A A A

            A A A A A A A A A A A A A

            A A A A A A A A A C C A A

            A A A A A A A A A C C A A

            A A A C C C A A A A A A A

            A A A C C C A A A A A A A

            A A A C C C A A A A C A A

            A A A A A A A A A A A A A

            A A A A A A A A A C C A C

            A A A A A A A A A A A A A

            Discrete exact Scaled Matching

            P Z U Y K V S X E T

            Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

            Idea Fix a scale s

            Constant amount of work for each square (s-block)

            s

            s

            nns

            Algorithm time

            Time for scale s

            Total time

            converges to a constant

            Making the total time O(nsup2)

            sn2

            2

            mn

            mn

            ss ssn n

            1

            2

            122

            2

            1

            Problem Real scales

            Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

            Formally

            nTT ||

            mPrrrP aaa j

            j ||21

            21

            aaaa crrc jjjj

            121

            121

            1

            rcrc jj

            11

            Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

            appears for some

            r timesr

            Remark

            α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

            loss of resolution

            From ldquofar enoughrdquo away everything looks the same

            By our definition for klt1m there is a match at every text location

            Simplify definition

            bcba4312 2

            323

            23

            23

            aaaa rrrrjj

            jj

            121121

            Definition 2 Look for in the textExample P=aabcccbbbb

            Match by definition 2 daaabccccbbbbbbe Match by definition 1

            but not by def 2 daaaabccccbbbbbbbe

            Why are definitions equivalent

            Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

            Time

            Time for split O(n+m)

            Finding Ps in Ts O(n+m) (eg KMP)

            HARD PART Finding PL in TL

            Definitions are Equivalent

            aa rrj

            j

            1212

            Claim Solving def 2 in time O(f(n))

            Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

            TLTotal time O(f(n)+n)=O(f(n))

            Naiumlve algorithm for matching PL in TL

            For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

            This is the interval of possible scales since

            tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

            Check intersectionIf intersection of all intervals is not

            empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

            The intersection is empty thus no scaled match in location 1 Buthellip

            Check intersectionIf intersection of all intervals is not

            empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

            The intersection is [7352) thus there is a scaled match in location 2

            Improvement ndash Parameterized Matching

            Introduced Baker 1994

            Motivation ldquocopyingrdquo code

            Parameterized Matching

            Input two strings s and t |s|=|t| over alphabets sums and sumt

            s parameterize matches t if bijection sums sumt such that (s) = t

            exist

            (a)=x

            (b)=y

            Π Π

            ΠΠ

            a ab b b

            x xy y y

            Example

            Parameterized Matching

            Claim (AFM-94)

            For Σ that can be sorted in linear time (eg Σ=1 n)

            Parameterized matching can be done in time O(n)

            The reduction

            1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

            Proof Assume PL does not p-match TL at

            location i

            The possible situations are

            Possibility 1wlog c ge a+1

            For c = a+1 (smallest possible)

            TL

            PL

            a

            b b

            cnea

            b

            a

            b

            a

            b

            a

            b

            a 211

            Possibility 2

            wlog c ge b+1

            Intersection not empty only if

            (a+1)(b+1) gt ab ie

            ab+b gt ab+a

            bgta

            But this can never happen if α ge 1

            TL

            PL

            a

            b cneb

            a

            1

            11

            1

            b

            a

            b

            a

            b

            a

            b

            a

            Algorithm for Real Scaled String Matching

            Let Pi1 Pi2 Pij be the different numbers in PL

            1 P-match PL in TL2 For each match chack intersection

            of intervals between Pi1 Pij and corresponding symbols in TL

            End Algorithm

            PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

            TL = 5 6 5 6 5 6 10 6 10 6 10 7

            scaled match

            Example

            2133 32

            21

            3121 2232

            3121 2255

            3231

            21 3333

            Important Fact

            So there are at most O(radicm) different Pikrsquos

            Time O(n) for parameterized matching (Σ=12

            hellipn) O(radicm) verification for each location Total O(nradicm)

            mi

            j

            kP

            k

            1

            Tighter analysis

            Upper bound number of possible p-matches

            Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

            Then there are at most n2j p-matches of PL in TL

            Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

            O((n2j) middot j) = O(n)

            Proof of Lemma

            1st appearance of Pi1 Pij

            PL Pi1 Pi2 Pij

            TL a1 a2 aj

            m-match

            2

            2

            1

            ja

            j

            ki

            Lemmarsquos proof (cont)

            Let x be the total number of p-matches in the text

            The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

            ge (xjsup2)2

            But There are overlaps How many

            Lemmarsquos proof (cont)

            For each text location at most j matches will count it Thereforehellip

            Total count without overlaps ge

            Clearly xmiddotj2 le n thus x le (2n)j

            2

            1

            2

            2

            xjxjj

            Open Problem

            Give 1-d algorithm linear in run-length compressed text and pattern

            • SCALED Pattern Matching
            • Motivation
            • Slide 3
            • Model
            • Types of Approximations
            • Types of Approximation
            • It seems daunting buthellip
            • CPM 2003 Morelia Mexico
            • Problem inherently inexact
            • Definition
            • Discrete exact Scaled Matching
            • Slide 12
            • Idea Fix a scale s
            • Algorithm time
            • Problem Real scales
            • Formally
            • Remark
            • Simplify definition
            • Why are definitions equivalent
            • Time
            • Definitions are Equivalent
            • Naiumlve algorithm for matching PL in TL
            • Check intersection
            • Slide 24
            • Improvement ndash Parameterized Matching
            • Parameterized Matching
            • Slide 27
            • The reduction
            • Possibility 1
            • Possibility 2
            • Algorithm for Real Scaled String Matching
            • Example
            • Important Fact
            • Tighter analysis
            • Proof of Lemma
            • Lemmarsquos proof (cont)
            • Slide 37
            • Open Problem

              CPM 2003 Morelia Mexico

              Problem inherently inexact

              What if occurrence is 1frac12 times bigger

              What is the meaning of ldquofrac12 a pixelrdquo

              Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

              DefinitionText Pattern

              Find all occurrences of the pattern in the text in all discrete sizes

              m

              m

              n

              n

              Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

              A A A A A A A A A A C C A A C A

              A A A C C A A A A A C C A A A A

              A A A C C A A A A A A A A

              A A A A A A A A A A A A A

              A A A A A A A A A C C A A

              A A A A A A A A A C C A A

              A A A C C C A A A A A A A

              A A A C C C A A A A A A A

              A A A C C C A A A A C A A

              A A A A A A A A A A A A A

              A A A A A A A A A C C A C

              A A A A A A A A A A A A A

              Discrete exact Scaled Matching

              P Z U Y K V S X E T

              Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

              Idea Fix a scale s

              Constant amount of work for each square (s-block)

              s

              s

              nns

              Algorithm time

              Time for scale s

              Total time

              converges to a constant

              Making the total time O(nsup2)

              sn2

              2

              mn

              mn

              ss ssn n

              1

              2

              122

              2

              1

              Problem Real scales

              Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

              Formally

              nTT ||

              mPrrrP aaa j

              j ||21

              21

              aaaa crrc jjjj

              121

              121

              1

              rcrc jj

              11

              Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

              appears for some

              r timesr

              Remark

              α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

              loss of resolution

              From ldquofar enoughrdquo away everything looks the same

              By our definition for klt1m there is a match at every text location

              Simplify definition

              bcba4312 2

              323

              23

              23

              aaaa rrrrjj

              jj

              121121

              Definition 2 Look for in the textExample P=aabcccbbbb

              Match by definition 2 daaabccccbbbbbbe Match by definition 1

              but not by def 2 daaaabccccbbbbbbbe

              Why are definitions equivalent

              Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

              Time

              Time for split O(n+m)

              Finding Ps in Ts O(n+m) (eg KMP)

              HARD PART Finding PL in TL

              Definitions are Equivalent

              aa rrj

              j

              1212

              Claim Solving def 2 in time O(f(n))

              Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

              TLTotal time O(f(n)+n)=O(f(n))

              Naiumlve algorithm for matching PL in TL

              For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

              This is the interval of possible scales since

              tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

              Check intersectionIf intersection of all intervals is not

              empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

              The intersection is empty thus no scaled match in location 1 Buthellip

              Check intersectionIf intersection of all intervals is not

              empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

              The intersection is [7352) thus there is a scaled match in location 2

              Improvement ndash Parameterized Matching

              Introduced Baker 1994

              Motivation ldquocopyingrdquo code

              Parameterized Matching

              Input two strings s and t |s|=|t| over alphabets sums and sumt

              s parameterize matches t if bijection sums sumt such that (s) = t

              exist

              (a)=x

              (b)=y

              Π Π

              ΠΠ

              a ab b b

              x xy y y

              Example

              Parameterized Matching

              Claim (AFM-94)

              For Σ that can be sorted in linear time (eg Σ=1 n)

              Parameterized matching can be done in time O(n)

              The reduction

              1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

              Proof Assume PL does not p-match TL at

              location i

              The possible situations are

              Possibility 1wlog c ge a+1

              For c = a+1 (smallest possible)

              TL

              PL

              a

              b b

              cnea

              b

              a

              b

              a

              b

              a

              b

              a 211

              Possibility 2

              wlog c ge b+1

              Intersection not empty only if

              (a+1)(b+1) gt ab ie

              ab+b gt ab+a

              bgta

              But this can never happen if α ge 1

              TL

              PL

              a

              b cneb

              a

              1

              11

              1

              b

              a

              b

              a

              b

              a

              b

              a

              Algorithm for Real Scaled String Matching

              Let Pi1 Pi2 Pij be the different numbers in PL

              1 P-match PL in TL2 For each match chack intersection

              of intervals between Pi1 Pij and corresponding symbols in TL

              End Algorithm

              PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

              TL = 5 6 5 6 5 6 10 6 10 6 10 7

              scaled match

              Example

              2133 32

              21

              3121 2232

              3121 2255

              3231

              21 3333

              Important Fact

              So there are at most O(radicm) different Pikrsquos

              Time O(n) for parameterized matching (Σ=12

              hellipn) O(radicm) verification for each location Total O(nradicm)

              mi

              j

              kP

              k

              1

              Tighter analysis

              Upper bound number of possible p-matches

              Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

              Then there are at most n2j p-matches of PL in TL

              Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

              O((n2j) middot j) = O(n)

              Proof of Lemma

              1st appearance of Pi1 Pij

              PL Pi1 Pi2 Pij

              TL a1 a2 aj

              m-match

              2

              2

              1

              ja

              j

              ki

              Lemmarsquos proof (cont)

              Let x be the total number of p-matches in the text

              The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

              ge (xjsup2)2

              But There are overlaps How many

              Lemmarsquos proof (cont)

              For each text location at most j matches will count it Thereforehellip

              Total count without overlaps ge

              Clearly xmiddotj2 le n thus x le (2n)j

              2

              1

              2

              2

              xjxjj

              Open Problem

              Give 1-d algorithm linear in run-length compressed text and pattern

              • SCALED Pattern Matching
              • Motivation
              • Slide 3
              • Model
              • Types of Approximations
              • Types of Approximation
              • It seems daunting buthellip
              • CPM 2003 Morelia Mexico
              • Problem inherently inexact
              • Definition
              • Discrete exact Scaled Matching
              • Slide 12
              • Idea Fix a scale s
              • Algorithm time
              • Problem Real scales
              • Formally
              • Remark
              • Simplify definition
              • Why are definitions equivalent
              • Time
              • Definitions are Equivalent
              • Naiumlve algorithm for matching PL in TL
              • Check intersection
              • Slide 24
              • Improvement ndash Parameterized Matching
              • Parameterized Matching
              • Slide 27
              • The reduction
              • Possibility 1
              • Possibility 2
              • Algorithm for Real Scaled String Matching
              • Example
              • Important Fact
              • Tighter analysis
              • Proof of Lemma
              • Lemmarsquos proof (cont)
              • Slide 37
              • Open Problem

                Problem inherently inexact

                What if occurrence is 1frac12 times bigger

                What is the meaning of ldquofrac12 a pixelrdquo

                Solutions until now Natural Scales - Consider only discrete scales 1 2 3 4 5

                DefinitionText Pattern

                Find all occurrences of the pattern in the text in all discrete sizes

                m

                m

                n

                n

                Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

                A A A A A A A A A A C C A A C A

                A A A C C A A A A A C C A A A A

                A A A C C A A A A A A A A

                A A A A A A A A A A A A A

                A A A A A A A A A C C A A

                A A A A A A A A A C C A A

                A A A C C C A A A A A A A

                A A A C C C A A A A A A A

                A A A C C C A A A A C A A

                A A A A A A A A A A A A A

                A A A A A A A A A C C A C

                A A A A A A A A A A A A A

                Discrete exact Scaled Matching

                P Z U Y K V S X E T

                Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

                Idea Fix a scale s

                Constant amount of work for each square (s-block)

                s

                s

                nns

                Algorithm time

                Time for scale s

                Total time

                converges to a constant

                Making the total time O(nsup2)

                sn2

                2

                mn

                mn

                ss ssn n

                1

                2

                122

                2

                1

                Problem Real scales

                Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                Formally

                nTT ||

                mPrrrP aaa j

                j ||21

                21

                aaaa crrc jjjj

                121

                121

                1

                rcrc jj

                11

                Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                appears for some

                r timesr

                Remark

                α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                loss of resolution

                From ldquofar enoughrdquo away everything looks the same

                By our definition for klt1m there is a match at every text location

                Simplify definition

                bcba4312 2

                323

                23

                23

                aaaa rrrrjj

                jj

                121121

                Definition 2 Look for in the textExample P=aabcccbbbb

                Match by definition 2 daaabccccbbbbbbe Match by definition 1

                but not by def 2 daaaabccccbbbbbbbe

                Why are definitions equivalent

                Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                Time

                Time for split O(n+m)

                Finding Ps in Ts O(n+m) (eg KMP)

                HARD PART Finding PL in TL

                Definitions are Equivalent

                aa rrj

                j

                1212

                Claim Solving def 2 in time O(f(n))

                Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                TLTotal time O(f(n)+n)=O(f(n))

                Naiumlve algorithm for matching PL in TL

                For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                This is the interval of possible scales since

                tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                Check intersectionIf intersection of all intervals is not

                empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                The intersection is empty thus no scaled match in location 1 Buthellip

                Check intersectionIf intersection of all intervals is not

                empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                The intersection is [7352) thus there is a scaled match in location 2

                Improvement ndash Parameterized Matching

                Introduced Baker 1994

                Motivation ldquocopyingrdquo code

                Parameterized Matching

                Input two strings s and t |s|=|t| over alphabets sums and sumt

                s parameterize matches t if bijection sums sumt such that (s) = t

                exist

                (a)=x

                (b)=y

                Π Π

                ΠΠ

                a ab b b

                x xy y y

                Example

                Parameterized Matching

                Claim (AFM-94)

                For Σ that can be sorted in linear time (eg Σ=1 n)

                Parameterized matching can be done in time O(n)

                The reduction

                1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                Proof Assume PL does not p-match TL at

                location i

                The possible situations are

                Possibility 1wlog c ge a+1

                For c = a+1 (smallest possible)

                TL

                PL

                a

                b b

                cnea

                b

                a

                b

                a

                b

                a

                b

                a 211

                Possibility 2

                wlog c ge b+1

                Intersection not empty only if

                (a+1)(b+1) gt ab ie

                ab+b gt ab+a

                bgta

                But this can never happen if α ge 1

                TL

                PL

                a

                b cneb

                a

                1

                11

                1

                b

                a

                b

                a

                b

                a

                b

                a

                Algorithm for Real Scaled String Matching

                Let Pi1 Pi2 Pij be the different numbers in PL

                1 P-match PL in TL2 For each match chack intersection

                of intervals between Pi1 Pij and corresponding symbols in TL

                End Algorithm

                PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                TL = 5 6 5 6 5 6 10 6 10 6 10 7

                scaled match

                Example

                2133 32

                21

                3121 2232

                3121 2255

                3231

                21 3333

                Important Fact

                So there are at most O(radicm) different Pikrsquos

                Time O(n) for parameterized matching (Σ=12

                hellipn) O(radicm) verification for each location Total O(nradicm)

                mi

                j

                kP

                k

                1

                Tighter analysis

                Upper bound number of possible p-matches

                Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                Then there are at most n2j p-matches of PL in TL

                Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                O((n2j) middot j) = O(n)

                Proof of Lemma

                1st appearance of Pi1 Pij

                PL Pi1 Pi2 Pij

                TL a1 a2 aj

                m-match

                2

                2

                1

                ja

                j

                ki

                Lemmarsquos proof (cont)

                Let x be the total number of p-matches in the text

                The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                ge (xjsup2)2

                But There are overlaps How many

                Lemmarsquos proof (cont)

                For each text location at most j matches will count it Thereforehellip

                Total count without overlaps ge

                Clearly xmiddotj2 le n thus x le (2n)j

                2

                1

                2

                2

                xjxjj

                Open Problem

                Give 1-d algorithm linear in run-length compressed text and pattern

                • SCALED Pattern Matching
                • Motivation
                • Slide 3
                • Model
                • Types of Approximations
                • Types of Approximation
                • It seems daunting buthellip
                • CPM 2003 Morelia Mexico
                • Problem inherently inexact
                • Definition
                • Discrete exact Scaled Matching
                • Slide 12
                • Idea Fix a scale s
                • Algorithm time
                • Problem Real scales
                • Formally
                • Remark
                • Simplify definition
                • Why are definitions equivalent
                • Time
                • Definitions are Equivalent
                • Naiumlve algorithm for matching PL in TL
                • Check intersection
                • Slide 24
                • Improvement ndash Parameterized Matching
                • Parameterized Matching
                • Slide 27
                • The reduction
                • Possibility 1
                • Possibility 2
                • Algorithm for Real Scaled String Matching
                • Example
                • Important Fact
                • Tighter analysis
                • Proof of Lemma
                • Lemmarsquos proof (cont)
                • Slide 37
                • Open Problem

                  DefinitionText Pattern

                  Find all occurrences of the pattern in the text in all discrete sizes

                  m

                  m

                  n

                  n

                  Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

                  A A A A A A A A A A C C A A C A

                  A A A C C A A A A A C C A A A A

                  A A A C C A A A A A A A A

                  A A A A A A A A A A A A A

                  A A A A A A A A A C C A A

                  A A A A A A A A A C C A A

                  A A A C C C A A A A A A A

                  A A A C C C A A A A A A A

                  A A A C C C A A A A C A A

                  A A A A A A A A A A A A A

                  A A A A A A A A A C C A C

                  A A A A A A A A A A A A A

                  Discrete exact Scaled Matching

                  P Z U Y K V S X E T

                  Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

                  Idea Fix a scale s

                  Constant amount of work for each square (s-block)

                  s

                  s

                  nns

                  Algorithm time

                  Time for scale s

                  Total time

                  converges to a constant

                  Making the total time O(nsup2)

                  sn2

                  2

                  mn

                  mn

                  ss ssn n

                  1

                  2

                  122

                  2

                  1

                  Problem Real scales

                  Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                  Formally

                  nTT ||

                  mPrrrP aaa j

                  j ||21

                  21

                  aaaa crrc jjjj

                  121

                  121

                  1

                  rcrc jj

                  11

                  Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                  appears for some

                  r timesr

                  Remark

                  α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                  loss of resolution

                  From ldquofar enoughrdquo away everything looks the same

                  By our definition for klt1m there is a match at every text location

                  Simplify definition

                  bcba4312 2

                  323

                  23

                  23

                  aaaa rrrrjj

                  jj

                  121121

                  Definition 2 Look for in the textExample P=aabcccbbbb

                  Match by definition 2 daaabccccbbbbbbe Match by definition 1

                  but not by def 2 daaaabccccbbbbbbbe

                  Why are definitions equivalent

                  Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                  Time

                  Time for split O(n+m)

                  Finding Ps in Ts O(n+m) (eg KMP)

                  HARD PART Finding PL in TL

                  Definitions are Equivalent

                  aa rrj

                  j

                  1212

                  Claim Solving def 2 in time O(f(n))

                  Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                  TLTotal time O(f(n)+n)=O(f(n))

                  Naiumlve algorithm for matching PL in TL

                  For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                  This is the interval of possible scales since

                  tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                  Check intersectionIf intersection of all intervals is not

                  empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                  The intersection is empty thus no scaled match in location 1 Buthellip

                  Check intersectionIf intersection of all intervals is not

                  empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                  The intersection is [7352) thus there is a scaled match in location 2

                  Improvement ndash Parameterized Matching

                  Introduced Baker 1994

                  Motivation ldquocopyingrdquo code

                  Parameterized Matching

                  Input two strings s and t |s|=|t| over alphabets sums and sumt

                  s parameterize matches t if bijection sums sumt such that (s) = t

                  exist

                  (a)=x

                  (b)=y

                  Π Π

                  ΠΠ

                  a ab b b

                  x xy y y

                  Example

                  Parameterized Matching

                  Claim (AFM-94)

                  For Σ that can be sorted in linear time (eg Σ=1 n)

                  Parameterized matching can be done in time O(n)

                  The reduction

                  1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                  Proof Assume PL does not p-match TL at

                  location i

                  The possible situations are

                  Possibility 1wlog c ge a+1

                  For c = a+1 (smallest possible)

                  TL

                  PL

                  a

                  b b

                  cnea

                  b

                  a

                  b

                  a

                  b

                  a

                  b

                  a 211

                  Possibility 2

                  wlog c ge b+1

                  Intersection not empty only if

                  (a+1)(b+1) gt ab ie

                  ab+b gt ab+a

                  bgta

                  But this can never happen if α ge 1

                  TL

                  PL

                  a

                  b cneb

                  a

                  1

                  11

                  1

                  b

                  a

                  b

                  a

                  b

                  a

                  b

                  a

                  Algorithm for Real Scaled String Matching

                  Let Pi1 Pi2 Pij be the different numbers in PL

                  1 P-match PL in TL2 For each match chack intersection

                  of intervals between Pi1 Pij and corresponding symbols in TL

                  End Algorithm

                  PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                  TL = 5 6 5 6 5 6 10 6 10 6 10 7

                  scaled match

                  Example

                  2133 32

                  21

                  3121 2232

                  3121 2255

                  3231

                  21 3333

                  Important Fact

                  So there are at most O(radicm) different Pikrsquos

                  Time O(n) for parameterized matching (Σ=12

                  hellipn) O(radicm) verification for each location Total O(nradicm)

                  mi

                  j

                  kP

                  k

                  1

                  Tighter analysis

                  Upper bound number of possible p-matches

                  Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                  Then there are at most n2j p-matches of PL in TL

                  Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                  O((n2j) middot j) = O(n)

                  Proof of Lemma

                  1st appearance of Pi1 Pij

                  PL Pi1 Pi2 Pij

                  TL a1 a2 aj

                  m-match

                  2

                  2

                  1

                  ja

                  j

                  ki

                  Lemmarsquos proof (cont)

                  Let x be the total number of p-matches in the text

                  The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                  ge (xjsup2)2

                  But There are overlaps How many

                  Lemmarsquos proof (cont)

                  For each text location at most j matches will count it Thereforehellip

                  Total count without overlaps ge

                  Clearly xmiddotj2 le n thus x le (2n)j

                  2

                  1

                  2

                  2

                  xjxjj

                  Open Problem

                  Give 1-d algorithm linear in run-length compressed text and pattern

                  • SCALED Pattern Matching
                  • Motivation
                  • Slide 3
                  • Model
                  • Types of Approximations
                  • Types of Approximation
                  • It seems daunting buthellip
                  • CPM 2003 Morelia Mexico
                  • Problem inherently inexact
                  • Definition
                  • Discrete exact Scaled Matching
                  • Slide 12
                  • Idea Fix a scale s
                  • Algorithm time
                  • Problem Real scales
                  • Formally
                  • Remark
                  • Simplify definition
                  • Why are definitions equivalent
                  • Time
                  • Definitions are Equivalent
                  • Naiumlve algorithm for matching PL in TL
                  • Check intersection
                  • Slide 24
                  • Improvement ndash Parameterized Matching
                  • Parameterized Matching
                  • Slide 27
                  • The reduction
                  • Possibility 1
                  • Possibility 2
                  • Algorithm for Real Scaled String Matching
                  • Example
                  • Important Fact
                  • Tighter analysis
                  • Proof of Lemma
                  • Lemmarsquos proof (cont)
                  • Slide 37
                  • Open Problem

                    Discrete exact Scaled MatchingT PA A A A A A A A A A A A A A A A

                    A A A A A A A A A A C C A A C A

                    A A A C C A A A A A C C A A A A

                    A A A C C A A A A A A A A

                    A A A A A A A A A A A A A

                    A A A A A A A A A C C A A

                    A A A A A A A A A C C A A

                    A A A C C C A A A A A A A

                    A A A C C C A A A A A A A

                    A A A C C C A A A A C A A

                    A A A A A A A A A A A A A

                    A A A A A A A A A C C A C

                    A A A A A A A A A A A A A

                    Discrete exact Scaled Matching

                    P Z U Y K V S X E T

                    Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

                    Idea Fix a scale s

                    Constant amount of work for each square (s-block)

                    s

                    s

                    nns

                    Algorithm time

                    Time for scale s

                    Total time

                    converges to a constant

                    Making the total time O(nsup2)

                    sn2

                    2

                    mn

                    mn

                    ss ssn n

                    1

                    2

                    122

                    2

                    1

                    Problem Real scales

                    Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                    Formally

                    nTT ||

                    mPrrrP aaa j

                    j ||21

                    21

                    aaaa crrc jjjj

                    121

                    121

                    1

                    rcrc jj

                    11

                    Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                    appears for some

                    r timesr

                    Remark

                    α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                    loss of resolution

                    From ldquofar enoughrdquo away everything looks the same

                    By our definition for klt1m there is a match at every text location

                    Simplify definition

                    bcba4312 2

                    323

                    23

                    23

                    aaaa rrrrjj

                    jj

                    121121

                    Definition 2 Look for in the textExample P=aabcccbbbb

                    Match by definition 2 daaabccccbbbbbbe Match by definition 1

                    but not by def 2 daaaabccccbbbbbbbe

                    Why are definitions equivalent

                    Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                    Time

                    Time for split O(n+m)

                    Finding Ps in Ts O(n+m) (eg KMP)

                    HARD PART Finding PL in TL

                    Definitions are Equivalent

                    aa rrj

                    j

                    1212

                    Claim Solving def 2 in time O(f(n))

                    Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                    TLTotal time O(f(n)+n)=O(f(n))

                    Naiumlve algorithm for matching PL in TL

                    For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                    This is the interval of possible scales since

                    tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                    Check intersectionIf intersection of all intervals is not

                    empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                    The intersection is empty thus no scaled match in location 1 Buthellip

                    Check intersectionIf intersection of all intervals is not

                    empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                    The intersection is [7352) thus there is a scaled match in location 2

                    Improvement ndash Parameterized Matching

                    Introduced Baker 1994

                    Motivation ldquocopyingrdquo code

                    Parameterized Matching

                    Input two strings s and t |s|=|t| over alphabets sums and sumt

                    s parameterize matches t if bijection sums sumt such that (s) = t

                    exist

                    (a)=x

                    (b)=y

                    Π Π

                    ΠΠ

                    a ab b b

                    x xy y y

                    Example

                    Parameterized Matching

                    Claim (AFM-94)

                    For Σ that can be sorted in linear time (eg Σ=1 n)

                    Parameterized matching can be done in time O(n)

                    The reduction

                    1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                    Proof Assume PL does not p-match TL at

                    location i

                    The possible situations are

                    Possibility 1wlog c ge a+1

                    For c = a+1 (smallest possible)

                    TL

                    PL

                    a

                    b b

                    cnea

                    b

                    a

                    b

                    a

                    b

                    a

                    b

                    a 211

                    Possibility 2

                    wlog c ge b+1

                    Intersection not empty only if

                    (a+1)(b+1) gt ab ie

                    ab+b gt ab+a

                    bgta

                    But this can never happen if α ge 1

                    TL

                    PL

                    a

                    b cneb

                    a

                    1

                    11

                    1

                    b

                    a

                    b

                    a

                    b

                    a

                    b

                    a

                    Algorithm for Real Scaled String Matching

                    Let Pi1 Pi2 Pij be the different numbers in PL

                    1 P-match PL in TL2 For each match chack intersection

                    of intervals between Pi1 Pij and corresponding symbols in TL

                    End Algorithm

                    PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                    TL = 5 6 5 6 5 6 10 6 10 6 10 7

                    scaled match

                    Example

                    2133 32

                    21

                    3121 2232

                    3121 2255

                    3231

                    21 3333

                    Important Fact

                    So there are at most O(radicm) different Pikrsquos

                    Time O(n) for parameterized matching (Σ=12

                    hellipn) O(radicm) verification for each location Total O(nradicm)

                    mi

                    j

                    kP

                    k

                    1

                    Tighter analysis

                    Upper bound number of possible p-matches

                    Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                    Then there are at most n2j p-matches of PL in TL

                    Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                    O((n2j) middot j) = O(n)

                    Proof of Lemma

                    1st appearance of Pi1 Pij

                    PL Pi1 Pi2 Pij

                    TL a1 a2 aj

                    m-match

                    2

                    2

                    1

                    ja

                    j

                    ki

                    Lemmarsquos proof (cont)

                    Let x be the total number of p-matches in the text

                    The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                    ge (xjsup2)2

                    But There are overlaps How many

                    Lemmarsquos proof (cont)

                    For each text location at most j matches will count it Thereforehellip

                    Total count without overlaps ge

                    Clearly xmiddotj2 le n thus x le (2n)j

                    2

                    1

                    2

                    2

                    xjxjj

                    Open Problem

                    Give 1-d algorithm linear in run-length compressed text and pattern

                    • SCALED Pattern Matching
                    • Motivation
                    • Slide 3
                    • Model
                    • Types of Approximations
                    • Types of Approximation
                    • It seems daunting buthellip
                    • CPM 2003 Morelia Mexico
                    • Problem inherently inexact
                    • Definition
                    • Discrete exact Scaled Matching
                    • Slide 12
                    • Idea Fix a scale s
                    • Algorithm time
                    • Problem Real scales
                    • Formally
                    • Remark
                    • Simplify definition
                    • Why are definitions equivalent
                    • Time
                    • Definitions are Equivalent
                    • Naiumlve algorithm for matching PL in TL
                    • Check intersection
                    • Slide 24
                    • Improvement ndash Parameterized Matching
                    • Parameterized Matching
                    • Slide 27
                    • The reduction
                    • Possibility 1
                    • Possibility 2
                    • Algorithm for Real Scaled String Matching
                    • Example
                    • Important Fact
                    • Tighter analysis
                    • Proof of Lemma
                    • Lemmarsquos proof (cont)
                    • Slide 37
                    • Open Problem

                      Discrete exact Scaled Matching

                      P Z U Y K V S X E T

                      Psup3 Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y Z Z Z U U U Y Y Y K K K V V V S S S K K K V V V S S S K K K V V V S S S X X X E E E T T T X X X E E E T T T X X X E E E T T T

                      Idea Fix a scale s

                      Constant amount of work for each square (s-block)

                      s

                      s

                      nns

                      Algorithm time

                      Time for scale s

                      Total time

                      converges to a constant

                      Making the total time O(nsup2)

                      sn2

                      2

                      mn

                      mn

                      ss ssn n

                      1

                      2

                      122

                      2

                      1

                      Problem Real scales

                      Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                      Formally

                      nTT ||

                      mPrrrP aaa j

                      j ||21

                      21

                      aaaa crrc jjjj

                      121

                      121

                      1

                      rcrc jj

                      11

                      Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                      appears for some

                      r timesr

                      Remark

                      α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                      loss of resolution

                      From ldquofar enoughrdquo away everything looks the same

                      By our definition for klt1m there is a match at every text location

                      Simplify definition

                      bcba4312 2

                      323

                      23

                      23

                      aaaa rrrrjj

                      jj

                      121121

                      Definition 2 Look for in the textExample P=aabcccbbbb

                      Match by definition 2 daaabccccbbbbbbe Match by definition 1

                      but not by def 2 daaaabccccbbbbbbbe

                      Why are definitions equivalent

                      Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                      Time

                      Time for split O(n+m)

                      Finding Ps in Ts O(n+m) (eg KMP)

                      HARD PART Finding PL in TL

                      Definitions are Equivalent

                      aa rrj

                      j

                      1212

                      Claim Solving def 2 in time O(f(n))

                      Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                      TLTotal time O(f(n)+n)=O(f(n))

                      Naiumlve algorithm for matching PL in TL

                      For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                      This is the interval of possible scales since

                      tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                      Check intersectionIf intersection of all intervals is not

                      empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                      The intersection is empty thus no scaled match in location 1 Buthellip

                      Check intersectionIf intersection of all intervals is not

                      empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                      The intersection is [7352) thus there is a scaled match in location 2

                      Improvement ndash Parameterized Matching

                      Introduced Baker 1994

                      Motivation ldquocopyingrdquo code

                      Parameterized Matching

                      Input two strings s and t |s|=|t| over alphabets sums and sumt

                      s parameterize matches t if bijection sums sumt such that (s) = t

                      exist

                      (a)=x

                      (b)=y

                      Π Π

                      ΠΠ

                      a ab b b

                      x xy y y

                      Example

                      Parameterized Matching

                      Claim (AFM-94)

                      For Σ that can be sorted in linear time (eg Σ=1 n)

                      Parameterized matching can be done in time O(n)

                      The reduction

                      1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                      Proof Assume PL does not p-match TL at

                      location i

                      The possible situations are

                      Possibility 1wlog c ge a+1

                      For c = a+1 (smallest possible)

                      TL

                      PL

                      a

                      b b

                      cnea

                      b

                      a

                      b

                      a

                      b

                      a

                      b

                      a 211

                      Possibility 2

                      wlog c ge b+1

                      Intersection not empty only if

                      (a+1)(b+1) gt ab ie

                      ab+b gt ab+a

                      bgta

                      But this can never happen if α ge 1

                      TL

                      PL

                      a

                      b cneb

                      a

                      1

                      11

                      1

                      b

                      a

                      b

                      a

                      b

                      a

                      b

                      a

                      Algorithm for Real Scaled String Matching

                      Let Pi1 Pi2 Pij be the different numbers in PL

                      1 P-match PL in TL2 For each match chack intersection

                      of intervals between Pi1 Pij and corresponding symbols in TL

                      End Algorithm

                      PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                      TL = 5 6 5 6 5 6 10 6 10 6 10 7

                      scaled match

                      Example

                      2133 32

                      21

                      3121 2232

                      3121 2255

                      3231

                      21 3333

                      Important Fact

                      So there are at most O(radicm) different Pikrsquos

                      Time O(n) for parameterized matching (Σ=12

                      hellipn) O(radicm) verification for each location Total O(nradicm)

                      mi

                      j

                      kP

                      k

                      1

                      Tighter analysis

                      Upper bound number of possible p-matches

                      Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                      Then there are at most n2j p-matches of PL in TL

                      Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                      O((n2j) middot j) = O(n)

                      Proof of Lemma

                      1st appearance of Pi1 Pij

                      PL Pi1 Pi2 Pij

                      TL a1 a2 aj

                      m-match

                      2

                      2

                      1

                      ja

                      j

                      ki

                      Lemmarsquos proof (cont)

                      Let x be the total number of p-matches in the text

                      The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                      ge (xjsup2)2

                      But There are overlaps How many

                      Lemmarsquos proof (cont)

                      For each text location at most j matches will count it Thereforehellip

                      Total count without overlaps ge

                      Clearly xmiddotj2 le n thus x le (2n)j

                      2

                      1

                      2

                      2

                      xjxjj

                      Open Problem

                      Give 1-d algorithm linear in run-length compressed text and pattern

                      • SCALED Pattern Matching
                      • Motivation
                      • Slide 3
                      • Model
                      • Types of Approximations
                      • Types of Approximation
                      • It seems daunting buthellip
                      • CPM 2003 Morelia Mexico
                      • Problem inherently inexact
                      • Definition
                      • Discrete exact Scaled Matching
                      • Slide 12
                      • Idea Fix a scale s
                      • Algorithm time
                      • Problem Real scales
                      • Formally
                      • Remark
                      • Simplify definition
                      • Why are definitions equivalent
                      • Time
                      • Definitions are Equivalent
                      • Naiumlve algorithm for matching PL in TL
                      • Check intersection
                      • Slide 24
                      • Improvement ndash Parameterized Matching
                      • Parameterized Matching
                      • Slide 27
                      • The reduction
                      • Possibility 1
                      • Possibility 2
                      • Algorithm for Real Scaled String Matching
                      • Example
                      • Important Fact
                      • Tighter analysis
                      • Proof of Lemma
                      • Lemmarsquos proof (cont)
                      • Slide 37
                      • Open Problem

                        Idea Fix a scale s

                        Constant amount of work for each square (s-block)

                        s

                        s

                        nns

                        Algorithm time

                        Time for scale s

                        Total time

                        converges to a constant

                        Making the total time O(nsup2)

                        sn2

                        2

                        mn

                        mn

                        ss ssn n

                        1

                        2

                        122

                        2

                        1

                        Problem Real scales

                        Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                        Formally

                        nTT ||

                        mPrrrP aaa j

                        j ||21

                        21

                        aaaa crrc jjjj

                        121

                        121

                        1

                        rcrc jj

                        11

                        Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                        appears for some

                        r timesr

                        Remark

                        α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                        loss of resolution

                        From ldquofar enoughrdquo away everything looks the same

                        By our definition for klt1m there is a match at every text location

                        Simplify definition

                        bcba4312 2

                        323

                        23

                        23

                        aaaa rrrrjj

                        jj

                        121121

                        Definition 2 Look for in the textExample P=aabcccbbbb

                        Match by definition 2 daaabccccbbbbbbe Match by definition 1

                        but not by def 2 daaaabccccbbbbbbbe

                        Why are definitions equivalent

                        Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                        Time

                        Time for split O(n+m)

                        Finding Ps in Ts O(n+m) (eg KMP)

                        HARD PART Finding PL in TL

                        Definitions are Equivalent

                        aa rrj

                        j

                        1212

                        Claim Solving def 2 in time O(f(n))

                        Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                        TLTotal time O(f(n)+n)=O(f(n))

                        Naiumlve algorithm for matching PL in TL

                        For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                        This is the interval of possible scales since

                        tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                        Check intersectionIf intersection of all intervals is not

                        empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                        The intersection is empty thus no scaled match in location 1 Buthellip

                        Check intersectionIf intersection of all intervals is not

                        empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                        The intersection is [7352) thus there is a scaled match in location 2

                        Improvement ndash Parameterized Matching

                        Introduced Baker 1994

                        Motivation ldquocopyingrdquo code

                        Parameterized Matching

                        Input two strings s and t |s|=|t| over alphabets sums and sumt

                        s parameterize matches t if bijection sums sumt such that (s) = t

                        exist

                        (a)=x

                        (b)=y

                        Π Π

                        ΠΠ

                        a ab b b

                        x xy y y

                        Example

                        Parameterized Matching

                        Claim (AFM-94)

                        For Σ that can be sorted in linear time (eg Σ=1 n)

                        Parameterized matching can be done in time O(n)

                        The reduction

                        1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                        Proof Assume PL does not p-match TL at

                        location i

                        The possible situations are

                        Possibility 1wlog c ge a+1

                        For c = a+1 (smallest possible)

                        TL

                        PL

                        a

                        b b

                        cnea

                        b

                        a

                        b

                        a

                        b

                        a

                        b

                        a 211

                        Possibility 2

                        wlog c ge b+1

                        Intersection not empty only if

                        (a+1)(b+1) gt ab ie

                        ab+b gt ab+a

                        bgta

                        But this can never happen if α ge 1

                        TL

                        PL

                        a

                        b cneb

                        a

                        1

                        11

                        1

                        b

                        a

                        b

                        a

                        b

                        a

                        b

                        a

                        Algorithm for Real Scaled String Matching

                        Let Pi1 Pi2 Pij be the different numbers in PL

                        1 P-match PL in TL2 For each match chack intersection

                        of intervals between Pi1 Pij and corresponding symbols in TL

                        End Algorithm

                        PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                        TL = 5 6 5 6 5 6 10 6 10 6 10 7

                        scaled match

                        Example

                        2133 32

                        21

                        3121 2232

                        3121 2255

                        3231

                        21 3333

                        Important Fact

                        So there are at most O(radicm) different Pikrsquos

                        Time O(n) for parameterized matching (Σ=12

                        hellipn) O(radicm) verification for each location Total O(nradicm)

                        mi

                        j

                        kP

                        k

                        1

                        Tighter analysis

                        Upper bound number of possible p-matches

                        Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                        Then there are at most n2j p-matches of PL in TL

                        Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                        O((n2j) middot j) = O(n)

                        Proof of Lemma

                        1st appearance of Pi1 Pij

                        PL Pi1 Pi2 Pij

                        TL a1 a2 aj

                        m-match

                        2

                        2

                        1

                        ja

                        j

                        ki

                        Lemmarsquos proof (cont)

                        Let x be the total number of p-matches in the text

                        The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                        ge (xjsup2)2

                        But There are overlaps How many

                        Lemmarsquos proof (cont)

                        For each text location at most j matches will count it Thereforehellip

                        Total count without overlaps ge

                        Clearly xmiddotj2 le n thus x le (2n)j

                        2

                        1

                        2

                        2

                        xjxjj

                        Open Problem

                        Give 1-d algorithm linear in run-length compressed text and pattern

                        • SCALED Pattern Matching
                        • Motivation
                        • Slide 3
                        • Model
                        • Types of Approximations
                        • Types of Approximation
                        • It seems daunting buthellip
                        • CPM 2003 Morelia Mexico
                        • Problem inherently inexact
                        • Definition
                        • Discrete exact Scaled Matching
                        • Slide 12
                        • Idea Fix a scale s
                        • Algorithm time
                        • Problem Real scales
                        • Formally
                        • Remark
                        • Simplify definition
                        • Why are definitions equivalent
                        • Time
                        • Definitions are Equivalent
                        • Naiumlve algorithm for matching PL in TL
                        • Check intersection
                        • Slide 24
                        • Improvement ndash Parameterized Matching
                        • Parameterized Matching
                        • Slide 27
                        • The reduction
                        • Possibility 1
                        • Possibility 2
                        • Algorithm for Real Scaled String Matching
                        • Example
                        • Important Fact
                        • Tighter analysis
                        • Proof of Lemma
                        • Lemmarsquos proof (cont)
                        • Slide 37
                        • Open Problem

                          Algorithm time

                          Time for scale s

                          Total time

                          converges to a constant

                          Making the total time O(nsup2)

                          sn2

                          2

                          mn

                          mn

                          ss ssn n

                          1

                          2

                          122

                          2

                          1

                          Problem Real scales

                          Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                          Formally

                          nTT ||

                          mPrrrP aaa j

                          j ||21

                          21

                          aaaa crrc jjjj

                          121

                          121

                          1

                          rcrc jj

                          11

                          Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                          appears for some

                          r timesr

                          Remark

                          α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                          loss of resolution

                          From ldquofar enoughrdquo away everything looks the same

                          By our definition for klt1m there is a match at every text location

                          Simplify definition

                          bcba4312 2

                          323

                          23

                          23

                          aaaa rrrrjj

                          jj

                          121121

                          Definition 2 Look for in the textExample P=aabcccbbbb

                          Match by definition 2 daaabccccbbbbbbe Match by definition 1

                          but not by def 2 daaaabccccbbbbbbbe

                          Why are definitions equivalent

                          Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                          Time

                          Time for split O(n+m)

                          Finding Ps in Ts O(n+m) (eg KMP)

                          HARD PART Finding PL in TL

                          Definitions are Equivalent

                          aa rrj

                          j

                          1212

                          Claim Solving def 2 in time O(f(n))

                          Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                          TLTotal time O(f(n)+n)=O(f(n))

                          Naiumlve algorithm for matching PL in TL

                          For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                          This is the interval of possible scales since

                          tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                          Check intersectionIf intersection of all intervals is not

                          empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                          The intersection is empty thus no scaled match in location 1 Buthellip

                          Check intersectionIf intersection of all intervals is not

                          empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                          The intersection is [7352) thus there is a scaled match in location 2

                          Improvement ndash Parameterized Matching

                          Introduced Baker 1994

                          Motivation ldquocopyingrdquo code

                          Parameterized Matching

                          Input two strings s and t |s|=|t| over alphabets sums and sumt

                          s parameterize matches t if bijection sums sumt such that (s) = t

                          exist

                          (a)=x

                          (b)=y

                          Π Π

                          ΠΠ

                          a ab b b

                          x xy y y

                          Example

                          Parameterized Matching

                          Claim (AFM-94)

                          For Σ that can be sorted in linear time (eg Σ=1 n)

                          Parameterized matching can be done in time O(n)

                          The reduction

                          1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                          Proof Assume PL does not p-match TL at

                          location i

                          The possible situations are

                          Possibility 1wlog c ge a+1

                          For c = a+1 (smallest possible)

                          TL

                          PL

                          a

                          b b

                          cnea

                          b

                          a

                          b

                          a

                          b

                          a

                          b

                          a 211

                          Possibility 2

                          wlog c ge b+1

                          Intersection not empty only if

                          (a+1)(b+1) gt ab ie

                          ab+b gt ab+a

                          bgta

                          But this can never happen if α ge 1

                          TL

                          PL

                          a

                          b cneb

                          a

                          1

                          11

                          1

                          b

                          a

                          b

                          a

                          b

                          a

                          b

                          a

                          Algorithm for Real Scaled String Matching

                          Let Pi1 Pi2 Pij be the different numbers in PL

                          1 P-match PL in TL2 For each match chack intersection

                          of intervals between Pi1 Pij and corresponding symbols in TL

                          End Algorithm

                          PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                          TL = 5 6 5 6 5 6 10 6 10 6 10 7

                          scaled match

                          Example

                          2133 32

                          21

                          3121 2232

                          3121 2255

                          3231

                          21 3333

                          Important Fact

                          So there are at most O(radicm) different Pikrsquos

                          Time O(n) for parameterized matching (Σ=12

                          hellipn) O(radicm) verification for each location Total O(nradicm)

                          mi

                          j

                          kP

                          k

                          1

                          Tighter analysis

                          Upper bound number of possible p-matches

                          Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                          Then there are at most n2j p-matches of PL in TL

                          Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                          O((n2j) middot j) = O(n)

                          Proof of Lemma

                          1st appearance of Pi1 Pij

                          PL Pi1 Pi2 Pij

                          TL a1 a2 aj

                          m-match

                          2

                          2

                          1

                          ja

                          j

                          ki

                          Lemmarsquos proof (cont)

                          Let x be the total number of p-matches in the text

                          The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                          ge (xjsup2)2

                          But There are overlaps How many

                          Lemmarsquos proof (cont)

                          For each text location at most j matches will count it Thereforehellip

                          Total count without overlaps ge

                          Clearly xmiddotj2 le n thus x le (2n)j

                          2

                          1

                          2

                          2

                          xjxjj

                          Open Problem

                          Give 1-d algorithm linear in run-length compressed text and pattern

                          • SCALED Pattern Matching
                          • Motivation
                          • Slide 3
                          • Model
                          • Types of Approximations
                          • Types of Approximation
                          • It seems daunting buthellip
                          • CPM 2003 Morelia Mexico
                          • Problem inherently inexact
                          • Definition
                          • Discrete exact Scaled Matching
                          • Slide 12
                          • Idea Fix a scale s
                          • Algorithm time
                          • Problem Real scales
                          • Formally
                          • Remark
                          • Simplify definition
                          • Why are definitions equivalent
                          • Time
                          • Definitions are Equivalent
                          • Naiumlve algorithm for matching PL in TL
                          • Check intersection
                          • Slide 24
                          • Improvement ndash Parameterized Matching
                          • Parameterized Matching
                          • Slide 27
                          • The reduction
                          • Possibility 1
                          • Possibility 2
                          • Algorithm for Real Scaled String Matching
                          • Example
                          • Important Fact
                          • Tighter analysis
                          • Proof of Lemma
                          • Lemmarsquos proof (cont)
                          • Slide 37
                          • Open Problem

                            Problem Real scales

                            Was open even for stringshellipHow do we define aabcccbbScaled to 2 aaaabbccccccbbbbScaled to 1frac12 aaab cccc bbb truncate truncate frac12b frac12c

                            Formally

                            nTT ||

                            mPrrrP aaa j

                            j ||21

                            21

                            aaaa crrc jjjj

                            121

                            121

                            1

                            rcrc jj

                            11

                            Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                            appears for some

                            r timesr

                            Remark

                            α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                            loss of resolution

                            From ldquofar enoughrdquo away everything looks the same

                            By our definition for klt1m there is a match at every text location

                            Simplify definition

                            bcba4312 2

                            323

                            23

                            23

                            aaaa rrrrjj

                            jj

                            121121

                            Definition 2 Look for in the textExample P=aabcccbbbb

                            Match by definition 2 daaabccccbbbbbbe Match by definition 1

                            but not by def 2 daaaabccccbbbbbbbe

                            Why are definitions equivalent

                            Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                            Time

                            Time for split O(n+m)

                            Finding Ps in Ts O(n+m) (eg KMP)

                            HARD PART Finding PL in TL

                            Definitions are Equivalent

                            aa rrj

                            j

                            1212

                            Claim Solving def 2 in time O(f(n))

                            Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                            TLTotal time O(f(n)+n)=O(f(n))

                            Naiumlve algorithm for matching PL in TL

                            For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                            This is the interval of possible scales since

                            tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                            Check intersectionIf intersection of all intervals is not

                            empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                            The intersection is empty thus no scaled match in location 1 Buthellip

                            Check intersectionIf intersection of all intervals is not

                            empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                            The intersection is [7352) thus there is a scaled match in location 2

                            Improvement ndash Parameterized Matching

                            Introduced Baker 1994

                            Motivation ldquocopyingrdquo code

                            Parameterized Matching

                            Input two strings s and t |s|=|t| over alphabets sums and sumt

                            s parameterize matches t if bijection sums sumt such that (s) = t

                            exist

                            (a)=x

                            (b)=y

                            Π Π

                            ΠΠ

                            a ab b b

                            x xy y y

                            Example

                            Parameterized Matching

                            Claim (AFM-94)

                            For Σ that can be sorted in linear time (eg Σ=1 n)

                            Parameterized matching can be done in time O(n)

                            The reduction

                            1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                            Proof Assume PL does not p-match TL at

                            location i

                            The possible situations are

                            Possibility 1wlog c ge a+1

                            For c = a+1 (smallest possible)

                            TL

                            PL

                            a

                            b b

                            cnea

                            b

                            a

                            b

                            a

                            b

                            a

                            b

                            a 211

                            Possibility 2

                            wlog c ge b+1

                            Intersection not empty only if

                            (a+1)(b+1) gt ab ie

                            ab+b gt ab+a

                            bgta

                            But this can never happen if α ge 1

                            TL

                            PL

                            a

                            b cneb

                            a

                            1

                            11

                            1

                            b

                            a

                            b

                            a

                            b

                            a

                            b

                            a

                            Algorithm for Real Scaled String Matching

                            Let Pi1 Pi2 Pij be the different numbers in PL

                            1 P-match PL in TL2 For each match chack intersection

                            of intervals between Pi1 Pij and corresponding symbols in TL

                            End Algorithm

                            PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                            TL = 5 6 5 6 5 6 10 6 10 6 10 7

                            scaled match

                            Example

                            2133 32

                            21

                            3121 2232

                            3121 2255

                            3231

                            21 3333

                            Important Fact

                            So there are at most O(radicm) different Pikrsquos

                            Time O(n) for parameterized matching (Σ=12

                            hellipn) O(radicm) verification for each location Total O(nradicm)

                            mi

                            j

                            kP

                            k

                            1

                            Tighter analysis

                            Upper bound number of possible p-matches

                            Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                            Then there are at most n2j p-matches of PL in TL

                            Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                            O((n2j) middot j) = O(n)

                            Proof of Lemma

                            1st appearance of Pi1 Pij

                            PL Pi1 Pi2 Pij

                            TL a1 a2 aj

                            m-match

                            2

                            2

                            1

                            ja

                            j

                            ki

                            Lemmarsquos proof (cont)

                            Let x be the total number of p-matches in the text

                            The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                            ge (xjsup2)2

                            But There are overlaps How many

                            Lemmarsquos proof (cont)

                            For each text location at most j matches will count it Thereforehellip

                            Total count without overlaps ge

                            Clearly xmiddotj2 le n thus x le (2n)j

                            2

                            1

                            2

                            2

                            xjxjj

                            Open Problem

                            Give 1-d algorithm linear in run-length compressed text and pattern

                            • SCALED Pattern Matching
                            • Motivation
                            • Slide 3
                            • Model
                            • Types of Approximations
                            • Types of Approximation
                            • It seems daunting buthellip
                            • CPM 2003 Morelia Mexico
                            • Problem inherently inexact
                            • Definition
                            • Discrete exact Scaled Matching
                            • Slide 12
                            • Idea Fix a scale s
                            • Algorithm time
                            • Problem Real scales
                            • Formally
                            • Remark
                            • Simplify definition
                            • Why are definitions equivalent
                            • Time
                            • Definitions are Equivalent
                            • Naiumlve algorithm for matching PL in TL
                            • Check intersection
                            • Slide 24
                            • Improvement ndash Parameterized Matching
                            • Parameterized Matching
                            • Slide 27
                            • The reduction
                            • Possibility 1
                            • Possibility 2
                            • Algorithm for Real Scaled String Matching
                            • Example
                            • Important Fact
                            • Tighter analysis
                            • Proof of Lemma
                            • Lemmarsquos proof (cont)
                            • Slide 37
                            • Open Problem

                              Formally

                              nTT ||

                              mPrrrP aaa j

                              j ||21

                              21

                              aaaa crrc jjjj

                              121

                              121

                              1

                              rcrc jj

                              11

                              Denote a aaa aProblem Definition 1Input Pattern TextOutput All text locations where

                              appears for some

                              r timesr

                              Remark

                              α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                              loss of resolution

                              From ldquofar enoughrdquo away everything looks the same

                              By our definition for klt1m there is a match at every text location

                              Simplify definition

                              bcba4312 2

                              323

                              23

                              23

                              aaaa rrrrjj

                              jj

                              121121

                              Definition 2 Look for in the textExample P=aabcccbbbb

                              Match by definition 2 daaabccccbbbbbbe Match by definition 1

                              but not by def 2 daaaabccccbbbbbbbe

                              Why are definitions equivalent

                              Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                              Time

                              Time for split O(n+m)

                              Finding Ps in Ts O(n+m) (eg KMP)

                              HARD PART Finding PL in TL

                              Definitions are Equivalent

                              aa rrj

                              j

                              1212

                              Claim Solving def 2 in time O(f(n))

                              Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                              TLTotal time O(f(n)+n)=O(f(n))

                              Naiumlve algorithm for matching PL in TL

                              For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                              This is the interval of possible scales since

                              tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                              Check intersectionIf intersection of all intervals is not

                              empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                              The intersection is empty thus no scaled match in location 1 Buthellip

                              Check intersectionIf intersection of all intervals is not

                              empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                              The intersection is [7352) thus there is a scaled match in location 2

                              Improvement ndash Parameterized Matching

                              Introduced Baker 1994

                              Motivation ldquocopyingrdquo code

                              Parameterized Matching

                              Input two strings s and t |s|=|t| over alphabets sums and sumt

                              s parameterize matches t if bijection sums sumt such that (s) = t

                              exist

                              (a)=x

                              (b)=y

                              Π Π

                              ΠΠ

                              a ab b b

                              x xy y y

                              Example

                              Parameterized Matching

                              Claim (AFM-94)

                              For Σ that can be sorted in linear time (eg Σ=1 n)

                              Parameterized matching can be done in time O(n)

                              The reduction

                              1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                              Proof Assume PL does not p-match TL at

                              location i

                              The possible situations are

                              Possibility 1wlog c ge a+1

                              For c = a+1 (smallest possible)

                              TL

                              PL

                              a

                              b b

                              cnea

                              b

                              a

                              b

                              a

                              b

                              a

                              b

                              a 211

                              Possibility 2

                              wlog c ge b+1

                              Intersection not empty only if

                              (a+1)(b+1) gt ab ie

                              ab+b gt ab+a

                              bgta

                              But this can never happen if α ge 1

                              TL

                              PL

                              a

                              b cneb

                              a

                              1

                              11

                              1

                              b

                              a

                              b

                              a

                              b

                              a

                              b

                              a

                              Algorithm for Real Scaled String Matching

                              Let Pi1 Pi2 Pij be the different numbers in PL

                              1 P-match PL in TL2 For each match chack intersection

                              of intervals between Pi1 Pij and corresponding symbols in TL

                              End Algorithm

                              PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                              TL = 5 6 5 6 5 6 10 6 10 6 10 7

                              scaled match

                              Example

                              2133 32

                              21

                              3121 2232

                              3121 2255

                              3231

                              21 3333

                              Important Fact

                              So there are at most O(radicm) different Pikrsquos

                              Time O(n) for parameterized matching (Σ=12

                              hellipn) O(radicm) verification for each location Total O(nradicm)

                              mi

                              j

                              kP

                              k

                              1

                              Tighter analysis

                              Upper bound number of possible p-matches

                              Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                              Then there are at most n2j p-matches of PL in TL

                              Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                              O((n2j) middot j) = O(n)

                              Proof of Lemma

                              1st appearance of Pi1 Pij

                              PL Pi1 Pi2 Pij

                              TL a1 a2 aj

                              m-match

                              2

                              2

                              1

                              ja

                              j

                              ki

                              Lemmarsquos proof (cont)

                              Let x be the total number of p-matches in the text

                              The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                              ge (xjsup2)2

                              But There are overlaps How many

                              Lemmarsquos proof (cont)

                              For each text location at most j matches will count it Thereforehellip

                              Total count without overlaps ge

                              Clearly xmiddotj2 le n thus x le (2n)j

                              2

                              1

                              2

                              2

                              xjxjj

                              Open Problem

                              Give 1-d algorithm linear in run-length compressed text and pattern

                              • SCALED Pattern Matching
                              • Motivation
                              • Slide 3
                              • Model
                              • Types of Approximations
                              • Types of Approximation
                              • It seems daunting buthellip
                              • CPM 2003 Morelia Mexico
                              • Problem inherently inexact
                              • Definition
                              • Discrete exact Scaled Matching
                              • Slide 12
                              • Idea Fix a scale s
                              • Algorithm time
                              • Problem Real scales
                              • Formally
                              • Remark
                              • Simplify definition
                              • Why are definitions equivalent
                              • Time
                              • Definitions are Equivalent
                              • Naiumlve algorithm for matching PL in TL
                              • Check intersection
                              • Slide 24
                              • Improvement ndash Parameterized Matching
                              • Parameterized Matching
                              • Slide 27
                              • The reduction
                              • Possibility 1
                              • Possibility 2
                              • Algorithm for Real Scaled String Matching
                              • Example
                              • Important Fact
                              • Tighter analysis
                              • Proof of Lemma
                              • Lemmarsquos proof (cont)
                              • Slide 37
                              • Open Problem

                                Remark

                                α ge 1 means we only scale ldquouprdquoReasons Avoid conceptual problem of

                                loss of resolution

                                From ldquofar enoughrdquo away everything looks the same

                                By our definition for klt1m there is a match at every text location

                                Simplify definition

                                bcba4312 2

                                323

                                23

                                23

                                aaaa rrrrjj

                                jj

                                121121

                                Definition 2 Look for in the textExample P=aabcccbbbb

                                Match by definition 2 daaabccccbbbbbbe Match by definition 1

                                but not by def 2 daaaabccccbbbbbbbe

                                Why are definitions equivalent

                                Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                                Time

                                Time for split O(n+m)

                                Finding Ps in Ts O(n+m) (eg KMP)

                                HARD PART Finding PL in TL

                                Definitions are Equivalent

                                aa rrj

                                j

                                1212

                                Claim Solving def 2 in time O(f(n))

                                Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                                TLTotal time O(f(n)+n)=O(f(n))

                                Naiumlve algorithm for matching PL in TL

                                For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                                This is the interval of possible scales since

                                tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                                Check intersectionIf intersection of all intervals is not

                                empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                The intersection is empty thus no scaled match in location 1 Buthellip

                                Check intersectionIf intersection of all intervals is not

                                empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                The intersection is [7352) thus there is a scaled match in location 2

                                Improvement ndash Parameterized Matching

                                Introduced Baker 1994

                                Motivation ldquocopyingrdquo code

                                Parameterized Matching

                                Input two strings s and t |s|=|t| over alphabets sums and sumt

                                s parameterize matches t if bijection sums sumt such that (s) = t

                                exist

                                (a)=x

                                (b)=y

                                Π Π

                                ΠΠ

                                a ab b b

                                x xy y y

                                Example

                                Parameterized Matching

                                Claim (AFM-94)

                                For Σ that can be sorted in linear time (eg Σ=1 n)

                                Parameterized matching can be done in time O(n)

                                The reduction

                                1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                Proof Assume PL does not p-match TL at

                                location i

                                The possible situations are

                                Possibility 1wlog c ge a+1

                                For c = a+1 (smallest possible)

                                TL

                                PL

                                a

                                b b

                                cnea

                                b

                                a

                                b

                                a

                                b

                                a

                                b

                                a 211

                                Possibility 2

                                wlog c ge b+1

                                Intersection not empty only if

                                (a+1)(b+1) gt ab ie

                                ab+b gt ab+a

                                bgta

                                But this can never happen if α ge 1

                                TL

                                PL

                                a

                                b cneb

                                a

                                1

                                11

                                1

                                b

                                a

                                b

                                a

                                b

                                a

                                b

                                a

                                Algorithm for Real Scaled String Matching

                                Let Pi1 Pi2 Pij be the different numbers in PL

                                1 P-match PL in TL2 For each match chack intersection

                                of intervals between Pi1 Pij and corresponding symbols in TL

                                End Algorithm

                                PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                scaled match

                                Example

                                2133 32

                                21

                                3121 2232

                                3121 2255

                                3231

                                21 3333

                                Important Fact

                                So there are at most O(radicm) different Pikrsquos

                                Time O(n) for parameterized matching (Σ=12

                                hellipn) O(radicm) verification for each location Total O(nradicm)

                                mi

                                j

                                kP

                                k

                                1

                                Tighter analysis

                                Upper bound number of possible p-matches

                                Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                Then there are at most n2j p-matches of PL in TL

                                Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                O((n2j) middot j) = O(n)

                                Proof of Lemma

                                1st appearance of Pi1 Pij

                                PL Pi1 Pi2 Pij

                                TL a1 a2 aj

                                m-match

                                2

                                2

                                1

                                ja

                                j

                                ki

                                Lemmarsquos proof (cont)

                                Let x be the total number of p-matches in the text

                                The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                ge (xjsup2)2

                                But There are overlaps How many

                                Lemmarsquos proof (cont)

                                For each text location at most j matches will count it Thereforehellip

                                Total count without overlaps ge

                                Clearly xmiddotj2 le n thus x le (2n)j

                                2

                                1

                                2

                                2

                                xjxjj

                                Open Problem

                                Give 1-d algorithm linear in run-length compressed text and pattern

                                • SCALED Pattern Matching
                                • Motivation
                                • Slide 3
                                • Model
                                • Types of Approximations
                                • Types of Approximation
                                • It seems daunting buthellip
                                • CPM 2003 Morelia Mexico
                                • Problem inherently inexact
                                • Definition
                                • Discrete exact Scaled Matching
                                • Slide 12
                                • Idea Fix a scale s
                                • Algorithm time
                                • Problem Real scales
                                • Formally
                                • Remark
                                • Simplify definition
                                • Why are definitions equivalent
                                • Time
                                • Definitions are Equivalent
                                • Naiumlve algorithm for matching PL in TL
                                • Check intersection
                                • Slide 24
                                • Improvement ndash Parameterized Matching
                                • Parameterized Matching
                                • Slide 27
                                • The reduction
                                • Possibility 1
                                • Possibility 2
                                • Algorithm for Real Scaled String Matching
                                • Example
                                • Important Fact
                                • Tighter analysis
                                • Proof of Lemma
                                • Lemmarsquos proof (cont)
                                • Slide 37
                                • Open Problem

                                  Simplify definition

                                  bcba4312 2

                                  323

                                  23

                                  23

                                  aaaa rrrrjj

                                  jj

                                  121121

                                  Definition 2 Look for in the textExample P=aabcccbbbb

                                  Match by definition 2 daaabccccbbbbbbe Match by definition 1

                                  but not by def 2 daaaabccccbbbbbbbe

                                  Why are definitions equivalent

                                  Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                                  Time

                                  Time for split O(n+m)

                                  Finding Ps in Ts O(n+m) (eg KMP)

                                  HARD PART Finding PL in TL

                                  Definitions are Equivalent

                                  aa rrj

                                  j

                                  1212

                                  Claim Solving def 2 in time O(f(n))

                                  Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                                  TLTotal time O(f(n)+n)=O(f(n))

                                  Naiumlve algorithm for matching PL in TL

                                  For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                                  This is the interval of possible scales since

                                  tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                                  Check intersectionIf intersection of all intervals is not

                                  empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                  The intersection is empty thus no scaled match in location 1 Buthellip

                                  Check intersectionIf intersection of all intervals is not

                                  empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                  The intersection is [7352) thus there is a scaled match in location 2

                                  Improvement ndash Parameterized Matching

                                  Introduced Baker 1994

                                  Motivation ldquocopyingrdquo code

                                  Parameterized Matching

                                  Input two strings s and t |s|=|t| over alphabets sums and sumt

                                  s parameterize matches t if bijection sums sumt such that (s) = t

                                  exist

                                  (a)=x

                                  (b)=y

                                  Π Π

                                  ΠΠ

                                  a ab b b

                                  x xy y y

                                  Example

                                  Parameterized Matching

                                  Claim (AFM-94)

                                  For Σ that can be sorted in linear time (eg Σ=1 n)

                                  Parameterized matching can be done in time O(n)

                                  The reduction

                                  1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                  Proof Assume PL does not p-match TL at

                                  location i

                                  The possible situations are

                                  Possibility 1wlog c ge a+1

                                  For c = a+1 (smallest possible)

                                  TL

                                  PL

                                  a

                                  b b

                                  cnea

                                  b

                                  a

                                  b

                                  a

                                  b

                                  a

                                  b

                                  a 211

                                  Possibility 2

                                  wlog c ge b+1

                                  Intersection not empty only if

                                  (a+1)(b+1) gt ab ie

                                  ab+b gt ab+a

                                  bgta

                                  But this can never happen if α ge 1

                                  TL

                                  PL

                                  a

                                  b cneb

                                  a

                                  1

                                  11

                                  1

                                  b

                                  a

                                  b

                                  a

                                  b

                                  a

                                  b

                                  a

                                  Algorithm for Real Scaled String Matching

                                  Let Pi1 Pi2 Pij be the different numbers in PL

                                  1 P-match PL in TL2 For each match chack intersection

                                  of intervals between Pi1 Pij and corresponding symbols in TL

                                  End Algorithm

                                  PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                  TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                  scaled match

                                  Example

                                  2133 32

                                  21

                                  3121 2232

                                  3121 2255

                                  3231

                                  21 3333

                                  Important Fact

                                  So there are at most O(radicm) different Pikrsquos

                                  Time O(n) for parameterized matching (Σ=12

                                  hellipn) O(radicm) verification for each location Total O(nradicm)

                                  mi

                                  j

                                  kP

                                  k

                                  1

                                  Tighter analysis

                                  Upper bound number of possible p-matches

                                  Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                  Then there are at most n2j p-matches of PL in TL

                                  Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                  O((n2j) middot j) = O(n)

                                  Proof of Lemma

                                  1st appearance of Pi1 Pij

                                  PL Pi1 Pi2 Pij

                                  TL a1 a2 aj

                                  m-match

                                  2

                                  2

                                  1

                                  ja

                                  j

                                  ki

                                  Lemmarsquos proof (cont)

                                  Let x be the total number of p-matches in the text

                                  The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                  ge (xjsup2)2

                                  But There are overlaps How many

                                  Lemmarsquos proof (cont)

                                  For each text location at most j matches will count it Thereforehellip

                                  Total count without overlaps ge

                                  Clearly xmiddotj2 le n thus x le (2n)j

                                  2

                                  1

                                  2

                                  2

                                  xjxjj

                                  Open Problem

                                  Give 1-d algorithm linear in run-length compressed text and pattern

                                  • SCALED Pattern Matching
                                  • Motivation
                                  • Slide 3
                                  • Model
                                  • Types of Approximations
                                  • Types of Approximation
                                  • It seems daunting buthellip
                                  • CPM 2003 Morelia Mexico
                                  • Problem inherently inexact
                                  • Definition
                                  • Discrete exact Scaled Matching
                                  • Slide 12
                                  • Idea Fix a scale s
                                  • Algorithm time
                                  • Problem Real scales
                                  • Formally
                                  • Remark
                                  • Simplify definition
                                  • Why are definitions equivalent
                                  • Time
                                  • Definitions are Equivalent
                                  • Naiumlve algorithm for matching PL in TL
                                  • Check intersection
                                  • Slide 24
                                  • Improvement ndash Parameterized Matching
                                  • Parameterized Matching
                                  • Slide 27
                                  • The reduction
                                  • Possibility 1
                                  • Possibility 2
                                  • Algorithm for Real Scaled String Matching
                                  • Example
                                  • Important Fact
                                  • Tighter analysis
                                  • Proof of Lemma
                                  • Lemmarsquos proof (cont)
                                  • Slide 37
                                  • Open Problem

                                    Why are definitions equivalent

                                    Split text and pattern to symbol part Ts Ps and length part TL PLExample P= aabcccbbbb Ps=abcb PL=2134 T=daaabccccbbbbbbe Ts=dabcbe TL=131461

                                    Time

                                    Time for split O(n+m)

                                    Finding Ps in Ts O(n+m) (eg KMP)

                                    HARD PART Finding PL in TL

                                    Definitions are Equivalent

                                    aa rrj

                                    j

                                    1212

                                    Claim Solving def 2 in time O(f(n))

                                    Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                                    TLTotal time O(f(n)+n)=O(f(n))

                                    Naiumlve algorithm for matching PL in TL

                                    For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                                    This is the interval of possible scales since

                                    tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                                    Check intersectionIf intersection of all intervals is not

                                    empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                    The intersection is empty thus no scaled match in location 1 Buthellip

                                    Check intersectionIf intersection of all intervals is not

                                    empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                    The intersection is [7352) thus there is a scaled match in location 2

                                    Improvement ndash Parameterized Matching

                                    Introduced Baker 1994

                                    Motivation ldquocopyingrdquo code

                                    Parameterized Matching

                                    Input two strings s and t |s|=|t| over alphabets sums and sumt

                                    s parameterize matches t if bijection sums sumt such that (s) = t

                                    exist

                                    (a)=x

                                    (b)=y

                                    Π Π

                                    ΠΠ

                                    a ab b b

                                    x xy y y

                                    Example

                                    Parameterized Matching

                                    Claim (AFM-94)

                                    For Σ that can be sorted in linear time (eg Σ=1 n)

                                    Parameterized matching can be done in time O(n)

                                    The reduction

                                    1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                    Proof Assume PL does not p-match TL at

                                    location i

                                    The possible situations are

                                    Possibility 1wlog c ge a+1

                                    For c = a+1 (smallest possible)

                                    TL

                                    PL

                                    a

                                    b b

                                    cnea

                                    b

                                    a

                                    b

                                    a

                                    b

                                    a

                                    b

                                    a 211

                                    Possibility 2

                                    wlog c ge b+1

                                    Intersection not empty only if

                                    (a+1)(b+1) gt ab ie

                                    ab+b gt ab+a

                                    bgta

                                    But this can never happen if α ge 1

                                    TL

                                    PL

                                    a

                                    b cneb

                                    a

                                    1

                                    11

                                    1

                                    b

                                    a

                                    b

                                    a

                                    b

                                    a

                                    b

                                    a

                                    Algorithm for Real Scaled String Matching

                                    Let Pi1 Pi2 Pij be the different numbers in PL

                                    1 P-match PL in TL2 For each match chack intersection

                                    of intervals between Pi1 Pij and corresponding symbols in TL

                                    End Algorithm

                                    PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                    TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                    scaled match

                                    Example

                                    2133 32

                                    21

                                    3121 2232

                                    3121 2255

                                    3231

                                    21 3333

                                    Important Fact

                                    So there are at most O(radicm) different Pikrsquos

                                    Time O(n) for parameterized matching (Σ=12

                                    hellipn) O(radicm) verification for each location Total O(nradicm)

                                    mi

                                    j

                                    kP

                                    k

                                    1

                                    Tighter analysis

                                    Upper bound number of possible p-matches

                                    Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                    Then there are at most n2j p-matches of PL in TL

                                    Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                    O((n2j) middot j) = O(n)

                                    Proof of Lemma

                                    1st appearance of Pi1 Pij

                                    PL Pi1 Pi2 Pij

                                    TL a1 a2 aj

                                    m-match

                                    2

                                    2

                                    1

                                    ja

                                    j

                                    ki

                                    Lemmarsquos proof (cont)

                                    Let x be the total number of p-matches in the text

                                    The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                    ge (xjsup2)2

                                    But There are overlaps How many

                                    Lemmarsquos proof (cont)

                                    For each text location at most j matches will count it Thereforehellip

                                    Total count without overlaps ge

                                    Clearly xmiddotj2 le n thus x le (2n)j

                                    2

                                    1

                                    2

                                    2

                                    xjxjj

                                    Open Problem

                                    Give 1-d algorithm linear in run-length compressed text and pattern

                                    • SCALED Pattern Matching
                                    • Motivation
                                    • Slide 3
                                    • Model
                                    • Types of Approximations
                                    • Types of Approximation
                                    • It seems daunting buthellip
                                    • CPM 2003 Morelia Mexico
                                    • Problem inherently inexact
                                    • Definition
                                    • Discrete exact Scaled Matching
                                    • Slide 12
                                    • Idea Fix a scale s
                                    • Algorithm time
                                    • Problem Real scales
                                    • Formally
                                    • Remark
                                    • Simplify definition
                                    • Why are definitions equivalent
                                    • Time
                                    • Definitions are Equivalent
                                    • Naiumlve algorithm for matching PL in TL
                                    • Check intersection
                                    • Slide 24
                                    • Improvement ndash Parameterized Matching
                                    • Parameterized Matching
                                    • Slide 27
                                    • The reduction
                                    • Possibility 1
                                    • Possibility 2
                                    • Algorithm for Real Scaled String Matching
                                    • Example
                                    • Important Fact
                                    • Tighter analysis
                                    • Proof of Lemma
                                    • Lemmarsquos proof (cont)
                                    • Slide 37
                                    • Open Problem

                                      Time

                                      Time for split O(n+m)

                                      Finding Ps in Ts O(n+m) (eg KMP)

                                      HARD PART Finding PL in TL

                                      Definitions are Equivalent

                                      aa rrj

                                      j

                                      1212

                                      Claim Solving def 2 in time O(f(n))

                                      Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                                      TLTotal time O(f(n)+n)=O(f(n))

                                      Naiumlve algorithm for matching PL in TL

                                      For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                                      This is the interval of possible scales since

                                      tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                                      Check intersectionIf intersection of all intervals is not

                                      empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                      The intersection is empty thus no scaled match in location 1 Buthellip

                                      Check intersectionIf intersection of all intervals is not

                                      empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                      The intersection is [7352) thus there is a scaled match in location 2

                                      Improvement ndash Parameterized Matching

                                      Introduced Baker 1994

                                      Motivation ldquocopyingrdquo code

                                      Parameterized Matching

                                      Input two strings s and t |s|=|t| over alphabets sums and sumt

                                      s parameterize matches t if bijection sums sumt such that (s) = t

                                      exist

                                      (a)=x

                                      (b)=y

                                      Π Π

                                      ΠΠ

                                      a ab b b

                                      x xy y y

                                      Example

                                      Parameterized Matching

                                      Claim (AFM-94)

                                      For Σ that can be sorted in linear time (eg Σ=1 n)

                                      Parameterized matching can be done in time O(n)

                                      The reduction

                                      1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                      Proof Assume PL does not p-match TL at

                                      location i

                                      The possible situations are

                                      Possibility 1wlog c ge a+1

                                      For c = a+1 (smallest possible)

                                      TL

                                      PL

                                      a

                                      b b

                                      cnea

                                      b

                                      a

                                      b

                                      a

                                      b

                                      a

                                      b

                                      a 211

                                      Possibility 2

                                      wlog c ge b+1

                                      Intersection not empty only if

                                      (a+1)(b+1) gt ab ie

                                      ab+b gt ab+a

                                      bgta

                                      But this can never happen if α ge 1

                                      TL

                                      PL

                                      a

                                      b cneb

                                      a

                                      1

                                      11

                                      1

                                      b

                                      a

                                      b

                                      a

                                      b

                                      a

                                      b

                                      a

                                      Algorithm for Real Scaled String Matching

                                      Let Pi1 Pi2 Pij be the different numbers in PL

                                      1 P-match PL in TL2 For each match chack intersection

                                      of intervals between Pi1 Pij and corresponding symbols in TL

                                      End Algorithm

                                      PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                      TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                      scaled match

                                      Example

                                      2133 32

                                      21

                                      3121 2232

                                      3121 2255

                                      3231

                                      21 3333

                                      Important Fact

                                      So there are at most O(radicm) different Pikrsquos

                                      Time O(n) for parameterized matching (Σ=12

                                      hellipn) O(radicm) verification for each location Total O(nradicm)

                                      mi

                                      j

                                      kP

                                      k

                                      1

                                      Tighter analysis

                                      Upper bound number of possible p-matches

                                      Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                      Then there are at most n2j p-matches of PL in TL

                                      Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                      O((n2j) middot j) = O(n)

                                      Proof of Lemma

                                      1st appearance of Pi1 Pij

                                      PL Pi1 Pi2 Pij

                                      TL a1 a2 aj

                                      m-match

                                      2

                                      2

                                      1

                                      ja

                                      j

                                      ki

                                      Lemmarsquos proof (cont)

                                      Let x be the total number of p-matches in the text

                                      The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                      ge (xjsup2)2

                                      But There are overlaps How many

                                      Lemmarsquos proof (cont)

                                      For each text location at most j matches will count it Thereforehellip

                                      Total count without overlaps ge

                                      Clearly xmiddotj2 le n thus x le (2n)j

                                      2

                                      1

                                      2

                                      2

                                      xjxjj

                                      Open Problem

                                      Give 1-d algorithm linear in run-length compressed text and pattern

                                      • SCALED Pattern Matching
                                      • Motivation
                                      • Slide 3
                                      • Model
                                      • Types of Approximations
                                      • Types of Approximation
                                      • It seems daunting buthellip
                                      • CPM 2003 Morelia Mexico
                                      • Problem inherently inexact
                                      • Definition
                                      • Discrete exact Scaled Matching
                                      • Slide 12
                                      • Idea Fix a scale s
                                      • Algorithm time
                                      • Problem Real scales
                                      • Formally
                                      • Remark
                                      • Simplify definition
                                      • Why are definitions equivalent
                                      • Time
                                      • Definitions are Equivalent
                                      • Naiumlve algorithm for matching PL in TL
                                      • Check intersection
                                      • Slide 24
                                      • Improvement ndash Parameterized Matching
                                      • Parameterized Matching
                                      • Slide 27
                                      • The reduction
                                      • Possibility 1
                                      • Possibility 2
                                      • Algorithm for Real Scaled String Matching
                                      • Example
                                      • Important Fact
                                      • Tighter analysis
                                      • Proof of Lemma
                                      • Lemmarsquos proof (cont)
                                      • Slide 37
                                      • Open Problem

                                        Definitions are Equivalent

                                        aa rrj

                                        j

                                        1212

                                        Claim Solving def 2 in time O(f(n))

                                        Solving def 1 in time O(f(n))Why - Find in time O(f(n)) - For each match verify 1st and last symbol in constant time in Ts and

                                        TLTotal time O(f(n)+n)=O(f(n))

                                        Naiumlve algorithm for matching PL in TL

                                        For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                                        This is the interval of possible scales since

                                        tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                                        Check intersectionIf intersection of all intervals is not

                                        empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                        The intersection is empty thus no scaled match in location 1 Buthellip

                                        Check intersectionIf intersection of all intervals is not

                                        empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                        The intersection is [7352) thus there is a scaled match in location 2

                                        Improvement ndash Parameterized Matching

                                        Introduced Baker 1994

                                        Motivation ldquocopyingrdquo code

                                        Parameterized Matching

                                        Input two strings s and t |s|=|t| over alphabets sums and sumt

                                        s parameterize matches t if bijection sums sumt such that (s) = t

                                        exist

                                        (a)=x

                                        (b)=y

                                        Π Π

                                        ΠΠ

                                        a ab b b

                                        x xy y y

                                        Example

                                        Parameterized Matching

                                        Claim (AFM-94)

                                        For Σ that can be sorted in linear time (eg Σ=1 n)

                                        Parameterized matching can be done in time O(n)

                                        The reduction

                                        1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                        Proof Assume PL does not p-match TL at

                                        location i

                                        The possible situations are

                                        Possibility 1wlog c ge a+1

                                        For c = a+1 (smallest possible)

                                        TL

                                        PL

                                        a

                                        b b

                                        cnea

                                        b

                                        a

                                        b

                                        a

                                        b

                                        a

                                        b

                                        a 211

                                        Possibility 2

                                        wlog c ge b+1

                                        Intersection not empty only if

                                        (a+1)(b+1) gt ab ie

                                        ab+b gt ab+a

                                        bgta

                                        But this can never happen if α ge 1

                                        TL

                                        PL

                                        a

                                        b cneb

                                        a

                                        1

                                        11

                                        1

                                        b

                                        a

                                        b

                                        a

                                        b

                                        a

                                        b

                                        a

                                        Algorithm for Real Scaled String Matching

                                        Let Pi1 Pi2 Pij be the different numbers in PL

                                        1 P-match PL in TL2 For each match chack intersection

                                        of intervals between Pi1 Pij and corresponding symbols in TL

                                        End Algorithm

                                        PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                        TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                        scaled match

                                        Example

                                        2133 32

                                        21

                                        3121 2232

                                        3121 2255

                                        3231

                                        21 3333

                                        Important Fact

                                        So there are at most O(radicm) different Pikrsquos

                                        Time O(n) for parameterized matching (Σ=12

                                        hellipn) O(radicm) verification for each location Total O(nradicm)

                                        mi

                                        j

                                        kP

                                        k

                                        1

                                        Tighter analysis

                                        Upper bound number of possible p-matches

                                        Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                        Then there are at most n2j p-matches of PL in TL

                                        Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                        O((n2j) middot j) = O(n)

                                        Proof of Lemma

                                        1st appearance of Pi1 Pij

                                        PL Pi1 Pi2 Pij

                                        TL a1 a2 aj

                                        m-match

                                        2

                                        2

                                        1

                                        ja

                                        j

                                        ki

                                        Lemmarsquos proof (cont)

                                        Let x be the total number of p-matches in the text

                                        The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                        ge (xjsup2)2

                                        But There are overlaps How many

                                        Lemmarsquos proof (cont)

                                        For each text location at most j matches will count it Thereforehellip

                                        Total count without overlaps ge

                                        Clearly xmiddotj2 le n thus x le (2n)j

                                        2

                                        1

                                        2

                                        2

                                        xjxjj

                                        Open Problem

                                        Give 1-d algorithm linear in run-length compressed text and pattern

                                        • SCALED Pattern Matching
                                        • Motivation
                                        • Slide 3
                                        • Model
                                        • Types of Approximations
                                        • Types of Approximation
                                        • It seems daunting buthellip
                                        • CPM 2003 Morelia Mexico
                                        • Problem inherently inexact
                                        • Definition
                                        • Discrete exact Scaled Matching
                                        • Slide 12
                                        • Idea Fix a scale s
                                        • Algorithm time
                                        • Problem Real scales
                                        • Formally
                                        • Remark
                                        • Simplify definition
                                        • Why are definitions equivalent
                                        • Time
                                        • Definitions are Equivalent
                                        • Naiumlve algorithm for matching PL in TL
                                        • Check intersection
                                        • Slide 24
                                        • Improvement ndash Parameterized Matching
                                        • Parameterized Matching
                                        • Slide 27
                                        • The reduction
                                        • Possibility 1
                                        • Possibility 2
                                        • Algorithm for Real Scaled String Matching
                                        • Example
                                        • Important Fact
                                        • Tighter analysis
                                        • Proof of Lemma
                                        • Lemmarsquos proof (cont)
                                        • Slide 37
                                        • Open Problem

                                          Naiumlve algorithm for matching PL in TL

                                          For each text location position pattern starting at that location and calculate interval [tp (t+1)p) for each resulting lttext patterngt pair

                                          This is the interval of possible scales since

                                          tpp = t for every α lt tp |αp| lt t(t+1)p p = t+1 for every α ge tp |αp| gt t

                                          Check intersectionIf intersection of all intervals is not

                                          empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                          The intersection is empty thus no scaled match in location 1 Buthellip

                                          Check intersectionIf intersection of all intervals is not

                                          empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                          The intersection is [7352) thus there is a scaled match in location 2

                                          Improvement ndash Parameterized Matching

                                          Introduced Baker 1994

                                          Motivation ldquocopyingrdquo code

                                          Parameterized Matching

                                          Input two strings s and t |s|=|t| over alphabets sums and sumt

                                          s parameterize matches t if bijection sums sumt such that (s) = t

                                          exist

                                          (a)=x

                                          (b)=y

                                          Π Π

                                          ΠΠ

                                          a ab b b

                                          x xy y y

                                          Example

                                          Parameterized Matching

                                          Claim (AFM-94)

                                          For Σ that can be sorted in linear time (eg Σ=1 n)

                                          Parameterized matching can be done in time O(n)

                                          The reduction

                                          1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                          Proof Assume PL does not p-match TL at

                                          location i

                                          The possible situations are

                                          Possibility 1wlog c ge a+1

                                          For c = a+1 (smallest possible)

                                          TL

                                          PL

                                          a

                                          b b

                                          cnea

                                          b

                                          a

                                          b

                                          a

                                          b

                                          a

                                          b

                                          a 211

                                          Possibility 2

                                          wlog c ge b+1

                                          Intersection not empty only if

                                          (a+1)(b+1) gt ab ie

                                          ab+b gt ab+a

                                          bgta

                                          But this can never happen if α ge 1

                                          TL

                                          PL

                                          a

                                          b cneb

                                          a

                                          1

                                          11

                                          1

                                          b

                                          a

                                          b

                                          a

                                          b

                                          a

                                          b

                                          a

                                          Algorithm for Real Scaled String Matching

                                          Let Pi1 Pi2 Pij be the different numbers in PL

                                          1 P-match PL in TL2 For each match chack intersection

                                          of intervals between Pi1 Pij and corresponding symbols in TL

                                          End Algorithm

                                          PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                          TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                          scaled match

                                          Example

                                          2133 32

                                          21

                                          3121 2232

                                          3121 2255

                                          3231

                                          21 3333

                                          Important Fact

                                          So there are at most O(radicm) different Pikrsquos

                                          Time O(n) for parameterized matching (Σ=12

                                          hellipn) O(radicm) verification for each location Total O(nradicm)

                                          mi

                                          j

                                          kP

                                          k

                                          1

                                          Tighter analysis

                                          Upper bound number of possible p-matches

                                          Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                          Then there are at most n2j p-matches of PL in TL

                                          Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                          O((n2j) middot j) = O(n)

                                          Proof of Lemma

                                          1st appearance of Pi1 Pij

                                          PL Pi1 Pi2 Pij

                                          TL a1 a2 aj

                                          m-match

                                          2

                                          2

                                          1

                                          ja

                                          j

                                          ki

                                          Lemmarsquos proof (cont)

                                          Let x be the total number of p-matches in the text

                                          The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                          ge (xjsup2)2

                                          But There are overlaps How many

                                          Lemmarsquos proof (cont)

                                          For each text location at most j matches will count it Thereforehellip

                                          Total count without overlaps ge

                                          Clearly xmiddotj2 le n thus x le (2n)j

                                          2

                                          1

                                          2

                                          2

                                          xjxjj

                                          Open Problem

                                          Give 1-d algorithm linear in run-length compressed text and pattern

                                          • SCALED Pattern Matching
                                          • Motivation
                                          • Slide 3
                                          • Model
                                          • Types of Approximations
                                          • Types of Approximation
                                          • It seems daunting buthellip
                                          • CPM 2003 Morelia Mexico
                                          • Problem inherently inexact
                                          • Definition
                                          • Discrete exact Scaled Matching
                                          • Slide 12
                                          • Idea Fix a scale s
                                          • Algorithm time
                                          • Problem Real scales
                                          • Formally
                                          • Remark
                                          • Simplify definition
                                          • Why are definitions equivalent
                                          • Time
                                          • Definitions are Equivalent
                                          • Naiumlve algorithm for matching PL in TL
                                          • Check intersection
                                          • Slide 24
                                          • Improvement ndash Parameterized Matching
                                          • Parameterized Matching
                                          • Slide 27
                                          • The reduction
                                          • Possibility 1
                                          • Possibility 2
                                          • Algorithm for Real Scaled String Matching
                                          • Example
                                          • Important Fact
                                          • Tighter analysis
                                          • Proof of Lemma
                                          • Lemmarsquos proof (cont)
                                          • Slide 37
                                          • Open Problem

                                            Check intersectionIf intersection of all intervals is not

                                            empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [132) [45)

                                            The intersection is empty thus no scaled match in location 1 Buthellip

                                            Check intersectionIf intersection of all intervals is not

                                            empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                            The intersection is [7352) thus there is a scaled match in location 2

                                            Improvement ndash Parameterized Matching

                                            Introduced Baker 1994

                                            Motivation ldquocopyingrdquo code

                                            Parameterized Matching

                                            Input two strings s and t |s|=|t| over alphabets sums and sumt

                                            s parameterize matches t if bijection sums sumt such that (s) = t

                                            exist

                                            (a)=x

                                            (b)=y

                                            Π Π

                                            ΠΠ

                                            a ab b b

                                            x xy y y

                                            Example

                                            Parameterized Matching

                                            Claim (AFM-94)

                                            For Σ that can be sorted in linear time (eg Σ=1 n)

                                            Parameterized matching can be done in time O(n)

                                            The reduction

                                            1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                            Proof Assume PL does not p-match TL at

                                            location i

                                            The possible situations are

                                            Possibility 1wlog c ge a+1

                                            For c = a+1 (smallest possible)

                                            TL

                                            PL

                                            a

                                            b b

                                            cnea

                                            b

                                            a

                                            b

                                            a

                                            b

                                            a

                                            b

                                            a 211

                                            Possibility 2

                                            wlog c ge b+1

                                            Intersection not empty only if

                                            (a+1)(b+1) gt ab ie

                                            ab+b gt ab+a

                                            bgta

                                            But this can never happen if α ge 1

                                            TL

                                            PL

                                            a

                                            b cneb

                                            a

                                            1

                                            11

                                            1

                                            b

                                            a

                                            b

                                            a

                                            b

                                            a

                                            b

                                            a

                                            Algorithm for Real Scaled String Matching

                                            Let Pi1 Pi2 Pij be the different numbers in PL

                                            1 P-match PL in TL2 For each match chack intersection

                                            of intervals between Pi1 Pij and corresponding symbols in TL

                                            End Algorithm

                                            PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                            TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                            scaled match

                                            Example

                                            2133 32

                                            21

                                            3121 2232

                                            3121 2255

                                            3231

                                            21 3333

                                            Important Fact

                                            So there are at most O(radicm) different Pikrsquos

                                            Time O(n) for parameterized matching (Σ=12

                                            hellipn) O(radicm) verification for each location Total O(nradicm)

                                            mi

                                            j

                                            kP

                                            k

                                            1

                                            Tighter analysis

                                            Upper bound number of possible p-matches

                                            Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                            Then there are at most n2j p-matches of PL in TL

                                            Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                            O((n2j) middot j) = O(n)

                                            Proof of Lemma

                                            1st appearance of Pi1 Pij

                                            PL Pi1 Pi2 Pij

                                            TL a1 a2 aj

                                            m-match

                                            2

                                            2

                                            1

                                            ja

                                            j

                                            ki

                                            Lemmarsquos proof (cont)

                                            Let x be the total number of p-matches in the text

                                            The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                            ge (xjsup2)2

                                            But There are overlaps How many

                                            Lemmarsquos proof (cont)

                                            For each text location at most j matches will count it Thereforehellip

                                            Total count without overlaps ge

                                            Clearly xmiddotj2 le n thus x le (2n)j

                                            2

                                            1

                                            2

                                            2

                                            xjxjj

                                            Open Problem

                                            Give 1-d algorithm linear in run-length compressed text and pattern

                                            • SCALED Pattern Matching
                                            • Motivation
                                            • Slide 3
                                            • Model
                                            • Types of Approximations
                                            • Types of Approximation
                                            • It seems daunting buthellip
                                            • CPM 2003 Morelia Mexico
                                            • Problem inherently inexact
                                            • Definition
                                            • Discrete exact Scaled Matching
                                            • Slide 12
                                            • Idea Fix a scale s
                                            • Algorithm time
                                            • Problem Real scales
                                            • Formally
                                            • Remark
                                            • Simplify definition
                                            • Why are definitions equivalent
                                            • Time
                                            • Definitions are Equivalent
                                            • Naiumlve algorithm for matching PL in TL
                                            • Check intersection
                                            • Slide 24
                                            • Improvement ndash Parameterized Matching
                                            • Parameterized Matching
                                            • Slide 27
                                            • The reduction
                                            • Possibility 1
                                            • Possibility 2
                                            • Algorithm for Real Scaled String Matching
                                            • Example
                                            • Important Fact
                                            • Tighter analysis
                                            • Proof of Lemma
                                            • Lemmarsquos proof (cont)
                                            • Slide 37
                                            • Open Problem

                                              Check intersectionIf intersection of all intervals is not

                                              empty then there is a matchTime O(nm)ExamplePL 2 1 2 3 2TL 2 4 2 4 7 4 5 3 [252) [23) [252)[7383)[252)

                                              The intersection is [7352) thus there is a scaled match in location 2

                                              Improvement ndash Parameterized Matching

                                              Introduced Baker 1994

                                              Motivation ldquocopyingrdquo code

                                              Parameterized Matching

                                              Input two strings s and t |s|=|t| over alphabets sums and sumt

                                              s parameterize matches t if bijection sums sumt such that (s) = t

                                              exist

                                              (a)=x

                                              (b)=y

                                              Π Π

                                              ΠΠ

                                              a ab b b

                                              x xy y y

                                              Example

                                              Parameterized Matching

                                              Claim (AFM-94)

                                              For Σ that can be sorted in linear time (eg Σ=1 n)

                                              Parameterized matching can be done in time O(n)

                                              The reduction

                                              1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                              Proof Assume PL does not p-match TL at

                                              location i

                                              The possible situations are

                                              Possibility 1wlog c ge a+1

                                              For c = a+1 (smallest possible)

                                              TL

                                              PL

                                              a

                                              b b

                                              cnea

                                              b

                                              a

                                              b

                                              a

                                              b

                                              a

                                              b

                                              a 211

                                              Possibility 2

                                              wlog c ge b+1

                                              Intersection not empty only if

                                              (a+1)(b+1) gt ab ie

                                              ab+b gt ab+a

                                              bgta

                                              But this can never happen if α ge 1

                                              TL

                                              PL

                                              a

                                              b cneb

                                              a

                                              1

                                              11

                                              1

                                              b

                                              a

                                              b

                                              a

                                              b

                                              a

                                              b

                                              a

                                              Algorithm for Real Scaled String Matching

                                              Let Pi1 Pi2 Pij be the different numbers in PL

                                              1 P-match PL in TL2 For each match chack intersection

                                              of intervals between Pi1 Pij and corresponding symbols in TL

                                              End Algorithm

                                              PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                              TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                              scaled match

                                              Example

                                              2133 32

                                              21

                                              3121 2232

                                              3121 2255

                                              3231

                                              21 3333

                                              Important Fact

                                              So there are at most O(radicm) different Pikrsquos

                                              Time O(n) for parameterized matching (Σ=12

                                              hellipn) O(radicm) verification for each location Total O(nradicm)

                                              mi

                                              j

                                              kP

                                              k

                                              1

                                              Tighter analysis

                                              Upper bound number of possible p-matches

                                              Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                              Then there are at most n2j p-matches of PL in TL

                                              Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                              O((n2j) middot j) = O(n)

                                              Proof of Lemma

                                              1st appearance of Pi1 Pij

                                              PL Pi1 Pi2 Pij

                                              TL a1 a2 aj

                                              m-match

                                              2

                                              2

                                              1

                                              ja

                                              j

                                              ki

                                              Lemmarsquos proof (cont)

                                              Let x be the total number of p-matches in the text

                                              The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                              ge (xjsup2)2

                                              But There are overlaps How many

                                              Lemmarsquos proof (cont)

                                              For each text location at most j matches will count it Thereforehellip

                                              Total count without overlaps ge

                                              Clearly xmiddotj2 le n thus x le (2n)j

                                              2

                                              1

                                              2

                                              2

                                              xjxjj

                                              Open Problem

                                              Give 1-d algorithm linear in run-length compressed text and pattern

                                              • SCALED Pattern Matching
                                              • Motivation
                                              • Slide 3
                                              • Model
                                              • Types of Approximations
                                              • Types of Approximation
                                              • It seems daunting buthellip
                                              • CPM 2003 Morelia Mexico
                                              • Problem inherently inexact
                                              • Definition
                                              • Discrete exact Scaled Matching
                                              • Slide 12
                                              • Idea Fix a scale s
                                              • Algorithm time
                                              • Problem Real scales
                                              • Formally
                                              • Remark
                                              • Simplify definition
                                              • Why are definitions equivalent
                                              • Time
                                              • Definitions are Equivalent
                                              • Naiumlve algorithm for matching PL in TL
                                              • Check intersection
                                              • Slide 24
                                              • Improvement ndash Parameterized Matching
                                              • Parameterized Matching
                                              • Slide 27
                                              • The reduction
                                              • Possibility 1
                                              • Possibility 2
                                              • Algorithm for Real Scaled String Matching
                                              • Example
                                              • Important Fact
                                              • Tighter analysis
                                              • Proof of Lemma
                                              • Lemmarsquos proof (cont)
                                              • Slide 37
                                              • Open Problem

                                                Improvement ndash Parameterized Matching

                                                Introduced Baker 1994

                                                Motivation ldquocopyingrdquo code

                                                Parameterized Matching

                                                Input two strings s and t |s|=|t| over alphabets sums and sumt

                                                s parameterize matches t if bijection sums sumt such that (s) = t

                                                exist

                                                (a)=x

                                                (b)=y

                                                Π Π

                                                ΠΠ

                                                a ab b b

                                                x xy y y

                                                Example

                                                Parameterized Matching

                                                Claim (AFM-94)

                                                For Σ that can be sorted in linear time (eg Σ=1 n)

                                                Parameterized matching can be done in time O(n)

                                                The reduction

                                                1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                                Proof Assume PL does not p-match TL at

                                                location i

                                                The possible situations are

                                                Possibility 1wlog c ge a+1

                                                For c = a+1 (smallest possible)

                                                TL

                                                PL

                                                a

                                                b b

                                                cnea

                                                b

                                                a

                                                b

                                                a

                                                b

                                                a

                                                b

                                                a 211

                                                Possibility 2

                                                wlog c ge b+1

                                                Intersection not empty only if

                                                (a+1)(b+1) gt ab ie

                                                ab+b gt ab+a

                                                bgta

                                                But this can never happen if α ge 1

                                                TL

                                                PL

                                                a

                                                b cneb

                                                a

                                                1

                                                11

                                                1

                                                b

                                                a

                                                b

                                                a

                                                b

                                                a

                                                b

                                                a

                                                Algorithm for Real Scaled String Matching

                                                Let Pi1 Pi2 Pij be the different numbers in PL

                                                1 P-match PL in TL2 For each match chack intersection

                                                of intervals between Pi1 Pij and corresponding symbols in TL

                                                End Algorithm

                                                PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                scaled match

                                                Example

                                                2133 32

                                                21

                                                3121 2232

                                                3121 2255

                                                3231

                                                21 3333

                                                Important Fact

                                                So there are at most O(radicm) different Pikrsquos

                                                Time O(n) for parameterized matching (Σ=12

                                                hellipn) O(radicm) verification for each location Total O(nradicm)

                                                mi

                                                j

                                                kP

                                                k

                                                1

                                                Tighter analysis

                                                Upper bound number of possible p-matches

                                                Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                Then there are at most n2j p-matches of PL in TL

                                                Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                O((n2j) middot j) = O(n)

                                                Proof of Lemma

                                                1st appearance of Pi1 Pij

                                                PL Pi1 Pi2 Pij

                                                TL a1 a2 aj

                                                m-match

                                                2

                                                2

                                                1

                                                ja

                                                j

                                                ki

                                                Lemmarsquos proof (cont)

                                                Let x be the total number of p-matches in the text

                                                The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                ge (xjsup2)2

                                                But There are overlaps How many

                                                Lemmarsquos proof (cont)

                                                For each text location at most j matches will count it Thereforehellip

                                                Total count without overlaps ge

                                                Clearly xmiddotj2 le n thus x le (2n)j

                                                2

                                                1

                                                2

                                                2

                                                xjxjj

                                                Open Problem

                                                Give 1-d algorithm linear in run-length compressed text and pattern

                                                • SCALED Pattern Matching
                                                • Motivation
                                                • Slide 3
                                                • Model
                                                • Types of Approximations
                                                • Types of Approximation
                                                • It seems daunting buthellip
                                                • CPM 2003 Morelia Mexico
                                                • Problem inherently inexact
                                                • Definition
                                                • Discrete exact Scaled Matching
                                                • Slide 12
                                                • Idea Fix a scale s
                                                • Algorithm time
                                                • Problem Real scales
                                                • Formally
                                                • Remark
                                                • Simplify definition
                                                • Why are definitions equivalent
                                                • Time
                                                • Definitions are Equivalent
                                                • Naiumlve algorithm for matching PL in TL
                                                • Check intersection
                                                • Slide 24
                                                • Improvement ndash Parameterized Matching
                                                • Parameterized Matching
                                                • Slide 27
                                                • The reduction
                                                • Possibility 1
                                                • Possibility 2
                                                • Algorithm for Real Scaled String Matching
                                                • Example
                                                • Important Fact
                                                • Tighter analysis
                                                • Proof of Lemma
                                                • Lemmarsquos proof (cont)
                                                • Slide 37
                                                • Open Problem

                                                  Parameterized Matching

                                                  Input two strings s and t |s|=|t| over alphabets sums and sumt

                                                  s parameterize matches t if bijection sums sumt such that (s) = t

                                                  exist

                                                  (a)=x

                                                  (b)=y

                                                  Π Π

                                                  ΠΠ

                                                  a ab b b

                                                  x xy y y

                                                  Example

                                                  Parameterized Matching

                                                  Claim (AFM-94)

                                                  For Σ that can be sorted in linear time (eg Σ=1 n)

                                                  Parameterized matching can be done in time O(n)

                                                  The reduction

                                                  1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                                  Proof Assume PL does not p-match TL at

                                                  location i

                                                  The possible situations are

                                                  Possibility 1wlog c ge a+1

                                                  For c = a+1 (smallest possible)

                                                  TL

                                                  PL

                                                  a

                                                  b b

                                                  cnea

                                                  b

                                                  a

                                                  b

                                                  a

                                                  b

                                                  a

                                                  b

                                                  a 211

                                                  Possibility 2

                                                  wlog c ge b+1

                                                  Intersection not empty only if

                                                  (a+1)(b+1) gt ab ie

                                                  ab+b gt ab+a

                                                  bgta

                                                  But this can never happen if α ge 1

                                                  TL

                                                  PL

                                                  a

                                                  b cneb

                                                  a

                                                  1

                                                  11

                                                  1

                                                  b

                                                  a

                                                  b

                                                  a

                                                  b

                                                  a

                                                  b

                                                  a

                                                  Algorithm for Real Scaled String Matching

                                                  Let Pi1 Pi2 Pij be the different numbers in PL

                                                  1 P-match PL in TL2 For each match chack intersection

                                                  of intervals between Pi1 Pij and corresponding symbols in TL

                                                  End Algorithm

                                                  PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                  TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                  scaled match

                                                  Example

                                                  2133 32

                                                  21

                                                  3121 2232

                                                  3121 2255

                                                  3231

                                                  21 3333

                                                  Important Fact

                                                  So there are at most O(radicm) different Pikrsquos

                                                  Time O(n) for parameterized matching (Σ=12

                                                  hellipn) O(radicm) verification for each location Total O(nradicm)

                                                  mi

                                                  j

                                                  kP

                                                  k

                                                  1

                                                  Tighter analysis

                                                  Upper bound number of possible p-matches

                                                  Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                  Then there are at most n2j p-matches of PL in TL

                                                  Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                  O((n2j) middot j) = O(n)

                                                  Proof of Lemma

                                                  1st appearance of Pi1 Pij

                                                  PL Pi1 Pi2 Pij

                                                  TL a1 a2 aj

                                                  m-match

                                                  2

                                                  2

                                                  1

                                                  ja

                                                  j

                                                  ki

                                                  Lemmarsquos proof (cont)

                                                  Let x be the total number of p-matches in the text

                                                  The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                  ge (xjsup2)2

                                                  But There are overlaps How many

                                                  Lemmarsquos proof (cont)

                                                  For each text location at most j matches will count it Thereforehellip

                                                  Total count without overlaps ge

                                                  Clearly xmiddotj2 le n thus x le (2n)j

                                                  2

                                                  1

                                                  2

                                                  2

                                                  xjxjj

                                                  Open Problem

                                                  Give 1-d algorithm linear in run-length compressed text and pattern

                                                  • SCALED Pattern Matching
                                                  • Motivation
                                                  • Slide 3
                                                  • Model
                                                  • Types of Approximations
                                                  • Types of Approximation
                                                  • It seems daunting buthellip
                                                  • CPM 2003 Morelia Mexico
                                                  • Problem inherently inexact
                                                  • Definition
                                                  • Discrete exact Scaled Matching
                                                  • Slide 12
                                                  • Idea Fix a scale s
                                                  • Algorithm time
                                                  • Problem Real scales
                                                  • Formally
                                                  • Remark
                                                  • Simplify definition
                                                  • Why are definitions equivalent
                                                  • Time
                                                  • Definitions are Equivalent
                                                  • Naiumlve algorithm for matching PL in TL
                                                  • Check intersection
                                                  • Slide 24
                                                  • Improvement ndash Parameterized Matching
                                                  • Parameterized Matching
                                                  • Slide 27
                                                  • The reduction
                                                  • Possibility 1
                                                  • Possibility 2
                                                  • Algorithm for Real Scaled String Matching
                                                  • Example
                                                  • Important Fact
                                                  • Tighter analysis
                                                  • Proof of Lemma
                                                  • Lemmarsquos proof (cont)
                                                  • Slide 37
                                                  • Open Problem

                                                    Parameterized Matching

                                                    Claim (AFM-94)

                                                    For Σ that can be sorted in linear time (eg Σ=1 n)

                                                    Parameterized matching can be done in time O(n)

                                                    The reduction

                                                    1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                                    Proof Assume PL does not p-match TL at

                                                    location i

                                                    The possible situations are

                                                    Possibility 1wlog c ge a+1

                                                    For c = a+1 (smallest possible)

                                                    TL

                                                    PL

                                                    a

                                                    b b

                                                    cnea

                                                    b

                                                    a

                                                    b

                                                    a

                                                    b

                                                    a

                                                    b

                                                    a 211

                                                    Possibility 2

                                                    wlog c ge b+1

                                                    Intersection not empty only if

                                                    (a+1)(b+1) gt ab ie

                                                    ab+b gt ab+a

                                                    bgta

                                                    But this can never happen if α ge 1

                                                    TL

                                                    PL

                                                    a

                                                    b cneb

                                                    a

                                                    1

                                                    11

                                                    1

                                                    b

                                                    a

                                                    b

                                                    a

                                                    b

                                                    a

                                                    b

                                                    a

                                                    Algorithm for Real Scaled String Matching

                                                    Let Pi1 Pi2 Pij be the different numbers in PL

                                                    1 P-match PL in TL2 For each match chack intersection

                                                    of intervals between Pi1 Pij and corresponding symbols in TL

                                                    End Algorithm

                                                    PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                    TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                    scaled match

                                                    Example

                                                    2133 32

                                                    21

                                                    3121 2232

                                                    3121 2255

                                                    3231

                                                    21 3333

                                                    Important Fact

                                                    So there are at most O(radicm) different Pikrsquos

                                                    Time O(n) for parameterized matching (Σ=12

                                                    hellipn) O(radicm) verification for each location Total O(nradicm)

                                                    mi

                                                    j

                                                    kP

                                                    k

                                                    1

                                                    Tighter analysis

                                                    Upper bound number of possible p-matches

                                                    Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                    Then there are at most n2j p-matches of PL in TL

                                                    Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                    O((n2j) middot j) = O(n)

                                                    Proof of Lemma

                                                    1st appearance of Pi1 Pij

                                                    PL Pi1 Pi2 Pij

                                                    TL a1 a2 aj

                                                    m-match

                                                    2

                                                    2

                                                    1

                                                    ja

                                                    j

                                                    ki

                                                    Lemmarsquos proof (cont)

                                                    Let x be the total number of p-matches in the text

                                                    The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                    ge (xjsup2)2

                                                    But There are overlaps How many

                                                    Lemmarsquos proof (cont)

                                                    For each text location at most j matches will count it Thereforehellip

                                                    Total count without overlaps ge

                                                    Clearly xmiddotj2 le n thus x le (2n)j

                                                    2

                                                    1

                                                    2

                                                    2

                                                    xjxjj

                                                    Open Problem

                                                    Give 1-d algorithm linear in run-length compressed text and pattern

                                                    • SCALED Pattern Matching
                                                    • Motivation
                                                    • Slide 3
                                                    • Model
                                                    • Types of Approximations
                                                    • Types of Approximation
                                                    • It seems daunting buthellip
                                                    • CPM 2003 Morelia Mexico
                                                    • Problem inherently inexact
                                                    • Definition
                                                    • Discrete exact Scaled Matching
                                                    • Slide 12
                                                    • Idea Fix a scale s
                                                    • Algorithm time
                                                    • Problem Real scales
                                                    • Formally
                                                    • Remark
                                                    • Simplify definition
                                                    • Why are definitions equivalent
                                                    • Time
                                                    • Definitions are Equivalent
                                                    • Naiumlve algorithm for matching PL in TL
                                                    • Check intersection
                                                    • Slide 24
                                                    • Improvement ndash Parameterized Matching
                                                    • Parameterized Matching
                                                    • Slide 27
                                                    • The reduction
                                                    • Possibility 1
                                                    • Possibility 2
                                                    • Algorithm for Real Scaled String Matching
                                                    • Example
                                                    • Important Fact
                                                    • Tighter analysis
                                                    • Proof of Lemma
                                                    • Lemmarsquos proof (cont)
                                                    • Slide 37
                                                    • Open Problem

                                                      The reduction

                                                      1 Lemma for which PL matches TL at location i scaled to α only if PL p-matches TL at i

                                                      Proof Assume PL does not p-match TL at

                                                      location i

                                                      The possible situations are

                                                      Possibility 1wlog c ge a+1

                                                      For c = a+1 (smallest possible)

                                                      TL

                                                      PL

                                                      a

                                                      b b

                                                      cnea

                                                      b

                                                      a

                                                      b

                                                      a

                                                      b

                                                      a

                                                      b

                                                      a 211

                                                      Possibility 2

                                                      wlog c ge b+1

                                                      Intersection not empty only if

                                                      (a+1)(b+1) gt ab ie

                                                      ab+b gt ab+a

                                                      bgta

                                                      But this can never happen if α ge 1

                                                      TL

                                                      PL

                                                      a

                                                      b cneb

                                                      a

                                                      1

                                                      11

                                                      1

                                                      b

                                                      a

                                                      b

                                                      a

                                                      b

                                                      a

                                                      b

                                                      a

                                                      Algorithm for Real Scaled String Matching

                                                      Let Pi1 Pi2 Pij be the different numbers in PL

                                                      1 P-match PL in TL2 For each match chack intersection

                                                      of intervals between Pi1 Pij and corresponding symbols in TL

                                                      End Algorithm

                                                      PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                      TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                      scaled match

                                                      Example

                                                      2133 32

                                                      21

                                                      3121 2232

                                                      3121 2255

                                                      3231

                                                      21 3333

                                                      Important Fact

                                                      So there are at most O(radicm) different Pikrsquos

                                                      Time O(n) for parameterized matching (Σ=12

                                                      hellipn) O(radicm) verification for each location Total O(nradicm)

                                                      mi

                                                      j

                                                      kP

                                                      k

                                                      1

                                                      Tighter analysis

                                                      Upper bound number of possible p-matches

                                                      Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                      Then there are at most n2j p-matches of PL in TL

                                                      Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                      O((n2j) middot j) = O(n)

                                                      Proof of Lemma

                                                      1st appearance of Pi1 Pij

                                                      PL Pi1 Pi2 Pij

                                                      TL a1 a2 aj

                                                      m-match

                                                      2

                                                      2

                                                      1

                                                      ja

                                                      j

                                                      ki

                                                      Lemmarsquos proof (cont)

                                                      Let x be the total number of p-matches in the text

                                                      The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                      ge (xjsup2)2

                                                      But There are overlaps How many

                                                      Lemmarsquos proof (cont)

                                                      For each text location at most j matches will count it Thereforehellip

                                                      Total count without overlaps ge

                                                      Clearly xmiddotj2 le n thus x le (2n)j

                                                      2

                                                      1

                                                      2

                                                      2

                                                      xjxjj

                                                      Open Problem

                                                      Give 1-d algorithm linear in run-length compressed text and pattern

                                                      • SCALED Pattern Matching
                                                      • Motivation
                                                      • Slide 3
                                                      • Model
                                                      • Types of Approximations
                                                      • Types of Approximation
                                                      • It seems daunting buthellip
                                                      • CPM 2003 Morelia Mexico
                                                      • Problem inherently inexact
                                                      • Definition
                                                      • Discrete exact Scaled Matching
                                                      • Slide 12
                                                      • Idea Fix a scale s
                                                      • Algorithm time
                                                      • Problem Real scales
                                                      • Formally
                                                      • Remark
                                                      • Simplify definition
                                                      • Why are definitions equivalent
                                                      • Time
                                                      • Definitions are Equivalent
                                                      • Naiumlve algorithm for matching PL in TL
                                                      • Check intersection
                                                      • Slide 24
                                                      • Improvement ndash Parameterized Matching
                                                      • Parameterized Matching
                                                      • Slide 27
                                                      • The reduction
                                                      • Possibility 1
                                                      • Possibility 2
                                                      • Algorithm for Real Scaled String Matching
                                                      • Example
                                                      • Important Fact
                                                      • Tighter analysis
                                                      • Proof of Lemma
                                                      • Lemmarsquos proof (cont)
                                                      • Slide 37
                                                      • Open Problem

                                                        Possibility 1wlog c ge a+1

                                                        For c = a+1 (smallest possible)

                                                        TL

                                                        PL

                                                        a

                                                        b b

                                                        cnea

                                                        b

                                                        a

                                                        b

                                                        a

                                                        b

                                                        a

                                                        b

                                                        a 211

                                                        Possibility 2

                                                        wlog c ge b+1

                                                        Intersection not empty only if

                                                        (a+1)(b+1) gt ab ie

                                                        ab+b gt ab+a

                                                        bgta

                                                        But this can never happen if α ge 1

                                                        TL

                                                        PL

                                                        a

                                                        b cneb

                                                        a

                                                        1

                                                        11

                                                        1

                                                        b

                                                        a

                                                        b

                                                        a

                                                        b

                                                        a

                                                        b

                                                        a

                                                        Algorithm for Real Scaled String Matching

                                                        Let Pi1 Pi2 Pij be the different numbers in PL

                                                        1 P-match PL in TL2 For each match chack intersection

                                                        of intervals between Pi1 Pij and corresponding symbols in TL

                                                        End Algorithm

                                                        PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                        TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                        scaled match

                                                        Example

                                                        2133 32

                                                        21

                                                        3121 2232

                                                        3121 2255

                                                        3231

                                                        21 3333

                                                        Important Fact

                                                        So there are at most O(radicm) different Pikrsquos

                                                        Time O(n) for parameterized matching (Σ=12

                                                        hellipn) O(radicm) verification for each location Total O(nradicm)

                                                        mi

                                                        j

                                                        kP

                                                        k

                                                        1

                                                        Tighter analysis

                                                        Upper bound number of possible p-matches

                                                        Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                        Then there are at most n2j p-matches of PL in TL

                                                        Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                        O((n2j) middot j) = O(n)

                                                        Proof of Lemma

                                                        1st appearance of Pi1 Pij

                                                        PL Pi1 Pi2 Pij

                                                        TL a1 a2 aj

                                                        m-match

                                                        2

                                                        2

                                                        1

                                                        ja

                                                        j

                                                        ki

                                                        Lemmarsquos proof (cont)

                                                        Let x be the total number of p-matches in the text

                                                        The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                        ge (xjsup2)2

                                                        But There are overlaps How many

                                                        Lemmarsquos proof (cont)

                                                        For each text location at most j matches will count it Thereforehellip

                                                        Total count without overlaps ge

                                                        Clearly xmiddotj2 le n thus x le (2n)j

                                                        2

                                                        1

                                                        2

                                                        2

                                                        xjxjj

                                                        Open Problem

                                                        Give 1-d algorithm linear in run-length compressed text and pattern

                                                        • SCALED Pattern Matching
                                                        • Motivation
                                                        • Slide 3
                                                        • Model
                                                        • Types of Approximations
                                                        • Types of Approximation
                                                        • It seems daunting buthellip
                                                        • CPM 2003 Morelia Mexico
                                                        • Problem inherently inexact
                                                        • Definition
                                                        • Discrete exact Scaled Matching
                                                        • Slide 12
                                                        • Idea Fix a scale s
                                                        • Algorithm time
                                                        • Problem Real scales
                                                        • Formally
                                                        • Remark
                                                        • Simplify definition
                                                        • Why are definitions equivalent
                                                        • Time
                                                        • Definitions are Equivalent
                                                        • Naiumlve algorithm for matching PL in TL
                                                        • Check intersection
                                                        • Slide 24
                                                        • Improvement ndash Parameterized Matching
                                                        • Parameterized Matching
                                                        • Slide 27
                                                        • The reduction
                                                        • Possibility 1
                                                        • Possibility 2
                                                        • Algorithm for Real Scaled String Matching
                                                        • Example
                                                        • Important Fact
                                                        • Tighter analysis
                                                        • Proof of Lemma
                                                        • Lemmarsquos proof (cont)
                                                        • Slide 37
                                                        • Open Problem

                                                          Possibility 2

                                                          wlog c ge b+1

                                                          Intersection not empty only if

                                                          (a+1)(b+1) gt ab ie

                                                          ab+b gt ab+a

                                                          bgta

                                                          But this can never happen if α ge 1

                                                          TL

                                                          PL

                                                          a

                                                          b cneb

                                                          a

                                                          1

                                                          11

                                                          1

                                                          b

                                                          a

                                                          b

                                                          a

                                                          b

                                                          a

                                                          b

                                                          a

                                                          Algorithm for Real Scaled String Matching

                                                          Let Pi1 Pi2 Pij be the different numbers in PL

                                                          1 P-match PL in TL2 For each match chack intersection

                                                          of intervals between Pi1 Pij and corresponding symbols in TL

                                                          End Algorithm

                                                          PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                          TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                          scaled match

                                                          Example

                                                          2133 32

                                                          21

                                                          3121 2232

                                                          3121 2255

                                                          3231

                                                          21 3333

                                                          Important Fact

                                                          So there are at most O(radicm) different Pikrsquos

                                                          Time O(n) for parameterized matching (Σ=12

                                                          hellipn) O(radicm) verification for each location Total O(nradicm)

                                                          mi

                                                          j

                                                          kP

                                                          k

                                                          1

                                                          Tighter analysis

                                                          Upper bound number of possible p-matches

                                                          Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                          Then there are at most n2j p-matches of PL in TL

                                                          Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                          O((n2j) middot j) = O(n)

                                                          Proof of Lemma

                                                          1st appearance of Pi1 Pij

                                                          PL Pi1 Pi2 Pij

                                                          TL a1 a2 aj

                                                          m-match

                                                          2

                                                          2

                                                          1

                                                          ja

                                                          j

                                                          ki

                                                          Lemmarsquos proof (cont)

                                                          Let x be the total number of p-matches in the text

                                                          The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                          ge (xjsup2)2

                                                          But There are overlaps How many

                                                          Lemmarsquos proof (cont)

                                                          For each text location at most j matches will count it Thereforehellip

                                                          Total count without overlaps ge

                                                          Clearly xmiddotj2 le n thus x le (2n)j

                                                          2

                                                          1

                                                          2

                                                          2

                                                          xjxjj

                                                          Open Problem

                                                          Give 1-d algorithm linear in run-length compressed text and pattern

                                                          • SCALED Pattern Matching
                                                          • Motivation
                                                          • Slide 3
                                                          • Model
                                                          • Types of Approximations
                                                          • Types of Approximation
                                                          • It seems daunting buthellip
                                                          • CPM 2003 Morelia Mexico
                                                          • Problem inherently inexact
                                                          • Definition
                                                          • Discrete exact Scaled Matching
                                                          • Slide 12
                                                          • Idea Fix a scale s
                                                          • Algorithm time
                                                          • Problem Real scales
                                                          • Formally
                                                          • Remark
                                                          • Simplify definition
                                                          • Why are definitions equivalent
                                                          • Time
                                                          • Definitions are Equivalent
                                                          • Naiumlve algorithm for matching PL in TL
                                                          • Check intersection
                                                          • Slide 24
                                                          • Improvement ndash Parameterized Matching
                                                          • Parameterized Matching
                                                          • Slide 27
                                                          • The reduction
                                                          • Possibility 1
                                                          • Possibility 2
                                                          • Algorithm for Real Scaled String Matching
                                                          • Example
                                                          • Important Fact
                                                          • Tighter analysis
                                                          • Proof of Lemma
                                                          • Lemmarsquos proof (cont)
                                                          • Slide 37
                                                          • Open Problem

                                                            Algorithm for Real Scaled String Matching

                                                            Let Pi1 Pi2 Pij be the different numbers in PL

                                                            1 P-match PL in TL2 For each match chack intersection

                                                            of intervals between Pi1 Pij and corresponding symbols in TL

                                                            End Algorithm

                                                            PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                            TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                            scaled match

                                                            Example

                                                            2133 32

                                                            21

                                                            3121 2232

                                                            3121 2255

                                                            3231

                                                            21 3333

                                                            Important Fact

                                                            So there are at most O(radicm) different Pikrsquos

                                                            Time O(n) for parameterized matching (Σ=12

                                                            hellipn) O(radicm) verification for each location Total O(nradicm)

                                                            mi

                                                            j

                                                            kP

                                                            k

                                                            1

                                                            Tighter analysis

                                                            Upper bound number of possible p-matches

                                                            Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                            Then there are at most n2j p-matches of PL in TL

                                                            Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                            O((n2j) middot j) = O(n)

                                                            Proof of Lemma

                                                            1st appearance of Pi1 Pij

                                                            PL Pi1 Pi2 Pij

                                                            TL a1 a2 aj

                                                            m-match

                                                            2

                                                            2

                                                            1

                                                            ja

                                                            j

                                                            ki

                                                            Lemmarsquos proof (cont)

                                                            Let x be the total number of p-matches in the text

                                                            The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                            ge (xjsup2)2

                                                            But There are overlaps How many

                                                            Lemmarsquos proof (cont)

                                                            For each text location at most j matches will count it Thereforehellip

                                                            Total count without overlaps ge

                                                            Clearly xmiddotj2 le n thus x le (2n)j

                                                            2

                                                            1

                                                            2

                                                            2

                                                            xjxjj

                                                            Open Problem

                                                            Give 1-d algorithm linear in run-length compressed text and pattern

                                                            • SCALED Pattern Matching
                                                            • Motivation
                                                            • Slide 3
                                                            • Model
                                                            • Types of Approximations
                                                            • Types of Approximation
                                                            • It seems daunting buthellip
                                                            • CPM 2003 Morelia Mexico
                                                            • Problem inherently inexact
                                                            • Definition
                                                            • Discrete exact Scaled Matching
                                                            • Slide 12
                                                            • Idea Fix a scale s
                                                            • Algorithm time
                                                            • Problem Real scales
                                                            • Formally
                                                            • Remark
                                                            • Simplify definition
                                                            • Why are definitions equivalent
                                                            • Time
                                                            • Definitions are Equivalent
                                                            • Naiumlve algorithm for matching PL in TL
                                                            • Check intersection
                                                            • Slide 24
                                                            • Improvement ndash Parameterized Matching
                                                            • Parameterized Matching
                                                            • Slide 27
                                                            • The reduction
                                                            • Possibility 1
                                                            • Possibility 2
                                                            • Algorithm for Real Scaled String Matching
                                                            • Example
                                                            • Important Fact
                                                            • Tighter analysis
                                                            • Proof of Lemma
                                                            • Lemmarsquos proof (cont)
                                                            • Slide 37
                                                            • Open Problem

                                                              PL = 2 3 2 3 2 Pi1=2 Pi2=3p-matches

                                                              TL = 5 6 5 6 5 6 10 6 10 6 10 7

                                                              scaled match

                                                              Example

                                                              2133 32

                                                              21

                                                              3121 2232

                                                              3121 2255

                                                              3231

                                                              21 3333

                                                              Important Fact

                                                              So there are at most O(radicm) different Pikrsquos

                                                              Time O(n) for parameterized matching (Σ=12

                                                              hellipn) O(radicm) verification for each location Total O(nradicm)

                                                              mi

                                                              j

                                                              kP

                                                              k

                                                              1

                                                              Tighter analysis

                                                              Upper bound number of possible p-matches

                                                              Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                              Then there are at most n2j p-matches of PL in TL

                                                              Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                              O((n2j) middot j) = O(n)

                                                              Proof of Lemma

                                                              1st appearance of Pi1 Pij

                                                              PL Pi1 Pi2 Pij

                                                              TL a1 a2 aj

                                                              m-match

                                                              2

                                                              2

                                                              1

                                                              ja

                                                              j

                                                              ki

                                                              Lemmarsquos proof (cont)

                                                              Let x be the total number of p-matches in the text

                                                              The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                              ge (xjsup2)2

                                                              But There are overlaps How many

                                                              Lemmarsquos proof (cont)

                                                              For each text location at most j matches will count it Thereforehellip

                                                              Total count without overlaps ge

                                                              Clearly xmiddotj2 le n thus x le (2n)j

                                                              2

                                                              1

                                                              2

                                                              2

                                                              xjxjj

                                                              Open Problem

                                                              Give 1-d algorithm linear in run-length compressed text and pattern

                                                              • SCALED Pattern Matching
                                                              • Motivation
                                                              • Slide 3
                                                              • Model
                                                              • Types of Approximations
                                                              • Types of Approximation
                                                              • It seems daunting buthellip
                                                              • CPM 2003 Morelia Mexico
                                                              • Problem inherently inexact
                                                              • Definition
                                                              • Discrete exact Scaled Matching
                                                              • Slide 12
                                                              • Idea Fix a scale s
                                                              • Algorithm time
                                                              • Problem Real scales
                                                              • Formally
                                                              • Remark
                                                              • Simplify definition
                                                              • Why are definitions equivalent
                                                              • Time
                                                              • Definitions are Equivalent
                                                              • Naiumlve algorithm for matching PL in TL
                                                              • Check intersection
                                                              • Slide 24
                                                              • Improvement ndash Parameterized Matching
                                                              • Parameterized Matching
                                                              • Slide 27
                                                              • The reduction
                                                              • Possibility 1
                                                              • Possibility 2
                                                              • Algorithm for Real Scaled String Matching
                                                              • Example
                                                              • Important Fact
                                                              • Tighter analysis
                                                              • Proof of Lemma
                                                              • Lemmarsquos proof (cont)
                                                              • Slide 37
                                                              • Open Problem

                                                                Important Fact

                                                                So there are at most O(radicm) different Pikrsquos

                                                                Time O(n) for parameterized matching (Σ=12

                                                                hellipn) O(radicm) verification for each location Total O(nradicm)

                                                                mi

                                                                j

                                                                kP

                                                                k

                                                                1

                                                                Tighter analysis

                                                                Upper bound number of possible p-matches

                                                                Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                                Then there are at most n2j p-matches of PL in TL

                                                                Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                                O((n2j) middot j) = O(n)

                                                                Proof of Lemma

                                                                1st appearance of Pi1 Pij

                                                                PL Pi1 Pi2 Pij

                                                                TL a1 a2 aj

                                                                m-match

                                                                2

                                                                2

                                                                1

                                                                ja

                                                                j

                                                                ki

                                                                Lemmarsquos proof (cont)

                                                                Let x be the total number of p-matches in the text

                                                                The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                                ge (xjsup2)2

                                                                But There are overlaps How many

                                                                Lemmarsquos proof (cont)

                                                                For each text location at most j matches will count it Thereforehellip

                                                                Total count without overlaps ge

                                                                Clearly xmiddotj2 le n thus x le (2n)j

                                                                2

                                                                1

                                                                2

                                                                2

                                                                xjxjj

                                                                Open Problem

                                                                Give 1-d algorithm linear in run-length compressed text and pattern

                                                                • SCALED Pattern Matching
                                                                • Motivation
                                                                • Slide 3
                                                                • Model
                                                                • Types of Approximations
                                                                • Types of Approximation
                                                                • It seems daunting buthellip
                                                                • CPM 2003 Morelia Mexico
                                                                • Problem inherently inexact
                                                                • Definition
                                                                • Discrete exact Scaled Matching
                                                                • Slide 12
                                                                • Idea Fix a scale s
                                                                • Algorithm time
                                                                • Problem Real scales
                                                                • Formally
                                                                • Remark
                                                                • Simplify definition
                                                                • Why are definitions equivalent
                                                                • Time
                                                                • Definitions are Equivalent
                                                                • Naiumlve algorithm for matching PL in TL
                                                                • Check intersection
                                                                • Slide 24
                                                                • Improvement ndash Parameterized Matching
                                                                • Parameterized Matching
                                                                • Slide 27
                                                                • The reduction
                                                                • Possibility 1
                                                                • Possibility 2
                                                                • Algorithm for Real Scaled String Matching
                                                                • Example
                                                                • Important Fact
                                                                • Tighter analysis
                                                                • Proof of Lemma
                                                                • Lemmarsquos proof (cont)
                                                                • Slide 37
                                                                • Open Problem

                                                                  Tighter analysis

                                                                  Upper bound number of possible p-matches

                                                                  Lemma Let |P|=m |T|=n Pi1 Pi2 Pij be the different numbers in PL

                                                                  Then there are at most n2j p-matches of PL in TL

                                                                  Meaning Since verification time is O(j) per p-match the lemma implies that total verification time is

                                                                  O((n2j) middot j) = O(n)

                                                                  Proof of Lemma

                                                                  1st appearance of Pi1 Pij

                                                                  PL Pi1 Pi2 Pij

                                                                  TL a1 a2 aj

                                                                  m-match

                                                                  2

                                                                  2

                                                                  1

                                                                  ja

                                                                  j

                                                                  ki

                                                                  Lemmarsquos proof (cont)

                                                                  Let x be the total number of p-matches in the text

                                                                  The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                                  ge (xjsup2)2

                                                                  But There are overlaps How many

                                                                  Lemmarsquos proof (cont)

                                                                  For each text location at most j matches will count it Thereforehellip

                                                                  Total count without overlaps ge

                                                                  Clearly xmiddotj2 le n thus x le (2n)j

                                                                  2

                                                                  1

                                                                  2

                                                                  2

                                                                  xjxjj

                                                                  Open Problem

                                                                  Give 1-d algorithm linear in run-length compressed text and pattern

                                                                  • SCALED Pattern Matching
                                                                  • Motivation
                                                                  • Slide 3
                                                                  • Model
                                                                  • Types of Approximations
                                                                  • Types of Approximation
                                                                  • It seems daunting buthellip
                                                                  • CPM 2003 Morelia Mexico
                                                                  • Problem inherently inexact
                                                                  • Definition
                                                                  • Discrete exact Scaled Matching
                                                                  • Slide 12
                                                                  • Idea Fix a scale s
                                                                  • Algorithm time
                                                                  • Problem Real scales
                                                                  • Formally
                                                                  • Remark
                                                                  • Simplify definition
                                                                  • Why are definitions equivalent
                                                                  • Time
                                                                  • Definitions are Equivalent
                                                                  • Naiumlve algorithm for matching PL in TL
                                                                  • Check intersection
                                                                  • Slide 24
                                                                  • Improvement ndash Parameterized Matching
                                                                  • Parameterized Matching
                                                                  • Slide 27
                                                                  • The reduction
                                                                  • Possibility 1
                                                                  • Possibility 2
                                                                  • Algorithm for Real Scaled String Matching
                                                                  • Example
                                                                  • Important Fact
                                                                  • Tighter analysis
                                                                  • Proof of Lemma
                                                                  • Lemmarsquos proof (cont)
                                                                  • Slide 37
                                                                  • Open Problem

                                                                    Proof of Lemma

                                                                    1st appearance of Pi1 Pij

                                                                    PL Pi1 Pi2 Pij

                                                                    TL a1 a2 aj

                                                                    m-match

                                                                    2

                                                                    2

                                                                    1

                                                                    ja

                                                                    j

                                                                    ki

                                                                    Lemmarsquos proof (cont)

                                                                    Let x be the total number of p-matches in the text

                                                                    The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                                    ge (xjsup2)2

                                                                    But There are overlaps How many

                                                                    Lemmarsquos proof (cont)

                                                                    For each text location at most j matches will count it Thereforehellip

                                                                    Total count without overlaps ge

                                                                    Clearly xmiddotj2 le n thus x le (2n)j

                                                                    2

                                                                    1

                                                                    2

                                                                    2

                                                                    xjxjj

                                                                    Open Problem

                                                                    Give 1-d algorithm linear in run-length compressed text and pattern

                                                                    • SCALED Pattern Matching
                                                                    • Motivation
                                                                    • Slide 3
                                                                    • Model
                                                                    • Types of Approximations
                                                                    • Types of Approximation
                                                                    • It seems daunting buthellip
                                                                    • CPM 2003 Morelia Mexico
                                                                    • Problem inherently inexact
                                                                    • Definition
                                                                    • Discrete exact Scaled Matching
                                                                    • Slide 12
                                                                    • Idea Fix a scale s
                                                                    • Algorithm time
                                                                    • Problem Real scales
                                                                    • Formally
                                                                    • Remark
                                                                    • Simplify definition
                                                                    • Why are definitions equivalent
                                                                    • Time
                                                                    • Definitions are Equivalent
                                                                    • Naiumlve algorithm for matching PL in TL
                                                                    • Check intersection
                                                                    • Slide 24
                                                                    • Improvement ndash Parameterized Matching
                                                                    • Parameterized Matching
                                                                    • Slide 27
                                                                    • The reduction
                                                                    • Possibility 1
                                                                    • Possibility 2
                                                                    • Algorithm for Real Scaled String Matching
                                                                    • Example
                                                                    • Important Fact
                                                                    • Tighter analysis
                                                                    • Proof of Lemma
                                                                    • Lemmarsquos proof (cont)
                                                                    • Slide 37
                                                                    • Open Problem

                                                                      Lemmarsquos proof (cont)

                                                                      Let x be the total number of p-matches in the text

                                                                      The sum of all text elements that match 1st occurrences of Piklsquos in the pattern

                                                                      ge (xjsup2)2

                                                                      But There are overlaps How many

                                                                      Lemmarsquos proof (cont)

                                                                      For each text location at most j matches will count it Thereforehellip

                                                                      Total count without overlaps ge

                                                                      Clearly xmiddotj2 le n thus x le (2n)j

                                                                      2

                                                                      1

                                                                      2

                                                                      2

                                                                      xjxjj

                                                                      Open Problem

                                                                      Give 1-d algorithm linear in run-length compressed text and pattern

                                                                      • SCALED Pattern Matching
                                                                      • Motivation
                                                                      • Slide 3
                                                                      • Model
                                                                      • Types of Approximations
                                                                      • Types of Approximation
                                                                      • It seems daunting buthellip
                                                                      • CPM 2003 Morelia Mexico
                                                                      • Problem inherently inexact
                                                                      • Definition
                                                                      • Discrete exact Scaled Matching
                                                                      • Slide 12
                                                                      • Idea Fix a scale s
                                                                      • Algorithm time
                                                                      • Problem Real scales
                                                                      • Formally
                                                                      • Remark
                                                                      • Simplify definition
                                                                      • Why are definitions equivalent
                                                                      • Time
                                                                      • Definitions are Equivalent
                                                                      • Naiumlve algorithm for matching PL in TL
                                                                      • Check intersection
                                                                      • Slide 24
                                                                      • Improvement ndash Parameterized Matching
                                                                      • Parameterized Matching
                                                                      • Slide 27
                                                                      • The reduction
                                                                      • Possibility 1
                                                                      • Possibility 2
                                                                      • Algorithm for Real Scaled String Matching
                                                                      • Example
                                                                      • Important Fact
                                                                      • Tighter analysis
                                                                      • Proof of Lemma
                                                                      • Lemmarsquos proof (cont)
                                                                      • Slide 37
                                                                      • Open Problem

                                                                        Lemmarsquos proof (cont)

                                                                        For each text location at most j matches will count it Thereforehellip

                                                                        Total count without overlaps ge

                                                                        Clearly xmiddotj2 le n thus x le (2n)j

                                                                        2

                                                                        1

                                                                        2

                                                                        2

                                                                        xjxjj

                                                                        Open Problem

                                                                        Give 1-d algorithm linear in run-length compressed text and pattern

                                                                        • SCALED Pattern Matching
                                                                        • Motivation
                                                                        • Slide 3
                                                                        • Model
                                                                        • Types of Approximations
                                                                        • Types of Approximation
                                                                        • It seems daunting buthellip
                                                                        • CPM 2003 Morelia Mexico
                                                                        • Problem inherently inexact
                                                                        • Definition
                                                                        • Discrete exact Scaled Matching
                                                                        • Slide 12
                                                                        • Idea Fix a scale s
                                                                        • Algorithm time
                                                                        • Problem Real scales
                                                                        • Formally
                                                                        • Remark
                                                                        • Simplify definition
                                                                        • Why are definitions equivalent
                                                                        • Time
                                                                        • Definitions are Equivalent
                                                                        • Naiumlve algorithm for matching PL in TL
                                                                        • Check intersection
                                                                        • Slide 24
                                                                        • Improvement ndash Parameterized Matching
                                                                        • Parameterized Matching
                                                                        • Slide 27
                                                                        • The reduction
                                                                        • Possibility 1
                                                                        • Possibility 2
                                                                        • Algorithm for Real Scaled String Matching
                                                                        • Example
                                                                        • Important Fact
                                                                        • Tighter analysis
                                                                        • Proof of Lemma
                                                                        • Lemmarsquos proof (cont)
                                                                        • Slide 37
                                                                        • Open Problem

                                                                          Open Problem

                                                                          Give 1-d algorithm linear in run-length compressed text and pattern

                                                                          • SCALED Pattern Matching
                                                                          • Motivation
                                                                          • Slide 3
                                                                          • Model
                                                                          • Types of Approximations
                                                                          • Types of Approximation
                                                                          • It seems daunting buthellip
                                                                          • CPM 2003 Morelia Mexico
                                                                          • Problem inherently inexact
                                                                          • Definition
                                                                          • Discrete exact Scaled Matching
                                                                          • Slide 12
                                                                          • Idea Fix a scale s
                                                                          • Algorithm time
                                                                          • Problem Real scales
                                                                          • Formally
                                                                          • Remark
                                                                          • Simplify definition
                                                                          • Why are definitions equivalent
                                                                          • Time
                                                                          • Definitions are Equivalent
                                                                          • Naiumlve algorithm for matching PL in TL
                                                                          • Check intersection
                                                                          • Slide 24
                                                                          • Improvement ndash Parameterized Matching
                                                                          • Parameterized Matching
                                                                          • Slide 27
                                                                          • The reduction
                                                                          • Possibility 1
                                                                          • Possibility 2
                                                                          • Algorithm for Real Scaled String Matching
                                                                          • Example
                                                                          • Important Fact
                                                                          • Tighter analysis
                                                                          • Proof of Lemma
                                                                          • Lemmarsquos proof (cont)
                                                                          • Slide 37
                                                                          • Open Problem

                                                                            top related