Top Banner
Suffix Arrays in Linear Time
23

Suffix arrays

Aug 21, 2015

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Suffix arrays

Suffix Arrays in Linear Time

Page 2: Suffix arrays

Index text, so substring queries can be answered fast

Page 3: Suffix arrays

C G A C G

The Text

C

C

G

T

T

A C

A G

A C

T

Suffix Tree

Page 4: Suffix arrays

C G A C G

The Text

C

C

G

T

T

A C

A G

A C

T

C G CSubstring

Query

Page 5: Suffix arrays

Trees take too much space. Are there smaller

indices?

Page 6: Suffix arrays

C G A C G

The Text

C

C

G

T

T

A C

A G

A C

T

3 1 4 6 2 5 7

Suffix Tree

Suffix ArraySorted List of

Suffixes

Page 7: Suffix arrays

C G A C G

The Text

C T

3 1 4 6 2 5 7

Suffix Array

Burrows-Wheeler Index (an array)

Page 8: Suffix arrays

How can one compute the Suffix Array in Linear

Time?

Page 9: Suffix arrays

Task

Sort these suffixes

lexicographically

O(n log n) comparisons

each taking up to n time

Obtain two arrays, f[i]: sorted order of

ith suffix, g[i]: which suffix is ith

highest

String of length n with characters in the range 1..n

Page 10: Suffix arrays

Divide and Conquer

Separate odd and even

suffixes; sort each recursively,

then combine

Page 11: Suffix arrays

Sorting Even Suffixes

Sort these n/2 pairs and map them to single

chars in the range 1..n/2

A1A2

A3A4

New text of half the

length; sort suffixes

recursively

Page 12: Suffix arrays

Sorting Odd Suffixes

A1,E1 A2,E2 A3,E3 A4,E4

Sort these n/2 pairs, E’s are

the even suffixes, whose order we know

O1 O2 O3 O4

Page 13: Suffix arrays

Time Complexity

T(n) = O(n) + T(n/2) + Time for merging even and odd suffixes

O(n)

Page 14: Suffix arrays

Merging

Do we have any info to determine

the relative order of an odd suffix and

an even one?

A,E B,O

O E

Page 15: Suffix arrays

The Trick Sanders, Karkkainnen

Split suffixes into 3 groups instead of 2, so 0 mod 3, 1 mod 3 and 2

mod 3

0 1 2

Page 16: Suffix arrays

Sorting 0 and 1 Together

A B C D E F G H I J K L

Sort these 2n/3 triplets

and map them to single chars

New text of length 2n/3; sort suffixes recursively

Page 17: Suffix arrays

Sorting Suffixes in 2

A1,01

Sort these n/3 pairs, 0’s are

the mod 0 suffixes, whose order we know

21 22 23 24

A2,02 A3,03 A4,04

Page 18: Suffix arrays

Merging

We know the order of all 0,1

suffixes!

AB,0 CD,1

1 2

Page 19: Suffix arrays

Time Complexity

T(n) = O(n) + T(2n/3) + O(n)

O(n)

Page 20: Suffix arrays

Generalization

v 2v 3v

This string has size |D|n/v

Set D of indices mod v

Time taken to create this string

is O(n |D|)

Sorting suffixes of this string gives the sorted order

of all suffixes which begin at

indices j such that j mod v is in D

Page 21: Suffix arrays

Key Property of D

For any 2 indices i and j i-j mod v is the distance between some two beads in D

x<v

D is a Difference Cover if distances between beads in D generate 0,1…,v-1

x<v

Page 22: Suffix arrays

Size of D

There exists a Difference Cover of size 1.5*sqrt(v)!

sqrt(v)

sqrt(v)

Page 23: Suffix arrays

Time Complexity

T(n) = O(n|D|) + T(|D|n/v) + O(nv)

For |D|=2.5 sqrt(v)

T(n) = O(n sqrt(v))+ T(n/srqt(v)) + O(nv)