Top Banner
Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo, ON
45

Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Dec 15, 2015

Download

Documents

Arlene Johnson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

 Three New Algorithms for Regular Language Enumeration

Erkki MakinenUniversity of Tempere Tempere, Finland

Margareta AckermanUniversity of Waterloo

Waterloo, ON

Page 2: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

What kind of words does this NFA accepts?

0

1

1

10

1

0

A

B

C

D

E

Page 3: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

ε 0 00 11 000 110 0000 1100 1110 00000 ....Cross-section problem: enumerate all words of length n accepted by the NFA in lexicographic order. 

0

1

1

10

1

0

A

B

C

D

E

Page 4: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

ε 0 00 11 000 110 0000 1100 1110 00000 ....Enumeration problem: enumerate the first m words accepted by the NFA in length-lexicographic order. 

0

1

1

10

1

0

A

B

C

D

E

Page 5: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

ε 0 00 11 000 110 0000 1100 1110 00000 ....Min-word problem: find the first word of length n accepted by the NFA. 

0

1

1

10

1

0

A

B

C

D

E

Page 6: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Applications

• Correctness testing, provides evidence that an NFA generates the expected language.

• An enumeration algorithm can be used to verify whether two NFAs accept the same language (Conway, 1971).

• A cross-section algorithm can be used to determine whether every word accepted by a given NFA is a power - a string of the from wn for n>1, |w|>0. (Anderson, Rampersad, Santean, and Shallit, 2007)

• A cross-section algorithm can be used to solve the “k-subset of an n-set” problem: Enumerate all k-subset of a set in alphabetical order. (Ackerman & Shallit, 2007)

Page 7: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Objectives

Find algorithms for the three problems that are• Asymptotically efficient in

– Size of the NFA (s states and d transitions)– Output size (t)– The length of the words in the cross-section (n)

• Efficient in practice

Page 8: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Previous Work• A cross-section algorithm, where finding each consecutive word is 

super-exponential in the size of the cross-section (Domosi, 1998).• A cross-section algorithm that is exponential in n (length of words 

in the cross-section) is found in the Grail computation package.– “Breast-First-Search” approach– Trace all paths of length n in the NFA, storing  the paths that end at a final state.

– O(dσn+1), where d is the number of transitions in the NFA and σ is the alphabet size.

 

Page 9: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Previous Polynomial Algorithms: Makinen, 1997

• Dynamic programming solution– Min-word O(dn2) – Cross-section O(dn2+dt)– Enumeration O(d(e+t)) 

e: the number of empty cross-section encountered d: the number of transitions in the NFAn: the length of words in the cross-sectiont: the number of characters in the output

 

Quadratic in n

Page 10: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Previous Polynomial Algorithms: Ackerman and Shallit, 2007

• Linear in the length of words in the cross-section– Min-word: O(s2.376n)– Cross-section: O(s2.376n+dt)– Enumeration: O(s2.376c+dt)

c: the number of cross-section encounteredd: the number of transitions in the NFAn: the length of words in the cross-sectiont: the number of characters in the output

Linear in n

Page 11: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Previous Polynomial Algorithms: Ackerman and Shallit, 2007

• The algorithm uses “smart breadth first search,” following only those paths that lead to a final state. 

 • Main idea: compute a look-ahead matrix, used to determine whether there is a path of length i starting at state s and ending at a final state.

• In practice, Makinen’s algorithm (slightly modified) is usually more efficient, except on some boundary cases. 

Page 12: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Contributions

Present 3 algorithms for each of theenumeration problems, including:• O(dn) algorithm for min-word • O(dn+dt) algorithm for cross-section• Algorithms with improved practical performance for each of the enumeration problems

Page 13: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

 Contributions: Detailed• We present three sets of algorithms

1. AMSorted: - An efficient min-word algorithm, based on Makinen’s original algorithm.- A cross-section and enumeration algorithms based on this min-word algorithm. 

2. AMBoolean:- A more efficient min-word algorithm, based on minWordAMSorted.- A cross-section and enumeration algorithms based on this min-word algorithm. 

3. Intersection-based:- An elegant min-word algorithm. - A cross-section algorithm based on this min-word algorithm.

Page 14: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Key ideas behind our first two algorithms

- Makinen’s algorithm uses simple dynamic programming, which is efficient in practice on most NFAs.

- The algorithm by Ackerman & Shallit uses “smart breadth first search,” following only those paths that lead to a final state.

- We build on these ideas to yield algorithms that are more efficient both asymptotically and in practice. 

Page 15: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Makinen’s original min-word algorithm

0

3

1

2

A

B

C

1 2 3

A - (3,C) (3,C)

B 0 (2,B) (0,A)

C 1 (1,B) (1,B)

S[i] stores a representation of the minimal word w of length i that appears on a path from S to a final state. 

1

Page 16: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Makinen’s original min-word algorithm

0

3

1

2

A

B

C

1 2 3

A - (3,C) (3,C)

B 0 (2,B) (0,A)

C 1 (1,B) (1,B)

The minimal word of length n can be found by tracing back from the last column of the start state. 

1

Page 17: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Makinen’s original min-word algorithm

• Initialize the first column• For columns i = 2...n

– For each state SFind S[i] by comparing all words  of length i appearing onpaths from S to a final state.

1 2 3

A - (3,C) (3,C)

B 0 (2,B) (0,A)C 1 (1,B) (1,B)

0

3

1

2

A

B

C1

Page 18: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Makinen’s original min-word algorithm

• Initialize the first column• For columns i = 2...n

– For each state SFind S[i] by comparing all words  of length i appearing onpaths from S to a final state.

i operations

1 2 3

A - (3,C) (3,C)

B 0 (2,B) (0,A)

C 1 (1,B) (1,B)

0

3

1

2

A

B

C1

Page 19: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Makinen’s original min-word algorithm

• Initialize the first column• For columns i = 2...n

– For each state SFind S[i] by comparing all words  of length i appearing onpaths from S to a final state.

i operations

Theorem: Makinen’s original min-word algorithm is O(dn2). 

Page 20: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: MinWordAMSorted 

Idea: Sort every columns by the words that the entries represent. 

0

3

1

2

A

B

C1

1 2 3

A - (3,C) (3,C)

B 0 (2,B) (0,A)

C 1 (1,B) (1,B)

321

031

120

Page 21: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: MinWordAMSorted 

• We define an order on {S[i] : S a state in N}.• If A[1]=a and B[1]=b, where a<b, then 

A[1]<B[1].• For i > 1, A[i] = (a, A’) and B[i] = (b, B’)

– If a<b, then A[i] < B[i].– If a = b, and A’[i-1] < B’[i-1], then A[i] < B[i].

• If A[i] is defined, and B[i] is undefined, then A[i] > B[i].

Page 22: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: MinWordAMSorted 

• Initialize the first column• For columns i = 2...n

– For each state S• Find S[i] using only column i-1 and the edges leaving S.

– Sort column i

1 2 3

A - (3,C) (3,C)

B 0 (2,B) (0,A)C 1 (1,B) (1,B)

0

3

1

2

A

B

C1

Page 23: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: MinWordAMSorted 

• Initialize the first column• For columns i = 2...n

– For each state S• Find S[i] using only column i-1 and the edges leaving S.

– Sort column i

d operations

s log s operations

Theorem: The algorithm minWordAMSorted is O((s log s +d) n). 

Page 24: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New cross-section algorithm: crossSectionAMSorted 

• A state S is i-complete if there exists a path of length i from state S to a final state.

• To enumerate all words of length n:1. Call minWordAMSorted (create a table)2. Perform a “smart BFS”: - Begin at the start state.- Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. 

O((s log s +d) n). O(dt)

Theorem: The algorithm crossSectionAMSorted is O(n (s log s + d) + dn). 

Page 25: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New enumeration algorithm: enumAMSorted 

    Run the cross-section algorithm until the required number of words are listed, while reusing the table.

Theorem: The algorithm enumAMSorted is O(c (s log s + d)+ dt).

c: the number of cross-section encounteredd: the number of transitions in the NFAt: the number of characters in the output

Page 26: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

What have we got so far? New Algorithms Previous Algorithms

Sorted Makinen Ackerman & Shallit

min-word O((s log s + d)n) O(dn2) O(s2.376n)

cross-section O(n (s log s + d)+dt) O(dn2+dt) O(s2.376n+dt)

enumeration O(c (s log s +d) + dt) O(de + dt) O(s2.376c+dt)

c: the number of cross-section encounterede: the number of empty cross-section encounteredd: the number of transitions in the NFAn: the length of words in the cross-sectiont: the number of characters in the output

Page 27: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: minWordAMBoolean

Idea: instead of using a table to find the minimal word, construct a table whose only purpose is to determine i-complete states. 

Can be done using a similar algorithm to minWordAMSorted, but more efficiently, since there is no need to sort. 

Page 28: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: minWordAMBoolean

0

3

1A

B

C

1 2 3

A F T T

B T T T

C T T F

Page 29: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: minWordAMBoolean

• Fill in the first column• For i=2 ... n

– For every state S• Determine whether S is i-complete using only the transitions leaving S and column i-1

• Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). 

    1 2 3

A F T T

B T T TC T T F

0

3

1A

B

C

Page 30: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: minWordAMBoolean

• Fill in the first column• For i=2 ... n

– For every state S• Determine whether S is i-complete using only the transitions leaving S and column i-1

• Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). 

    1 2 3

A F T T

B T T TC T T F

0

3

1A

B

C

d operations

Page 31: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New min-word algorithm: minWordAMBoolean

• Fill in the first column• For i=2 ... n

– For every state S• Determine whether S is i-complete using only the transitions leaving S and column i-1

• Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). 

    

d operations

Theorem: The algorithm minWordAMBoolean is O(dn). 

Page 32: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New cross-section algorithm: crossSectionAMBoolean 

• Extend to a cross-section algorithm using the same approach as the Sorted algorithm. 

• To enumerate all words of length n:– Call minWordAMBoolean (create a table)– Perform a “smart BFS”: - Begin at the start state.- Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. 

O(dn). O(dt)

Theorem: The algorithm crossSectionAMBoolean is O(dn+dt). 

Page 33: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

New enumeration algorithm: enumAMBoolean 

    Run the cross-section algorithm until the required number of words are listed, while reusing the table. 

Theorem: The algorithm enumAMBoolean is O(de+ dn).

e: the number of empty cross-section encounteredd: the number of transitions in the NFAn: the length of words in the cross-sectiont: the number of characters in the output

Page 34: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

What have we got so far? New Algorithms Previous Algorithms

Sorted Boolean Makinen Ackerman & Shallit

min-word O((s logs+d)n) O(dn) O(dn2) O(s2.376n)

cross-section O(n (s log s+d)+dt) O(dn+dt) O(dn2+dt) O(s2.376n+dt)

enumeration O(c (s log s +d) + dt) O(de+dt) O(de+dt) O(s2.376c+dt)

c: the number of cross-section encounterede: the number of empty cross-section encounteredd: the number of transitions in the NFAn: the length of words in the cross-sectiont: the number of characters in the output

Page 35: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

• We present surprisingly elegant min-word and cross-section algorithms that have the asymptotic efficiency of the Boolean-based algorithms.  

• However, these algorithms are not as efficient in practice as the Boolean-based and Sorted-based algorithms. 

Intersection-Based Algorithms 

Page 36: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n.

1. Let C = N x A2. Remove all states of C that cannot be 

reached from the final states of C using reversed transitions. 

3.  Starting at the start state, follow the minimal n consecutive transitions to a final state. 

New min-word algorithm: minWordIntersection 

Page 37: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n.

1. Let C = N x A2. Remove all states of C that cannot be 

reached from the final states of C using reversed transitions. 

3.  Starting at the start state, follow the minimal n consecutive transitions to a final state. 

New min-word algorithm: minWordIntersection 

1

0

1

A

B

C

Automaton N

Let n = 2

0

0 0

1

1 1

Automaton A

0

Page 38: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n.

1. Let C = N x A2. Remove all states of C that cannot be 

reached from the final states of C using reversed transitions. 

3.  Starting at the start state, follow the minimal n consecutive transitions to a final state. 

New min-word algorithm: minWordIntersection 

Automaton N

0

0

1

1

Automaton C

1

0

1

A

B

C0

Page 39: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n.

1. Let C = N x A2. Remove all states of C that cannot be 

reached from the final states of C using reversed transitions. 

3.  Starting at the start state, follow the minimal n consecutive transitions to a final state. 

New min-word algorithm: minWordIntersection 

1

1

Page 40: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n.

1. Let C = N x A2. Remove all states of C that cannot be 

reached from the final states of C using reverse transitions. 

3.  Starting at the start state, follow the minimal n consecutive transitions to a final state. 

New min-word algorithm: minWordIntersection 

1

1

Thus the minimal word of length 2 accepted by N is “11”

Page 41: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

1. Let C = N x A2. Remove all states of C that cannot be 

reached from the final states of C using reverse transitions. 

3.  Starting at the start state, Follow the minimal n consecutive transitions to final. 

Asymptotic running time of minWordIntersection 

Each step is proportional to size of C, which is O(nd).

Theorem: The algorithm minWordIntersection is O(dn). 

Concatenate n copies of N.

Page 42: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

• To enumerate all words of length n, perform BFS on C = N x A, and remove all states not reachable from final state removed (using reverse transitions). 

• Since all paths of length n starting at the start state lead to a final state, there is no need to check for i-completness. 

New cross-section algorithm: crossSectionIntersection 

Theorem: The algorithm crossSectionIntersection is O(dn+dt). 

Page 43: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

• We compared Makinen’s, Ackerman-Shallit, AMSorted, and AMBoolean, and Intersection-based algorithms. 

• Tested the algorithms on a variety of NFAs: dense, sparse, few and many final states, different alphabet size, worst case for Makinen’s algorithm, ect…

• Here are the best performing algorithms:– Min-word: AMSorted– Cross-section: AMBoolean– Enumeration: AMBoolean

Practical Performance

Page 44: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

SummaryNew Algorithms Previous Algorithms

Sorted Boolean Intersection Makinen Ackerman & Shallit

min-word O((s logs+d)n) O(dn) O(dn) O(dn2) O(s2.376n)

cross-section O(n (s log s +d)+dt) O(dn+dt) O(dn+dt) O(dn2+dt) O(s2.376n+dt)

enumeration O(c (s log s +d) + dt)

O(de+dt) - O(de+dt) O(s2.376c+dt)

c: the number of cross-section encounterede: the number of empty cross-section encounteredd: the number of transitions in the NFAn: the length of words in the cross-sectiont: the number of characters in the output

: most efficient in practice

Page 45: Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

• Extending the intersection-based cross-section algorithm to an enumeration algorithm.

• Lower bounds.• Can better results be obtained using a different order?

• Restricting attention to a smaller family of NFAs. 

Open problems