Top Banner
DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations Rajeev Alur Loris D’Antoni Mukund Raghothaman POPL 2015 1
39

POPL 2015 Slides

Jan 02, 2017

Download

Documents

HoàngMinh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: POPL 2015 Slides

DReX: A Declarative Language for EfficientlyEvaluating Regular String Transformations

Rajeev Alur Loris D’Antoni Mukund Raghothaman

POPL 2015

1

Page 2: POPL 2015 Slides

DReX is a DSL for String Transformationsalign-bibtex

...

@book{Book1 ,title = {Title0},author = {Author1},year = {Year1},

}

@book{Book2 ,title = {Title1},author = {Author2},year = {Year2},

}

...

...

@book{Book1 ,title = {Title1},author = {Author1},year = {Year1},

}

...

2

Page 3: POPL 2015 Slides

Describing align-bibtex Using DReXThe simpler issue of make-entry

Given two entries, Entry1 and Entry2, make-entry outputs thetitle of Entry2 and the remaining body of Entry1

Entry1 Entry2

All but title Title only

3

Page 4: POPL 2015 Slides

Describing align-bibtex Using DReXalign-bibtex = chain(make-entry,REntry)

Entry1 Entry2 Entry3 Entryk−1 Entryk

make-entry(Entry1Entry2)

make-entry(Entry2Entry3)

make-entry(Entry3Entry4)

make-entry(Entryk−1Entryk )

Function combinators — such as chain — combine smallerfunctions into bigger ones

4

Page 5: POPL 2015 Slides

Why DReX?I DReX is declarative

Languages, Σ∗ → bool ≡ Regular expressionsTranformations, Σ∗ → Γ∗ ≡ DReX

I DReX is fast: Streaming evaluation algorithm for well-typedexpressions

I Based on robust theoretical foundationsI Expressively equivalent to regular string transformationsI Multiple characterizations: two-way finite state transducers,

MSO-definable graph transformations, streaming stringtransducers

I Closed under various operations: function composition, regularlook-ahead etc.

I DReX supports algorithmic analysisI Is the transformation well-defined for all inputs?I Does the output always have some “nice” property?∀σ, is it the case that f (σ) ∈ L?

I Are two transformations equivalent?5

Page 6: POPL 2015 Slides

DReX is publicly available! Go to drexonline.com

6

Page 7: POPL 2015 Slides

Function Combinators

7

Page 8: POPL 2015 Slides

Base functions: σ 7→ γ

Map input string σ to γ, and undefined everywhere else

“.c” 7→ “.cpp”

σ ∈ Σ∗ and γ ∈ Γ∗ are constant stringsAnalogue of basic regular expressions: {σ}, for σ ∈ Σ∗

8

Page 9: POPL 2015 Slides

Conditionals: try f else g

If f (σ) is defined, then output f (σ), and otherwise output g(σ)

try [0-9]∗ 7→ “Number”

else [a-z]∗ 7→ “Name”

Analogue of unambiguous regex union

9

Page 10: POPL 2015 Slides

Split sum: split(f , g)

Split σ into σ = σ1σ2 with both f (σ1) and g(σ2) defined. Ifthe split is unambiguous then split(f , g)(σ) = f (σ1)g(σ2)

σ1 σ2

f (σ1) g(σ2)

f g

I Analogue of regex concatenationI If title maps a BibTeX entry to its title, and body maps a

BibTeX entry to the rest of its body, thenmake-entry = split(body, title)

10

Page 11: POPL 2015 Slides

Iterated sum: iterate(f )

Split σ = σ1σ2 . . . σk , with all f (σi) defined. If the split isunambiguous, then output f (σ1)f (σ2) . . . f (σk)

σ1 σ2 σk

f (σ1) f (σ2) f (σk)

f f f

I Kleene-*I If echo echoes a single character, then id = iterate(echo) is the

identity function

11

Page 12: POPL 2015 Slides

Left-iterated sum: left-iterate(f )

Split σ = σ1σ2 . . . σk , with all f (σi) defined. If the split isunambiguous, then output f (σk)f (σk−1) . . . f (σ1)

σ1 σk−1 σk

f (σk) f (σk−1) f (σ1)

Think of string reversal: left-iterate(echo)

12

Page 13: POPL 2015 Slides

“Repeated” sum: combine(f , g)

combine(f , g)(σ) = f (σ)g(σ)

σ

f (σ) g(σ)

f g

I No regex equivalentI σ 7→ σσ: combine(id, id)

13

Page 14: POPL 2015 Slides

Chained sum: chain(f ,R)

σ1 ∈ L(R) σ2 ∈ L(R) σ3 ∈ L(R) σk ∈ L(R)

f (σ1σ2) f (σ2σ3) f (σ3σ4) f (σk−1σk)

And similarly for left-chain(f ,R)

14

Page 15: POPL 2015 Slides

Summary of Function Combinators

Purpose Regular Transformations Regular Expressions

Base ⊥, σ 7→ γ ∅, {σ}Concatenation split(f , g), left-split(f , g) R1 ·R2Union try f else g R1 ∪ R2Kleene-* iterate(f ), left-iterate(f ) R∗

Repetition combine(f , g)New!Chained sum chain(f ,R),

left-chain(f ,R)

15

Page 16: POPL 2015 Slides

Regular String Transformations

Or, why our choice of combinators was not arbitrary

Languages, Σ∗ → bool ≡ DFATranformations, Σ∗ → Γ∗ ≡ ?

16

Page 17: POPL 2015 Slides

Historical ContextRegular languages

Beautiful theory

Regular expressions ≡ DFA

Analysis questions (mostly) efficiently decidable

Lots of practical implementations

17

Page 18: POPL 2015 Slides

String Transducers

One-way transducers: Mealy machinesa/babc

Folk knowledge [Aho et al 1969]Two-way transducers strictly more powerful than one-way transducers

Gap includes many interesting transformationsExamples: string reversal, copy, substring swap, etc.

18

Page 19: POPL 2015 Slides

String TransducersTwo-way finite state transducers

I Known resultsI Closed under composition [Chytil, Jákl 1977]I Decidable equivalence checking [Gurari 1980]I Equivalent to MSO-definable string transformations [Engelfriet,

Hoogeboom 2001]

I Streaming string transducers: Equivalent one-way deterministicmodel with applications to the analysis of list-processingprograms [Alur, Černý 2011]

I Two-way finite state transducers are our notion of regularity

19

Page 20: POPL 2015 Slides

Function Combinators are Expressively Complete

Theorem (Completeness, Alur et al 2014)

All regular string transformations can be expressed using thefollowing combinators:

I Basic functions: ⊥, σ 7→ γ,I split(f , g), left-split(f , g),I try f else g ,I iterate(f ), left-iterate(f ),I combine(f , g),I chained sums: chain(f ,R), and left-chain(f ,R).

20

Page 21: POPL 2015 Slides

Evaluating DReX Expressions

21

Page 22: POPL 2015 Slides

The Anatomy of a Streaming Evaluator

(a, 1) (b, 2) (b, 3)

(Result, γ)

(a, 4) (b, 5)

(Result, γ′)

(σn, n)

Evaluatorfor f

(σi , i)

(Result, γ)

22

Page 23: POPL 2015 Slides

The Case of split(f , g)

1 i j n

f defined

f defined g defined

TfTg

(Start, i)

(σi , i)

(Result,

j ,

γ)

(Kill, j)

(Start, i)

(σi , i)

(Result,

j ,

γ)

(Kill, j)

Thread startingat index

Index at whichTf responded

Result reportedby Tf

2 9 aaab3 7 abbab. . . . . . . . .

23

Page 24: POPL 2015 Slides

The Case of split(f , g)

1 i j n

f defined

f defined g defined

TfTg

(Start, i)

(σi , i)

(Result,

j ,

γ)

(Kill, j)

(Start, i)

(σi , i)

(Result,

j ,

γ)

(Kill, j)

Thread startingat index

Index at whichTf responded

Result reportedby Tf

2 9 aaab3 7 abbab. . . . . . . . .

23

Page 25: POPL 2015 Slides

The Case of split(f , g)

1 i j nf definedf defined g defined

TfTg

(Start, i)

(σi , i)

(Result,

j ,

γ)

(Kill, j)

(Start, i)

(σi , i)

(Result,

j ,

γ)

(Kill, j)

Thread startingat index

Index at whichTf responded

Result reportedby Tf

2 9 aaab3 7 abbab. . . . . . . . .

23

Page 26: POPL 2015 Slides

The Case of split(f , g)

1 i j nf definedf defined g defined

TfTg

(Start, i)

(σi , i)

(Result, j , γ)

(Kill, j)

(Start, i)

(σi , i)

(Result, j , γ)

(Kill, j)

Thread startingat index

Index at whichTf responded

Result reportedby Tf

2 9 aaab3 7 abbab. . . . . . . . .

23

Page 27: POPL 2015 Slides

The Case of split(f , g)

1 i j nf definedf defined g defined

TfTg

(Start, i)

(σi , i)

(Result, j , γ)

(Kill, j)

(Start, i)

(σi , i)

(Result, j , γ)

(Kill, j)

Thread startingat index

Index at whichTf responded

Result reportedby Tf

2 9 aaab3 7 abbab. . . . . . . . .

23

Page 28: POPL 2015 Slides

The Case of split(f , g)

1 i j nf definedf defined g defined

TfTg

(Start, i)

(σi , i)

(Result, j , γ)

(Kill, j)

(Start, i)

(σi , i)

(Result, j , γ)

(Kill, j)

Thread startingat index

Index at whichTf responded

Result reportedby Tf

2 9 aaab3 7 abbab. . . . . . . . .

23

Page 29: POPL 2015 Slides

The Case of split(f , g)

I What if two threads of Tg report results simultaneously?

f defined g defined

f defined g defined

I Statically disallow!I split(f , g) is well-typed iff

I both f and g are well-typed, andI their domains are unambiguously concatenable

24

Page 30: POPL 2015 Slides

Main Result

Theorem

1. All regular string transformations can be expressed as well-typedDReX expressions.

2. DReX expressions can be type-checked in O(poly(|f |, |Σ|)).3. Given a well-typed DReX expression f , and an input string σ,

f (σ) can be computed in time O(|σ|, poly(|f |)).

25

Page 31: POPL 2015 Slides

Summary of Typing Rules

I ⊥, σ 7→ γ are always well-typedI split(f , g) and left-split(f , g) are well-typed iff

I f and g are well-typed, andI Dom(f ) and Dom(g) are unambiguously concatenable

I try f else g is well-typed iffI f and g are well-typed, andI Dom(f ) and Dom(g) are disjoint

I iterate(f ) and left-iterate(f ) are well-typed iffI f is well-typed, andI Dom(f ) is unambiguously iterable

I chain(f ,R) and left-chain(f ,R) are well-typed iffI f is well-typed, R is an unambiguous regular expression,I Dom(f ) is unambiguously iterable, andI Dom(f ) = JR ·RK

26

Page 32: POPL 2015 Slides

Experimental Results

27

Page 33: POPL 2015 Slides

Experimental ResultsStreaming evaluation algorithm for well-typed expressions

0

1

2

3

4

5

6

7

8

0 20000 40000 60000 80000 100000

second

s

characters

delete-comminsert-quotesget-tagsreverseswap-bibtexalign-bibtex

I align-bibtex has 3500 nodes in syntax tree, typechecks in ≈halfa second

I Type system did not get in the way

28

Page 34: POPL 2015 Slides

Conclusion

I Introduced a DSL for regular string transformationsI Described a fast streaming algorithm to evaluate well-typed

expressions

29

Page 35: POPL 2015 Slides

ConclusionSummary of operators

Purpose Regular Transformations Regular Expressions

Base ⊥, σ 7→ γ ∅, {σ}Concatenation split(f , g), left-split(f , g) R1 ·R2Union try f else g R1 ∪ R2Kleene-* iterate(f ), left-iterate(f ) R∗

Repetition combine(f , g)New!Chained sum chain(f ,R),

left-chain(f ,R)

30

Page 36: POPL 2015 Slides

Future Work

I Implement practical programmer assistance toolsI Static: Precondition computatation, equivalence checkingI Runtime: Debugging aids

I Theory of regular functionsI Automatically learn transformations from teachers (L*), from

input / output examples, etc.I Trees to trees / strings (Processing hierarchical data, XML

documents, etc.)I ω-strings to strings

I Non-regular extensionsI “Count number of a-s in a string”

31

Page 37: POPL 2015 Slides

Thank you! Questions?

drexonline.com

32

Page 38: POPL 2015 Slides

What About Unrestricted DReX Expressions?

33

Page 39: POPL 2015 Slides

Evaluating Unrestricted DReX Expressions is HardOr, why the typing rules are essential

I With function composition, it is PSPACE-completeI combine(f , g) is defined iff both f and g are defined

Flavour of regular expression intersectionThe best algorithms for this are either

I Non-elementary in regex size, orI Cubic in length of input string

34