-
Fusing Effectful Comprehensions ∗
Olli SaarikiviAalto University
[email protected]
Margus Veanes Todd MytkowiczMadan Musuvathi
Microsoft Research{margus,toddm,madanm}@microsoft.com
AbstractList comprehensions provide a powerful abstraction
mechanism forexpressing computations over ordered collections of
data declara-tively without having to use explicit iteration
constructs. This paperputs forth effectful comprehensions as an
elegant way to describelist comprehensions that incorporate loop
carried state. This is mo-tivated by operations such as
compression/decompression and se-rialization/deserialization that
are common in log/data processingpipelines and require loop-carried
state when processing an inputstream of data.
We build on the underlying theory of symbolic transducers to
fusepipelines of effectful comprehensions into a single
representation,from which efficient code can be generated. Using
backgroundtheory reasoning with an SMT solver our fusion and
subsequentreachability based branch elimination algorithms can
significantlyreduce the complexity of the fused pipelines. Our
implementationshows significant speedups over reasonable
hand-written code (3×,on average) and a LINQ implementation of the
pipeline (5×, onaverage) for a variety of examples, including
scenarios for extractingfields with regular expressions, processing
XML with XPath, andrunning queries over encoded data.
Finally, we formalize the semantics of symbolic transducersand
their compositions as a transduction monad, which provides alink
between the automata-theoretic view and a monadic view ofsymbolic
transducers.
Categories and Subject Descriptors F.1.1 [Computation by
Ab-stract Devices]: Models of Computation—Automata; D.3.4
[Pro-gramming Languages]: Processors—Code generation; F.3.1
[Log-ics and Meanings of Programs]: Specifying and Verifying and
Rea-soning about Programs—Mechanical verification; D.3.2
[Program-ming Languages]: Language Classifications—Specialized
applica-tion languages
General Terms Algorithms, Languages, Performance,
Verification
Keywords transducers, comprehensions, fusion,
deforestation,reachability analysis, applicative functors,
monads
1. IntroductionList comprehensions provide a powerful mechanism
for declara-tively specifying a pipeline of computations on
collections of data.Programmers specify the various stages of the
pipeline conciselyand modularly without using explicit iteration
constructs, while theruntime ameliorates the cost of the
abstraction by performing variousoptimization such as
fusion/deforestation [27, 35].
This paper extends this idea to effectful comprehensions,
anelegant way to describe list comprehensions that incorporate
loop-carried state. As a motivation, consider the problem of
analyzing
∗Microsoft Research Technical Report no. MSR-TR-2016-55
0010101101001010101100011101011111001011…
[Dec12, SPY, 50.13] [Dec13, SPY, 49.44] [Dec14, …
Deserialize (chars -> objects)
50.13, 49.44, 48.13, 51.32, 53.53, 49.12, 48.14, …
SelectPrice (objects -> floats)
0, 0, 0, 0, 5, 0, 0, 3, 0, 0, 0, 0, 0, 0, 7, …
FindPriceDips (floats -> ints)
b1 a9 86 a8 70 7d a3 66 01 05 3a 0f d4 51 af 83...
Serialize (ints -> bytes)
Compress (bytes -> bits)
111000100100010000101111011111101101000...
75 49 95 7d 2e 98 80 e4 3e 76 0b 3b 2e 92 18 5e...
Decompress (bits -> bytes)
“12/12/12 SPY 50.13\n 12/13/12 SPY 49.44\n …”
UTF8Decode (bytes -> chars)
Data from disk/network
Data to disk/network
output =input.Decompress(. . .).Decode(. . .).Deserialize(. .
.)
.SelectPrice(. . .).FindPriceDips(. . .)
.Serialize(. . .).Compress(. . .);
Figure 1. Motivating example of a log processing pipeline where
aninput stream of bits goes through various stages to an output
streamof bits. This paper allows programmers to declaratively
specify thispipeline as a composition of symbolic transducers, as
shown at thebottom.
logs as shown in Figure 1. The log on the disk (or coming
acrossthe network from a file server) is compressed, and thus the
userhas to first decompress the input stream of bits into bytes
which isthen deserialized into objects in a higher-level language,
such asJava. In this example, the application selects stock prices
from eachobject and looks for price dips — decreases followed by
increases.The output is then serialized and compressed before being
writtenback to disk. Such processing from input stream of bits to
outputstream of bits is not uncommon today. For instance, the
processingin a single node, such as a mapper or a reducer, of
data-processingsystems [2, 7, 13, 38], is similar to the one shown
in Figure 1.
Note that the stages in the pipeline include both
“functional”computations that operate on each input independently,
such asSelectPrice, and “effectful” computations that iterate over
theinput list while maintaining loop-carried state, such as
Decompress,Deserialize, and FindPriceDips. The goal of this paper
is toallow such pipelines to be declaratively and modularly
specifiedas shown at the bottom of the figure, then fuse them to a
singlerepresentation for which efficient code can be generated. We
use avariation of symbolic transducers [33] as our program
representation.
In order to provide some intuition we consider a concrete
butsimplified example scenario of such a pipeline, consisting of
twosymbolic transducers. The situation that we consider is a
fairlytypical one when the raw input data is unstructured text, for
examplewhen parsing CSV files. Raw text is most commonly assumed
to
1 2016/10/3
-
be UTF8 encoded. Suppose that the task is to parse and extract
anonnegative integer from the text, assuming a decimal encodingwith
ASCII digits, i.e., matching the regex ^[0-9]+$. Suppose oursample
pipeline is as follows: it first UTF8 decodes (Utf8Decode)and then
parses an integer (ToInt). Utf8Decode takes as input asequence of
bytes and produces a sequence of integers that are thedecoded
Unicode character codes. For simplicity assume that onlyup to 2
byte encodings are allowed.1 Utf8Decode can be
illustratedgraphically as follows:2
c∈[0xC2-0xDF]/[];r:=((c&0x3F)
-
hensions. Figure 4 presents a function implementing a pipeline
ofthe Utf8Decode and ToInt comprehensions using C#’s LINQ
[22]library4. Utf8Decode is represented as a SelectMany, which
allowsproducing variable amounts of output. Since SelectMany does
notencapsulate state usage, Utf8Decode uses ad-hoc state in the
formof local variables, which complicates analyses by potentially
allow-ing different stages in the pipeline to communicate through
sharedstate. Because ToInt’s Update does not produce output it can
berepresented with Aggregate, which does encapsulate state.
However,writing effectful comprehensions that do partial state
updates withAggregate is cumbersome, since returning the new state
disallowsspecifying only the parts that change.
To address these concerns we present a C# interface (Section
5.1)for specifying effectful comprehensions that encapsulates state
usage.The interface is similar to ones found in existing streaming
libraries(Section 8). We translate programs that implement this
interface intosymbolic transducers. Additionally, we provide
specialized frontendsfor parsing scenarios based on regex and XPath
matching.
We evaluate the efficacy of our approach on a variety of
dataprocessing pipelines that decode, parse, compute, and then
serializeback to disk. These pipelines exhibit common real-world
scenariosof extracting data with regexes, querying XML files with
XPath, andworking with (Base64) encoded data. On average, our fused
code is3× faster than reasonable hand-written code and 5× faster
than aLINQ implementation. We further demonstrate that our
conservativereachability analysis and subsequent pruning based on
backgroundtheory reasoning can significantly reduce the complexity
of thesefused pipelines.
Finally, we formalize the semantics of transducers and their
com-position using applicative functors and monads in a purely
functionalstyle. A transduction monad extends state monads with
compositionmechanisms allowing us to compose transducers. We found
thisconnection to the functional programming world important
becauseit explains the problem from a very different angle and lets
us formal-ize composition unambiguously and succinctly. The
functional viewalso provides a way to explain composition in a more
declarativestyle, as opposed to automata based formulations that
are mostlyoperational.
The contributions of this paper are:
• A variation of symbolic transducers with branching rules,
whichsimplify analysis and code generation.• An algorithm for
fusing symbolic transducers.• A branch elimination algorithm based
on reachability analysis
which complements the satisfiability based branch
eliminationbuilt into the fusion algorithm.• A frontend for
specifying effectful comprehensions and a strat-
egy for translating these into symbolic transducers.
Additionally,we provide frontends for regex and XPath based parsing
scenar-ios.• A monadic formalization of transducers and their
compositions.• A comprehensive evaluation demonstrating the
efficacy of our
approach.
2. Symbolic TransducersThis section formally introduces symbolic
transducers or STs, as ageneralization of symbolic finite
transducers or SFTs. The definitionused here differs in the
following key aspect from the originalintroduction of STs [33] — it
is specialized for deterministic STs.This specialization is
reflected in the way individual transitions are
4 The code for other list comprehension libraries, such as Java
8’s StreamsAPI, is largely similar.
q0 c∈[0-0x7F]
c∈[0xC2-0xDF][c];r:=0
false case
q1
true case
[];r:=((c&0x3F)
-
standard Cartesian product and function types, respectively.6
Thetype for Booleans is bool with truth values true and false. LetT
(τ) denote a given predefined set of terms t that denote
values[[t]] of type τ . In our implementation we use Z3 [12]
expressions forT (τ) but the general definition is not restricted
to any fixed represen-tation. Further, our implementation
constructs no terms of the formT ((τ → σ)→ ρ), for which there is
no direct representation or de-cision procedures in Z3. In general
our theory and algorithms workwith any decidable background theory.
A term in T (τ → bool) isa τ -predicate.
Let [τ ] denote the type of finite-length lists of elements of
typeτ . A list of type [τ ] is denoted by [t1, . . . , tn] or
[ti]ni=1 wheren ≥ 0 and each ti is a term or value of type τ . We
assume thatif τ is a Cartesian product type τ1 × τ2 then there are
projectionfunctions first : τ → τ1 and second : τ → τ2 and a
pairingfunction 〈, 〉 : τ1 → τ2 → τ1 × τ2 with the intended
semanticsthat [[〈t1, t2〉]] = ([[t1]], [[t2]]), and [[first〈t1,
t2〉]] = [[t1]], and[[second〈t1, t2〉]] = [[t2]]. Each type τ denotes
a nonempty set andhas a default element defaultτ .
The formal definition of a rule is as follows. Given types τ, o,
ρand a finite set (or type) Q, letR(τ, o,Q, ρ) denote the smallest
setX of rules satisfying the following conditions:
• Undef ∈ X;• for n ≥ 0 if {fi}ni=1 ⊆ T (τ → o), g ∈ T (τ → ρ),
and q ∈ Q
then Base([fi]ni=1, q, g) ∈ X;• if ϕ ∈ T (τ → bool) and t, f ∈ X
then Ite(ϕ, t, f) ∈ X .
A rule r ∈ R(τ, o,Q, ρ) denotes a partial function [[r]] of
typeτ → [o]×Q× ρ : 7
[[Undef]] v = ⊥ for all values v;[[Base([fi]ni=1, q, g)]] v =
([[[fi]] v]
ni=1, q, [[g]] v);
[[Ite(ϕ, t, f)]] v ={
[[t]] v, if [[ϕ]] v = true;[[f ]] v, otherwise.
A symbolic transducer (ST) is a tuple (ι, o, ρ,Q, q0, r0, δ, $)
wherethe components are:
• input type ι;• output type o;• register type ρ;• finite
control state set Q;
• initial state (q0, r0) of state type σ def= Q× ρ, and s0 def=
(q0, r0);• transition function δ : Q→R(ι× ρ, o,Q, ρ);• finalizer $
: Q→R(ρ, o,Q, ρ);
We indicate the component of an ST by using the ST as a
subscript,unless the ST is clear from the context.
The finalizer is used to produce a final output list upon
reachingthe end of the input list. It is a generalization of a
final state.Intuitively one may think of the finalizer as being a
special caseof the transition function that is triggered by a
unique end-of-inputsymbol. However, unlike in the classical
setting, formally such asymbol cannot in general be treated as an
element of type ι. Instead
6 As usual,→ is right-associative. We assume that× is also
right-associativeand has higher precedence than→.7 We can lift the
rule type τ → [o] × Q × ρ to be the type of a totalfunction τ →
([o] × Q × ρ)? by using option types, but here we workdirectly with
partial functions f of type τ1 → τ2 as relations of type τ1×τ2with
the understanding that if (a, b), (a, b′) ∈ [[f ]] then b = b′.
Moreover,[[f ]](a)
def= b if (a, b) ∈ [[f ]] and [[f ]](a) def= ⊥ if {b|(a, b) ∈
[[f ]]} = ∅.
of lifting every input type ι to a sum type of ι and an
end-of-inputsymbol, end-of-input is handled separately by the
finalizer.
We adopt the following variable naming conventions of
termsoccurring in rules. In a term t occurring in a rule, variable
x is oftype ι and refers to the input element and variable r is of
type ρand refers to the register. To disambiguate between variables
andfunctions that appear in formulas from those used in our
definitions,proofs and algorithms, we use a mono-space font for the
former.For example in (x = ϕ) the x is a literal part of the
formula, whileϕ refers to another formula. Substitution of a
variable y by a term uin t is denoted t{y 7→ u}.
In Utf8Decode, in Figure 5, the finalizer is depicted as q0
beingaccepting and q1 being non-accepting in the classical sense,
meaningthat the finalizer is the function:
$Utf8Decode = {q0 7→ Base([], q0, 0), q1 7→ Undef}]The finalizer
of ToInt, in Figure 6, that is shown as the dashed arrows,is the
function:
$ToInt = {p0 7→ Undef, p1 7→ Base([r], p1, 0)},where the final
value of the register r is output upon reaching theend of the input
list in the control state p1, whereas the initial controlstate p0
is not valid as a final state and the input would be rejected ifthe
input list terminates in this state.
An ST A denotes a transduction [[A]] that is a partial
functionof type [ι] → [o]. First, we define the following partial
semanticfunctions δ̂ : ι→ σ → [o]× σ and $̂ : σ → [o]× σ that
enable usto provide a declarative definition of [[A]]:
δ̂ a (q, b)def= [[δ q]](a, b); $̂ (q, b)
def= [[$ q]] b.
Let ā = [ai]ki=1 be a given input list. Let π1(a, b)def= a.
Then
[[A]] ādef= π1(((δ̂ a1)⊕ (δ̂ a2)⊕ · · · ⊕ (δ̂ ak)⊕ $̂)s0)
(1)
where ⊕ is a left-associative operator of type(σ → [o]× σ)× (σ →
[o]× σ)→ (σ → [o]× σ)
that composes single-input transduction steps into multi-input
trans-duction steps, with the formal definition:
F1 ⊕ F2def= λs. let (u1, s1) = (F1 s) in
let (u2, s2) = (F2 s1) in(u1 + u2, s2)
where ‘+’ denotes list concatenation. For example, given ā
=[0xC5, 0x93] and A = Utf8Decode, we have that, s0 = (q0, 0),
[[A]] ā = π1(((δ̂ 0xC5)⊕ (δ̂ 0x93)⊕ $̂)s0)= π1(((λs.([] +
[‘œ’], s
0))⊕ $̂)s0)= π1((λs.([] + [‘œ’] + [], s
0))s0)= [‘œ’]
We refer to ⊕ as step composition and revisit it in Section 7.
Themain intuition about⊕ is that it combines function composition
withlist comprehension in the following sense. If the arguments F1
andF2 do not depend on the state s then⊕ corresponds to
concatenation,as in a typical SelectMany list comprehension in
LINQ. If, on theother hand, F1 and F2 produce no outputs and only
transform thestate, then ⊕ corresponds to function composition.
3. Fusion of STsConsider two STs A and B such that oA = ιB . We
want to fuseA and B into a single ST A ⊗ B such that [[A ⊗ B]] is
equivalentto [[A]] ◦ [[B]], i.e., λx.[[B]]([[A]](x)). We first
explain the main ideabehind the construction. We then explain the
incremental algorithmthat makes the composition scale in practice.
The control-state
4 2016/10/3
-
complexity of the algorithm is |Q|2. Typically |Q| is in the
range of100-1000. The worst-case complexity with respect to the
size of therules is also quadratical, even when the number of
control states issmall. It is therefore instrumental to prune
unreachable states earlyand to develop incremental algorithms.
3.1 Main ideaAt a high level, the fusion algorithm of A⊗B can be
described asfollows. A ⊗ B has the following components: ι = ιA, o
= oB ,ρ = ρA × ρB , Q ⊆ QA × QB , r0 = (r0A, r0B), q0 = (q0A,
q0B).The goal of the fusion algorithm is to construct δA⊗B and $A⊗B
.
For each pair (p, q) of control states in QA ×QB build a
fusedrule that, given the rule δA p, symbolically runs δB q
treating allof the output lists [vi]ni=1 that occur in the
Base-subrules of δA pas symbolic values. The symbolic values are
substituted into theregister update and output functions of (δ̂Bv1)
⊕ · · · ⊕ (δ̂Bvn),that is partially evaluated with respect to the
control state q, andfinally normalized into a rule inR(ι× ρ, o,Q,
ρ). The finalizer isconstructed similarly.
While such brute force approach will terminate in theory,
becausethe output lists have a fixed length that is independent of
the inputelement, it is highly impractical for several reasons. One
problem iscontrol state space size, because |Q| = |QA||QB |.
Another problemis output-branch explosion. Just consider
self-composition of anencoder (say, with a single control state)
that may output n elementsfor some input element. Then the
composition may potentially outputn2 elements for some input
element, but most of those cases maybe infeasible due to symbolic
constraints imposed by the outputfunctions and their guards in A
when considered as inputs of B. Forexample, an HTML encoder H may
output a character with codehex(x÷ 32) in one of its branches,
where
hex(y) = if (0 ≤ y ≤ 9) then (y + 48) else (y + 55)
if the guard γ(x) = 0x100 ≤ x ≤ 0xFFF holds for the
inputcharacter x. However, in a double-HTML encoder H ⊗ H ,
thecorresponding composed guard γ(hex(x ÷ 32)) ∧ γ(x) for
thatelement is unsatisfiable, which requires nontrivial integer
linearconstraint reasoning in order to eliminate that branch. Such
pruningrequires incremental symbolic techniques outside the scope
of thebrute force approach.
3.2 Incremental fusionThere are several key optimizations used
in the construction ofcomposed rules, powered by the use of the
solver for deciding sat-isfiability and for model generation of
predicates. One techniqueis to incrementally check for
unsatisfiability and validity of guardsof newly formed Ite-rules
and to remove branches that are inacces-sible and consequently also
eliminate control states that becomeinaccessible. The distinction
between control states and registers isinstrumental because
finiteness of control states guarantees termina-tion and enables
techniques not directly available over infinite statespaces.
We provide a top-down view of the fusion algorithm in Figure
7with further helper procedures in Figure 8. Fusion is
implementedusing depth first search starting from (p, q) = (q0A,
q
0B). Only
satisfiable parts of composite rules are ever explored. The
procedureFUSE(γ,R, q) in Figure 7 uses an accumulating context
conditionγ for a branch of an Ite-rule of A with R as the
unexplored subrulein that context, and q is a control state of B.
If the conditionSAT(γ ∧R′1 6= R′2) is false then for all (x, r) ∈
[[γ]], [[R′1]](x, r) =[[R′2]](x, r), so the branching condition is
redundant. The conditionR′1 6= R′2 is itself, w.l.o.g., expressible
as a ι×ρ-predicate. Thenewly discovered states in the depth first
search are added to theFrontier in line 8 of the definition of
PRODUCT in Figure 7. Elements
A⊗B1 let global Frontier = {(q0A, q0B)}2 let global Q = {(q0A,
q0B)}3 let δ = $ = ∅4 while Frontier 6= ∅5 remove (p, q) from
Frontier6 δ(p, q) 7→ FUSE(true, (δA p), q)7 $(p, q) 7→ FUSE$(true,
($A p), q)8 return (ιA, oB , ρA×ρB , Q, (q0A, q0B), 〈r0A, r0B〉, δ,
$)
FUSE(γ,R, q) : (T (ι×ρ→bool)×R(ι×ρA, oA, QA, ρA)×QB)→R(ι×ρ, o,Q,
ρ)
1 let θ = {r:ρA 7→ first(r:ρ)}2 match R3 case Undef: return
Undef4 case Ite(ϕ,R1, R2):5 let R′1 = FUSE(γ ∧ (ϕθ), R1, q)6 let
R′2 = FUSE(γ ∧ ¬(ϕθ), R2, q)7 if SAT(γ ∧R′1 6= R′2)8 return
Ite(ϕθ,R′1, R′2)9 else return R′1
10 case Base(v̄, p, g):11 return PRODUCT(p, gθ,
RUN(γ, v̄θ, q, second(r:ρ)))
PRODUCT(p, g, R)1 match R2 case Undef: return Undef3 case
Ite(ϕ,R1, R2):4 return Ite(ϕ, PRODUCT(p, g,R1),
PRODUCT(p, g,R2)5 case Base(v̄, q, h):6 if (p, q) /∈ Q7 add (p,
q) to Q8 add (p, q) to Frontier9 return Base(v̄, (p, q), (g,
h))
Figure 7. Fusion of STsA andB with oA = ιB . Definition of RUNis
given in Figure 8.
of QA ×QB that are never added to Frontier are unreachable
andthus irrelevant.
To construct a rule, the mutually recursive RUN(γ, v̄, q, s)
andSTEP(γ, v, rest, R, s) procedures shown in Figure 8
symbolicallyexecute the step composition operator ⊕ for B over the
symbolicvalue list v̄ starting from the state (q, s) of B. The
satisfiabilitychecks in STEP on lines 6 and 10 maintain that the
constructedrules only have branches that are feasible and
non-redundant. Atrivial case of redundancy is when both R′1 and R′2
are Undef, butmore complicated conditional cases may arise when R′1
and R′2are syntactically different but semantically equivalent in
the givencontext γ.
Observe how the procedure FUSE uses γ on lines 5–7: γ isincluded
as a conjunct in every solver call to SAT and every recursivecall
to FUSE. This pattern of use allows incremental SMT solving,where
the solver is used in such a way that subsequent solvercalls can
reuse clauses learned during previous calls. For example,on line 5
in FUSE this would be implemented by pushing (ϕθ)into the solver
context before the recursive call and popping thecontext
afterwards. In fact, both procedures FUSE and STEP usethe parameter
γ in a way such that γ is included as a conjunct in(i) each call to
SAT, and (ii) each γ argument formula in recursive
5 2016/10/3
-
RUN(γ, v̄, q, s) : (T (ι×ρ→bool)× [T (ι×ρ→ιB)]×QB×T
(ι×ρ→ρB))→R(ι×ρ, o,QB , ρB)
1 match v̄2 case []: return Base([], q, s)3 case [v|rest]:4
return STEP(γ, v, rest, (δB q), s)
STEP(γ, v, rest, R, s)1 let θ = {r:ρB 7→ s, x:ιB 7→ v}2 match R3
case Undef: return Undef4 case Ite(ϕ,R1, R2):5 let R′1 =6 if SAT(γ
∧ (ϕθ))7 STEP(γ ∧ (ϕθ), v, rest, R1, s)8 else Undef9 let R′2 =
10 if SAT(γ ∧ ¬(ϕθ))11 STEP(γ ∧ ¬(ϕθ), v, rest, R2, s)12 else
Undef13 if SAT(γ ∧R′1 6= R′2)14 return Ite(ϕθ,R′1, R′2)15 else
return R′116 case Base(ū, q, g):17 return CONCAT(ūθ,RUN(γ, rest,
q, gθ))
CONCAT(ū, R)1 match R2 case Undef: return Undef3 case Ite(ϕ,R1,
R2):4 return Ite(ϕ,CONCAT(ū, R1),
CONCAT(ū, R2))5 case Base(v̄, q, h):6 return Base(ū+ v̄, q,
h)
Figure 8. Step composition of B over a list of symbolic inputs
v̄ inthe context γ.
calls.Furthermore when FUSE calls STEP on line 11 it passes its
γas an argument. Therefore, each call to FUSE can use a single
solvercontext incrementally for all satisfiability checks. The
structureof pushing and popping the contexts follows the structure
of theIte-rules. From our experience using the solver incrementally
maydecrease the fusion time by an order of magnitude.
The fusion procedure for $A⊗B is omitted from the
presentation,but is similar to the construction of δA⊗B . Elements
of Q that onlylead to non-final control states (control states that
only have theUndef finalizer) are also removed as “dead-ends” using
the standarddead-end elimination algorithm for finite state
automata [16].
Theorem 3.1. [[A⊗B]] = [[A]] ◦ [[B]]The proof is omitted for
brevity. The main intuition for the proof
is that STEP implements a symbolic version of a single step of
⊕and RUN is a symbolic version of a run (of multiple steps) of
⊕.Once this connection is proved formally it can be used as a
lemmafor proving that the transduction semantics given by Equation
(1)(Section 2) is preserved by A⊗B.
3.3 Implementation remarksThe incremental satisfiability checks
that are performed during STfusion are critical for the overall
feasibility of the algorithm. Inalmost all of our case studies, the
algorithm would not terminate oth-erwise. Several further
optimizations are possible to locally improve
the succinctness of the generated ST. One such optimization is
whatwe call symbolic constant propagation: applying the
substitution θ toa sub-term t of ūθ in STEP(γ, v, rest, R,) may
result in t becoming“constant valued”. This can be decided by
checking unsatisfiabilityof the formula γ ∧ γ′ ∧ t 6= t′ where the
variables in γ′ and t′ arefresh variants of the variables in γ and
t. If the formula is unsatisfi-able, then t has a unique value in
the context γ, independent of thevariables it contains, and can
thus be replaced by that value. Such avalue can be queried from the
solver by considering a model for theformula γ ∧ t = y where y is a
fresh variable (a model exists sinceγ is satisfiable), and
extracting the value of y from that model. Incode generation, we
have witnessed that symbolic constant propa-gation may add
significant performance improvements by avoidingunnecessary
expression evaluation.
4. Reachability Based Branch EliminationFusing already removes
many unsatisfiable branches. Still, theresulting STs may have a
large number of control states and/orrules with redundant
conditions. In particular some branches maybe unreachable due to
state carried constraints, i.e., even thoughthe branch itself is
satisfiable, the conjunction of reachable registervalues in the
source states together with the branch is unsatisfiable.In this
section we present a reachability based branch elimination(RBBE)
algorithm, that proves the unreachability of and removessuch
branches in the target ST. The algorithm is a combination
ofsymbolic forward reachability and backward reachability
algorithmsadapted to STs.
The reachability algorithm reasons about transition rules as
aflattened set of Base-rules with their associated combined
branchconstraints. Given a rule r ∈ R(τ, o,Q, ρ) let Paths(r) be
definedas follows:
Paths : R(τ, o,Q, ρ)→ {(T (τ → bool)×T (τ → ρ)×Q)}
Paths(Undef) def= ∅Paths(Base([fi]ni=1, g, q))
def= {(true, g, q)}
Paths(Ite(ϕ, u, v)) def=⋃
(ψ,g,q)∈Paths(u){(ϕ ∧ ψ, g, q)} ∪⋃(ψ,g,q)∈Pathsv{(¬ϕ ∧ ψ, g,
q)}
Since outputs do not affect reachability they are dropped from
theflattened representation. Given an ST A let there be the
following:
Movesδ(A) def=⋃p∈QA
⋃(ϕ,g,q)∈Paths(δA(p))
{(p, ϕ, g, q)}Moves$(A) def=
⋃p∈QA
⋃(ϕ,g,q)∈Paths($A(p))
{(p, ϕ)}
These give a flat representation of all transitions and
finalizers(respectively) by source and target control state. We
call elements ofthese sets moves and final moves respectively.
The ELIMINATE procedure in Figure 9 implements the top-level
reachability algorithm. The variable w ∈ [ιA] is used torepresent a
list of inputs. To check the reachability of a (final)move it calls
ISREACHABLE with a ([ιA]× ρA)-predicate suchthat the (final) move
is reachable if and only if the source controlstate can be reached
such that the predicate holds (lines 5 and 9).If ISREACHABLE
returns false then the branch is eliminated bysimplifying the
corresponding Ite(ϕ, u, v), where u (or v) is theunreachable base
rule, into v (or u). Note that if ISREACHABLEhits the bound k then
it returns ⊥ and the branch can not be safelyremoved.
To minimize calls to ISREACHABLE, ELIMINATE uses a moreefficient
COMPUTEUNDERAPPROXIMATION procedure. It performsa breadth-first
forward-reachability analysis from the initial state andtags moves
whose path conditions from the initial state are satisfiableas
reachable. Breadth-first search increases coverage and ensuresthat
there are potentially several states in a breadth-first frontier
for
6 2016/10/3
-
ELIMINATE(A)1 let U = COMPUTEUNDERAPPROXIMATION(A)2 let M =
Movesδ(A) ∪Moves$(A) \ U3 let k = |QA|4 foreach move (p, ϕ, g, q)
in M5 let ϕ′ = (w 6= []) ∧ ϕ{x 7→ Head (w)}6 if ISREACHABLE(A, p,
ϕ′, k) = false7 eliminate the corresponding branch in δA8 foreach
final move (p, ϕ) in M9 let ϕ′ = (w = []) ∧ ϕ
10 if ISREACHABLE(A, p, ϕ′, k) = false11 eliminate the
corresponding branch in $A12 remove control states with no path
from q0A
Figure 9. Reachability based branch elimination (RBBE).
ISREACHABLE(A, qtgt, ϕtgt, k) : (ST ×QA×T ([ιA]×ρA → bool)×
int)→ bool
1 let layer = {qtgt}2 let layer ′ = ∅3 let Ψ′ = empty = {q 7→
false | q ∈ QA}4 let Σ = Ψ = empty ] {qtgt 7→ ϕtgt}5 while layer 6=
∅6 while layer 6= ∅7 pop q from layer8 let ψ = Ψ[q]9 if q = q0A ∧
SAT(ψ{r 7→ r0A})
10 return true11 foreach (p, ϕ, g, q) in Movesδ(A)12 if (ϕ
depends on r8) or (g depends on x8)13 let update = g{x 7→ Head
(w)}14 let γ = (w 6= []) ∧ ϕ{x 7→ Head (w)}∧
ψ{w 7→ Tail (w), r 7→ update}15 else16 let γ = ψ{r 7→ g{x 7→
defaultιA}}17 if SAT(γ ∧ ¬Σ[p])18 let Σ[p] = Σ[p] ∨ γ19 let Ψ′[p] =
Ψ′[p] ∨ γ20 add p to layer ′21 if k = 0 ∧ layer ′ 6= ∅22 return ⊥23
let k = k − 124 let layer = layer ′25 let layer ′ = ∅26 let Ψ =
Ψ′27 let Ψ′ = empty28 return false
Figure 10. Checking the reachability of a state predicate.
the same control state, hopefully capturing different ways of
enteringthe control state. While more sophisticated under
approximationsare possible, this basic version was adequate for our
experiments.
The ISREACHABLE procedure in Figure 10 performs a
backwardbreadth-first traversal on A, exploring the states one
layer at a time.Each layer is associated with the map Ψ from
control states toreachability conditions yet to be explored.
Initially the control stateqtgt is mapped to the predicate ϕtgt. Σ
maps control states to the
8 These checks can be performed with an SMT solver call. For
example∃i,r,r’ (ϕ 6= ϕ{r 7→ r’}) is satisfiable iff ϕ depends on
the register.
predicates that summarize the arguments for which exploration
hasalready been performed or is about to be performed.
Let ∆A denote the following partial function that extends
thetransition function δ̂A to input lists and omits the output
part:
∆A : [ιA]× σA → σA∆A([], s)
def= s
∆A([i|w], s)def= ∆A(w, π2(δ̂A i s))
A state s is k-reachable (in A) if there exists w
∈⋃n∈[0,k](ιA)
n
such that ∆A(w, s0A) = s. For example s0A is 0-reachable. A
state
s is reachable if it is k-reachable for some k ≥ 0. Given q ∈
QAand an ρA-predicate ϕ, we say that (q, ϕ) is (k-)reachable if
thereexists a (k-)reachable state (q, r) such that r ∈ [[ϕ]]Theorem
4.1. If ISREACHABLE(A, qtgt, ϕtgt, k) equals (a) truethen (qtgt,
ϕtgt) is reachable; (b) false then (qtgt, ϕtgt) is notreachable;
(c) ⊥ then (qtgt, ϕtgt) is not k-reachable.
Proof. First, we prove the theorem with one optimization
turnedoff — the branch condition in line 12 always returns true.
Let ψtgtbe the σA-predicate (q = qtgt) ∧ ϕtgt. The algorithm
maintainsthe following invariant that for all entries (q 7→ ϕ) ∈ Σ
such thatϕ 6= false:
(i) SAT(ϕ) and(ii) for all (w, r) ∈ [[ϕ]]: ∆A(w, (q, r)) ∈
[[ψtgt]]
Property (i) follows from the observation that, other than the
initialvalue false, only satisfiable predicates are added to Σ[q]
and thatsatisfiability remains true under disjunctions. Property
(ii) followsby induction over |w| using the definition of ∆A and
that theconstruction of γ in line 14 is the weakest precondition
with respectto ψ and the given move from p.
Now (a) follows from the fact that if the procedure terminated
inline 10 then ∃w ((w, r0A)∈ [[Σ[q0A]]]). So, by (ii), ∃w (∆A(w,
s0A)∈[[ψtgt]]). The proof of (c) is by induction over k, showing
that allpossible behaviors for input lists of up to length k that
from somestate lead to ψtgt are captured in Σ. This implies that
the initialregister must be captured in some layerk predicate
Ψk[q
0] for thereto be a path from the initial state to the target
state. The satisfiabilitytest in line 17 ensures that [[ϕ]] 6⊆
[[Σ[p]]]. In other words, if the testfails then [[ϕ]] ⊆ [[Σ[p]]],
so no behavior is lost by excluding ϕ in thatcase. Statement (b)
follows from (c), because if false is returnedfor k, then false is
returned for any bound greater than k.
The condition in line 12 filters out input-noise: the else-case
istaken in line 16 if the input element does not affect the
registerupdate, which is when the guard does not depend on the
registerand the register update does not depend on the input
element. Inthis case, the summaries in Σ may accept shorter words
than whatis required by ∆A, but the register part of the predicate
is notaffected by omitting the input element because it does not
influenceit. Here we need to assume that there is no (p, ϕ, g, q) ∈
Movesδ(A)for which ϕ is unsatisfiable. Otherwise the definition of
γ inline 16 is unsound when ϕ is unsatisfiable. If ϕ is satisfiable
then(∃i ϕ{r 7→ defaultρA}) ∧ ψ{r 7→ g{x 7→ defaultιA}} isequivalent
to ψ{r 7→ g{x 7→ defaultιA}}.
The statement (ii) can no longer be used directly, but must
bemodified to count for the omitted input elements, that become
muchlike input-epsilon moves. Intuitively, the ST is implicitly
convertedinto an εST (ST with input-epsilon moves) although the
input-epsilon moves do still count against the bound k.
In this algorithm, Σ enables a crucial subsumption checking
forpredicates (line 17) — if a reachability condition ϕ for a
control statep is subsumed by Σ[p], then any search from ϕ is
already covered,so adding ϕ to the next layer would be redundant. A
subtlety is to
7 2016/10/3
-
abstract class Transducer {abstract IEnumerable Update(I
datum);virtual IEnumerable Finish() { yield break; }
}
Figure 11. The C# abstract class users extend.
avoid the possible quantifier alternation that would arise if we
treatΣ[p] as the predicate ∃w (Σ[p]) (i.e. characterize the
reachable setof registers independent of inputs used to reach
them). This couldpotentially introduce undecidability. However, the
test in line 17works because it is sufficient in the the else case
(when we omit ϕ).When the else case is taken, it means that
∀w,r (ϕ⇒ Σ[p])holds, which implies that
∀r (∃wϕ⇒ ∃wΣ[p]) (2)holds. Condition (2) is the necessary
condition needed to preserveall register values.
5. Specifying Effectful ComprehensionsWe have explored several
frontends for specifying effectful com-prehensions. In Section 5.1
we present a frontend that translatesimperative C# code to STs.
This pattern matches interfaces presentin existing streaming
frameworks, which we discuss in Section 8.
Some comprehensions can be more efficiently specified with
aspecialized frontend. In Section 5.2 we translate regexes with
namedcaptures into STs, while Section 5.3 presents a similar
approach forXPath queries.
5.1 Effectful Comprehensions as C#We have implemented a
translation from a subset of C# to STs. Usersextend the abstract
class in Figure 11, where the Update and Finishmethods respectively
define δ and $. Users may opt to not overrideFinish, in which case
a trivial no-op finalizer is used.
Example 5.1. The following code implements the ToInt
transducerfrom Figure 3:
partial class ToInt : Transducer {bool IsDigit(char c) {
return 0x30
-
integer in decimal notation and the substring in the fourth
column isparsed as a Boolean:
(([^,]*,){2}(?\d+),(?\w+),[^\n]*\n)*
Here S1 is "([^,]*,){2}" (skip to the third column), S2 is
","(skip to the next column), and S3 is ",[^\n]*\n" (skip
remainingcolumns until EOL). The capture int is mapped to the
transducerToInt from Figure 3 and the capture bool is mapped to a
transducerToBool, which maps the strings “true” and “false”
respectively totrue and false. �
5.3 Effectful XPath ComprehensionsFor extracting information
from XML formatted data we use trans-ducers constructed from XPath9
query expressions. Consider anexpression X of the form
st:trans(/tag1/tag2/tag3 · · · /tagn)The tag names tagn specify
a path to match in an XML file. trans isa name that maps to a
transducer A that maps the contents of anymatching elements to
output of type o. Given X and the transducerA, a fused transducer
that parses matches of X into values of o isconstructed. The
matcher for the query uses counting with an integerregister to
ignore arbitrarily deep nestings of non-matching elements.Otherwise
the algorithm is similar to the one for regular expressionsin
Section 5.2 (i.e. for steps 2 and 3).
Example 5.3. Consider the following XML:
PST893
88410
A transducer based on the following XPath expression will
extractthe populations in the dataset:
st:int(/cities/city/population)
int again maps to the ToInt transducer from Figure 3. �
6. EvaluationWe have implemented the techniques described above
in a tool thattranslates C# (and our other frontends) into STs,
fuses them andfinally generates efficient C# code. For each control
state a labeledcode block that implements the transition rule is
generated. Givena rule, a tree of if else statements is generated,
where each leafconsist of an appropriate sequence of outputs, state
updates andfinally a goto to the code block of the target control
state.
We evaluate the viability of our approach with a set of
benchmarkpipelines. The experiments were run on an Intel Core
i5-3570K CPU@ 3.4 GHz with 8 GB of RAM. All reported throughputs
are meansof a sufficient number of samples to obtain a confidence
intervalsmaller than ±0.5 MB/s at a 95% confidence level. All
pipelineswere run through C#’s NGen tool, which produces native
code forC# assemblies ahead-of-time.
Figure 12 presents throughputs for three variations of
eachpipeline. For LINQ the pipelines communicate with
IEnumerableand yield. The Hand-written pipelines are
straightforward imple-mentations using arrays as buffers between
phases. The fused andoptimized pipelines are labeled Fused. The
individual pipeline stages
9 See https://en.wikipedia.org/wiki/XPath.
UTF8-lines
Base64-avg
CSV-max
Base64-delta
624.6
44.6
176.5
57.6
569.7
80.1
37.4
19.9
92.7
16.5
67.6
13
Throughput (MB/s)
LINQHand-writtenFused
Figure 12. Throughputs for different pipeline versions
in the LINQ and Fused pipelines use code generated from STs by
ourimplementation, while the Hand-written pipelines use
Hand-writtenC# and .NET system libraries where available. For the
Hand-writtenpipelines we did not perform any manual fusion, since
the aimof this paper is to allow pipeline stages to be specified
modularlywith the fusion being handled by the compiler. Four
pipelines werebenchmarked:
Base64-avg calculates a running average (window of 10)
forBase6410 encoded ints and re-encodes the results in Base64.
CSV-max decodes an UTF-8 encoded CSV file to UTF-16, ex-tracts
the third column with a regular expression and finds themaximum
length of these strings. The output is a single UTF-8encoded
decimal formatted integer.
Base64-delta reads Base64 encoded ints and outputs deltas
ofsuccessive inputs as UTF-8 encoded decimal integers on
separatelines.
UTF8-lines decodes an UTF-8 encoded file to UTF-16 and countsthe
number of newline characters. The output is a single UTF-8encoded
decimal formatted integer.
For Figure 12 we sampled the pipelines with 100 MB of data.
Forthe UTF8-lines pipeline we used Herman Melville’s “Moby
Dick”repeated a sufficient number of times, while for the others we
usedrandomly generated data. For all pipelines except CSV-max
theLINQ version has the lowest throughput. We believe this is due
to theoverhead associated with passing values through
IEnumerable.
Figure 13 presents a more detailed comparison of CSV
parsingscenarios. Pipelines for three different datasets are
compared:
CHSI is a dataset on health indicators from the U.S.
Departmentof Health & Human Services. The three pipelines
produce theaverage lung cancer deaths, minimum births and maximum
totaldeaths for counties in the dataset.
SBO is a dataset on business owners from the U.S. Census
Bureau.The three pipelines find the maximum employees, minimumgross
receipts and average payroll for businesses in the dataset.
CC is a dataset of consumer complaints received by the
U.S.Consumer Financial Protection Bureau. The pipeline producesthe
maximum value for the ID column.
Each of the Fused pipelines in Figure 13 apply four
effectfulcomprehensions: (i) decode UTF-8 to UTF-16, (ii) parse a
columnas an int using a regular expression based parser, (iii) run
a query(maximum, minimum or average), and (iv) output the result as
asequence of bytes. The pipelines differ only in the regular
expressionand query used.
10 See https://en.wikipedia.org/wiki/Base64.
9 2016/10/3
-
CC-id
SBO-payroll
SBO-receipts
SBO-employees
CHSI-deaths
CHSI-births
CHSI-cancer
103.2
572.7
619.2
594.1
282.4
275.1
290.6
44.6
220
211.6
234.1
81.1
80.8
94.4
34.6
86.1
85.9
86.8
83.7
84.2
84.7
Throughput (MB/s)
LINQHand-writtenFused
Figure 13. Throughputs for CSV parsing pipelines
TPC-DI-SQL
PIR-proteins
DBLP-oldest
MONDIAL-pop
310.3
326
459.7
305.6
67.5
82.6
84
83.8
31.4
25.7
28.2
58.7
41.8
41.6
61.6
Throughput (MB/s)
XmlDocumentXPathReaderLINQFused
Figure 14. Throughputs for XPath matching pipelines
Each version of the pipelines uses the same regular
expressionfor parsing the CSV file. For example, the expression
(([^,]*,){5}(?\d+),[^\n]*\n)* is used in the maximum
employeespipeline for matching the sixth column on each line. In
the Hand-written tests the .NET framework’s RegexOptions.Compiled
optionwas used, which generates a .NET assembly for doing the
matching.This extra work is not counted against the reported
throughputs. An-other optimization we implemented for the
Hand-written pipelinesis that the regular expression is matched for
the whole dataset andthe values captured are then iterated. This
proved to be significantlyfaster than splitting the dataset into
lines and running the regularexpression on each line
separately.
The original SBO dataset is 744 MB, which caused the .NETregular
expression library to run out of memory. To work aroundthis we cut
the dataset down to a 83 MB prefix. Our fused pipelinesare free of
such limitations due to their incremental nature.
The fused pipelines are significantly faster for all
benchmarks,with the average speedup being over 2.9× over the
Hand-writtenpipelines.
Figure 14 presents throughputs for XML processing scenarios.Four
pipelines are compared:
TPC-DI-SQL The dataset was generated by a tool from the TPC-DI
benchmark [26]. The pipeline extracts ids of accounts fromcustomer
records and for each outputs an SQL insert statement.
Pipeline Eliminated Left Pipeline Eliminated Left
Base64-delta 0 77 SBO-employees 7 78CSV-max 6 65 SBO-receipts 11
117Base64-avg 0 163 SBO-payroll 10 107UTF8-lines 1 10 TPC-DI-SQL
238 936CC-id 1301 5274 PIR-proteins 198 758CHSI-cancer 113 1134
DBLP-oldest 104 456CHSI-births 143 1434 MONDIAL-pop 162
662CHSI-deaths 144 1444
Figure 15. Branches eliminated by RBBE and branches left.
PIR-proteins The dataset is a protein dataset from the U.S.
basedNational Biomedical Research Foundation. The pipeline
extractsthe lengths of all proteins in the dataset and outputs the
averagelength.
DBLP-oldest The dataset is bibliographic information from
theDigital Bibliography Library Project. The pipeline extracts
thepublication year of each article and outputs the earliest
year.
MONDIAL-pop Mondial is a dataset extracted from various
geo-graphical Web data sources. The pipeline extracts the
populationof each city in the dataset and outputs the highest
population.
All of the Fused pipelines in Figure 14 use an XPath based
transducerfor extracting the relevant data. The XmlDocument
pipelines usethe the XPath matching implemented in C#’s standard
libraries.The throughput for the XmlDocument version of the
PIR-proteinspipeline is not reported due the library running out of
memory withthe 700 MB dataset. The XPathReader pipelines use
Microsoft’sXPathReader library, which allows evaluating a subset of
XPath in astreaming manner. Due to its streaming nature it is able
to processthe PIR-proteins dataset.
The Fused versions have the highest throughput on all of
theXPath benchmarks, with an average speedup of 11× over
thestreaming XPathReader library. The fact that in the Fused
pipelinesthe XPath matching code is specialized to the query is
likely to giveit a significant advantage over the XmlDocument and
XPathReaderversions, which do not perform any code generation. This
also holdsfor the LINQ pipelines, which were second on all XML
benchmarks.For queries over large XML datasets using our approach
over ageneral purpose XPath library makes sense, as the speedup
willmake up for the compilation time.
Figure 15 presents the number of branches in rules removedby
RBBE (Section 4) for each pipeline. The numbers are sums ofremovals
after all fusions that contribute to the complete pipeline.
We can see that for most pipelines applying RBBE resulted
inbranches being removed. Thus RBBE is helpful for allowing
biggerpipelines to be practically fused.
7. Symbolic Transducers and MonadsThis section provides
redefinitions of the ⊕ and ⊗ operators interms of applicative
functors and monads. In addition to beingconcise, these definitions
provide a link between an automata-theoretic view and a functional
view of symbolic transducers. This isto our knowledge the first
time transductions have been successfullyrelated to monads, which
has been unsuccessfully attempted before.For more discussion and
the exact connection to LINQ’s list monadsee Section 8.
Given types σ and τ we define TMστ as the type σ →
(τ×σ)?11,which as we will later show is a transduction monad. We
use thehigher order applicative functor [21] operators pure and ?,
defined
11 ? is the option type. We write a wrapped value x as �x and no
value as ⊥.
10 2016/10/3
-
as follows:pure : τ → TMστ
pure f def= λs.�(f, s)
? : TMσ(τ → τ ′)→ TMστ → TMστ ′
F ? Xdef= λs.let �(f, s1) = (F s) in
let �(x, s2) = (X s1) in�((f x), s2)
The intuition is that ? captures side-effects of F in s1 and
propagatesthem to X which produces side-effects s2 while the output
is (f x).
Now the step composition operator ⊕ can also be defined as⊕ :
TMστ → TMστ → TMστ
f ⊕ g def= (pure +) ? f ? gwhere + denotes list concatenation,
although other operators, suchas addition, maximum, and minimum,
could be used. One reasonwhy such operators are interesting is that
they allow us to defineaggregation operations without explicitly
using state for accumu-lating the intermediate result. Regardless
of the operator used as+, the purpose of ⊕ is to compose together
output results whilepropagating the effects of the computations
“from left to right” asloop carried state.
The type of ⊕ in this definition (assuming + is concatenation)
is
(σ → ([o]× σ)?)→ (σ → ([o]× σ)?)→ (σ → ([o]× σ)?)
Note that the original definition in Section 2 did not wrap
thetransition functions inside the option type and instead defined
themas partial functions. For the functional view we make the
functionstotal by representing rejection with ⊥.
The bind operator for TMστ is
�= : TMστ1 → (τ1 → TMστ2)→ TMστ2F �= G def= λs.let �(a, s′) = (F
s) in (Gas′)
We may now view TMστ as a transduction monad with the givenbind
operator and whose unit operator is pure. It follows from
thedefinitions that the monad laws hold. One can view this monad as
acombination of the state monad and the option monad.
The fusion composition operator ⊗ (Section 3) can be
definedusing the bind operator. First let there be:
fuse : ([ι]→ TMσA [τ ])→ ([τ ]→ TMσB [o])→([ι]→ TMσA×σB [o])
fuse A B def= λx̄. (A′ x̄)�= B′ whereA′ ā
def= λ(s1, s2).let �(b̄, s′1) = (A ā s1) in �(b̄, (s′1,
s2))
B′ b̄def= λ(s′1, s2).let �(c̄, s′2) = (B b̄ s2) in �(c̄, (s′1,
s′2))
Note how in fuse the ST A uses its own state that is disjoint
fromthe state of B, and the function builds the disjoint sum of the
states.Further, notice that the output b̄ of A may depend on the
state s1,so the state s′2 may, through b̄, depend on s1, whereas
s′1 does notdepend on s2. The latter property is integral to the
fusion algorithmin Section 3. Now ⊗ can be defined as:
(A, s0A)⊗ (B, s0B)def= (fuse A B, (s0A, s
0B))
Note that here we represent an ST A as a pair of a function of
type([ιA] → TMσA [oA]) and the initial state of A. To run
transducersrepresented like this the following can be used:
runST : ([ι]→ TMσ[o])× σ → [ι]→ [o]runST (A, s) def= λx̄.let
�(ȳ, s′) = (A x̄ s) in ȳ
Effectively, given an ST A, (runST (A, s0A)) is its denotation
[[A]].In functional languages the state monad is typically
implemented
using lazy evaluation and fuse could in principle be
implemented
similarly. In contrast to these languages wherein unfeasible
pathsare never explored by virtue of lazy evaluation, the fusion
algorithmin Section 3 implements a statically optimized binding
operator forthe transduction monad which statically prunes
unfeasible paths. Webelieve similar static fusion techniques could
also be applied to codewritten using the state monad.
8. Related WorkSymbolic transducers: were originally defined in
flat form in [33].The main focus of the work in [33] is on symbolic
finite transducersor SFTs, for analysis of string sanitizers. It is
noted in [33] that STsare closed under composition, but, to the
best of our knowledge,no algorithm for fusing STs has been studied
prior to our work.Prior work on STs has focused on register
exploration and inputgrouping that are orthogonal problems [11,
34]. Register explorationattempts to project the register type ρ
into a Cartesian product typeρ1 × ρ2 where ρ1 is a finite type, the
primary goal is to reduceregister dependency by migrating ρ1 into
the set of control states.Input grouping tries to take advantage of
grouping characters intolarger tokens in order to avoid
intermediate register usage, thathas applications in decoder
analysis [11] and parallelization [34].Efficient fusion of STs has,
to the best of our knowledge, not beenstudied prior to our
work.
Streaming: There is a large body of work on stream-process-ing
[14, 20, 23, 24, 30]. There is also recent work on a domainspecific
language DReX [8] for expressing regular string transfor-mations.
Stream computations with internal state have been studiedbefore.
The work in [10] defines a Stream data-type with internalstate that
yields elements and allows operations such as map, fold,and zip.
These operations are functional and operate on one ele-ment at a
time with no operation-state carried across elements. Thestate in
the Stream allows one to represent the current position,and
bundling in the case of generalized stream fusion [19], in
thestream. In contrast, our focus is on applying transformations
thathave operation-state carried across elements (as opposed to
streamshaving state). This allows us to represent effectful
functions such asUTF decoding/encoding.
Some libraries for streams provide APIs for expressing state-ful
operations. The Apache Flink [7] and Spark Streaming [5]
dis-tributed streaming engines both provide support for using state
instream operations and an associated framework for
implementingfault tolerance in the presence of state. The
Highland.js [3] and Con-duit [1] are traditional stream libraries,
which both provide a way toexpress stateful operations. However, in
these libraries the statefuloperations are treated as black boxes,
as opposed to our approachthat fuses operations in compositions of
STs. Implementing fron-tends similar to the C# one (Section 5.1)
for these libraries wouldallow code written for them to use our
backend.
StreamIt [31] is a programming language and compiler forsignal
processing applications. StreamIt composes pipelines ofstateless
filters with the aim of reducing communication overhead.In [6]
composition is extended to filters with a linear state
spacerepresentation, i.e., ones where the outputs and state updates
arelinear operations. The composition retains the linear state
spacerepresentation with a linear increase in size.
In contrast to StreamIt, we can compose any stateful filters
wherethe state update is over a decidable theory, and instead of
linearalgebra we use SMT solvers for our analysis. We view the work
doneby the StreamIt group as complimentary to ours: the
compositionand optimization techniques for symbolic transducers
could be usedas an additional backend module in the StreamIt
compiler for statefulfilters which are not amenable to a linear
state space representation.
Monads: have had a huge impact on programming paradigmsand
techniques in general after they were introduced into the func-
11 2016/10/3
-
tional programming world by Wadler [36]. One of the core
contribu-tions of monads is that they provide a type discipline by
which onecan enforce a separation of computational concerns in a
clean func-tional style. A prime example is the state monad [37].
Another veryuseful monad is the maybe monad [36]. Our transduction
monadtype TMστ is more-or-less the type for the maybe state monad
pa-rameterized with the state type σ and the output type τ , and
extendedwith extra composition operations for step and fusion
composition.The fusion composition operator ⊗ is based on the monad
bindingoperator �= but is itself not a binding operator because it
usesdifferent monad state types. The “maybe” part in the
transductionmonad reflects the fact that (deterministic)
transducers are typicallypartial functions and their composition
(that corresponds exactlyto fusion composition here) is often
treated as a special case ofrelational composition.
LINQ [22] uses the list monad (or list comprehension [36])as its
primary construct for query processing and (unlike SQL)also
supports nested lists. The list comprehension construct is inLINQ
expressed with the Select or, more generally, SelectManyextension
method of the IEnumerable class. The exact relationto the
transduction monad is that the list comprehension in
LINQcorresponds to iterating the step composition operator ⊕
(Section 2)over the input list. Step composition handles loop
carried state. TheLINQ query
"Man".SelectMany(A.Update)
corresponds to the following transduction or effectful
comprehension,provided that we apply it to the initial state of
A:
(δ̂A ‘M’)⊕ (δ̂A ‘a’)⊕ (δ̂A ‘n’)
The state of the computation (δ̂A ‘M’) is threaded through into
thecomputation (δ̂A ‘a’), etc. For example, if we take A to be
theBase64 encoder, and we start from the initial state (at the
pointwhen no characters have been read so far) then the output
would bethe string "TWFu". This is consistent with the existing
semantics ofLINQ.
In Figure 4 in Section 1 the finalizer for ToInt can be
imple-mented as a separate piece of code after the state has been
ag-gregated. However, for transducers whose Update function
pro-duces output the following pattern would be natural:
SelectMany(i=> Update()).Concat(Finalize()), where Finalize
returns anIEnumerable. This pattern is semantically correct, but
relies onthe fact that Concat evaluates its parameter lazily. With
eager eval-uation Finalize would access state before Update had
been calledfor all inputs. We feel this reliance on subtle
semantics makes LINQa poor match for writing effectful
comprehensions. This is anotherconcern we address with our C#
frontend.
Fusion: For fusion of symbolic transducers there is related
workon filter fusion [27] and deforestation [35]. Fusion of
symbolictransducers can be viewed as an extended form of filter
fusion thatincorporates loop carried state and advanced constraint
satisfactiontechniques into the classical framework.
The Steno library in [25] implements deforestation for
LINQqueries and achieves speedups from removing the
IEnumerableabstraction similar to what we report in Section 6. In
contrastwith our work, Steno treats filters as black boxes,
although thedeforestation can expose some optimization
opportunities to thecompiler. Additionally, some of Steno’s
optimizations assume thatfilters are stateless.
Filter fusion has also been extended to network fusion [15]
thatuses the product of labeled transition systems, to merge a
networkof interconnecting components. Synchronous product of
automataand fusion of symbolic transducers have different semantics
andcomputational complexities.
The work in [29] is related to our work regarding motivation.
Thedifference is in the execution, we use an automata based
definitionof transducers with an explicit control flow graph and
use an SMTsolver as an oracle in our algorithms. This leads to a
different set ofalgorithms and opens up a different set of
optimization techniques.We build on some of the work in [33] by
extending it with anincremental fusion algorithm and reachability
analysis. The authorsof [29] were not able to relate their work to
monads but use theSML type system in general. In our case the
definition of the stepcomposition operator ⊕ uses applicative
functors or idioms [18, 21]— it does not require full monad
functionality.
Regex: Our construction of symbolic transducers from regexes
isrelated to the work in [28]. On one hand our algorithm only
handlesa special class of regexes, but on the other hand it
supports fullUnicode by using the .NET regex parser and represents
guards bypredicates over 16-bit bit-vectors (i.e., the char type).
Regexes arevery handy for capturing custom patterns, for example
for somespecific CSV file or some specific alphabet (such as the
emoticonalphabet12.). This is reminiscent to handling hierarchical
data, suchas XML, but with more relaxed rules, e.g., a line in a
custom CSVfile may (or may not) end with a comma.
To handle XML data we use transducers generated from a subsetof
the XPath query language. For a full automata theoretic treatmentof
XPath see [9], where an approach for evaluating and reasoningabout
XPath expressions (extended with regular expressions) basedon
two-way weak alternating tree automata is presented.
List comprehensions have also been extended with ORDER BYand
GROUP BY constructs [17] that are also supported in LINQ. Itis an
ongoing research topic for us to investigate whether
symbolictransducers can be extended similarly and, if so, to
understand whatthe potential payoffs are.
9. ConclusionGood abstractions let a programmer easily express
their intent as aprogram and at the same time let a runtime system
compile thatprogram for efficient execution. This paper puts forth
effectfulcomprehensions as an abstraction for expressing
possibly-statefuldata-processing pipelines. We present fusion and
branch eliminationalgorithms for these effectful comprehensions,
which allow us tocompile large pipelines into efficient code.
We use symbolic transducers to represent individual and
fusedstages in a data-processing pipeline, which we additionally
formalizewith transduction monads. The monadic view provides very
concisesemantics for transductions and their compositions. On the
otherhand, our fusion and branch elimination algorithms use an
automata-theoretic view, which allows them to exploit the
separation of control-state from other state.
We have built a compiler that ingests pipelines written in C#and
produces fused code that runs, on average, 3× faster than
ahand-written baseline and 5× faster than LINQ on a variety of
dataprocessing programs. In the future we will explore more
extensiveoptimizations that rely on background theory reasoning to
proveprogram properties. One such optimization we excluded from
thispaper due to space constraints exploits minimization of
symbolicfinite automata to simplify control flow.
In the future we intend to explore hierarchical
compositions,i.e., parts of an effectful comprehension being
specified in termsof another. In Sections 5.2 and 5.3 we use a
specific pattern ofhierarchical composition for which fusion is
straightforward. Weaim to expand this work to allow hierarchical
compositions in ourgeneral C# frontend (Section 5.1).
12 See http://unicode.org/charts/PDF/U1F600.pdf
12 2016/10/3
-
References[1] Conduit (Haskell library).
https://github.com/snoyberg/conduit.
[2] Apache Hadoop.http://hadoop.apache.org/.
[3] Highland.js.http://highlandjs.org/.
[4] The .NET compiler platform
“Roslyn”.https://github.com/dotnet/roslyn.
[5] Spark Streaming.http://spark.apache.org/streaming/.
[6] S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing
streamprograms using linear state space analysis. In Proceedings of
the 2005International Conference on Compilers, Architectures and
Synthesisfor Embedded Systems (CASES’05), pages 126–136. ACM,
2005.doi:10.1145/1086297.1086315.
[7] A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F.
Hueske,A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann,
M. Peters,A. Rheinländer, M. J. Sax, S. Schelter, M. Höger, K.
Tzoumas, andD. Warneke. The Stratosphere platform for big data
analytics. TheVLDB Journal, 23(6):939–964, Dec. 2014.
doi:10.1007/s00778-014-0357-y.
[8] R. Alur, L. D’Antoni, and M. Raghothaman. DReX: A
declarativelanguage for efficiently evaluating regular string
transformations. InProceedings of the 42nd Annual ACM
SIGPLAN-SIGACT Symposiumon Principles of Programming Languages
(POPL’15), pages 125–137.ACM, 2015.
doi:10.1145/2676726.2676981.
[9] D. Calvanese, G. Giacomo, M. Lenzerini, and M. Y. Vardi.
Anautomata-theoretic approach to regular XPath. In Proceedings of
the12th International Symposium on Database Programming
Languages(DBPL’09), volume 5708 of LNCS, pages 18–35. Springer,
2009.doi:10.1007/978-3-642-03793-1 2.
[10] D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion:
From liststo streams to nothing at all. In Proceedings of the 12th
ACM SIGPLANInternational Conference on Functional Programming
(ICFP’07),pages 315–326. ACM, 2007.
doi:10.1145/1291151.1291199.
[11] L. D’antoni and M. Veanes. Extended symbolic finite
automata andtransducers. Formal Methods in System Design,
47(1):93–119, Aug.2015. doi:10.1007/s10703-015-0233-4.
[12] L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In
Proceed-ings of the 14th International Conference on Tools and
Algorithms forthe Construction and Analysis of Systems (TACAS’08),
volume 4963 ofLNCS, pages 337–340. Springer, 2008.
doi:10.1007/978-3-540-78800-3 24.
[13] J. Dean and S. Ghemawat. MapReduce: Simplified data
process-ing on large clusters. Commun. ACM, 51(1):107–113, Jan.
2008.doi:10.1145/1327452.1327492.
[14] D. Debarbieux, O. Gauwin, J. Niehren, T. Sebastian, and M.
Zergaoui.Early nested word automata for XPath query answering on
XMLstreams. Theoretical Computer Science, 578:100–125, May
2015.doi:10.1016/j.tcs.2015.01.017.
[15] P. Fradet and S. H. T. Ha. Network fusion. In Proceedings
ofProgramming Languages and Systems: Second Asian
Symposium(APLAS’04), volume 3302 of LNCS, pages 21–40. Springer,
2004.doi:10.1007/978-3-540-30477-7 3.
[16] J. E. Hopcroft and J. D. Ullman. Introduction to Automata
Theory, Lan-guages, and Computation. Addison-Wesley, 1979. ISBN
0321455363.
[17] S. P. Jones and P. Wadler. Comprehensive comprehensions.
InProceedings of the ACM SIGPLAN Workshop on Haskell
(Haskell’07),pages 61–72. ACM, 2007.
doi:10.1145/1291201.1291209.
[18] S. Lindley, P. Wadler, and J. Yallop. Idioms are oblivious,
arrows aremeticulous, monads are promiscuous. In Proceedings of the
SecondWorkshop on Mathematically Structured Functional
Programming(MSFP’08), volume 229 of ENTCS, pages 97–117. Elsevier,
2011.doi:10.1016/j.entcs.2011.02.018.
[19] G. Mainland, R. Leshchinskiy, and S. Peyton Jones.
Exploiting vectorinstructions with generalized stream fusion. In
Proceedings of the 18thACM SIGPLAN International Conference on
Functional Programming(ICFP’13), pages 37–48. ACM, 2013.
doi:10.1145/2500365.2500601.
[20] A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power
ofextended top-down tree transducers. SIAM J. Comput.,
39(2):410–430,June 2009. doi:10.1137/070699160.
[21] C. Mcbride and R. Paterson. Applicative programming with
ef-fects. Journal of Functional Programming, 18(1):1–13, Jan.
2008.doi:10.1017/S0956796807006326.
[22] E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling
object,relations and XML in the .NET framework. In Proceedings of
the 2006ACM SIGMOD International Conference on Management of Data
(SIG-MOD’06), pages 706–706. ACM, 2006.
doi:10.1145/1142473.1142552.
[23] T. Milo, D. Suciu, and V. Vianu. Typechecking for XML
transformers.In Proceedings of the Nineteenth ACM
SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems
(PODS’00), pages 11–22. ACM, 2000. doi:10.1145/335168.335171.
[24] B. Mozafari, K. Zeng, L. D’antoni, and C. Zaniolo.
High-performancecomplex event processing over hierarchical data.
ACM Trans. DatabaseSyst., 38(4):21:1–21:39, Dec. 2013.
doi:10.1145/2536779.
[25] D. G. Murray, M. Isard, and Y. Yu. Steno: Automatic
opti-mization of declarative queries. In Proceedings of the 32ndACM
SIGPLAN Conference on Programming Language Designand Implementation
(PLDI’11), pages 121–131. ACM,
2011.doi:10.1145/1993498.1993513.
[26] M. Poess, T. Rabl, H.-A. Jacobsen, and B. Caufield. TPC-DI:
The first industry benchmark for data integration. Pro-ceedings of
the VLDB Endowment, 7(13):1367–1378, Aug.
2014.doi:10.14778/2733004.2733009.
[27] T. A. Proebsting and S. A. Watterson. Filter fusion. In
Proceedingsof the 23rd ACM SIGPLAN-SIGACT Symposium on Principles
ofProgramming Languages (POPL’96), pages 119–130. ACM,
1996.doi:10.1145/237721.237760.
[28] Y. Sakuma, Y. Minamide, and A. Voronkov. Translating
regularexpression matching into transducers. Journal of Applied
Logic, 10(1):32–51, Mar. 2012. doi:10.1016/j.jal.2011.11.003.
[29] O. Shivers and M. Might. Continuations and transducer
composition. InProceedings of the 27th ACM SIGPLAN Conference on
ProgrammingLanguage Design and Implementation (PLDI’06), pages
295–307.ACM, 2006. doi:10.1145/1133981.1134016.
[30] J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek.
StreamFlex: High-throughput stream programming in Java. In
Proceedings of the 22ndAnnual ACM SIGPLAN Conference on
Object-Oriented ProgrammingSystems and Applications (OOPSLA’07),
pages 211–228. ACM, 2007.doi:10.1145/1297027.1297043.
[31] W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A
languagefor streaming applications. In Proceedings of the 11th
InternationalConference on Compiler Construction (CC’02), volume
2304 of LNCS,pages 179–196. Springer, 2002.
doi:10.1007/3-540-45937-5 14.
[32] M. Veanes, P. d. Halleux, and N. Tillmann. Rex: Symbolic
reg-ular expression explorer. In Proceedings of the 2010 Third
In-ternational Conference on Software Testing, Verification and
Vali-dation (ICST’10), pages 498–507. IEEE Computer Society,
2010.doi:10.1109/ICST.2010.15.
[33] M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N.
Bjorner.Symbolic finite state transducers: Algorithms and
applications. InProceedings of the 39th Annual ACM SIGPLAN-SIGACT
Symposiumon Principles of Programming Languages (POPL’12), pages
137–150.ACM, 2012. doi:10.1145/2103656.2103674.
[34] M. Veanes, T. Mytkowicz, D. Molnar, and B. Livshits.
Data-parallel string-manipulating programs. In Proceedings of the
42ndAnnual ACM SIGPLAN-SIGACT Symposium on Principles of
Pro-gramming Languages (POPL’15), pages 139–152. ACM,
2015.doi:10.1145/2676726.2677014.
13 2016/10/3
https://github.com/snoyberg/conduithttp://hadoop.apache.org/http://highlandjs.org/https://github.com/dotnet/roslynhttp://spark.apache.org/streaming/http://dx.doi.org/10.1145/1086297.1086315http://dx.doi.org/10.1007/s00778-014-0357-yhttp://dx.doi.org/10.1007/s00778-014-0357-yhttp://dx.doi.org/10.1145/2676726.2676981http://dx.doi.org/10.1007/978-3-642-03793-1_2http://dx.doi.org/10.1145/1291151.1291199http://dx.doi.org/10.1007/s10703-015-0233-4http://dx.doi.org/10.1007/978-3-540-78800-3_24http://dx.doi.org/10.1007/978-3-540-78800-3_24http://dx.doi.org/10.1145/1327452.1327492http://dx.doi.org/10.1016/j.tcs.2015.01.017http://dx.doi.org/10.1007/978-3-540-30477-7_3http://dx.doi.org/10.1145/1291201.1291209http://dx.doi.org/10.1016/j.entcs.2011.02.018http://dx.doi.org/10.1145/2500365.2500601http://dx.doi.org/10.1137/070699160http://dx.doi.org/10.1017/S0956796807006326http://dx.doi.org/10.1145/1142473.1142552http://dx.doi.org/10.1145/335168.335171http://dx.doi.org/10.1145/2536779http://dx.doi.org/10.1145/1993498.1993513http://dx.doi.org/10.14778/2733004.2733009http://dx.doi.org/10.1145/237721.237760http://dx.doi.org/10.1016/j.jal.2011.11.003http://dx.doi.org/10.1145/1133981.1134016http://dx.doi.org/10.1145/1297027.1297043http://dx.doi.org/10.1007/3-540-45937-5_14http://dx.doi.org/10.1109/ICST.2010.15http://dx.doi.org/10.1145/2103656.2103674http://dx.doi.org/10.1145/2676726.2677014
-
[35] P. Wadler. Deforestation: Transforming programs to
eliminatetrees. Theoretical Computer Science, 73(2):231–248, Jan.
1988.doi:10.1016/0304-3975(90)90147-A.
[36] P. Wadler. Comprehending monads. In Proceedings of the 1990
ACMConference on LISP and Functional Programming (LFP’90),
pages61–78. ACM, 1990. doi:10.1145/91556.91592.
[37] P. Wadler. Monads for functional programming. In Advanced
Func-tional Programming: First International Spring School on
AdvancedFunctional Programming Techniques, Tutorial Text, volume
925 ofLNCS, pages 24–52. Springer, 1995. doi:10.1007/3-540-59451-5
2.
[38] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and
I. Stoica.Spark: Cluster computing with working sets. In
Proceedings of the 2ndUSENIX Conference on Hot Topics in Cloud
Computing (HotCloud’10),pages 10–10. USENIX Association, 2010.
14 2016/10/3
http://dx.doi.org/10.1016/0304-3975(90)90147-Ahttp://dx.doi.org/10.1145/91556.91592http://dx.doi.org/10.1007/3-540-59451-5_2
IntroductionSymbolic TransducersFusion of STsMain
ideaIncremental fusionImplementation remarks
Reachability Based Branch EliminationSpecifying Effectful
ComprehensionsEffectful Comprehensions as C#Effectful Regex
ComprehensionsEffectful XPath Comprehensions
EvaluationSymbolic Transducers and MonadsRelated
WorkConclusion