A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula
Mar 26, 2015
A Polynomial-Time Algorithm for Global Value Numbering
SAS 2004
Sumit Gulwani George C. Necula
2
Global Value Numbering
Goal: Discover equivalent expressions in procedures
Applications: • Compiler optimizations
– Copy propagation, Constant propagation, Common sub-expression elimination, Induction variable elimination etc.
• Program verification– Discover loop invariants, verify program assertions
• Discover equivalent computations across programs– Plagiarism detection tools, Translation validation
3
Global Value Numbering
x := b £ a;
y := a £ 3;
c := a £ b;If (b == 3)
z := a £ b;
Equivalence problem is undecidable.
Simplification Assumptions:
• Operators are uninterpreted (will not discover x = c)
• Conditionals are non-deterministic (will not discover y = c)
• Will discover z = c
True False
4
Non-trivial Example
assert(x = y); assert(z = F(y));
*
x := a; y := a; z := F(a);
x := b; y := b; z := F(b);
5
Existing Algorithms
• Algorithms that work on SSA form of the program– Alpern, Wegman, Zadeck’s (AWZ) algorithm: POPL 1988
• Polynomial, Incomplete– Ruthing, Knoop, Steffen’s (RKS) Algorithm: SAS 1999
• Polynomial, Incomplete, Improvement on AWZ
• Dataflow analysis or Abstract interpretation based– Kildall’s Algorithm: POPL 1973
• Exponential, Complete– Our Algorithm: POPL 2004
• Polynomial, Complete, Randomized– Our Algorithm: this paper
• Polynomial, Complete
6
Why SSA based algorithms are incomplete?
assert(x = y); assert(z = F(y));
*x = (a,b)
y = (a,b)
z = (F(a),F(b))
F(y) = F((a,b))
• AWZ Algorithm: functions are uninterpreted– fails to discover second assertion
• RKS Algorithm: uses rewrite rules for normalization– Does not discover all assertions in little more involved examples.– Rewrite rules not applied exhaustively (exp applications o.w.)– Rules are pessimistic in handling loops
x := a; y := a; z := F(a);
x := b; y := b; z := F(b);
7
Abstract Interpretation based algorithm
G = SP(G0,x := e)
Assignment Node
G0
x := e
G2= G0
Conditional Node G1= G0
*
G0
G = Join(G10,G2
0)
G10
Join Node
G20
8
Outline
• Strong equivalence DAG (SED)
• The join operation: Idea #1
• Pruning an SED: Idea #2
• The strongest postcondition operation
• Fixed point computation
9
Representing Equivalences
a := 1;b := 2;x := F(1,2);
{ a,1 } { b,2 } { x, F(1,2) }
10
Representing Equivalences
a := 1;b := 2;x := F(1,2);
{ a,1 } { b,2 } { x, F(1,2), F(a,2), F(1,b), F(a,b) }
Such an explicit representation can be exponential.
11
Strong Equivalence DAG (SED)
A data structure for representing equivalences.
• Nodes n: <Set of variables, Type>
• Type: c, ?, F(n1,n2)
• Terms(n): set of equivalent expressions– Terms(<V, ?>) = V – Terms(<V, c>) = V [ { c }
– Terms(<V, F(n1,n2)>) = V [
{ F(e1,e2) | e1 2 Terms(n1), e2 2 Terms(n2) }
• 8 variables x, 9 at most one node <V,t> s.t. x 2 V– called Node(x)
12
SED: Example
This SED represents the following partition:Terms(n1) = { a, 2 }
Terms(n2) = { b}
Terms(n3) = { c, d, F(a,b), F(2,b) }
Terms(n4) = { e, F(c,b), F(d,b), F(F(a,b),b), F(F(2,b),b) }
a, 2
d,c, F
b, ?
e, F
n1
n4
n3
n2
13
Outline
• Strong equivalence DAG (SED)
• The join operation: Idea #1
• Pruning an SED: Idea #2
• The strongest postcondition operation
• Fixed point computation
14
The Join Operation
G = Join(G1, G2)
G is obtained by product construction of G1 and G2
If n=<V1,t1> 2 G1 and m=<V2,t2> 2 G2, then
[n,m]= <V1 Å V2, t1 t t2> 2 G
Definition of t1 t t2
c t c = cF(l1,r1) t F(l2,r2) = F ([l1,l2],[r1,r2])
t1 t t2 = ?, otherwise
Proof of CorrectnessTerms([n,m]) = Terms(n) Å Terms(m)(Thus product construction = partition intersection)
15
Example: The Join Operation
G1 G2
F
y2, F
y1, F
y3,y4 y5,?
F
y6,? y7,?
F
y2, F
y1, F
y4,y5
?
F
y6,y7 ?
y3,?
G = Join(G1,G2)
F
y2, F
y1, F
y4,y5
?
F
y6,?y3,? y7,?
16
Outline
• Strong equivalence DAG (SED)
• The join operation: Idea #1
• Pruning an SED: Idea #2
• The strongest postcondition operation
• Fixed point computation
17
Motivation: The Prune Operation
Discovering equivalences among all expressions
For the latter, it is sufficient to discover equivalences among all terms of size at most t at each program point (where t = #variables * size of program).
Thus, SEDs can be pruned to have a small size.
Discovering equivalences among program expressionsvs.
•If G=Join(G1,G2), then Size(G) can be Size(G1) £ Size(G2)
•There are programs, where size of SEDs after n joins is exponential in n.
18
The Prune Operation
Prune(G,k)
• For each node <V,t>, check if x 2 V is equal to some F-term of size less than k.
• If not, then delete all the nodes that are reachable from only <V,t>
19
Example: The Prune Operation
Prune(G,2)
y2, ?
y1, G
y4,y5
?
G
F
y2, F
y1, G
y4,y5
?
F
y6,?y3,? y7,?
20
Outline
• Strong equivalence DAG (SED)
• The join operation: Idea #1
• Pruning an SED: Idea #2
• The strongest postcondition operation
• Fixed point computation
21
The Strongest Postcondition Operation
G = SP(G0, x := e)
To obtain G from G’, do:• Delete label x from Node(x) in G0
• Let n=<V,t> be the node in G0 s.t. e 2 Terms(n)
(Add such a node to G0 if it does not already exists)
Add x to V.
22
F
Example: The Strongest Postcondition Operation
G0
z, u, F
x, ?
G = SP(G0, u := F(z,x))
z, F
x, ?
u, F
23
Outline
• Strong equivalence DAG (SED)
• The join operation: Idea #1
• Pruning an SED: Idea #2
• The strongest postcondition operation
• Fixed point computation
24
Fixed Point Computation and Complexity
• The lattice of sets of equivalences (among uninterpreted function terms) has height at most k.
• Complexity– Dominated by the cost of join operations– # of join operations: O(j £ k)– Each join operation: O(k2 £ N)
• This requires doing pruning while computing join– Total cost: O(k3 £ N £ j)
k: # of variablesN: size of program j: # of join points in program
25
Example
x := 1; y := 1;z := F(1,1);
x := 2; y := 2;z := F(2,2);
u := F(x,y);
Assert(u = z);
L1 L2
L3
L4
G1 z, F
x,y, 1
G2 z, F
x,y, 2
G3 = Join(G1,G2)
G3 z, F
x,y,?
G4 = Assignment(G3, u := F(x,y))
G4 u,z, F
x,y, ?
26
Conclusion
• Idea #1: Join of 2 SEDs = Product construction• Idea #2: Prune SEDs (Discovering
equivalences among program expressions does not require computing equivalences involving large terms)
Future Work• Inter-procedural value numbering• Abstract interpretation for combined theory of
linear arithmetic and uninterpreted functions