A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula.

A Polynomial-Time Algorithm for Global Value Numbering

SAS 2004

Sumit Gulwani George C. Necula

2

Global Value Numbering

Goal: Discover equivalent expressions in procedures

Applications: • Compiler optimizations

– Copy propagation, Constant propagation, Common sub-expression elimination, Induction variable elimination etc.

• Program verification– Discover loop invariants, verify program assertions

• Discover equivalent computations across programs– Plagiarism detection tools, Translation validation

3

Global Value Numbering

x := b £ a;

y := a £ 3;

c := a £ b;If (b == 3)

z := a £ b;

Equivalence problem is undecidable.

Simplification Assumptions:

• Operators are uninterpreted (will not discover x = c)

• Conditionals are non-deterministic (will not discover y = c)

• Will discover z = c

True False

4

Non-trivial Example

assert(x = y); assert(z = F(y));

*

x := a; y := a; z := F(a);

x := b; y := b; z := F(b);

5

Existing Algorithms

• Algorithms that work on SSA form of the program– Alpern, Wegman, Zadeck’s (AWZ) algorithm: POPL 1988

• Polynomial, Incomplete– Ruthing, Knoop, Steffen’s (RKS) Algorithm: SAS 1999

• Polynomial, Incomplete, Improvement on AWZ

• Dataflow analysis or Abstract interpretation based– Kildall’s Algorithm: POPL 1973

• Exponential, Complete– Our Algorithm: POPL 2004

• Polynomial, Complete, Randomized– Our Algorithm: this paper

• Polynomial, Complete

6

Why SSA based algorithms are incomplete?

assert(x = y); assert(z = F(y));

*x = (a,b)

y = (a,b)

z = (F(a),F(b))

F(y) = F((a,b))

• AWZ Algorithm: functions are uninterpreted– fails to discover second assertion

• RKS Algorithm: uses rewrite rules for normalization– Does not discover all assertions in little more involved examples.– Rewrite rules not applied exhaustively (exp applications o.w.)– Rules are pessimistic in handling loops

x := a; y := a; z := F(a);

x := b; y := b; z := F(b);

7

Abstract Interpretation based algorithm

G = SP(G0,x := e)

Assignment Node

G0

x := e

G2= G0

Conditional Node G1= G0

*

G0

G = Join(G10,G2

0)

G10

Join Node

G20

8

Outline

• Strong equivalence DAG (SED)

• The join operation: Idea #1

• Pruning an SED: Idea #2

• The strongest postcondition operation

• Fixed point computation

9

Representing Equivalences

a := 1;b := 2;x := F(1,2);

{ a,1 } { b,2 } { x, F(1,2) }

10

Representing Equivalences

a := 1;b := 2;x := F(1,2);

{ a,1 } { b,2 } { x, F(1,2), F(a,2), F(1,b), F(a,b) }

Such an explicit representation can be exponential.

11

Strong Equivalence DAG (SED)

A data structure for representing equivalences.

• Nodes n: <Set of variables, Type>

• Type: c, ?, F(n1,n2)

• Terms(n): set of equivalent expressions– Terms(<V, ?>) = V – Terms(<V, c>) = V [ { c }

– Terms(<V, F(n1,n2)>) = V [

{ F(e1,e2) | e1 2 Terms(n1), e2 2 Terms(n2) }

• 8 variables x, 9 at most one node <V,t> s.t. x 2 V– called Node(x)

12

SED: Example

This SED represents the following partition:Terms(n1) = { a, 2 }

Terms(n2) = { b}

Terms(n3) = { c, d, F(a,b), F(2,b) }

Terms(n4) = { e, F(c,b), F(d,b), F(F(a,b),b), F(F(2,b),b) }

a, 2

d,c, F

b, ?

e, F

n1

n4

n3

n2

13

Outline






14

The Join Operation

G = Join(G1, G2)

G is obtained by product construction of G1 and G2

If n=<V1,t1> 2 G1 and m=<V2,t2> 2 G2, then

[n,m]= <V1 Å V2, t1 t t2> 2 G

Definition of t1 t t2

c t c = cF(l1,r1) t F(l2,r2) = F ([l1,l2],[r1,r2])

t1 t t2 = ?, otherwise

Proof of CorrectnessTerms([n,m]) = Terms(n) Å Terms(m)(Thus product construction = partition intersection)

15

Example: The Join Operation

G1 G2

F

y2, F

y1, F

y3,y4 y5,?

F

y6,? y7,?

F

y2, F

y1, F

y4,y5

?

F

y6,y7 ?

y3,?

G = Join(G1,G2)

F

y2, F

y1, F

y4,y5

?

F

y6,?y3,? y7,?

16

Outline






17

Motivation: The Prune Operation

Discovering equivalences among all expressions

For the latter, it is sufficient to discover equivalences among all terms of size at most t at each program point (where t = #variables * size of program).

Thus, SEDs can be pruned to have a small size.

Discovering equivalences among program expressionsvs.

•If G=Join(G1,G2), then Size(G) can be Size(G1) £ Size(G2)

•There are programs, where size of SEDs after n joins is exponential in n.

18

The Prune Operation

Prune(G,k)

• For each node <V,t>, check if x 2 V is equal to some F-term of size less than k.

• If not, then delete all the nodes that are reachable from only <V,t>

19

Example: The Prune Operation

Prune(G,2)

y2, ?

y1, G

y4,y5

?

G

F

y2, F

y1, G

y4,y5

?

F

y6,?y3,? y7,?

20

Outline






21

The Strongest Postcondition Operation

G = SP(G0, x := e)

To obtain G from G’, do:• Delete label x from Node(x) in G0

• Let n=<V,t> be the node in G0 s.t. e 2 Terms(n)

(Add such a node to G0 if it does not already exists)

Add x to V.

22

F

Example: The Strongest Postcondition Operation

G0

z, u, F

x, ?

G = SP(G0, u := F(z,x))

z, F

x, ?

u, F

23

Outline






24

Fixed Point Computation and Complexity

• The lattice of sets of equivalences (among uninterpreted function terms) has height at most k.

• Complexity– Dominated by the cost of join operations– # of join operations: O(j £ k)– Each join operation: O(k2 £ N)

• This requires doing pruning while computing join– Total cost: O(k3 £ N £ j)

k: # of variablesN: size of program j: # of join points in program

25

Example

x := 1; y := 1;z := F(1,1);

x := 2; y := 2;z := F(2,2);

u := F(x,y);

Assert(u = z);

L1 L2

L3

L4

G1 z, F

x,y, 1

G2 z, F

x,y, 2

G3 = Join(G1,G2)

G3 z, F

x,y,?

G4 = Assignment(G3, u := F(x,y))

G4 u,z, F

x,y, ?

26

Conclusion

• Idea #1: Join of 2 SEDs = Product construction• Idea #2: Prune SEDs (Discovering

equivalences among program expressions does not require computing equivalences involving large terms)

Future Work• Inter-procedural value numbering• Abstract interpretation for combined theory of

linear arithmetic and uninterpreted functions

A Polynomial-Time Algorithm for Global Value Numbering SAS 2004 Sumit Gulwani George C. Necula.

Documents

f slide

algorithm g

g0g0 g

y assertz

join operation g1g1

f example

g definition of t

fb slide