Data Flow Analysis 3 15-411 Compiler Design Nov. 8, 2005.

Data Flow Analysis 3

15-411 Compiler Design

Nov. 8, 2005

Key Reference on Global Optimization

Gary A. Kildall, A Unified Approach to Global Program Optimization, ACM Symposium on Principles of Programming Languages, 1973, pages 194-206.

From the abstract:

“A technique is presented for global analysis of object code generated for expressions. The global expression optimization presented includes constant propagation, common sub-expression elimination, elimination of redundant register load operations and live expression analysis. A general purpose program flow analysis algorithm is developed which depends on an optimizing function. The algorithm is defined formally using a directed graph model of program flow structure and is shown to be correct. …”

Kildall’s Contribution

•A number of techniques had been developed for compile-time optimization to

locate redundant computations,

perform constant computations,

reduce the number of store-load sequences, etc.

•Some provided analysis of only straight-line sequences of instructions; others tried to take program branching into account.

•Kildall gave a single unified flow analysis algorithm which extended all the straight-line techniques to include branching.

•He stated the algorithm formally and proved it correct in his POPL paper.

Constant Propagation – Example program

begin

integer i, a, b, c, d, e;

a := 1; c:=0; …

for i :=1 step 1 until 10 do

begin b:= 2; …

d := a + b; …

e := b + c; …

c := 4; …

end

end

Directed Graph Representation

Nodes represent sequences of instructions with no branches. Edges represent control flow between nodes.

Constant Propagation

Convenient to associate a pool of propagated constants with each node in the graph.

Pool is a set of ordered pairs which indicate variables that have constant values when node is encountered.

The pool at node B denoted by PB consists of a single element (a,1) since the assignment a:= 1 must occur before B.

Constant Propagation (cont.)

Fundamental problem of constant propagation is to determine the pool of constants for each node in an arbitrary program graph.

By inspection of the program graph for the example, the pool of constants at each node is

PA = PB = {(a, 1)} PC = {(a, 1)} PD = {(a, 1), (b, 2)}

PE = {(a, 1), (b, 2), (d, 3)} PF = {(a, 1), (b, 2), (d, 3)}


PN may be determined for each node N in the graph as follows:

Consider each path (A, p1,p2, …, pn,N). Apply constant propagation along path to obtain set of constants at node N.

Intersection for each path to N is the set of constants which can be assumed for optimization.

(It is unknown what path will be taken at execution time, so intersection is conservative choice)

Global Analysis Algorithm--Informal

• Start with an entry node in the program graph, along with a given entry pool corresponding to this entry node.

• Process the entry node and produce optimization information for all immediate successors of the entry node.

• Intersect incoming optimizing pools with already established pools at the successor nodes.

(First time node is encountered, assume incoming pool is first approximation and continue processing.)

• for each successor, if amount of optimizing information is reduced by this intersection, then process successor like initial entry node.

Global Analysis Algorithm (cont)It is useful to define an optimizing function f which maps an input pool together with a particular node to a new output pool.

Given a set of propagated constants, it is possible to examine the operation of a particular node and determine the set of constants that can be assumed after the node is executed.

In the case of constant propagation, let V be a set of variables, C be a set of constants, and N be the set of nodes in the graph.

The set U = V £ C represents ordered pairs which may appear in any constant pool.

In fact, all constant pools are elements of the power set U, denoted P(U).

Thus, f: N £ P(U) ! P(U), where (v, c) 2 f(N, P) if and only if

(cont.)

Global Analysis Algorithm (cont.)

1. (v, c) 2 P and the operation at node N does not assign a new value to the variable v.

2. The operation at N assigns an expression to the variable v, and the expression evaluates to the constant c.


Successively longer paths from A to D can be evaluated, resulting in PD,3 , PD,4 , …, PD,n

for arbitrarily large n.

The pool of constants that can be assumed no matter what flow of control occurs is the set of constants common to all PD,i , i.e.

Åi PD,i

This procedure is not effective since the number of such paths may have no finite bound, and the procedure would not halt.

Optimization Function for Example

The optimizing function can be applied to node A with an empty constant pool resulting in

f(A, ; ) = {(a,1)}.

The function can be applied to B with {(a, 1)} as the constant pool yielding

f(B, {(a, 1)}) = {(a, 1), (c, 0)}.

Extending f to Paths in the Graph

Given a path from entry node A to an arbitrary node N, optimizing pool for path is determined by composing the function f.

For example, f(C, f(B, f(A, ;))) = {(a, 1), (c, 0), (b, 2)} is the constant pool for D for this path.


The pool of propagated constants at node D can be determined as follows:

A path from entry node A to the node D is (A, B, C, D). For this path the first approximation to the pool for D is

PD,1 = {(a, 1), (b, 2), (c, 0)}.

A longer path from A to D is (A, B, C, D, E, F, C, D) which results in the pool

PD,2 = {(a, 1), (b, 2), (c, 4), (d, 3), (e, 2)}.

Computing the Pool of Optimizing Information.

The pool of optimizing information which can be assumed at node N in the graph, independent of the path taken at execution time, is

PN = Å {x | x 2 FN}.

Here FN = { f(pn, f(pn-1, …, f(p1, P))…)| (p1, p2, …, pn, N) is a path from an entry node p1 with corresponding entry pool P to node N}.

Directed Graphs and Paths

A finite directed graph G = <N,E> is an arbitrary finite set of nodes N and edges E ½ N £ N.

A path from node A to node B in G is a sequence (p1, p2, …, pk ) such that p1 = A and pk = B where (pi, pi+1) 2 E for 16 i < k.

The length of the path is k – 1.

Program Graphs

A program graph is a finite directed graph G with a non-empty set of entry nodes I ½ N.

Given N 2 N we assume there exists a path (p1, p2, …, pn) such that p1 2 I and pn = N.

(i.e., there is a path to every node in the graph from an entry node.)

Successors and Predecessors of a Node

The set of immediate successors of a node N is given by

I(N) = { N’ 2 N | 9 (N,N’) 2 E}.

The set of immediate predecessors of N is given by

I-1(N) = {N’ 2 N| 9 (N’, N) 2 E}.

Meet-Semilatticies

Let the finite set L be the set of all possible optimizing pools for a given application.

Let Æ be a meet operation with the properties:

Æ : L £ L ! L

x Æ y = y Æ x

x Æ (y Æ z) = (x Æ y) Æ z

where x, y z 2 L. The set L and the Æ operation define a finite meet-semilattice.

Ordering on Meet-Semilattices

The Æ operation defines a partial ordering on L by

x 6 y if and only if x Æ y = x.

Similarly,

x < y if and only if x 6y and x y.

Generalized Meet Operation

If X ½ L, the generalized meet operation Æ X is defined as the pairwise application of Æ to the elements of X.

L is assumed to have a “zero element” 0 such that 0 6 x for all x 2 L.

An augmented set L’ is constructed from L by adding a “unit element” 1 such that 1 is not in L and 1 Æ x = x for all x in L.

The set L’ = L [ {1}. It follows that x <1 for all x in L.

Optimizing Function

An “optimizing function” f is defined

f: N £ L ! L .

It must have the homomorphism property:

F(N, x Æ y) = f(N, x) Æ f(N, y) for all N 2 N and x, y 2 L.

Note that f(N, x) < 1 for all N 2 N and x 2 L.

Global Analysis AlgorithmGlobal analysis starts with an entry pool set EP ½ I £ L, where (e, x) 2 EP if e 2 I is an entry node with optimizing pool x 2 L.

A1 [initialize] L := EP.

A2 [terminate ?] If L = ; then halt.

A3 [select node] Let L’ 2 L, L’ = (N, Pi) for some N 2 N and Pi 2 L.

Then L := L – {L’}.

A4 [Traverse] Let PN be the current approximate pool for node N

(Initially PN = 1). If PN 6 Pi the go to step A2.

A5 [set pool] PN := PN Æ Pi, L:= L [ {(N’, f(N, PN)) | N’ 2 I(N)}.

A6 [Loop] Go to step A2.

Data Flow Analysis 3 15-411 Compiler Design Nov. 8, 2005.

Documents

node b

entry node

node n

time node

b c c

global program optimization

pool of constants

global analysis algorithm