Top Banner
Course Outline • Traditional Static Program Analysis – Theory • Compiler Optimizations; Control Flow Graphs • Data-flow Analysis: Data-flow frameworks – Classic analyses and applications • Software Testing • Dynamic Program Analysis
24

Course Outline

Jan 06, 2016

Download

Documents

ziv

Course Outline. Traditional Static Program Analysis Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks Classic analyses and applications Software Testing Dynamic Program Analysis. Announcements. No class on Monday Tuesday follows Tuesday schedule - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Course Outline

Course Outline• Traditional Static Program Analysis

– Theory• Compiler Optimizations; Control Flow Graphs

• Data-flow Analysis: Data-flow frameworks

– Classic analyses and applications

• Software Testing

• Dynamic Program Analysis

Page 2: Course Outline

Announcements

• No class on Monday – Tuesday follows Tuesday schedule

• Homework 1 due today

• Homework 2posted– Due Monday, February 28th

• Requests for CS accounts sent to labstaff

Page 3: Course Outline

Outline

• Data-flow frameworks– Monotone frameworks– Distributive frameworks– A Non-distributive example: Points-to Analysis– The “Maximal Fixed Point” (MFP) solution– The “Meet Over all Paths” (MOP) solution

• Reading: Compilers: Principles, Techniques and Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.3

Page 4: Course Outline

Monotone Dataflow Frameworks

• Generic data-flow equations:in(i) = V out(m) out(i) = fi (in(i))

Parameters: – Property space: in(i), out(i) are elements of a property space

• Combination operator V: U for may problems and ∩ for must problems

• Initial values set to the 0 (smallest element) of the property space

– Transfer functions: fi is associated with node i

– If we instantiate these parameters in a certain way, then our analysis is an instance of the monotone dataflow framework

m in pred(i)

Page 5: Course Outline

Monotone Frameworks: Requirements

• The property space Is a complete lattice L under partial order ≤ where L satisfies the Ascending Chain Condition (i.e., all ascending chains are finite)

• The combination operator V Is the join (V, pronounced “vee”) of L

• Initial values set to the 0 of L

– Reaching Definitions: Property space? Combination operator?– Available Expressions: Property space? Combination operator?

Page 6: Course Outline

Monotone Frameworks: Requirements

• The transfer functions: fi: L L

Formally, there is space F such that – F contains all fi

– F contains the identity function id(x) = x– F is closed under composition

– Each fi is monotone

Page 7: Course Outline

Monotonicity

• It is defined as

(1) a ≤ b f(a) ≤ f(b)

• An equivalent definitions is (2) f(x) V f(y) ≤ f(x V y)

• Lemma: The two definitions are equivalent.

First, we show that (1) implies (2).

Second, we show that (2) implies (1).

Page 8: Course Outline

Distributive Frameworks

• A distributive framework: A monotone framework with distributive transfer functions: f(x) V f(y) = f(x V y).

Page 9: Course Outline

The four classical dataflow problems

Let AExp denote all expressions in the program.Let 2|AExp| denote the powerset of AExp

Let Def denote all definitions in the programLet 2|Def| denote the powerset of Def

L, elements

2|Def| 2|AExp|

L, ≤ 2|Def|, 2|AExp|,

L,∨ 2|Def|, U 2|AExp|, ∩

fiout(i)=(in(i)-kill(i))Ugen(i)kill(i) = ?gen(i) = ?

out(i)=(in(i)-kill(i))Ugen(i)kill(i) = ?gen(i) = ?

Reaching Definitions Available Expressions

Page 10: Course Outline

Distributive Frameworks

• Each of the four problems is an instance of a distributive framework.– First, prove monotonicity– Second, prove distributivity of the functions

Page 11: Course Outline

Distributivity

• Each of the four problems is an instance of a distributive framework.– First, prove monotonicityif in’(i) ≤ in”(i) then out’(i) ≤ out”(i)For Reaching Definitions we have to show:if in’(i) in”(i) then (in’(i)∩pres(i)) U gen(i) (in”(i)∩pres(i)) U gen(i)– Second, prove distributivity((in’(i) U in”(i))∩pres(i)) U gen(i) =

((in’(i)∩pres(i)) U gen(i)) U ((in”(i)∩pres(i)) U gen(i))

Page 12: Course Outline

• Points-to Analysis: a fundamental analysis. It computes the memory locations that a pointer variable may point to

Example 1:

int a, b;

int *p1, *p2;

p1 = &a;

p2 = p1;

*p2 = 1;

Points-to Analysis

Example 2:int a, b = 15;int *p1, *p2;int **p3;p3 = &p1;p1 = &a;p2 = *p3;*p2 = b;

Page 13: Course Outline

Points-to Analysis: Monotone, Non-distributive Analysis

• Lattice: The set of all points-to graphs Pt• ≤ is inclusion, Pt1 ≤ Pt2 if Pt1 is a subgraph of Pt2• V is union, P1 V P2 = P1 U P2• Transfer functions are defined on four kinds of statements:

– (1) f(p=&q) is “kill” all points-to edges from p, and “generate a new points-to edge from p to q

– (2) f(p=q) is “kill” all points-to edges from p, and “generate” new points-to edges from p to every x such that q points-to x

– (3) f(p=*q) is “kill” all points-to edges from p, and “generate” new points to edges from p to every x, such that there exists y and q points to y and y points to x

– (4) f(*p=q) Do not perform kill. Can you think of a reason why? “Generate” new points-to edges from every y to every x, such that p points to y and q points to x.

Page 14: Course Outline

Monotone non-distributive Analysis

• First, we show that the framework is monotone, – I.e., for each of the four transfer functions we have

to show that if Pt1 ≤ Pt2, then f(Pt1) ≤ f(Pt2)

• Second, we show that the framework is not distributive– It is easy to show f(Pt1 V Pt2) ≠ f(Pt1) V f(Pt2)

• Another example is constant propagation

Page 15: Course Outline

Non-distributivity of Points-to Analysis

p=&x;q=&y;

p=&z;q=&w;

*p=q

p q

x yPt1:

p q

z wPt2:

p q

x yf(Pt1):

p q

z wf(Pt2):

f(Pt1) V f(Pt2) :

p q

x yz w

Pt1 V Pt2 :

p q

x yz w

p q

x yz w

f(Pt1 V Pt2):

What f does: Adds edges from each variable that p points to (i.e., x and z), to each variable where q points to (i.e., y and w). 4 new edges: from x to y and w, and fromz to y and w.

Page 16: Course Outline

The Maximal Fixed Point (MFP)1

/* Initialize to initial values */in(1)=InitialValue; in(1) = UNDEF

for m := 2 to n do in(m) := 0; in(m) := ØW := {1,2,…,n} /* put every node on the worklist */

while W ≠ Ø do {remove i from W;out(i) = fi(in(i)); outRD(i) = inRD(i)∩pres(i)Ugen(i)

for j in successors(i)

if out(i) ≤ in(j) then { if outRD(i) not subset of inRD (j)

in(j) = out(i) V in(j); inRD(j) = out(i) U inRD(j) if j not in W do add j to W

}}

1. The Least Fixed Point (LFP) actually…

Page 17: Course Outline

Properties of the algorithm• Lemma1: The algorithm terminates.Sketch of the proof: We have ink(j) ≤ ink+1(j) and since L has ACC, in(j)

changes at most O(h) times. Thus, each j is put on W at most O(h) times (h is the height of the lattice L).

Complexity: At each iteration, the analysis examines e(j)out edges. Thus, number of basic operations is bounded by h*(e(1)out+…+e(N)out)=O(h*E). We can do better on reducible graphs.

Page 18: Course Outline

Properties of the Algorithm

• Lemma2: The algorithm computes the least solution of the dataflow equations.– For every node i MFP computes solution MFP(i)

= {in(i),out(i)}, such that every other solution {in’(i),out’(i)} of the dataflow equations is “larger” than the MFP

• Lemma3: The algorithm computes a correct (safe) solution.

Page 19: Course Outline

Example

1. z:=x+y

2. if (z > 500)

3. skip

inAE(2) = outAE(1) V outAE(3)

inout(3) = outAE(2)

inAE(1) = Ø

outAE(2) = inAE(2)

outAE(3) = inAE(3)

outAE(1) = (inAE(1)-Ez) {x+y}

Equivalent to: inAE(2) = {x+y} V inAE(2)and recall that V is ∩ (i.e., set intersection).

Solution1 Solution2Ø

{x+y}

{x+y}

{x+y}Ø

{x+y}

Ø

Ø

That is why we needed to initialize inAE(2) and the other initial values to the universal set of expressions (0 of the Available Expressions lattice), rather than to the more intuitive empty set.

Page 20: Course Outline

Meet Over All Paths (MOP) Solution1

• Desired dataflow information at n is obtained by traversing ALL PATHS from ρ to n. For every path p=(ρ, n1, n2 ..., nk) we compute fnk

(…fn2(fn1

(init(ρ))))

• The MOP at entry of n is V fnk(…fn2

(fn1(init(ρ))))

• The MOP is the best summary of dataflow facts possible to compute with this static analysis

ρ

n1n2

nk

p in paths from ρ to n

n

Page 21: Course Outline

MOP vs. MFP

• For distributive functions the dataflow analysis can merge paths (p1, p2), without loss of precision!– E.g., fp1(0) need not be calculated explicitly

– MFP=MOP

• Due to Kam and Ullman, 1976,1977: This is not true for monotone functions.

• Lemma 3: The MFP approximates the MOP for general monotone functions: MFP ≥ MOP

Page 22: Course Outline

Many Applications!

• White-box testing: compute coverage– Control-flow-based testing– Data-flow-based testing

• Intuitively, test each def-use chain

• Regression testing– Analyze changes and select regression tests that

actually test changed code

Page 23: Course Outline

Many Applications!

• Reverse engineering– UML class diagrams– UML sequence diagrams– Many tools do these; Eclipse plug-ins

• Automated refactoring– Analyze program, prove “safety” of the

refactoring– Eclipse plug-ins

Page 24: Course Outline

Many Applications!

• Static debugging– Memory errors in C/C++ programs

• Memory leaks

• Null pointer dereferences

• Array-out-of-bound accesses

– Concurrency errors in shared-memory apps• Data-races

• Atomicity violations

• Deadlocks