General Data Flow Frameworks Uday Khedker (www.cse.iitb.ac.in/˜uday) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2017 Part 1 About These Slides CS 618 General Frameworks: About These Slides 1/178 Copyright These slides constitute the lecture notes for CS618 Program Analysis course at IIT Bombay and have been made available as teaching material accompanying the book: • Uday Khedker, Amitabha Sanyal, and Bageshri Karkare. Data Flow Analysis: Theory and Practice. CRC Press (Taylor and Francis Group). 2009. (Indian edition published by Ane Books in 2013) Apart from the above book, some slides are based on the material from the following book • M. S. Hecht. Flow Analysis of Computer Programs. Elsevier North-Holland Inc. 1977. These slides are being made available under GNU FDL v1.2 or later purely for academic or research use. Sep 2017 IIT Bombay CS 618 General Frameworks: Outline 2/178 Outline • Modelling General Flows • Constant Propagation • Strongly Live Variables Analysis (after mid-sem) • Pointer Analyses (after mid-sem) • Heap Reference Analysis (after mid-sem) Sep 2017 IIT Bombay
67
Embed
About These Slides - Department of Computer Science and ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
General Data Flow Frameworks
Uday Khedker
(www.cse.iitb.ac.in/ uday)
Department of Computer Science and Engineering,
Indian Institute of Technology, Bombay
September 2017
Part 1
About These Slides
CS 618 General Frameworks: About These Slides 1/178
Copyright
These slides constitute the lecture notes for CS618 Program Analysis course atIIT Bombay and have been made available as teaching material accompanyingthe book:
• Uday Khedker, Amitabha Sanyal, and Bageshri Karkare. Data FlowAnalysis: Theory and Practice. CRC Press (Taylor and Francis Group).2009.
(Indian edition published by Ane Books in 2013)
Apart from the above book, some slides are based on the material from thefollowing book
• M. S. Hecht. Flow Analysis of Computer Programs. ElsevierNorth-Holland Inc. 1977.
These slides are being made available under GNU FDL v1.2 or later purely for
academic or research use.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Outline 2/178
Outline
• Modelling General Flows
• Constant Propagation
• Strongly Live Variables Analysis (after mid-sem)
• Pointer Analyses (after mid-sem)
• Heap Reference Analysis (after mid-sem)
Sep 2017 IIT Bombay
Part 2
Precise Modelling of General Flows
CS 618 General Frameworks: Precise Modelling of General Flows 3/178
Complexity of Constant Propagation?
1 a = b + 1 1
2 a = b + 1 2
3 b = c + 1 3
4 c = d + 1 4
5 d = 2 5
1 a = b + 1 1
2 a = b + 1 2
3 b = c + 1 3
4 c = d + 1 4
5 d = 2 5
Iteration #1
1 a = b + 1 1
2 a = b + 1 2
3 b = c + 1 3
4 c = 3 4
5 d = 2 5
Iteration #2
1 a = b + 1 1
2 a = b + 1 2
3 b = 4 3
4 c = 3 4
5 d = 2 5
Iteration #3
1 a = 5 1
2 a = 5 2
3 b = 3 3
4 c = 3 4
5 d = 2 5
Iteration #4
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 4/178
Loop Closures of Flow Functions
X
p1
X
p2
Xp3
x
f (x)
Paths Terminating at p2 Data Flow Value
p1, p2 xp1, p2, p3, p2 f (x)p1, p2, p3, p2, p3, p2 f (f (x)) = f 2(x)p1, p2, p3, p2, p3, p2, p3, p2 f (f (f (x))) = f 3(x). . . . . .
• For static analysis we need to summarize the value at p2 by a value whichis safe after any iteration.
f ∗(x) = x ⊓ f (x) ⊓ f 2(x) ⊓ f 3(x) ⊓ f 4(x) ⊓ . . .
• f ∗ is called the loop closure of f .
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 5/178
Loop Closure Boundedness
• Boundedness of f requires the existence of some k such that
f ∗(x) = x ⊓ f (x) ⊓ f 2(x) ⊓ . . . ⊓ f k−1(x)
• This follows from the descending chain condition
• For efficiency, we need a constant k that is independent of the size of thelattice
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 6/178
Loop Closures in Bit Vector Frameworks
• Flow functions in bit vector frameworks have constant Gen and Kill
f ∗(x) = x ⊓ f (x) ⊓ f 2(x) ⊓ f 3(x) ⊓ . . .
f 2(x) = f (Gen ∪ (x − Kill))
= Gen ∪ ((Gen ∪ (x − Kill))− Kill)
= Gen ∪ ((Gen − Kill) ∪ (x − Kill))
= Gen ∪ (Gen − Kill) ∪ (x − Kill)
= Gen ∪ (x − Kill) = f (x)
f ∗(x) = x ⊓ f (x)
• Loop Closures of Bit Vector Frameworks are 2-bounded.
• Intuition: Since Gen and Kill are constant, same things are generated orkilled in every application of f .
Multiple applications of f are not required unless the input value changes.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 7/178
Larger Values of Loop Closure Bounds
• Fast Frameworks ≡ 2-bounded frameworks (eg. bit vector frameworks)
Both these conditions must be satisfied
◮ SeparabilityData flow values of different entities are independent
◮ Constant or Identity Flow FunctionsFlow functions for an entity are either constant or identity
• Non-fast frameworks
At least one of the above conditions is violated
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 8/178
Separability
f : L → L is 〈h1, h2, . . . , hm〉 where hi computes the value of x i
Separable
〈 x1, x2, . . . , xm 〉
f
〈 y1, y 2, . . . , ym 〉
Non-Separable
〈 x1, x2, . . . , xm 〉
f
〈 y1, y 2, . . . , ym 〉
Example: All bit vector frameworks Example: Constant Propagation
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 8/178
Separability
f : L → L is 〈h1, h2, . . . , hm〉 where hi computes the value of x i
Separable
〈 x1, x2, . . . , xm 〉
h2
〈 y 1, y2, . . . , ym 〉
h : L → L
Non-Separable
〈 x1, x2, . . . , xm 〉
h2
〈 y 1, y2, . . . , ym 〉
h : L → L
Example: All bit vector frameworks Example: Constant Propagation
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 9/178
Separability of Bit Vector Frameworks
• L is {0, 1}, L is {0, 1}m
• ⊓ is either boolean AND or boolean OR
• ⊤ and ⊥ are 0 or 1 depending on ⊓.• h is a bit function and could be one of the following:
Raise Lower Propagate Negate
⊤⊥
⊤⊥
⊤⊥
⊤⊥
⊤⊥
⊤⊥
⊤⊥
⊤⊥
Non-monotonicity
Sep 2017 IIT Bombay
CS 618 General Frameworks: Precise Modelling of General Flows 10/178
CS 618 General Frameworks: Constant Propagation 13/178
Overall Lattice for Integer Constant Propagation
• Inn/Outn values are mappings Var → L : Inn,Outn ∈ Var → L
• Overall lattice L is a set of mappings Var → L : L = Var → L
• ⊓ and ⊓ get defined by ⊑ and ⊑◮ Partial order is restricted to data flow values of the same variable
Data flow values of different variables are incomparable
(x , v1) ⊑ (y , v2) ⇔ x = y ∧ v1⊑v2
OR x 7→ v1 ⊑ y 7→ v2 ⇔ x = y ∧ v1⊑v2
◮ For meet operation, we assume that X is a total functionPartial functions are made total by using ⊤value
X ⊓ Y ={(x , v1⊓v2) | (x , v1) ∈ X , (x , v2) ∈ Y
}
OR X ⊓ Y ={x 7→ v1⊓v2 | x 7→ v1 ∈ X , x 7→ v2 ∈ Y
}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 14/178
Notations for Mappings as Data Flow Values
Accessing and manipulating a mapping X ⊆ A → B
• X (a) denotes the image of a ∈ A
X (a) ∈ B
• X [a 7→ v ] changes the image of a in X to v
X [a 7→ v ] = (X − {(a, u) | u ∈ B}) ∪ {(a, v)}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 15/178
Defining Data Flow Equations for Constant Propagation
Inn =
BI = {〈y , ud〉 | y ∈ Var} n = Start
p∈pred(n)Outp otherwise
Outn = fn(Inn)
fn(X ) =
X [y 7→ c] n is y = c , y ∈ Var, c ∈ ConstX [y 7→ nc] n is input(y), y ∈ varX [y 7→ X (z)] n is y = z , y ∈ Var, z ∈ VarX [y 7→ eval(e,X )] n is y = e, y ∈ Var, e ∈ ExprX otherwise
eval(e,X ) =
nc a ∈ Opd(e) ∩ Var,X (a) = ncud a ∈ Opd(e) ∩ Var,X (a) = ud−X (a) e is − aX (a)⊕ X (b) e is a⊕ b
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 16/178
Example Program for Constant Propagation
n1 input (e); n1
n2a = 7; b = 2; f = e;
if (f > 0) n2
n3a = 2;
if (f ≥ e + 2) n3
n4b = c + 1;if (b ≥ 7) n4
n6 if (f ≥ e + 1) n6
n5 f = f + 1; n5
n7 c = d ∗ a; n7n8 d = a+ b; n8
n9d = a+ 1;f = f + 1 n9n10 e = a+ b; n10
false
truefalse
falsetrue false
true
true
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 16/178
Example Program for Constant Propagation
n1 input (e); n1
n2a = 7; b = 2; f = e;
if (f > 0) n2
n3a = 2;
if (f ≥ e + 2) n3
n4b = c + 1;if (b ≥ 7) n4
n6 if (f ≥ e + 1) n6
n5 f = f + 1; n5
n7 c = d ∗ a; n7n8 d = a+ b; n8
n9d = a+ 1;f = f + 1 n9n10 e = a+ b; n10
false
truefalse
falsetrue false
true
true
For readability, we have combined manystatements in a single block. However, con-stant propagation requires every basic blockto contain a single statement because of thepresence of dependent parts in flow functions.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 17/178
Result of Constant Propagation
Iteration #1 Changes in Changes in Changes initeration #2 iteration #3 iteration #4
CS 618 General Frameworks: Constant Propagation 18/178
Result of Constant Propagation
n1 input (e); n1
n2a = 7; b = 2; f = e;
if (f > 0) n2
n3a = 2;
if (f ≥ e + 2) n3
n4b = c + 1;if (b ≥ 7) n4
c = 6
n6 if (f ≥ e + 1) n6
n5 f = f + 1; n5
n7 c = d ∗ a; n7
a = 2, d = 3
n8 d = a+ b; n8
a = 2
n9d = a+ 1;f = f + 1 n9
a = 2
n10 e = a+ b; n10
false
truefalse
falsetrue false
true
true
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 19/178
Monotonicity of Constant Propagation
Proof obligation: X1 ⊑ X2 ⇒ fn(X1) ⊑ fn(X2)where,
fn(X ) =
X [y 7→ c] n is y = c , y ∈ Var, c ∈ Const (C1)X [y 7→ nc] n is input(y), y ∈ var (C2)X [y 7→ X (z)] n is y = z , y ∈ Var, z ∈ Var (C3)X [y 7→ eval(e,X )] n is y = e, y ∈ Var, e ∈ Expr (C4)X otherwise (C5)
• The proof obligation trivially follows for cases C1, C2, C3, and C5
• For case C4, it requires showing
X1 ⊑ X2 ⇒ eval(e,X1) ⊑ eval(e,X2)
which follows from the definition of eval(e,X )
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 20/178
Non-Distributivity of Constant Propagation
n1
a = 1b = 2
c = a+ bn1
n2c = a+ bd = a ∗ b n2
n3
d = c − 1a = 2b = 1
c = a+ b
n3
a = 1, b = 2
a = 2, b = 1
• x = 〈1, 2, 3, ?〉 (Along Outn1 → Inn2)
• y = 〈2, 1, 3, 2〉 (Along Outn3 → Inn2)
• Function application before merging
f (x) ⊓ f (y) = f (〈1, 2, 3, ?〉) ⊓ f (〈2, 1, 3, 2〉)= 〈1, 2, 3, 2〉 ⊓ 〈2, 1, 3, 2〉= 〈⊥, ⊥, 3, 2〉
• Function application after merging
f (x ⊓ y) = f (〈1, 2, 3, ?〉 ⊓ 〈2, 1, 3, 2〉)= f (〈⊥, ⊥, 3, 2〉)= 〈⊥, ⊥, ⊥, ⊥〉
• f (x ⊓ y) ⊏ f (x) ⊓ f (y)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 21/178
Why is Constant Propagation Non-Distributive?
a = 1b = 2
a = 2b = 1
c = a+ b
a = 1 a = 2 b = 1 b = 2
Possible combinations due to merging
c = a+ b = 3
• Correct combination.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 21/178
Why is Constant Propagation Non-Distributive?
a = 1b = 2
a = 2b = 1
c = a+ b
a = 1 a = 2 b = 1 b = 2
Possible combinations due to merging
c = a+ b = 3
• Correct combination.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 21/178
Why is Constant Propagation Non-Distributive?
a = 1b = 2
a = 2b = 1
c = a+ b
a = 1 a = 2 b = 1 b = 2
Possible combinations due to merging
c = a+ b = 2
• Wrong combination.
• Mutually exclusive information.
• No execution path along which thisinformation holds.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 21/178
Why is Constant Propagation Non-Distributive?
a = 1b = 2
a = 2b = 1
c = a+ b
a = 1 a = 2 b = 1 b = 2
Possible combinations due to merging
c = a+ b = 4
• Wrong combination.
• Mutually exclusive information.
• No execution path along which thisinformation holds.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 22/178
Tutorial Problem on Constant Propagation
How many iterations do we need?
n1 a = b n1
n2 a = b n1
n3 d = 2 n1n4 a = b n1
n5 c = d n1n6 a = b n1
n7 b = c n1n8 a = b n1
n10 a = b n10n9 a = b n1
23456
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 22/178
Tutorial Problem on Constant Propagation
How many iterations do we need?
n1 a = b n1
n2 a = b n1
n3 d = 2 n1n4 a = b n1
n5 c = d n1n6 a = b n1
n7 b = c n1n8 a = b n1
n10 a = b n10n9 a = b n1
23456
• Every back edge occurs only once in the ifpfrom n3 to n1 that goes via n5, n7, and n9.
• 5 + 1 iterations for computing data flow values
(+1 iteration to detect convergence)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 23/178
Tutorial Problem on Constant Propagation
And now how many iterations do we need?
n1 a = b n1
n2 a = b n1
n3 a = b n1n4 a = b n1
n5 b = c n1n6 a = b n1
n7 c = d n1n8 a = b n1
n10 a = b n10n9 d = 2 n1
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 23/178
Tutorial Problem on Constant Propagation
And now how many iterations do we need?
n1 a = b n1
n2 a = b n1
n3 a = b n1n4 a = b n1
n5 b = c n1n6 a = b n1
n7 c = d n1n8 a = b n1
n10 a = b n10n9 d = 2 n1
Back edge n10 → n1 needs to betraversed once each for back edgesn9 → n8, n7 → n6, n5 → n4, andn3 → n2 (in that order).⇒ 8 + 1 iterations.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Constant Propagation 24/178
• Is p→data live at the exit of line 5? Can we delete line 5?
• We cannot delete line 5 if p and q can be possibly aliased
(while loop or do-while loop with a circular list)
• We can delete line 5 if p and q are definitely not aliased
(do-while loop without a circular list)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 44/178
Code Optimization In Presence of Pointers
a = 5
x = &a
b = ∗x
a = 5
x = &a
b = ∗x
a = 5
x = &a
b = 5
Original Program Constant Propagation Constant Propagationwithout aliasing with aliasing
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 45/178
The World of Pointer Analysis
Alias Analysis Pointer Analysis
Alias analysisof referenceparameters,
fields of unionsarray indices
Alias analysis ofdata pointers
Points-toanalysis ofdata andfunctionpointers
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 46/178
Pointer Analysis Musings
• Pointer analysis collects information about indirect accesses in programs
◮ Enables precise data analysis◮ Enable precise interprocedural control flow analysis
• Needs to scale to large programs
• Pointer Analysis Musings
◦ Which Pointer Analysis should I Use?
Michael Hind and Anthony Pioli. ISTAA 2000
◦ Pointer Analysis: Haven’t we solved this problem ?
Michael Hind PASTE
yetyet
20012001
◦ 2017 . . .
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 47/178
The Mathematics of Pointer Analysis
In the most general situation
• Alias analysis is undecidable.
Landi-Ryder [POPL 1991], Landi [LOPLAS 1992],Ramalingam [TOPLAS 1994]
• Flow insensitive alias analysis is NP-hard
Horwitz [TOPLAS 1997]
• Points-to analysis is undecidable
Chakravarty [POPL 2003]
Adjust your expectations suitably to avoid disappointments!
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 48/178
The Engineering of Pointer Analysis
So what should we expect? To quote Hind [PASTE 2001]
• “Fortunately many approximations exist”
• “Unfortunately too many approximations exist!”
Engineering of pointer analysis is much more dominant than its science
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 49/178
Pointer Analysis: Engineering or Science?
• Engineering view. ◮ Build quick approximations◮ The tyranny of (exclusive) OR!
Precision OR Efficiency?
• Science view. ◮ Build clean abstractions◮ Can we harness the Genius of AND?
Precision AND Efficiency?
• A distinction between approximation and abstraction is subjective
Our working definition
◮ Abstractions focus on precision and conciseness of modelling◮ Approximations focus on efficiency and scalability
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 50/178
An Outline of Pointer Analysis Coverage
• The larger perspective
• Comparing Points-to and Alias information Next Topic
• Flow Insensitive Points-to Analysis
• Flow Sensitive Points-to Analysis
• Pointer Analyses: An Engineer’s Landscape
• Liveness Based Points-to Analysis
• Generalizations to Heap, Arrays, Pointer Arithmetic, and Unions
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 51/178
Alias Information Vs. Points-to Information
1 x = &a 1
2 b = x 2
a ax ab a
“x Points-to a”denoted xa
a ax ab a
“x and b are Aliases”denoted x ⊜ b
Symmetricand
Reflexive
NeitherSymmetric
Nor Reflexive
• What about transitivity?
◮ Points-to: No.◮ Alias: Depends.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 52/178
Comparing Points-to and Alias Relations (1)
Statement Memory Points-to Aliases
x = &y
x yBefore(assume)
x yAfter
Existing
New xy
Existing
New Direct x ⊜&y
x = y
x y zBefore(assume)
x y zAfter
Existing yz
New xz
Existing y ⊜&z
New Direct x ⊜ y
New Indirect x ⊜&z
• Indirect aliases. Substitute a name by its aliases for transitivity
• Derived aliases. Apply indirection operator to aliases (ignored here)
x ⊜ y ⇒ ∗x ⊜ ∗y
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 53/178
Comparing Points-to and Alias Relations (2)
Statement Memory Points-to Aliases
∗x = y
Before(assume)
x y z u
After x y z u
Existingxuyz
New uz
Existingx ⊜&uy ⊜&z
New Direct ∗x ⊜ y
New Indirectu⊜&zy ⊜ u∗x ⊜&z
x = ∗yx y z uBefore
(assume)
After x y z u
Existingyz
zu
New xu
Existingy ⊜&zz ⊜&u
∗y ⊜&u
New Direct x ⊜ ∗y
New Indirectx ⊜&ux ⊜ z
The resulting memories look similar but are different. In the first case we haveuz whereas in the second case the arrow direction is opposite (i.e. zu).
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 54/178
Comparing Points-to and Alias Relations (3)
• Points-to information records edges in the memory graph
◮ aliases of the kind x ⊜ &yx holds the address of y
◮ other aliases can be discovered by composing edges◮ since addresses are explicated, it can represent only those memory
locations that can be named at compile time
More compact but less general
• Alias information records paths in the memory graph
◮ paths incident on the same nodex and y hold the same address (and the address is left implicit)
◮ since addresses are implicit, it can represent unnamed memorylocations too
◮ if we have x ⊜ y then ∗x ⊜ ∗y is redundant and is not recorded
More general and more complex
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 55/178
An Outline of Pointer Analysis Coverage
• The larger perspective
• Comparing Points-to and Alias information
• Flow Insensitive Points-to Analysis Next Topic
• Flow Sensitive Points-to Analysis
• Pointer Analyses: An Engineer’s Landscape
• Liveness Based Points-to Analysis
• Generalizations to Heap, Arrays, Pointer Arithmetic, and Unions
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 56/178
Flow Sensitive Vs. Flow Insensitive Pointer Analysis
CS 618 General Frameworks: Pointer Analyses 63/178
Inclusion Based (aka Andersen’s) Points-to Analysis:Example 1
a = &b
Program
1
c = a2
a = &d3 a = &e4
b = a5
Node Constraint
1 Pa ⊇ {b}2 Pc ⊇ Pa
3 Pa ⊇ {d}4 Pa ⊇ {e}5 Pb ⊇ Pa
Points-to Graph
a
b
c
d
• Since Pa has changed, Pc needsto be processed again
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 63/178
Inclusion Based (aka Andersen’s) Points-to Analysis:Example 1
a = &b
Program
1
c = a2
a = &d3 a = &e4
b = a5
Node Constraint
1 Pa ⊇ {b}2 Pc ⊇ Pa
3 Pa ⊇ {d}4 Pa ⊇ {e}5 Pb ⊇ Pa
Points-to Graph
a
b
c
d
e
• Observe that Pc is processed for the third time
• Order of processing the sets influencesefficiency significantly
• A plethora of heuristics have been proposed
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 63/178
Inclusion Based (aka Andersen’s) Points-to Analysis:Example 1
a = &b
Program
1
c = a2
a = &d3 a = &e4
b = a5
Node Constraint
1 Pa ⊇ {b}2 Pc ⊇ Pa
3 Pa ⊇ {d}4 Pa ⊇ {e}5 Pb ⊇ Pa
Points-to Graph
a
b
c
d
e
Actually:
• c does not point to any location in block 1
• a does not point b in block 5
(the method ignores the kill due to 3 and 4)
• b does not point to itself at any time
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 64/178
Equality Based (aka Steensgaard’s) Points-to Analysis:Example 1
a = &b
Program
1
c = a2
a = &d3 a = &e4
b = a5
Node Constraint
1Pa ⊇ {b}Unify(x , d), x ∈ Pa
2 UnifyPTS(c , a)
3Pa ⊇ {d}Unify(x , d), x ∈ Pa
4Pa ⊇ {e}Unify(x , e), x ∈ Pa
5 UnifyPTS(b, a)
Points-to Graph
a
c
bde
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 64/178
Equality Based (aka Steensgaard’s) Points-to Analysis:Example 1
a = &b
Program
1
c = a2
a = &d3 a = &e4
b = a5
Node Constraint
1Pa ⊇ {b}Unify(x , d), x ∈ Pa
2 UnifyPTS(c , a)
3Pa ⊇ {d}Unify(x , d), x ∈ Pa
4Pa ⊇ {e}Unify(x , e), x ∈ Pa
5 UnifyPTS(b, a)
Points-to Graph
a
c
bde
a
b
d
e
c
• The full blown up points-to graphhas far more edges than in the graphcreated by Andersen’s method
• Far more efficient but far less precise
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 65/178
Comparing Equality and Inclusion Based Analyses (2)
• Andersen’s algorithm is cubic in number of pointers
• Steensgaard’s algorithm is nearly linear in number of pointers
◮ How can it be more efficient by an orders of magnitude?
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 66/178
Efficiency of Equality Based Approach
Program Andersen’s approach Steensgaard’s approach
a = &ba = &cb = &db = &c
a
b
c
d
abcd
• Andersen’s inclusion based wisdom:
◮ Add edges and let the number of successors increase
• Steensgaard’s equality based wisdom:
◮ Merge multiple successors and maintain a single successor of anynode
◮ Since a larger number of pointers treated are alike and fewerdistinctions are maintained, we get much smaller points-to graphs
◮ Efficient Union-Find algorithms to merge intersecting subsets
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 67/178
Inclusion Based (aka Andersen’s) Points-to Analysis:Example 2
n1
x = &yy = &zz = &u
n1
n2 ∗z = y n2 n3 z = y n3
n4
y = &xuse uuse x
n4
x
Points-to Graph
y z u
Constraints onPoints-to Sets
Px ⊇ {y}Py ⊇ {z}Pz ⊇ {u}
∀w ∈ Pz , Pw ⊇ Py
Pz ⊇ Py
Py ⊇ {x}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 68/178
Equality Based (aka Steensgaard’s) Points-to Analysis:Example 2
n1
x = &yy = &zz = &u
n1
n2 ∗z = y n2 n3 z = y n3
n4
y = &xuse uuse x
n4
Steensgaard’s Points-to Graph
x y
z u
• Treat all pointees of apointer as “equivalent”locations
• Transitive closure
Pointees of allequivalent locationsbecome equivalent
Effective additionalconstraints
Unify (x , y)/* pointees of x */
Unify (x , z)/* pointees of y */
Unify (x , u)/* pointees of z */
⇒ x , y , z , u areequivalent
⇒ Complete graph
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 69/178
Tutorial Problem for Flow Insensitive Pointer Analysis (1)
Program Inclusion based Equality based
p = &qr = &st = &pu = p∗t = r
t
u
r
p
q
s
t
u
r
p
q
s
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 70/178
Tutorial Problems for Flow Insensitive Pointer Analysis (2)
Compute flow insensitive points-to information using inclusion based method aswell as equality based method
if (. . . )p = &x;
elsep = &y;
x = &a;y = &b;∗p = &c;∗y = &a;
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 71/178
Tutorial Problem for Flow Insensitive Pointer Analysis (3)
Compute flow insensitive points-to information using inclusion based method aswell as equality based method
n1 b = &a; n1
n2 c = b; n2
n3 a = &b; n3 n4 a = &c ; n5
n5 a = ∗a; n6
n6 ∗b = c ; n7
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 72/178
An Outline of Pointer Analysis Coverage
• The larger perspective
• Comparing Points-to and Alias information
• Flow Insensitive Points-to Analysis
• Flow Sensitive Points-to Analysis Next Topic
• Pointer Analyses: An Engineer’s Landscape
• Liveness Based Points-to Analysis
• Generalizations to Heap, Arrays, Pointer Arithmetic, and Unions
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 73/178
Must Points-to Information
1 x = &a 1
2 x = &b 2 3 x = &b 3
4 x = &b 4
a ax ab a
a ax ab a
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 74/178
May Points-to Information
1 x = &a 1
2 x = &b 2 3 x = &b 3
4 x = &b 4
a ax ab a
a ax ab a
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 75/178
Must Alias Information
1 x = &a 1
2 b = x 2
3 x = &b 3 4 x = &b 4
5 y = b 5
a ax ab ay a
a ax ab ay a
a ax ab ay a
x ⊜ b and b ⊜ y ⇒ x ⊜ y
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 76/178
May Alias Information
1 x = &a 1
2 b = &z 2
3 b = x 3 4 y = b 4
5 y = b 5
a ax ab ay az a
a ax ab ay az a
a ax ab ay az a
a ax ab ay az a
a ax ab ay az a
x ⊜ b and b ⊜ y 6⇒ x ⊜ y
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 77/178
Strong and Weak Updates
1 x = &a 1
2y = &bw = &c 2
3 z = &x 3 4 z = &y 4
5∗z = &e∗w = &e 5
Weak update: Modification of x or y due to ∗z in block 5
Strong update: Modification of c due to ∗w in block 5
How is this concept related to May/Must nature of information?
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 78/178
What About Heap Data?
• Compile time entities, abstract entities, or summarized entities
• Three options:
◮ Represent all heap locations by a single abstract heap location◮ Represent all heap locations of a particular type by a single abstract
heap location◮ Represent all heap locations allocated at a given memory allocation
site by a single abstract heap location
• Summarization: Usually based on the length of pointer expression
• Initially, we will restrict ourselves to stack and static data
We will later introduce heap using the allocation site based abstraction
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 79/178
Lattice for May Points-to Analysis
Let P ⊆ Var be the set of pointers. Assume Var = {p, q} and P = {p}
Product View Mapping view
∅
{(p, p)} {(p, q)}
{(p, p), (p, q)}
Points-to graph as alist of directed edges
{(p, ∅)}
{(p, {p})} {(p, {q})}
{(p, {p, q})}
Points-to graph as alist of adjacency lists
Data flow values ⊆ P× Var
Lattice =(2P×Var,⊇
)Data flow values ∈ P → 2Var
Lattice =(P → 2Var,⊑map
)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 80/178
Lattice for Must Points-to Analysis
Let P ⊆ Var be the set of pointers. Assume Var = {p, q, r} and P = {p}
Mapping View Set View
{(p, ⊤)
}
{(p, p)} {(p, r)}{(p, q)}
{(p, ⊥)
}
⊤
p q r
⊥
ComponentLattice
{(p, p), (p, q), (p, r)
}
{(p, p)} {(p, r)}{(p, q)}
∅
Data flow values = P → Var ∪{⊤, ⊥
}
Lattice =(2P→Var∪{⊤,⊥},⊑map
) Restricted subset of P× Var
∩ can be used for ⊓
A pointer can point to at most one location
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 81/178
Lattice for Combined May-Must Points-to Analysis (1)
• Consider the following abbreviation of the May-Must lattice L
Unknown
No Must
May
abbreviated as
un
no mt
my
• For Var = {p, q}, P = {p}, the May-Must points-to lattice is the product
P× Var× L
◮ Some elements are prohibited because of the semantics of Must◮ If we have (p,p,mt) in a data flow value X ∈ P× Var× L , then
◮ we cannot have (p,q,un), (p,q,mt), or (p,q,my) in X◮ we can only have (p,q,no) in X
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 82/178
Lattice for Combined May-Must Points-to Analysis (2)
For Var = {p, q}, P = {p}, the May-Must points-to lattice is
{(p,p,un), (p,q,un)}
{(p,p,un),(p,q,no)}
{(p,p,no),(p,q,un)}
{(p,p,mt),(p,q,un)}
{(p,p,un),(p,q,mt)}
{(p,p,no),(p,q,no)}
{(p,p,mt),(p,q,no)}
{(p,p,my ),(p,q,un)}
{(p,p,un),(p,q,my )}
{(p,p,no),(p,q,mt)}
{(p,p,mt),(p,q,mt)}
{(p,p,my ),(p,q,no)}
{(p,p,no),(p,q,my )}
{(p,p,mt),(p,q,my )}
{(p,p,my ),(p,q,mt)}
{(p,p,my ),(p,q,my )}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 82/178
Lattice for Combined May-Must Points-to Analysis (2)
For Var = {p, q}, P = {p}, the May-Must points-to lattice is
{(p,p,un), (p,q,un)}
{(p,p,un),(p,q,no)}
{(p,p,no),(p,q,un)}
{(p,p,mt),(p,q,un)}
{(p,p,un),(p,q,mt)}
{(p,p,no),(p,q,no)}
{(p,p,mt),(p,q,no)}
{(p,p,my ),(p,q,un)}
{(p,p,un),(p,q,my )}
{(p,p,no),(p,q,mt)}
{(p,p,mt),(p,q,mt)}
{(p,p,my ),(p,q,no)}
{(p,p,no),(p,q,my )}
{(p,p,mt),(p,q,my )}
{(p,p,my ),(p,q,mt)}
{(p,p,my ),(p,q,my )}
Prohibited
Allowed
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 82/178
Lattice for Combined May-Must Points-to Analysis (2)
For Var = {p, q}, P = {p}, the May-Must points-to lattice is
{(p,p,un), (p,q,un)}
{(p,p,un),(p,q,no)}
{(p,p,no),(p,q,un)}
{(p,p,no),(p,q,no)}
{(p,p,mt),(p,q,no)}
{(p,p,my ),(p,q,un)}
{(p,p,un),(p,q,my )}
{(p,p,no),(p,q,mt)}
{(p,p,my ),(p,q,no)}
{(p,p,no),(p,q,my )}
{(p,p,my ),(p,q,my )}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 82/178
Lattice for Combined May-Must Points-to Analysis (2)
For Var = {p, q}, P = {p}, the May-Must points-to lattice is
{(p,p,un), (p,q,un)}
{(p,p,un),(p,q,no)}
{(p,p,no),(p,q,un)}
{(p,p,no),(p,q,no)}
{(p,p,mt),(p,q,no)}
{(p,p,my ),(p,q,un)}
{(p,p,un),(p,q,my )}
{(p,p,no),(p,q,mt)}
{(p,p,my ),(p,q,no)}
{(p,p,no),(p,q,my )}
{(p,p,my ),(p,q,my )}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 83/178
May and Must Analysis for Killing Points-to Information (1)
May Points-to Analysis
• (a, b) should be inMayIn5
Holds along path 1-3-4
• Block 4 should not kill(a, b)
• Possible if pointee set ofc is ∅ (Use MustIn4)
• However, MayIn4contains (c , a)
1 a=&b 1
2 c=&a 23 c=d 3
4 ∗c=&e 4
5 ∗c=e 4
Must Points-to Analysis
• (a, b) should not be inMustIn5
Does not hold along path1-2-4
• Block 4 should kill (a, b)
• Possible if pointee set ofc is {a} (Use MayIn4)
• However, MustIn4contains (a, b)
For killing points-to information through indirection,
• Must points-to analysis should identify pointees of c using MayIn4
• May points-to analysis should identify pointees of c using MustIn4
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 84/178
May and Must Analysis for Killing Points-to Information (2)
• May Points-to analysis should remove a May points-to pair
◮ only if it must be removed along all paths
Kill should remove only strong updates
⇒ should use Must Points-to information
• Must Points-to analysis should remove a Must points-to pair
◮ if it can be removed along any path
Kill should remove all weak updates
⇒ should use May Points-to information
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 85/178
Discovering Must Points-to Information from May Points-toInformation
a = &bb = &e1
c = &a2 c = &d3
∗c = &d∗a = &e4
a ? b ?
c ? e ?
c ?
a b e
c ?
a b e
c a b e
?
• BI. every pointer points to “?”
• Perform usual may points-toanalysis
• Since c has multiple pointees, itis a MAY relation
• Since a has a single pointee, itis a MUST relation
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 86/178
Relevant Algebraic Operations on Relations (1)
• Let P ⊆ Var be the set of pointer variables
• May-points-to information: A =⟨2P×Var,⊇
⟩
• Standard algebraic operations on points-to relations
Given relation R ⊆ P× Var and X ⊆ P,
◮ Relation application R X = {v | u ∈X ∧ (u, v) ∈R}(Find out the pointees of the pointers contained in X )
◮ Relation restriction (R |X ) R |X = {(u, v) ∈ R | u ∈ X}(Restrict the relation only to the pointers contained in X byremoving points-to information of other pointers)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 87/178
Relevant Algebraic Operations on Relations (2)
LetVar = {a, b, c , d , e, f , g , ?}P = {a, b, c , d , e}R = {(a, b), (a, c), (b, d), (c , e), (c , g), (d , a), (e, ?)}X = {a, c}
Then,R X = {v | u ∈X ∧ (u, v) ∈R}
= {b, c , e, g}R |X = {(u, v) ∈ R | u ∈ X}
= {(a, b), (a, c), (c , e), (c , g)}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 88/178
Points-to Analysis Data Flow Equations
Ainn =
Var×{?} n is Startp⋃
p∈pred(n)
Aoutp otherwise
Aoutn =(Ainn −
(Killn × Var
))∪(Defn × Pointeen
)
• Ain/Aout: sets of mAy points-to pairs
• Killn, Defn, and Pointeen are defined in terms of Ainn
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 88/178
Points-to Analysis Data Flow Equations
Ainn =
Var×{?} n is Startp⋃
p∈pred(n)
Aoutp otherwise
Aoutn =(Ainn −
(Killn × Var
))∪(Defn × Pointeen
)
• Ain/Aout: sets of mAy points-to pairs
• Killn, Defn, and Pointeen are defined in terms of Ainn
Pointers that aredefined (i.e. pointers inwhich addresses are
stored)
Pointees (i.e. locationswhose addresses are
stored)
Pointers whosepoints-to relations should
be removed
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Values defined in terms of Ainn (denoted A)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Values defined in terms of Ainn (denoted A)
Pointees of y inAinn are the targets of
defined pointers
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Values defined in terms of Ainn (denoted A)
Pointees of thosepointees of y in Ainn which
are pointers
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Values defined in terms of Ainn (denoted A)
Pointees ofx in Ainn receive new
addresses
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Must(R) =⋃
z∈P
{z} ×{
{w} R{z} = {w} ∧ w 6= ?
∅ otherwise
Values defined in terms of Ainn (denoted A)Strong update usingmust-points-to information
computed from Ainn
Find outmust-pointees of
all pointers
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Must(R) =⋃
z∈P
{z} ×{
{w} R{z} = {w} ∧ w 6= ?
∅ otherwise
Values defined in terms of Ainn (denoted A)Strong update usingmust-points-to information
computed from Ainn
z has a single pointeew in must-points-to
relation
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Must(R) =⋃
z∈P
{z} ×{
{w} R{z} = {w} ∧ w 6= ?
∅ otherwise
Values defined in terms of Ainn (denoted A)Strong update usingmust-points-to information
computed from Ainn
z has no pointeein must-points-to
relation
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 89/178
Extractor Functions for Points-to Analysis
Defn Killn Pointeen
use x ∅ ∅ ∅x = &a {x} {x} {a}x = y {x} {x} A{y}x = ∗y {x} {x} A(A{y} ∩ P)
∗x = y A{x} ∩ P Must(A){x} ∩ P A{y}other ∅ ∅ ∅
Must(R) =⋃
z∈P
{z} ×{
{w} R{z} = {w} ∧ w 6= ?
∅ otherwise
Values defined in terms of Ainn (denoted A)
Pointees of y inAinn are the targets of
defined pointers
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 90/178
An Example of Flow Sensitive May Points-to Analysis
n1
x = &yy = &zz = &u
n1
Assume thatthe program istype correct
n2 ∗z = y n2 n3 z = y n3
n4 ∗u = &x n4
n5 ∗y = &y n5
x y z u ?
x y z u ?
x y z u ? x y z u ?
x y z u x y z u ?
x y z u ?
Weak Update
x y z u ?
x y z u ?
Strong Update
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 91/178
Tutorial Problems for Flow Sensitive Pointer Analysis (2)
Compute May and Must points-to information
if (. . . )p = &x;
elsep = &y;
x = &a;y = &b;∗p = &c;∗y = &a;
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 92/178
Non-Distributivity of Points-to Analysis
May Points-to Must Points-to
n1 ∗x = y n1
n2 x = &z n2 n3 y = &w n3
n4 ∗x = y n4
n1 ∗x = y n1
n2b = &cc = &d n2 n3
b = &ee = &d n3
n4 a = ∗b n4
zw is spurious ad is missing
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 93/178
An Outline of Pointer Analysis Coverage
• The larger perspective
• Comparing Points-to and Alias information
• Flow Insensitive Points-to Analysis
• Flow Sensitive Points-to Analysis
• Pointer Analyses: An Engineer’s Landscape Next Topic
• Liveness Based Points-to Analysis
• Generalizations to Heap, Arrays, Pointer Arithmetic, and Unions
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 94/178
An Example of Flow Insensitive May Points-to Analysis
n1
x = &yy = &zz = &u
n1
n2 ∗z = y n2 n3 z = y n3
n4
y = &xuse uuse x
n4
Andersen’s Points-to Graph
x y z u
Steensgaard’s Points-to Graph
x y
z u
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 95/178
An Example of Flow Sensitive May Points-to Analysis
n1
x = &yy = &zz = &u
n1For simplicity,we ignore theBI with “?”
n2 ∗z = y n2 n3 z = y n3
n4
y = &xuse uuse x
n4
∅
x y z u
x y z u x y z u
x y z u x y z
x y z u
x y z u
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 96/178
Context Sensitivity in Interprocedural Analysis
Startr
Endr
Starts
a = &b
Ends
Ci
Ri
ci
Startt
c = &d
Endt
Cj
Rj
cj
fr
a b
a b
c d
c d
a b××
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 96/178
Context Sensitivity in Interprocedural Analysis
Startr
Endr
Starts
a = &b
Ends
Ci
Ri
ci
Startt
c = &d
Endt
Cj
Rj
cj
fr
a b
a b
c d
c d
c d× ×
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 96/178
Context Sensitivity in Interprocedural Analysis
Startr
Endr
Starts
a = &b
Ends
Ci
Ri
ci
Startt
c = &d
Endt
Cj
Rj
cj
fr
a b
a b
c d
c d
a b c d
We will revisit this concept andstudy it in details in the fourthmodule (interprocedural dataflow analysis) of the course
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 97/178
Context Sensitivity in the Presence of Recursion
Starts
Ci
Ri
Ends
call (c)
return (r)
stopcalling (s)
• Paths from Starts to Ends shouldconstitute a context free language cnsrn
• Many interprocedural analyses treatcycle of recursion as an SCC andapproximate paths by a regular languagec∗sr∗
• We do not know any practical points-toanalysis that is fully context sensitive
Most context sensitive approaches
◮ either do not consider recursion, or◮ do not consider recursive pointer
manipulation (e.g. “p = p → n”),or
◮ are context insensitive in recursion
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 97/178
Context Sensitivity in the Presence of Recursion
Starts
Ci
Ri
Ends
call (c)
return (r)
stopcalling (s)
• Paths from Starts to Ends shouldconstitute a context free language cnsrn
• Many interprocedural analyses treatcycle of recursion as an SCC andapproximate paths by a regular languagec∗sr∗
• We do not know any practical points-toanalysis that is fully context sensitive
Most context sensitive approaches
◮ either do not consider recursion, or◮ do not consider recursive pointer
manipulation (e.g. “p = p → n”),or
◮ are context insensitive in recursion
We will revisit this concept andstudy it in details in the fourthmodule (interprocedural dataflow analysis) of the course
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 98/178
Pointer Analysis: An Engineer’s Landscape
Flow
Sensitivity
Increases
Context SensitivityIncreases
FI=
FI⊆
FISSA
FSNoKill
FS
CI CIObjSens CSRecIns CS
Over Crowed Area
StillVacant
Data Structures: BDDs, probabilisticMethods: parallel, on demand, randomized
Refinement: Levelwise, bootstrapping
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 98/178
Pointer Analysis: An Engineer’s Landscape
Flow
Sensitivity
Increases
Context SensitivityIncreases
FI=
FI⊆
FISSA
FSNoKill
FS
CI CIObjSens CSRecIns CS
Over Crowed Area
StillVacant
Data Structures: BDDs, probabilisticMethods: parallel, on demand, randomized
Refinement: Levelwise, bootstrapping
That’s thecorner we are trying to
occupy :-)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 99/178
An Outline of Pointer Analysis Coverage
• The larger perspective
• Comparing Points-to and Alias information
• Flow Insensitive Points-to Analysis
• Flow Sensitive Points-to Analysis
• Pointer Analyses: An Engineer’s Landscape
• Liveness Based Points-to Analysis Next Topic
• Generalizations to Heap, Arrays, Pointer Arithmetic, and Unions
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 100/178
Our Motivating Example for FCPA
n1
x = &yy = &zz = &u
n1For simplicity,we ignore theBI with “?”
n2 ∗z = y n2 n3 z = y n3
n4
y = &xuse uuse x
n4
∅
x y z u
x y z u x y z u
x y z u x y z
x y z u
x y z u
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 101/178
yIs All This Information Useful?y
n1
x = &yy = &zz = &u
n1For simplicity,we ignore theBI with “?”
n2 ∗z = y n2 n3 z = y n3
n4
y = &xuse uuse x
n4
∅
x y z u
x y z u
x y z u
x y z u
x y z ux y z u
x y z u
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 102/178
The L and P of LFCPA
Mutual dependence of liveness and points-to information
• Define points-to information only for live pointers
• For pointer indirections, define liveness information using points-toinformation
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 103/178
The F and C of LFCPA
• Use call strings method for full flow and context sensitivity
• Use value contexts for efficient interprocedural analysis
CS 618 General Frameworks: Pointer Analyses 114/178
Points-to Information is Small and Sparse
lbm
mcf
libquantum
bzip2
sjeng
hmmer
parser
h264ref0
20
40
60
80
100
%ofPoints-toPairs
per
Basic
Block
LFCPA FCPA0 01-4 1-45-8 5-89+ 9+
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 115/178
LFCPA Observations
• Usable pointer information is very small and sparse
• Data flow propagation in real programs seems to involve only a smallsubset of all possible data flow values
• Earlier approaches reported inefficiency and non-scalability because theycomputed far more information than the actual usable information
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 116/178
LFCPA Conclusions
• Building quick approximations and compromising on precision may not benecessary for efficiency
• Building clean abstractions to separate the necessary information fromredundant information is much more significant
Our experience of points-to analysis shows that
◮ Use of liveness reduced the pointer information . . .◮ which reduced the number of contexts required . . .◮ which reduced the liveness and pointer information . . .
• Approximations should come after building abstractions rather than before
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 117/178
LFCPA Lessons: The Larger Perspective
exhaustivecomputation
computationrestrictedto usableinformation
incrementalcomputation
demand drivencomputation
MaximumComputation
MinimumComputation
EarlyComputation
LateComputation
What should be computed?
When should it be computed?
Do not compute what you don’t need!
Who defines what is needed? Client
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 117/178
LFCPA Lessons: The Larger Perspective
exhaustivecomputation
computationrestrictedto usableinformation
incrementalcomputation
demand drivencomputation
MaximumComputation
MinimumComputation
EarlyComputation
LateComputation
What should be computed?
When should it be computed?
Do not compute what you don’t need!
Who defines what is needed? Algorithm, Data Structure
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 117/178
LFCPA Lessons: The Larger Perspective
exhaustivecomputation
computationrestrictedto usableinformation
incrementalcomputation
demand drivencomputation
MaximumComputation
MinimumComputation
EarlyComputation
LateComputation
What should be computed?
When should it be computed?
Do not compute what you don’t need!
Who defines what is needed? Algorithm, Data Structure
Avoid computing some values because
• they have been computed before, or
• they can just be “adjusted”, or
• they are equivalent to some other values
E.g. Value based termination of call strings,Work list based methods, BDDs
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 117/178
LFCPA Lessons: The Larger Perspective
exhaustivecomputation
computationrestrictedto usableinformation
incrementalcomputation
demand drivencomputation
MaximumComputation
MinimumComputation
EarlyComputation
LateComputation
What should be computed?
When should it be computed?
Do not compute what you don’t need!
Who defines what is needed? Definition of Analysis
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 117/178
LFCPA Lessons: The Larger Perspective
exhaustivecomputation
computationrestrictedto usableinformation
incrementalcomputation
demand drivencomputation
MaximumComputation
MinimumComputation
EarlyComputation
LateComputation
What should be computed?
When should it be computed?
Do not compute what you don’t need!
Who defines what is needed? No One!
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 117/178
LFCPA Lessons: The Larger Perspective
exhaustivecomputation
computationrestrictedto usableinformation
incrementalcomputation
demand drivencomputation
MaximumComputation
MinimumComputation
EarlyComputation
LateComputation
What should be computed?
When should it be computed?
Do not compute what you don’t need!
Who defines what is needed?These seem orthogonaland may be used together
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 118/178
Tutorial Problems for FCPA and LFCPA
• Perform may points-to analysis by deriving must info using “?” in BI
• Perform liveness based points-to analysis
1 b = &a 1
2 c = b 2
3 a = &b 3 4 a = &c 4
5 a = ∗a 5
6 ∗b = c 6
7 use c 7
y = &z1
z=&w2
x=&u3 x=&v 4
t = ∗y5
∗x = t6
use u7
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 119/178
An Outline of Pointer Analysis Coverage
• The larger perspective
• Comparing Points-to and Alias information
• Flow Insensitive Points-to Analysis
• Flow Sensitive Points-to Analysis
• Pointer Analyses: An Engineer’s Landscape
• Liveness Based Points-to Analysis
• Generalizations to Heap, Arrays, Pointer Arithmetic, and Unions
Next Topic
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 120/178
Variables Var, Pointers P,Allocation Sites H ,Fields F , pF , npF ,Offsets C
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 122/178
Generalization for Heap and Structures
• Grammar.
α := malloc | &β | ββ := x | β.f | β → f | ∗β
where α is a pointer expression, x is a variable, and f is a field
• Memory model: Named memory locations. No numeric addresses
S = P ∪ H ∪ Sp (source locations)T = Var ∪ H ∪ Sm ∪ {?} (target locations)Sp = R×npF∗× pF (pointers in structures)Sm = R×npF∗×(pF∪npF ) (other locations in structures)
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 123/178
Named Locations for Pointer Expressions
typedef struct B
{ ...
struct B *f;
} sB;
typedef struct A
{ ...
struct B g;
} sA;
sA *a;
sB *x, *y, b;
1. a = (sA*) malloc
(sizeof(sA));
2. y = &a->g;
3. b.f = y;
4. x = &b;
5. y.f = &x;
6. return x->f->f;
x
a
y
fb
g
f
o1
PointerExpression
l-value r-value
x x bx → f b.f o1.g .fx → f → f o1.g .f b
Sep 2017 IIT Bombay
CS 618 General Frameworks: Pointer Analyses 124/178
CS 618 General Frameworks: Heap Reference Analysis 157/178
Data Flow Equations for Explicit Liveness Analysis: AccessGraphs Version
Inn =(Outn ⊖ Killn(Outn)
)⊎ Genn(Outn)
Outn =
BI n is End⊎s∈succ(n)
Ins otherwise
• Inn, Outn, and Genn are access graphs
• Killn is a set of access paths
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 158/178
Flow Functions for Explicit Liveness Analysis: Access PathsVersion
Let A denote May Aliases at the exit of node n
Statement n Genn(X ) Killn(X )
x = y {y σ | x σ ∈ X} x ∗x = y .f {y f σ | x σ ∈ X} x ∗
x .f = y{y σ
∣∣∣ z f σ ∈ X , z ∈ A(x)} ⋃
z∈Must(A)(x)
z f ∗
x = new ∅ x ∗x = null ∅ x ∗other ∅ ∅
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 158/178
Flow Functions for Explicit Liveness Analysis: Access PathsVersion
Let A denote May Aliases at the exit of node n
Statement n Genn(X ) Killn(X )
x = y {y σ | x σ ∈ X} x ∗x = y .f {y f σ | x σ ∈ X} x ∗
x .f = y{y σ
∣∣∣ z f σ ∈ X , z ∈ A(x)} ⋃
z∈Must(A)(x)
z f ∗
x = new ∅ x ∗x = null ∅ x ∗other ∅ ∅
May link aliasing for soundness Must link aliasing for precision
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 159/178
Flow Functions for Explicit Liveness Analysis: Access GraphsVersion
• A denotes May Aliases at the exit of node n
• mkGraph(ρ) creates an access graph for access path ρ
Statement n Genn(X ) Killn(X )
x = y mkGraph(y)#(X/x) {x}x = y .f mkGraph(y f )#(X/x) {x}
x .f = y mkGraph(y)#
( ⋃z∈A(x)
(X/(z f ))
){z f | z ∈ Must(A)(x)}
x = new ∅ {x}x = null ∅ {x}other ∅ ∅
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 160/178
Liveness Analysis of Example Program: Ist Iteration
1 w = x 1
x l4 l6
x l4 l6
2 while (x.data < max) 2
x l4 l6
3 x = x.rptr 3
EG
EG
4 y = x.lptr 4
x l4 l6
5 z = New class of z 5
x y l6
6 y = y.lptr 6
x y l6 z
7 z.sum = x.data + y.data 7
x y z
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 161/178
Liveness Analysis of Example Program: 2nd Iteration
1 w = x 1
x r3 l4 l6
x r3 l4 l6
2 while (x.data < max) 2
x r3 l4 l6
3 x = x.rptr 3
x r3 l4 l6
x l4 l6
4 y = x.lptr 4
x l4 l6
5 z = New class of z 5
x y l6
6 y = y.lptr 6
x y l6 z
7 z.sum = x.data + y.data 7
x y z
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 162/178
Liveness Analysis of Example Program: 3rd Iteration
1 w = x 1
x r3 l4 l6
x r3 l4 l6
2 while (x.data < max) 2
x r3 l4 l6
3 x = x.rptr 3
x r3 l4 l6
x r3 l4 l6
4 y = x.lptr 4
x l4 l6
5 z = New class of z 5
x y l6
6 y = y.lptr 6
x y l6 z
7 z.sum = x.data + y.data 7
x y z
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 163/178
Liveness Analysis of Example Program: 4th Iteration
1 w = x 1
x r3 l4 l6
x r3 l4 l6
2 while (x.data < max) 2
x r3 l4 l6
3 x = x.rptr 3
x r3 l4 l6
x r3 l4 l6
4 y = x.lptr 4
x l4 l6
5 z = New class of z 5
x y l6
6 y = y.lptr 6
x y l6 z
7 z.sum = x.data + y.data 7
x y z
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 164/178
Tutorial Problem for Explicit Liveness (1)
Construct access graphs at the entry of block 1 for the following programs
A B C
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
D E F
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
4 y = x .r 4
x = x .n1
x = x .n2 x = x .l 3
Use x .r .d4
x = x .n1
x = x .n2 x = x .l 3
y = x .r4
5 Use x .r .d 5
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 164/178
Tutorial Problem for Explicit Liveness (1)
Construct access graphs at the entry of block 1 for the following programs
A B C
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
D E F
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
4 y = x .r 4
x = x .n1
x = x .n2 x = x .l 3
Use x .r .d4
x = x .n1
x = x .n2 x = x .l 3
y = x .r4
5 Use x .r .d 5
Why are the accessgraphs for programsB and D identical?
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 164/178
Tutorial Problem for Explicit Liveness (1)
Construct access graphs at the entry of block 1 for the following programs
A B C
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
D E F
1 x = x .n 1
2 x = x .n 1
3 Use x .r .d 1
4 y = x .r 4
x = x .n1
x = x .n2 x = x .l 3
Use x .r .d4
x = x .n1
x = x .n2 x = x .l 3
y = x .r4
5 Use x .r .d 5
The final magic!!
Rotate each pictureanti-clockwise by 90o andcompare it with its access graph
The structure of access graph ofvariable x is identical to thecontrol flow structure betweenpointer assignments of x
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 165/178
Tutorial Problem for Explicit Liveness (2)
• Unfortunately the student who constructed these access graphs forgot toattach statement numbers as subscripts to node labels and has misplacedthe programs which gave rise to these graphs
• Please help her by constructing CFGs for which these access graphsrepresent explicit liveness at some program point in the CFGs
x l l r
r
yl l
r
r l
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 166/178
Tutorial Problem for Explicit Liveness (3)
• Compute explicit liveness for the program.
• Are the following access paths live at node 1?Show the corresponding execution sequenceof statements
P1 : y m lP2 : y l n mP3 : y l n lP4 : y n l n
x = z1
x=y .l2
x .n=y .m3
y=x .n4
use x .d5
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 167/178
Which Access Paths Can be Nullified?
• Consider extensions of accessible paths for nullification.
Let ρ be accessible at p (i.e. available or anticipable)
for each reference field f of the object pointed to by ρ
if ρ f is not live at p then
Insert ρ f = null at p subject to profitability
• For simple access paths, ρ is empty and f is the root variable name.
Can be safelydereferenced
Consider linkaliases at p
Cannot be hoisted and isnot redefined at p
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 168/178
Availability and Anticipability Analyses
• ρ is available at program point p if the target of each prefix of ρ isguaranteed to be created along every control flow path reaching p.
• ρ is anticipable at program point p if the target of each prefix of ρ isguaranteed to be dereferenced along every control flow path starting at p.
• Finiteness.
◮ An anticipable (available) access path must be anticipable (available)along every paths. Thus unbounded paths arising out of loops cannotbe anticipable (available).
◮ Due to “every control flow path nature”, computation of anticipableand available access paths uses ∩ as the confluence. Thus the setsare bounded.
⇒ No need of access graphs.
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 169/178
Availability Analysis of Example Program
1 w = x 1
∅
∅
2 while (x.data < max) 2
{x}
3 x = x.rptr 3
{x}
∅4 y = x.lptr 4
{x}
5 z = New class of z 5
{x}
6 y = y.lptr 6
{x , z}
7 z.sum = x.data + y.data 7
{x , z}
{x , y , z}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 170/178
Anticipability Analysis of Example Program
1 w = x 1
{x}
{x}
2 while (x.data < max) 2
{x}
3 x = x.rptr 3
{x , x rptr }
{x}4 y = x.lptr 4
{x , x lptr, x lptr lptr }
5 z = New class of z 5
{x , y , y lptr }
6 y = y.lptr 6
{x , y , y lptr, z}
7 z.sum = x.data + y.data 7
{x , y , z}
∅Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 171/178
Live and Accessible Paths
1 w = x 1
x r3 l4 l6
x r3 l4 l6 {x}
{x}
2 while (x.data < max) 2
x r3 l4 l6
{x}
3 x = x.rptr 3
x r3 l4 l6
x r3 l4 l6
{x , x rptr }
{x}
4 y = x.lptr 4
x l4 l6
{x , x lptr, x lptr lptr }
5 z = New class of z 5
x y l6 {x , y , y lptr }
6 y = y.lptr 6
x y l6 z{x , y , y lptr, z}
7 z.sum = x.data + y.data 7
x y z {x , y , z}
{x , y , z}
Sep 2017 IIT Bombay
CS 618 General Frameworks: Heap Reference Analysis 172/178
Creating null Assignments from Live and Accessible Paths