Precise Program Analysis with Data Structures Collaborators: George Necula, Xavier Rival (INRI Bor-Yuh Evan Chang University of California, Berkeley February-April 2008
Feb 22, 2016
Precise Program Analysis with Data Structures
Collaborators: George Necula, Xavier Rival (INRIA)
Bor-Yuh Evan ChangUniversity of California, Berkeley
February-April 2008
Precise Program Analysis with Data Structures
by Designing with the User in Mind
Collaborators: George Necula, Xavier Rival (INRIA)
Bor-Yuh Evan ChangUniversity of California, Berkeley
February-April 2008
3Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Software errors cost a lot
~$60 billion annually (~0.5% of US GDP)– 2002 National Institute of Standards and
Technology report
total annual revenue of>10x annual budget of >
4Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
But there’s hope in program analysis
Microsoft uses and distributesthe Static Driver Verifier
Airbus appliesthe Astrée Static Analyzer
Companies, such as Coverity and Fortify, market static source code analysis tools
5Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Because program analysis caneliminate entire classes of bugsFor example,
– Reading from a closed file:– Reacquiring a locked lock:
How?– Systematically examine the program– Simulate running program on “all inputs”– “Automated code review”
read( );acquire( );
6Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
… code …// x now points to an unlocked lock
acquire(x);… code …
analysis state
Program analysis by example:Checking for double acquiresSimulate running program on “all inputs”
x
acquire(x);… code …
7Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
… code …// x now points to an unlocked lock in a linked list
acquire(x);… code …
ideal analysis state
Program analysis by example:Checking for double acquiresSimulate running program on “all inputs”
x xx
or or or …
undecidability
8Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
… code …// x now points to an unlocked lock in a linked list
acquire(x);… code …
ideal analysis state analysis state
Must abstract
x xx
or or or … ?
xFor decidability, must abstract—“model all inputs” (e.g., merge objects)
Abstraction too coarse or not precise enough (e.g., lost x is always unlocked)
mislabels good code as buggy
9Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
To address the precision challengeTraditional program analysis mentality:
“Why can’t developers write more specifications for our analysis? Then, we could verify so much more.”
“Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.”
My approach:“Can we design program analyses around the
user? Developers write testing code. Can we adapt the analysis to use those as specifications?”
10Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of overviewChallenge in analysis: Finding a good abstraction
precise enough but not more than necessaryPowerful, generic abstractions
expensive, hard to use and understandBuilt-in, default abstractions
often not precise enough (e.g., data structures)
My approach: Must involve the user in abstraction
without expecting the user to be a program analysis expert
11Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Overview of contributions
Extensible Inductive Shape Analysis [POPL’08,SAS’07]
Precise inference of data structure propertiesAble to check, for instance, the locking
exampleTargeted to software developers
Uses data structure checking code for guidance Turns testing code into a specification for
static analysisEfficient
~10-100x speed-up over generic approaches Builds abstraction out of developer-supplied
checking code
Extensible InductiveShape Analysis
Precise inference of data structure properties
Developer-oriented approach
[POPL’08, SAS’07]
…
Part 1
13Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Shape analysis is a fundamental analysisData structures are at the core of
– Traditional languages (C, C++, Java)– Emerging web scripting languages
Improves verifiers that try to– Eliminate resource usage bugs
(locks, file handles)– Eliminate memory errors (leaks, dangling
pointers)– Eliminate concurrency errors (data races)– Validate developer assertions
Enables program transformations– Compile-time garbage collection– Data structure refactorings
…
14Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Shape analysis by example:Removing duplicates
// l is a sorted doubly-linked list
for each node cur in list l {remove cur if duplicate;
}assert l is sorted,
doubly-linked with no duplicates;
Example/Testing Code Review/Static Analysis
“no duplicates”l
“sorted dl list”l
program-specific
l 2 2 44
l 2 44
cur
l 2 4
“sorted dl list”l “segment withno duplicates”
cur
intermediate state more
complicated
15Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Shape analysis is not yet practicalChoosing the heap abstraction difficult for precision
Parametric in high-level, developer-oriented predicates+ Extensible+ Targeted to developers
Xisa
Built-in high-level predicates
- Hard to extend+ No additional user effort (if
precise enough)
Parametric in low-level, analyzer-oriented predicates+ Very general and expressive- Hard for non-expert
89
Traditional approaches:
My approach:
Space Invader [Distefano et
al.]
TVLA[Sagiv et al.]
16Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Key insightfor being developer-friendly and efficientUtilize “run-time checking code” as specification for static analysis.
assert(sorted_dll(l,…));for each node cur in list l {
remove cur if duplicate;
}assert(sorted_dll_nodup(l,…));
l
l
cur
l
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)checker
Contribution: Automatically generalize checkers for complicated intermediate states
Contribution: Build the abstraction for analysis out of developer-specified checking code
• p specifies where prev should point
17Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Our framework is …
• Extensible and targeted for developers– Parametric in developer-supplied checkers
• Precise yet compact abstraction for efficiency– Data structure-specific based on properties of
interest to the developer
An automated shape analysis with a precise memory abstraction based around invariant checkers.
shape analyzer
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
18Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Splitting of summaries (materialization)
To reflect updates precisely (strong updates)
And summarizing for termination (widening)
Shape analysis is an abstract interpretation on abstract memory descriptions with …
cur
l
cur
l
cur
l
cur
l
cur
l
cur
l
19
Outline
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
type“pre-analysis”
on checkerdefinitions
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Learn information about the checker to use it as an abstraction 12
3Compare and contrast manual code review and our automated shape analysis
20Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Overview: Split summariesto interpret updates precisely
l
cur
l
cur
Want abstract update to be “exact”, that is, to update one “concrete memory cell”.The example at a high-level: iterate using cur changing the doubly-linked list from purple to red.
l
cur
split at cur
update cur purple to red
l
cur
Challenge:How does the analysis “split” summaries and know where to “split”?
21Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
“Split forward”by unfolding inductive definition
Çdll(h, p) =
if (h = null) thentrue
elseh!prev = p and dll(h!next, h)
l
curget: cur!next
l
cur
null
p dll(cur, p)
l
cur
p dll(n, cur)n
Analysis doesn’t forget the empty case
22
“Split backward” also possible and necessary
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
l
cur
p dll(n, cur)n
for each node cur in list l {
remove cur if duplicate;}assert l is
sorted, doubly-linked with no duplicates;
“dll segment”
l
cur
p0 dll(n, cur)n“dll segment”
cur!prev!next= cur!next;
l
cur
dll(n, cur)nnull
get: cur!prev!next
Ç
Technical Details:How does the analysis do this unfolding?Why is this unfolding allowed?(Key: Segments are also inductively defined)
[POPL’08]How does the analysis know to do this
unfolding?
23Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Outline
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
type“pre-analysis”
on checkerdefinitions
Contribution: Turns testing code into specification for static analysis
12
3
How do we decide where to unfold?
Derives additional information to guide unfolding
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
24Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
memory cell (points-to: °!next = ±)
Abstract memory as graphs
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
l® dll(null) dll(¯)
cur° dll(°)
¯prev
next ±
Make endpoints and segments explicit, yet high-levell dll(±, °)±“dll segment”
cur
°
®
segment summary
checker summary (inductive pred)
memory address (value)
Contribution: Generalization of checker(Intuitively, dll(®,null) up to dll(°,¯).)
Some number of memory cells (thin edges)
Which summary (thick edge), in what direction, and how far do we unfold to get the edge ¯!next (cur!prev!next)?
¯
next
25Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
01
-1-2
Types for deciding where to unfold
®dll(null) dll(¯) dll(¯)
°
dll(®,null)dll(¯,®)dll(°,¯)dll(±,°)
dll(null,±)
Checker “Run” (call tree/derivation)
Instance
Summary
° ±® ¯ nullnull
dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
h:{nexth0i,prevh0i }p:{nexth-1i,prevh-1i }
If it exists, where is:
°!next ?¯!next ?
Checker Definition
0-1
Says:For h!next/h!prev,
unfold from hFor p!next/p!prev,
unfold before h
26Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Types make the analysis robust with respect to how checkers are written
¯dll(®) dll(¯) dll(¯)
°
Instance
Summary dll(h, p) =if (h = null) then
trueelse
h!prev = p and dll(h!next, h)
h:{nexth0i,prevh0i }p:{nexth-1i,prevh-1i }
°¯ null®
¯ ° nullInstance
¯dll0 dll0 dll0
°Summary
dll0(h) =if (h!next = null)
thentrue
elseh!next!prev = h
and dll0(h!next)
Alternative doubly-linked list checker h:{nexth0i,prevh-1i }
°!prev ? -1
Doubly-linked list checker (as before)
Different types for different unfolding
27Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of checker parameter typesTell where to unfold for which fields
Make analysis robust with respect to how checkers are written
Learn where in summaries unfolding won’t help
Can be inferred automatically with a fixed-point computation on the checker definitions
28Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of interpreting updates
Splitting of summaries needed for precision
Unfolding checkers is a natural way to do splitting
When checker traversal matches code traversal
Checker parameter typesEnable, for example, “back pointer” traversal without blindly guessing where to unfold
29Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Outline
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
type“pre-analysis”
on checkerdefinitions
12
3dll(h, p) =
if (h = null) thentrue
elseh!prev = prev and dll(h!next, h)
checkers
30Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summarizeby folding into inductive predicateslast = l;cur = l!next;while (cur != null) {
// … cur, last …if (…) last =
cur;cur = cur! next;
}
listl, lastnext cur
listl next next curlast
listl next next next curlast
summarize
listlast listnext curlistl
Challenge: Precision (e.g., last, cur separated by at least one step)
Previous approaches guess where to fold for each graph.Contribution: Determine where by comparing graphs across history
31Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary:Given checkers, everything is automatic
shape analyzer
abstract interpretation
splitting andinterpreting update
summarizing
type“pre-analysis”
on checkerdefinitions
dll(h, p) =if (h = null) then
trueelse
h!prev = prev and dll(h!next, h)
checkers
32Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Results: Performance
Benchmark
Max. Num.
Graphs at a
Program Pt
Analysis
Time (ms)
singly-linked list reverse 1 0.6doubly-linked list reverse 1 1.4doubly-linked list copy 2 5.3doubly-linked list remove 5 6.5doubly-linked list remove and back 5 6.8search tree with parent insert 5 8.3search tree with parent insert and
back5 47.0
two-level skip list rebalance 6 87.0Linux scull driver (894 loc)
(char arrays ignored, functions inlined)
4 9710.0
Times negligible for data structure operations (often in sec or 1/10 sec)Expressiveness:
Different data structures
Verified shape invariant as given by the checker is preserved across the operation.
TVLA: 850 ms
TVLA: 290 ms
Space Invaderonly analyzes lists (built-in)
33Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Demo: Doubly-linked list reversal
http://xisa.cs.berkeley.edu
Body of loop over the elements:Swaps the next and prev fields of curr.
Already reversed segmentNode whose next and prev fields were swapped Not yet reversed list
34Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Experience with the toolCheckers are easy to write and try out
– Enlightening (e.g., red-black tree checker in 6 lines)
– Harder to “reverse engineer” for someone else’s code
– Default checkers based on types useful
Future expressiveness and usability improvements– Pointer arithmetic and arrays– More generic checkers:
polymorphic “element kind unspecified”
higher-orderparameterized by other predicates
Future evaluation: user study
35Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Short-term future work:Exploiting common specification frameworkScenario: Code instrumented with lots of
checker calls (perhaps automatically with object invariants)assert( mychecker(x) );
// … operation on x …assert( mychecker(x) );
Can we prove parts statically?Static Analysis View: Hybrid checkingTesting View: Incrementalize invariant checking
Example: Insert in a sorted listl v wu
Preservation of sortedness shown staticallyEmit run-time check for new element: u · v · w
• Very slow to execute• Hard to prove statically (in
general)
36Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary ofExtensible Inductive Shape AnalysisKey Insight: Checkers as specifications
Developer View: Global, Expressed in a familiar style
Analysis View: Capture developer intent, Not arbitrary inductive definitions
Constructing the program analysisIntermediate states: Generalized segment predicates
Splitting: Checker parameter types with levels
Summarizing: History-guided approachnext listlist list listlist
® ¯c(°) c0(°0)
h : {nexth0i, prevh0i}p : {nexth-1i, prevh-1i}
37Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Are there other kinds of program analysis users?
38Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Two kinds of users of program analysis
software
software
developer end-user
Wants:Precise program analysis for development tools
Wants:Program analysis to certify software is okAnalysis of Low-Level
Code Using Decompilers
[SAS’06, TLDI’05]
Extensible Inductive Shape Analysis
[POPL’08, SAS’07]
1 2
Analysis ofLow-Level Code Using
Cooperating Decompilers
[SAS’06, TLDI’05]
Part 2
40Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
End-users want low-level code analysis
• Want analyses to check code to be executed is ok– E.g., won’t crash, good wrt
Static Driver Verifier• Do not know any details
about the program– Analysis must be fully
automatic• But can demand
additional information from the developer– To make analysis
automatic
source code
But most program analyses operate at the source-level!
executable
code
end-user
for low-level code
analyzer
41
Analyzers for low-level code are more difficult and tedious to build
Porting source-level analyses is error prone– one statement becomes many
instructions– dependencies between
instructions must be carefully tracked
Key Insight:Low-level complexity– deals with compilation idioms– mostly independent of the
analysis– can be captured with
intermediate languagesBor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
executable
code
end-user
source code
for low-level code
analyzer
for low-level code
analyzer
42
Decompile code rather than port analysis
Framework of small, cooperating decompilers that gradually lift the level of the program
Decompilation for program analysis– Need not get to original, nor
be human understandable– Only concern is safety,
not performance• Unlike, e.g., Java VM platform (JIT
compiler)– Can use additional meta-data
(e.g., source-level types)Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
executable
code
end-user
source code
for low-level code
analyzer
43Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Summary of resultsFlexibility and usability
– 3 compilers (gcc/C, gcj/Java, coolc/Cool)– 2 architectures (x86, MIPS)– With 6 decompiler modules– Basic Java type-checking for gcj output implemented in
3-4 hours, 500 lines of code
Benefits of modularity– decompiler-based re-implementation of a low-level analysis
uncovered 8 bugs in the original implementation (heavily used, deployed in the classroom)
Applicability of existing source-level tools– applied C code tools, BLAST and Cqual, on decompiled
benchmarks (size: ~10,000 lines of C)
Future Research
45Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Long-term and outreachTheme: Overcome decidability issues in
program analysis by tailoring it to the user
• “Programs” are no longer only written to be executed on computers– E.g., computational models of biological
pathways in systems biology• Need new “program” analysis tools
– Validate models(e.g., pathway model produces only expected products)
– Reason about models• How do these users work?
46Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Conclusion
Extensible Inductive Shape Analysisprecision demanding program analysis improved by novel user interactionDeveloper: Gets results
corresponding to intuitionAnalysis: Focused on what’s important
to the developerCooperating Decompilers
adapt program analyses to code end-users run
Practical precise tools for better software!
What can inductiveshape analysis do for you?
http://xisa.cs.berkeley.edu
Bonus Slides:Extensible Inductive
Shape Analysis
49Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Intuition: Checkers and types
global specification (i.e., per data structure)
more precise (typically)holds only in “steady-
state”need generalization
global specification (i.e., per data structure)
less precise (typically)holds alwaysdoesn’t need
generalization
l.sorteddll(prev, min) =if (l = null) then
trueelse
l!prev = prev and min · l!val and
l!next.sorteddll(l,l!val)
struct Dll {int val;Dll* prev;Dll* next;
};
x . sorteddll(…) x : Dll
50Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Segments as partial checkers
®.dll(null)¯.dll(®)°.dll(¯)±.dll(°)
null.dll(±)
Checker “Run”
Instance
Summary®
dll(¯)°
c0(¯,°0)
c(®,°)
… …
… ……
® ¯c(°) c0(°0)
ii
i = 0
i = 0
ii 00
c = c0
® = ¯° = °0
® = °¯ = null
null next° next ±prevprev
null® ¯
51Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
To unfold backward, split the segment and then unfold forward
cur = l!next;while (cur != null){
if ( cur!prev!val== cur!val )
{cur = cur!prev;
remove_after(cur);}cur = cur!next;
}
:= 9´.¼dll(½)
Ǽ = nullemp
¼ null¼ next dll(¼)´½ prev
materialize: cur!prev!next
l® dll(null) dll(°)
cur
°
±
prev
dll(±)next "
l®
cur°
¯
±prev dll(±)next "
dll(±)next "
Ç
l, cur
°
±
prev
® = ±° = null
° 0dll(¯) dll(¯) 1
=unfol
d
52Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Backward unfolding by forward unfolding
¯dll(null) dll(°)i+1
° prevsplit (lemma)
dll(null) dll(e)i 1± ¯
° prevdll(°)dll(e)
i
unfold forward at ±
dll(null) dll(e)i 0±
° prevdll(°)
e prev
¯´next dll(±)
nextdll(null) dll(e) ±preve prev
¯
reduce ´ = ¯, ± = °
53Chang, Rival, Necula - Shape Analysis with Structural Invariant Checkers
History-guided folding
listnext
listnext next
listnextlist
l, last
last
cur
cur
l
l
last cur
l,
list ?
v
?list
Yes
last = l;cur = l!next;while (cur != null) {
if (…) last = cur;
cur = cur! next;}
• Match edges to identify where to fold
• Apply local folding rules
nextl last
l last
l, last
Bonus Slides:Analysis of Low-Level
Code
55Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Porting source-level analyses is error prone
class C extends P {void m() { … }
}
P p = new P();P c = … ? new C() :
p;…c.m();
rc := m[rsp]if (rc = 0) Lexc
r1 := m[rc]r1 := m[r1+28]rsp := rsp - 4m[rsp] := m[rsp+4] -m[rsp] := rc
icall [r1]
Analyzers for low-level code are more difficult and tedious to buildExample: Java Type Analysis
hrc : P, … ihrc : nonnull P, …i
hr1 : disp(P), …ihr1 : meth(P,28), …i
hm[rsp] : nonnull P, …i
Type analysis intermixed with low-level reasoning(e.g., args on stack)
56Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
Porting source-level analyses is error prone
class C extends P {void m() { … }
}
P p = new P();P c = … ? new C() :
p;…c.m();
rc := m[rsp]if (rc = 0) Lexc
r1 := m[rc]r1 := m[r1+28]rsp := rsp - 4m[rsp] := rp
icall [r1]
Analyzers for low-level code are more difficult and tedious to buildExample: Java Type Analysis
hrc : P, … ihrc : nonnull P, …i
hr1 : disp(P), …ihr1 : meth(P,28), …i
hm[rsp] : nonnull P, …i
unsound
Dependencies must be carefully tracked
57Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures
f: … rc := m[rsp+12] if (rc = 0) Lexc
r1 := m[rc] r1 := m[r1+28] rsp := rsp - 4 m[rsp] := m[rsp+16] icall [r1]
…
f(tc):
rc := tc
if (rc = 0) Lexc
r1 := m[rc] r1 := m[r1+28]
t1 := tc
icall [r1](t1)
f(c):
if (c = 0) Lexc
icall [m[m[c]+28]] (c)
f(obj c):
if (c = 0) Lexc
invokevirtual [c, 28] ()
f(C c):
if (c = 0) Lexc
c.m()
Framework of small, reusable cooperating decompiler modulesstatic void f(C c)
{ c.m(); }
Locals SymEval OO JavaTypes
Local Variables
Symbolic Evaluation
Dynamic Dispatch
youranalyzer