-
Shape Analysis with Structural Invariant Checkers
Bor-Yuh Evan ChangXavier RivalGeorge Necula
Electrical Engineering and Computer SciencesUniversity of
California at Berkeley
Technical Report No. UCB/EECS-2007-80
http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-80.html
June 4, 2007
-
Copyright © 2007, by the author(s).All rights reserved.
Permission to make digital or hard copies of all or part of this
work forpersonal or classroom use is granted without fee provided
that copies arenot made or distributed for profit or commercial
advantage and that copiesbear this notice and the full citation on
the first page. To copy otherwise, torepublish, to post on servers
or to redistribute to lists, requires prior specificpermission.
Acknowledgement
This research was supported in part by the National Science
Foundationunder grants CCR-0326577, CCF-0524784, and CNS-0509544;
and anNSF Graduate Research Fellowship. Any opinions, findings,
conclusions orrecommendations expressed in this material are those
of the authors anddo not necessarily reflect the views of the
National Science Foundation.
-
Shape Analysis with Structural InvariantCheckers?
Bor-Yuh Evan Chang1 , Xavier Rival1,2 , and George C.
Necula1
1 University of California, Berkeley, California, USA2 École
Normale Supérieure, Paris,
France{bec,rival,necula}@cs.berkeley.edu
Abstract. Developer-supplied data structure specifications are
impor-tant to shape analyses, as they tell the analysis what
information shouldbe tracked in order to obtain the desired shape
invariants. We observethat data structure checking code (e.g., used
in testing or dynamic anal-ysis) provides shape information that
can also be used in static analysis.In this paper, we propose a
lightweight, automatic shape analysis basedon these
developer-supplied structural invariant checkers. In particular,we
set up a parametric abstract domain, which is instantiated with
suchchecker specifications to summarize memory regions using both
notionsof complete and partial checker evaluations. The analysis
then automati-cally derives a strategy for canonicalizing or
weakening shape invariants.
1 Introduction
Pointer manipulation is fundamental in almost all software
developed in impera-tive programming languages today. For this
reason, verifying properties of inter-est to the developer or
checking the pre-conditions for certain complex
programtransformations (e.g., refactorings) often requires detailed
aliasing and structuralinformation. Shape analyses are unique in
that they can provide this detailedmust-alias and shape information
that is useful for many higher-level analyses(e.g., typestate or
resource usage analyses, race detection for concurrent pro-grams).
Unfortunately, because of precision requirements, shape analyses
havebeen generally prohibitively expensive to use in practice.
The design of our shape analysis is guided by the desire to keep
the abstrac-tion close to informal developer reasoning and to
maintain a reasonable level ofinteraction with the user in order to
avoid excessive case analysis. In this pa-per, we propose a shape
analysis guided by the developer through programmer-supplied data
structure invariants. The novel aspect of our proposal is that
thesespecifications are given as checking code, that is, code that
could be used to verifyinstances dynamically. In this paper, we
make the following contributions:? This research was supported in
part by the National Science Foundation under grants
CCR-0326577, CCF-0524784, and CNS-0509544; and an NSF Graduate
ResearchFellowship. Any opinions, findings, conclusions or
recommendations expressed inthis material are those of the authors
and do not necessarily reflect the views of theNational Science
Foundation.
-
2 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
– We observe that invariant checking code can help guide a shape
analysis andprovides a familiar mechanism for the developer to
supply information tothe analysis tool. Intuitively, checkers can
be viewed as programmer-suppliedsummaries of heap regions bundled
with a usage pattern for such regions.
– We develop a shape analysis based on programmer-supplied
invariant check-ers (utilizing the framework of separation logic
[Rey02]).
– We introduce a notion of partial checker runs (using −∗ ) as
part of theabstraction in order to generalize programmer-supplied
summaries when thedata structure invariant holds only partially
(Sect. 3).
– We notice that the iteration history of the analysis can be
used to guide theweakening of shape invariants, which perhaps could
apply to other shapeanalyses. We develop an automatic widening
strategy for our abstractionbased on this observation (Sect.
4.2).
In this paper, we consider structural invariants, that is,
invariants concerning thepointer structure (e.g., acyclic list,
cyclic list, tree) but not data properties (e.g.,orderedness). In
the next section, we motivate the design of our shape analysisand
highlight the challenges through an example.
2 Overview
In Fig. 1, we present an example analysis that checks a skip
list [Pug90] rebal-ancing operation to verify that it preserves the
skip list structure. At the top,we show the structure of a
two-level skip list. In such a skip list, each node iseither level
1 or level 0. All nodes are linked together with the next field
(n),while the level 1 nodes are additionally linked with the skip
field (s). A level 0node has its s field set to null . In the
middle left, we give the C type declarationof a SkipNode and in the
middle right, we give a checking routine skip1 thatwhen viewed as C
code (assumed type safe) either diverges if there is a cyclein the
reachable nodes, returns false, or returns true when the nodes
reachablefrom the argument l are arranged in a skip list structure.
The skip0 functionis a helper function for checking a segment of
level 0 nodes. Intuitively, skip1and skip0 simply give the
inductive structure of skip lists.
In the bottom section of Fig. 1, we present an analysis of the
rebalancingroutine (rebalance). The assert at the top ensures that
skip1(l) holds (i.e.,l is a skip list), and the assert at the
bottom checks that l is again a skiplist on return. We have made
explicit these pre- and post-conditions here, butwe can imagine a
system that connects the checker to the type and verifiesthat the
structure invariants are preserved at function or module
boundaries. Inthe figure, we show the abstract memory state of the
analysis at a number ofprogram points using a graphical notation,
which for now, we can consider asinformal sketches a developer
might draw to check the code by hand. For theprogram points inside
the loop there are two memory states shown: one for thefirst
iteration (left) and one for the fixed point (right).
A programmer-defined checker can be used in static analysis by
viewing thememory addresses it would dereference during a
successful execution as describ-
-
Shape Analysis with Structural Invariant Checkers 3
level 1
level 0
sn
sn
sn
sn
sn
sn
typedef
struct SkipNode {int d;
struct SkipNode* s;
struct SkipNode* n;
}SkipNode;
bool skip1(SkipNode* l) {if (l == null) return true;
else return skip1(l->s) &&
skip0(l->n, l->s);
}bool skip0(SkipNode* l, SkipNode* e) {
if (l == e) return true;
else return l != null && l->s == null &&
skip0(l->n, e);
}
void rebalance(SkipNode* l) {SkipNode *p, *c;
assert (l != null && skip1(l));
1
�
�
�
�l
skip1
p = l; // previous level 1 node
2 c = l->n; // cursor
3 l->s = null;
4 while (c != null) {
5
�
�
�
�
0ε
l, p cn
s
skip0(ε) skip1
�
�
�
�
0α β γ δ ε
l p cskip1 n
s
skip0(-) skip0(ε) skip1
if (c should be a level 1 node) {6 p->s = c; // set the skip
pointer of the previous level 1 node
7 p = p->s;
8 c->s = null; c = c->n;
9
�
�
�
�
0ε
l p cn
s
n
s
skip0(ε) skip1
�
�
�
�
0ε
l p cskip1 n
s
skip0(-) n
s
skip0(ε) skip1
}else {
10 c->s = null; c = c->n;
11
�
�
�
�
0 0ε
l, p cn
s
n
s
skip0(ε) skip1
�
�
�
�
0 0ε
l p cskip1 n
s
skip0(-) n
s
skip0(ε) skip1
}}
12 assert (l != null && skip1(l));
}First Iteration At Fixed Point
Fig. 1. Analysis of a skip list rebalancing
-
4 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
ing a class of memory regions arranged according to particular
constraints. Webuild an abstraction around this summarization
mechanism. To name heap ob-jects, the analysis introduces symbolic
values (i.e., fresh existential variables).To distinguish them from
program variables, we use lowercase Greek letters(α, β, γ, δ, ε, π,
ρ, . . .). A graph node denotes a value (e.g., a memory
address)and, when necessary, is labeled by a symbolic value; the 0
nodes represent null .We write a program variable (e.g., l) below a
node to indicate that the value ofthat variable is that node. Each
edge corresponds to a memory region. A thinedge denotes a points-to
relationship, that is, a memory cell whose address isthe source
node and whose value is the destination node (e.g., on line 5 in
theleft graph, the edge labeled by n says that l->n points to
c). A thick edgesummarizes a memory region, i.e., some number of
points-to edges. Thick edges,or checker edges, are labeled by a
checker instantiation that describes the struc-ture of the
summarized region. There are two kinds of checker edges:
completechecker edges, which have only a source node, and partial
checker edges, whichhave both a source and a target node. Complete
checker edges indicate a memoryregion that satisfies a particular
checker (e.g., on line 1, the complete checkeredge labeled skip1
says there is a memory region from l that satisfies checkerskip1).
Partial checker edges are generalization that we introduce in our
ab-straction to describe memory states at intermediate program
points, which wediscuss further in Sect. 3. An important point is
that two distinct edges in thegraph denote disjoint memory
regions.
To reflect memory updates in the graph, we simply modify the
appropriatepoints-to edges (performing strong updates). For
example, consider the transi-tion from program point 5 to point 9
and the updates on lines 6 and 7. For theupdates on line 8, observe
that we do not have nodes for c->s or c->n in thegraph at
program point 5. However, we have that from c , an instance of
skip0holds, which can be unfolded to materialize points-to edges
for c->s and c->n(that is, conceptually unfolding one step of
its computation). The update canthen be reflected after
unfolding.
As exemplified here, we want the work performed by our shape
analysisto be close to the informal, on-paper verification that
might be done by thedeveloper. The abstractions used to summarize
memory regions is developer-guided through the checker
specifications. While it may be reasonable to buildin generic
summarization strategies for common structures, like lists and
trees(cf., [DOY06,MNCL06]), it seems unlikely such strategies will
suffice for otherstructures, like the skip lists in this example.
Traversal code for checking seemslike a useful and intuitive
specification mechanism, as such code could be usedin testing or
dynamic analysis (cf., [SRW02]).
From this example, we make some observations that guide the
design of ouranalysis and highlight the challenges. First, in our
diagrams, we have implicitlyassumed a disjointness property between
the regions described by edges (to per-form strong updates on
points-to edges). This assumption is made explicit byutilizing
separation logic to formalize these diagrams (see Sect. 3). This
choicealso imposes restrictions on the checkers. That is, all
conjunctions are separating
-
Shape Analysis with Structural Invariant Checkers 5
conjunctions; in terms of dynamic checking, a compilation of
skip1 must checkthat each address is dereferenced at most once
during the traversal. Second, aswith many data structure
operations, the rebalance routine requires a traversalusing a
cursor (e.g., c). To check properties of such operations, we are
oftenrequired to track information in detail locally around the
cursor, but we may beable to summarize the rest rather coarsely.
This summarization cannot be onlyfor the suffix (yet to be visited
by the cursor) but must also be for the prefix(already visited by
the cursor) (see Sect. 3). Third, similar to other shape anal-yses,
a central challenge is to fold the graphs sufficiently in order to
find a fixedpoint (and to be efficient) while retaining enough
precision. With arbitrary datastructure specifications, it becomes
particularly difficult. The key observation wemake is that previous
iterates are generally more abstract and can be used toguide the
folding process (see Sect. 4.2).
3 Memory Abstraction
We describe our analysis within the framework of abstract
interpretation [CC77].Our analysis state is composed of an abstract
memory state (in the form ofa shape graph) and a pure state to
track disequalities (the non-points-to con-straints). We describe
the memory state in a manner based largely on separationlogic, so
we use a notation that is borrowed from there.
memories M ::= β@f 7→ r | M1 ∗M2 | emp | α.c(β) | α1.c(β) ∗−
α2.c(β)r-values r ::= α | null | · · ·symbolic values α, β, γ, δ,
ε, π, ρ, . . .field names fchecker names c
A memory state M includes the points-to relation (β @ f 7→ r ),
the separatingconjunction (M1 ∗ M2 ), and the empty memory state
(emp) from separationlogic, which together can describe a set of
possible memories that have a finitenumber of points-to
relationships. The separating conjunction M1 ∗ M2 de-scribes a
memory that can be divided into two disjoint regions (i.e., with
disjointdomains) described by M1 and M2 . A field offset expression
β @ f correspondsto the base address β plus the offset of field f
(i.e., &(b.f) in C). For sim-plicity, we assume that all
pointers occur as fields in a struct . R-values r aresymbolic
expressions representing the contents of memory cells (whose
preciseform is unimportant but does include null). Memory regions
are summarizedwith applications of user-supplied checkers. We write
α.c(β) to mean checkerc applied to α and β holds (i.e., c succeeds
when applied to α and β ). Forexample, α.skip1() says that the
skip1 checker is successful when applied to α .We use this
object-oriented style notation to distinguish the main traversal
ar-gument α from any additional parameters β . These additional
parameters maybe used to specify additional constraints (as in the
skip0 checker in Fig. 1), butwe do not traverse from them. We also
introduce a notion of a partial checker
-
6 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
α1@f 7→ α2 α1 α2f
α@f 7→ null α 0f
α.c(β) αc(β)
α1.c(β) ∗− α2.c(β) α1 α2c(β)
Fig. 2. Correspondence between formulas and edges
run α1.c(β) ∗− α2.c(β) that describes a memory region summarized
by a seg-ment from α1 to α2 , which will be described further in
the subsections below.Visually, we regard a memory state as a
directed graph. The edges correspondto formulas as shown in Fig.
2.3 Each edge in a graph is considered separatelyconjoined (i.e.,
each edge corresponds to a disjoint region of memory).
Inductive Structure Checkers. The abstract domain provides
generic sup-port for inductive structures through user-specified
checkers. Observe that adynamic run of a checker, such as skip1 (in
Fig. 1), visits a region of memorystarting from some root pointer,
and furthermore, a successful, terminating runof a checker
indicates how the user intends to access that region of memory.
Inthe context of our analysis, a checker gives a corresponding
inductively-definedpredicate in separation logic and a successful,
terminating run of the checkerbears witness to a derivation of that
predicate.
The definition of a checker c , with formals π and ρ , consists
of a finitedisjunction of rules. A rule is the conjunction of a
separating conjunction of aseries of points-to relations and
checker applications M and a pure, first-orderpredicate P , written
〈M ; P 〉 .
checker definitions π.c(ρ) := 〈M1 ; P1〉 ∨ · · · ∨ 〈Mn ; Pn〉
Free variables in the rules are considered as existential
variables bound at thedefinition. Because we view checkers as
executable code, the kinds of inductivepredicates are restricted.
More precisely, we have the following restrictions onthe Mi ’s: (1)
they do not contain partial checker applications (i.e., ∗−) and
(2)the points-to edges correspond to finite access paths from π .
In other words,each Mi can only correspond to a memory region
reachable from π . A checkercannot, for example, posit the
existence of some pointer that points to π .
Each rule specifies one way to prove that a structure satisfies
the checkerdefinition, by checking that the corresponding
first-order predicate holds andthat the store can be separated into
a series of stores, which respectively allowproving each of the
separating conjuncts. Base cases are rules with no
checkerapplications.
3 For presentation, we show the most common kinds of edges. In
the implementation,we support field offsets in most places to
handle, for example, pointer to fields.
-
Shape Analysis with Structural Invariant Checkers 7
Example 1 (A binary tree checker). A binary tree with fields lt
and rt can bedescribed by a checker with two rules:
π.tree() := 〈emp ; π = null〉 ∨ 〈(π@lt 7→ γ) ∗ (π@rt 7→ δ) ∗
γ.tree() ∗ δ.tree() ; π 6= null〉
Example 2 (A skip list checker). The “C-like” checkers for the
two-level skip listin Fig. 1 would be translated to the
following:
π.skip1() := 〈emp ; π = null〉∨ 〈(π@s 7→ γ) ∗ (π@n 7→ δ) ∗
γ.skip1() ∗ δ.skip0(γ) ; π 6= null〉
π.skip0(ρ) := 〈emp ; π = ρ〉∨ 〈(π@s 7→ null) ∗ (π@n 7→ γ) ∗
γ.skip0(ρ) ; π 6= ρ ∧ π 6= null〉
Segments and Partial Checker Runs. In the above, we have built
someintuition on how user-specified checkers can be utilized to
give precise summariesof memory regions. Unfortunately, the
inductive predicates obtained from typicalcheckers, such as tree or
skip1 , are usually not general enough to capture theinvariants of
interest at all program points. To see this, consider the invariant
atfixed point on line 5 (i.e., the loop invariant) in the skip list
example (Fig. 1).Here, we must track some information in detail
around a cursor (e.g., p andc), while we need to summarize both the
already explored prefix before thecursor and the yet to be explored
suffix after the cursor. Such a situation istypical when analyzing
a traversal algorithm. The suffix can be summarized bya checker
application δ.skip0(ε) (i.e., the skip0 edge from c), but
unfortunately,the prefix segment (i.e., the region between l and p)
cannot.
Rather than require more general checker specifications
sufficient to capturethese intermediate invariants, we introduce a
generic mechanism for summariz-ing prefix segments. We make the
observation that they are captured by partialchecker runs. In terms
of inductively-defined predicates, we want to consider par-tial
derivations, that is, derivations with a hole in a subtree. This
concept is in-ternalized in the logic with the separating
implication. For example, the segmentfrom l to p on line 5
corresponds to the partial checker application α.skip1()
∗−β.skip1(). Informally, a memory region satisfies α.skip1() ∗−
β.skip1() if andonly if for any disjoint region that satisfies
β.skip1() (i.e., is a skip list from β ),then conjoining that
region satisfies α.skip1() (i.e., makes a complete skip listfrom
α). This statement entails that β is reachable from α . Our
notation forseparating implication is reversed compared to the
traditional notation −∗ tomirror more closely the graphical
diagrams. Our use of separating implication isrestricted to the
form where the premise and conclusion are checker applicationsthat
differ only in the unfolding argument because these are the only
partialchecker edges our analysis generates.
Semantics of Shape Graphs. For completeness in presentation, we
give thesemantics of abstract memory states with checkers (i.e.,
graphs) in terms of setsof concrete stores, which follows mostly
from separation logic. In Sect. 4, wedescribe the shape analysis
algorithm that utilizes this memory abstraction.
-
8 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
We write u, v ∈ Val for concrete values and make no distinction
betweenaddresses and values, and we write v@ f to mean the address
v + offset(f ) (i.e.,the base address v plus the field offset f ).
A concrete store σ : Val ⇀ Valmaps addresses to values. We write σ1
∗ · · · ∗ σn for the store with disjointsub-stores σ1, . . . , σn
(i.e., they have disjoint domains). For the empty store, wewrite [
] , and for the store with one cell with address v and containing
valueu , we write [v 7→ u] . A valuation ν is a substitution with
concrete values forsymbolic values (written [~v/~α]). Finally, we
write νM for applying the valuationν to M .
We say a concrete store σ satisfies an abstract memory M if
there existsa valuation ν such that σ |= νM where the relation |=
is defined as the leastrelation satisfying the following rules:
[ ] |= emp (always)[v@f 7→ u] |= [v@f 7→ u] (always)σ1 ∗ σ2 |=
M1 ∗ M2 if σ1 |= M1 and σ2 |= M2σ |= v.c()
if there exists a rule 〈M ; P 〉 in the definition of π.c() and
there existvalues ~u such that σ satisfies the pure formula
[v/π][~u/~α]P andσ |= [v/π][~u/~α]M where ~α are the free variables
of the rule.
σ |= v.c() ∗− v′.c()if for all σ′ (disjoint from σ), if σ′ |=
v′.c(), then σ ∗ σ′ |= v.c().
For presentation, we write the semantics with checkers with no
additional param-eters. They can be extended to checkers with
parameters without any difficulty.
4 Analysis Algorithm
In this section, we describe our shape analysis algorithm. Like
many other shapeanalyses, we have a notion of materialization,
which reifies memory regions inorder to track updates, as well as
blurring or weakening, which (re-)summarizescertain memory regions
in order to obtain a terminating analysis. For us, wematerialize by
unfolding checker edges (Sect. 4.1) and weaken by folding
memoryregions back into checker edges (Sect. 4.2). Like others, we
materialize as neededto reflect updates and dereferences, but
instead of weakening eagerly, we delayweakening in order to use
history information to guide the process.
Our shape analysis is a standard forward analysis that computes
an abstractstate at each program point. In addition to the memory
state (as described inSect. 3), the analysis also keeps track of a
number of pure constraints P (pointerequalities and disequalities).
Furthermore, we maintain some disjunction, so ouranalysis state has
essentially the following form: 〈M1 ; P1〉 ∨ 〈M2 ; P2〉 ∨ · · · ∨〈Mn
; Pn〉 (for unfoldings and acyclic paths where needed).
Additionally, wekeep the values of the program variables (i.e., the
stack frame) in an abstract
-
Shape Analysis with Structural Invariant Checkers 9
environment E that maps program variables to symbolic values
that denotetheir contents.4
4.1 Abstract Transition and Checker Unfolding
Because each edge in the graph denotes a separate memory region,
the atomicoperations (i.e., mutation, allocation, and deallocation)
are straightforward andonly affect graphs locally. As alluded to in
Sect. 2, mutation reduces to theflipping of an edge when each
memory cell accessed in the statement exists inthe graph as a
points-to edge. This strong update is sound because of
separation(that is, because each edge is a disjoint region).
When there is no points-to edge corresponding to a dereferenced
locationbecause it is summarized as part of a checker edge, we
first materialize points-toedges by unfolding the checker
definition (i.e., conceptually unfolding one-stepof the checker
run). We unfold only as needed to expose the points-to edge
thatcorresponds to the dereferenced location. Unfolding generates
one graph perchecker rule, obtained by replacing the checker edge
with the points-to edges andthe recursive checker applications
specified by the rule; the pure constraints in therule are also
added to pure state. In case we derive a contradiction (in the
pureconstraints), then those unfolded elements are dropped. Though,
unfolding maygenerate a disjunction of several graphs. A
fundamental property of unfoldingis that the join of the
concretizations of the resulting graphs is equal to
theconcretization of the initial graph.
Example 3 (Unfolding a skip list). We exhibit an unfolding of
the skip1 checkerfrom Example 2. The addition of the pure
constraints are shown explicitly.
�
�
�
�
P
αskip1
unfold−−−−→
�
�
�
�
P ∧ α = null
emp
∨
�
�
�
�
P ∧ α 6= null
α β γn
s
skip0(γ) skip1
4.2 History-Guided Folding
We need a strategy to identify sub-graphs that should be folded
into completeor partial checker edges. What kinds of sub-graphs can
be summarized withoutlosing too much precision is highly dependent
on the structures in question andthe code being analyzed. To see
this, consider the fixed-point graph at programpoint 5 in this skip
list example (Fig. 1). One could imagine folding the points-toedges
corresponding to p->n and p->s into one summary region from p
to c (i.e.,eliminating the node labeled γ ), but it is necessary to
retain the information thatp and c are “separated” by at least one
n field. Keeping node γ expresses thisfact. Rather than using a
canonicalization operation that looks only at one graphto identify
the sub-graphs that should be summarized, our weakening strategy
isbased on the observation that previous iterates at loop join
points can be utilized
4 In implementation, we instead include the stack frame in M to
enable handlingaddress of local variable expressions (as in C) in a
smooth manner.
-
10 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
to guide the folding process. In this subsection, we define the
approximation testand widening operations (standard operations in
abstract interpretation-basedstatic analysis) over graphs as a
simultaneous traversal over the input graphs.
Approximation Test. The approximation test on memory states M1 v
M2takes two graphs as input and tries to establish that the
concretization of M1is contained in the concretization of M2 (i.e.,
M1 ⇒ M2 ). Static analyses relyon the approximation test in order
to ensure the termination of fixed pointcomputation. We also
utilize it to collapse extraneous disjuncts in the analysisstate
and most importantly, as a sub-routine in the widening
operation.
Roughly speaking, our approximation test checks that graph M1 is
equiva-lent to graph M2 up to unfolding of M2 . That is, the basic
idea is to determinewhether M1 v M2 by reducing to stronger
statements either by matching edgeson both sides or by unfolding M2
. To check this relation, we need a correspon-dence between nodes
of M1 and nodes of M2 . This correspondence is given bya mapping Φ
from nodes of M2 to those of M1 . The condition that Φ is such
afunction ensures any aliasing expressed in M2 is also reflected in
M1 . If at anypoint, this condition on Φ is violated, then the test
fails.
Initialization. The mapping Φ plays an essential role in the
algorithm itself sinceit gives the points from where we should
compare the graphs. It is initializedusing the environment and then
extended as the input graphs are traversed. Thenatural starting
points are the nodes that correspond to the program variables(i.e.,
the initial mapping Φ0 = {E2(x) � E1(x) | x ∈ Var}).
Traversal. After initialization, we decide the approximation
relation by travers-ing the input graphs and attempting to match
all edges. To check region dis-jointness (i.e., linearity), when
edges are matched, they are “consumed”. If thealgorithm gets stuck
where not all edges are “consumed”, then the test fails. Todescribe
this traversal, we define the judgment M1 v M2[Φ] that says, “M1
isapproximated by M2 under Φ .”
In the following, we describe the rules that define M1 v M2[Φ]
by followingthe example derivation shown in Fig. 3 (from goal to
axiom). A complete listingof the rules is given in Appendix A. In
Fig. 3, the top line shows the initial goalwith a particular
initialization for Φ . Each subsequent line shows a step in
thederivation (i.e., a rewriting step) that is obtained by applying
the rule namedon the right. The highlighting of nodes and edges
indicates where the rewritingapplies. We are able to prove that the
left graph is approximated by the rightgraph because we reach emp v
emp[Φ] .
First, consider the application of the pointsto rule (line 3 to
4). When bothM1 and M2 have the same kind of edge from matched
nodes, the approximationrelation obviously holds for those edges,
so those edges can be consumed. Anytarget nodes are then added to
the mapping Φ so that the traversal can continuefrom those nodes.
In this case, the s and n points-to edges match from the pairα � δ
. With this matching, the mappings β � ε, γ � ε are added. We
highlight
-
Shape Analysis with Structural Invariant Checkers 11
1 δ εn
s
v α ζskip1
[ α � δ, ζ � ε ]
2 δ εn
s
skip1v α
skip1[ α � δ, ζ � ε ]
3 δ εn
s
skip1v α β γ
n
s
skip0(γ) skip1[ α � δ, ζ � ε ]
4 δ εskip1
v α β γskip0(γ) skip1
[ α � δ, ζ � ε, β � ε, γ � ε ]
5 ε
skip0(γ)
v β γskip0(γ)
[ α � δ, ζ � ε, β � ε, γ � ε ]
6 emp v emp [ α � δ, ζ � ε, β � ε, γ � ε ]
assume
unfold
pointsto (2x)
checker
unfold
Fig. 3. Testing approximation by reducing to stronger
statements
in Φ with underlines the mappings that must match for each rule
to apply. Thechecker rule is the analogous matching rule for
complete checker edges. We applythis edge matching only to
points-to edges and complete checker edges. Partialchecker edges
are treated separately as described below.
Partial checker edges are handled by taking the separating
implication inter-pretation, which becomes critical here. We use
the assume rule (as in the firststep in Fig. 3) to reduce the
handling of partial checker edges in M2 to the han-dling of
complete checker edges (i.e., a “−∗ right” in sequent calculus or
“−∗introduction” in natural deduction). It extends the partial
checker edge in M2to a complete checker edge by adding the
corresponding completion to M1 . Akey aspect of our algorithm is
that this rule only applies when we have matchedboth the source and
target nodes of the partial checker edge, that is, we
havedelineated in M1 the region that corresponds to the partial
checker edge in M2 .
Now, consider the first application of unfold in Fig. 3 (line 2
to 3) where wehave a complete checker edge from α on the right, but
we do not have an edgefrom δ on the left that can be immediately
matched with it. In this case, weunfold the complete checker edge.
In general, the unfolding results in a disjunc-tion of graphs (one
for each rule, Sect. 4.1), so the overall approximation
checksucceeds if the approximation check succeeds for any one of
the unfolded graphs.Note that on an unfolding, we must also
remember the pure constraint P fromthe rule, which must be
conjoined to the pure state on the right when we checkthe
approximation relation on the pure constraints. In the second
applicationof unfold in Fig. 3 (line 5 to 6), the unfolding of
β.skip0(γ) is to emp becausewe have that β = γ . This equality
arises because they are both unified with ε(specifically, the
pointsto steps added β � ε and γ � ε to Φ).
Finally, we also have a rule for partial checkers in M1 (i.e., a
corresponding“left” or “elimination” rule). Since it is not used in
the above example, we presentit below schematically:
-
12 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
α1 α′
1
cf2
f1
c
c v α2c
[Φ, α2 � α1]
α′1
f2
f1
c
c v α′2
c[Φ, α2 � α1, α′2 � α
′
1] (α′
2fresh)
apply
The rule is presented in the same way as in the example (i.e.,
with the goal ontop). Conceptually, this rule can be viewed as a
kind of unfolding rule wherethe complete checker edge in M2 is
unfolded the necessary number of steps tomatch the the partial
checker edge in M1 .
Informally, the soundness of the approximation test can be
argued from sep-aration logic principles and from the fact that
unfoldings have equivalent con-cretizations. The approximation test
is, however, incomplete (i.e., it may fail toestablish that an
approximation relation between two graphs when their
con-cretizations are ordered by subset containment). Rather these
rules have beenprimarily designed to be effective in the way the
approximation test is used bythe widening operation as described in
the next subsection where we need todetermine if M1 is an unfolded
version of M2 .
Widening. In this subsection, we present an upper bound
operation M1OM2that we use as our widening operator at loop join
points. The case of disjunctionsof graphs will be addressed below.
At a high-level, the upper bound operationworks in a similar manner
as compared to the approximation test. We maintain anode pairing Ψ
that relates the nodes of M1 and M2 . Because we are computingan
upper bound here, the pairing Ψ need not have the same restriction
as in theapproximation test; it may be any relation on nodes in M1
and M2 . From thispairing, we simultaneously traverse the input
graphs M1 and M2 consumingedges. However, for the upper bound
operation, we also construct the upperbound as we consume edges
from the input graphs. Intuitively, the basic edgematching rules
will lay down the basic structure of the upper bound and guideus to
the regions of memory that need to be folded.
Initialization. The initialization of Ψ is the analogous to the
approximation testinitialization: we pair the nodes that correspond
to the values of each variablefrom the environments (i.e., the
initial pairing Ψ0 = {〈E1(x), E2(x)〉 | x ∈ Var}).
Traversal. To describe the upper bound computation, we define a
set of rewritingrules of the form Ψ # (M1OM2) � M _ Ψ ′ # (M ′1OM
′2) � M ′ . Initially, M isemp , and then we try to rewrite until M
′1 and M
′2 are emp in which case M
′
is the upper bound. A node in M corresponds to a pair (from M1
and M2 ).Conceptually, we build M with nodes labeled with such
pairs and then relabeleach distinct pair with a distinct symbolic
value at the end.
Figure 4 shows an example sequence of rewritings to compute an
upperbound. A complete listing of the rewrite rules is given in
Appendix B. We elidethe pairing Ψ , as it can be read off from the
nodes in the upper bound graph M
-
Shape Analysis with Structural Invariant Checkers 13
previous current upper bound
1
0α β γ
l, p cn
s
skip0(γ) skip1O
0δ ε ζ η
l p cn
s
n
s
skip0(η) skip1�
α,δ α,ε β,ζ
l p cskip1
m-checker
m-checker
w-aliases
m-pointsto
2
0α β γ
l, p cn
s
skip1O
0δ ε ζ η
l p cn
s
n
s
skip1�
α,δ α,ε β,ζ γ,η
l p cskip0(γ,η)
3
0α β γ
l, p cn
s
O
0δ ε ζ η
l p cn
s
n
s
�α,δ α,ε β,ζ γ,η
l p cskip0(γ,η) skip1
4
0α β
l, p cn
s
O
0ε ζ
p cn
s
�α,δ α,ε β,ζ γ,η
l p cskip1 skip0(γ,η) skip1
5 emp O emp �
0
α,δ α,ε β,ζ γ,η
l p cskip1 skip0(γ,η) skip1n
s
Fig. 4. An example of computing an upper bound. The inputs are
the graphson the first iteration at program points 5 and 9 in the
skip list example (Fig. 1).The fixed-point graph at 5 is obtained
by computing the upper bound of thisresult and the upper bound of
the first-iteration graphs at 5 and 11
(the rightmost graph). The highlighting of nodes in the upper
bound graph indi-cate the node pairings that are required to apply
the rule, and the highlighting ofedges in the input graphs show
which edges are consumed in the rewriting step.Roughly speaking,
the upper bound operation has two kinds of rules: matchingrules for
when we have the same kind of edge on both sides (like in the
approx-imation test) and weakening rules where we have identified a
memory regionto fold. We use the prefix m- for the matching rules
and w- for the weakeningrules.
Line 1 shows the state after initialization: we have nodes in
upper boundgraph for the program variables. The first two steps
(applying rule m-checker)match complete checker edges (first from
〈β, ζ〉 and then from 〈γ, η〉). Note thatthe second application is
enabled by the first where we add the pair 〈γ, η〉 . Extraparameters
are essentially implicit target nodes.
l, p
vl p
skip1? Yes, always.
l pn
s
vl p
skip1? Yes, see Fig. 3.
The core of the upper boundoperation are three weakeningrules
where we fold memory re-gions. The next rule applicationw-aliases
is such a weakening step(line 3 to 4). In this case, a nodeon one
side is paired with two nodes on the other (〈α, δ〉 and 〈α, ε〉).
This situ-ation arises where on one side, we have must-alias
information, while the otherside does not (l and p are aliased on
the left but not on the right). In thiscase, we want to weaken both
sides to a partial checker edge. To see that this isindeed an upper
bound for these regions, consider the diagram in the inset. Asshown
on the first line, aliases can always be weakened to a partial
checker edge(intuitively, from a zero-step segment to a
zero-or-more step segment). On the
-
14 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
second line, we need to check that a skip1 checker edge is
indeed weaker thanthe region between δ and ε . This check is done
using the approximation testdescribed in the previous subsection.
The check we need to perform here is theexample shown in Fig. 3.
Observe that we utilize the edge matching rules thatpopulates Ψ to
delineate the region to be folded (e.g., the region between δ andε
in the right graph). For the w-aliases rule, we do not specify here
how thechecker c is determined, but in practice, we can limit the
checkers that need tobe tried by, for example, tracking the type of
the node (or looking at the fieldsused in outgoing points-to
edges).
There are two other weakening rules w-partial and w-checker that
are notused in the above example. Rule w-partial applies when we
identify that an(unfolded) memory region on one side corresponds to
a partial checker edge onthe other. In this case, we weaken to the
partial checker edge if we can show thepartial checker edge is
weaker than the memory region. Rule w-partial is shownbelow
schematically:
M1 ∗ α βc
O M2 ∗δ
γ
f2
f1
c
� M ∗ α,γ β,δ
M1 O M2 � M ∗ α,γ β,δc
w-partial
ifδ
γ
f2
f1
c
v γ δc
[ γ � γ, δ � δ ]
Observe that we find out that the region in the right graph must
be foldedbecause the corresponding region in the left graph is
folded (and also indicateswhich checker to use). Rule w-checker is
the analogous rule for a complete checkeredge.
In Fig. 4, the last step is simply matching points-to edges.
When we reachemp for M1 and M2 , then M is the upper bound. In
general, if, in the end,there are regions we cannot match or weaken
in the input graphs, we can obtainan upper bound by weakening those
regions to > in the resulting graph (i.e., asummary region that
cannot be unfolded). This results in an enormous loss inprecision
that we would like avoid but can be done if necessary.
Soundness. The basic idea is that we compute an upper bound by
rewritingbased on the following derived rule of inference in
separation logic:
M ′1 ⇒ M M ′2 ⇒ M(M1 ∗ M ′1) ∨ (M2 ∗ M ′2) ⇒ (M1 ∨M2) ∗ M
For each memory region in the input graphs, either they have the
same structurein the input graphs and we preserve that structure or
we weaken to a checker edgeonly when we can decide the weakening
with v . That is, during the traversal,we simply alternate between
weakening memory regions in each input graph tomake them match and
applying the distributivity of separating conjunction
overdisjunction to factor out matching regions.
-
Shape Analysis with Structural Invariant Checkers 15
Termination. We shall use this upper bound operation as our
widening operator,so we check that it has the stabilizing property
(i.e., successive iterates eventuallystabilize) to ensure
termination of the analysis. Consider an infinite
ascendingchain
M0 v M1 v M2 v · · ·
and the corresponding widening chain
M0 v (M0OM1) v ((M0OM1)OM2) v · · ·
(i.e., the sequence of iterates). The widening chain stabilizes
because the suc-cessive iterates are bounded by the size of M0 .
Over the sequence of iterates,the only rule that may produce
additional edges not present in M0 is w-aliases ,but its
applicability is limited by the number of nodes. Then, nodes are
cre-ated in the result only in two cases: the target node when
matching points-toedges (m-pointsto) and any additional parameter
nodes when matching com-plete checker edges (m-checker). Points-to
and complete checker edges are onlycreated in the resulting graph
because of matching, so the number of nodes islimited by the
points-to and complete checker edges in M0 .
Strategy for applying rules. Unlike the approximation test, the
upper bound rulesas described above have a fair amount of
non-determinism, and unfortunately,applying the rules in different
orders may yield different results in terms of pre-cision. To avoid
an exponential explosion in computational complexity, we fix
aparticular strategy in which to apply the rules, which has been
determined, inpart, experimentally. We note, however, that neither
soundness nor terminationare affected by the strategy that we
choose. Intuitively, we obtain a good resultwhen we are able to
consume all the edges in the input graphs by applying theupper
bound rules. A potential bad interaction between the rules is if we
pre-maturely match (and consume) points-to edges that rather should
be weakenedtogether with other edges. For example, in Fig. 4 before
w-aliases , if instead wematch the points-to edges α@n 7→ β on the
left and δ@n 7→ ε on the right (i.e.,apply m-pointsto) creating the
pair 〈β, ε〉 , then we will not be able to consumeall edges. Our
strategy is to first exhaustively match complete checker
edges(m-checker), as it does not prohibit any other rules and
corresponds to identi-fying the “yet to be explored tail of the
structure”. Then, since the weakeningrules (w-aliases and
w-partial) only apply once we have identified correspondingregions
(and that can only be consumed by performing this weakening), we
ap-ply these rules exhaustively when applicable. To identify such
regions, we thenapply m-pointsto but incrementally (i.e., we match
a points-to edge and restart).Finally, when nothing else applies,
we try weakenings to complete checker edges(w-checker).
Disjunctions of graphs. In general, we consider widening
disjunctions of graphs.The widening operator for disjunctions is
based on the operator for graphs andattempts to find pairs that can
be widened precisely in the sense that no regionneed be weakened to
> (i.e., because an input region could not be matched). In
-
16 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
addition to this selective widening process, the widening may
leave additionaldisjuncts, up to some fixed limit (perhaps based on
trace partitioning [MR05]).
More precisely, let us consider two disjunctions of graphs
M1 ∨ . . . ∨Mn and M ′1 ∨ . . . ∨M ′n′
(where we omit the pure formulas for the sake of clarity). Then,
the wideningon the two disjunctive states relies on the following
algorithm:
– for each disjunct M ′j , if there exists an element Mi such
that the rewritingrules for the graph widening algorithm for MiOM
′j does not get stuck, thenadd it to the result; if there exists no
such element Mi , then add M ′j to theresult;
– for each disjunct Mi such that no MiOM ′j has been added to
the result,then add Mi to the result, unless this would cause the
generation of moredisjuncts than a fixed constant; in this case, an
M ′j should be widenedagainst Mi (with unmatched regions weakened
to > if necessary).
The termination follows from the termination property of the
widening operatorfor pairs of graphs and from the bound on the
number of disjuncts.
4.3 Extensions and Limitations
The kinds of structures that can be described with our checkers
are essentiallytrees with regular sharing patterns, which include
skip lists, circular lists, doubly-linked lists, and trees with
parent pointers. Intuitively, these are structures whereone can
write a recursive traversal that dereferences each field once (plus
pointerequality and disequality constraints). However, the
effectiveness of our shapeanalysis is not the same for all code
using these structures. First, we materializeonly when needed by
unfolding inductive definitions, which means that code thattraverse
structures in a different direction than the checker are more
difficult toanalyze. This issue may be addressed by considering
additional materializationstrategies. Second, in our presentation,
we consider partial checker edges with onehole (i.e., a separating
implication with one premise). This formulation handlescode that
use cursors along a path through the structure but not code that
usesmultiple cursors along different branches of a structure.
5 Experimental Evaluation
We evaluate our shape analysis using a prototype implementation
for analyz-ing C code. Our analysis is written in OCaml and uses
the CIL infrastruc-ture [NMRW02]. We have applied our analysis to a
number of small data struc-ture manipulation benchmarks and a
larger Linux device driver benchmark(scull). In the table, we show
the size in pre-processed lines of code, the anal-ysis times on a
2.16GHz Intel Core Duo with 2GB RAM, the maximum numberof graphs
(i.e., number of disjuncts) at any program point, and the
maximum
-
Shape Analysis with Structural Invariant Checkers 17
Table 1. Analysis statistics
Code Analysis Max. Graphs Max. IterationsSize Time at Any Point
at Any Point
Benchmark (loc) (sec) (num) (num)
list reverse 19 0.007 1 3list remove element 27 0.016 4 6list
insertion sort 56 0.021 4 7binary search tree find 23 0.010 2 4skip
list rebalance 33 0.087 6 7
scull driver 894 9.710 4 16
number iterations at any program point. In each case, we
verified that the datastructure manipulations preserved the
structural invariants given by the check-ers. Because we only fold
into checkers based only on history information, wetypically cannot
generate the appropriate checker edge when a structure is be-ing
constructed. This issue could be resolved by using constructor
functions withappropriate post-conditions or perhaps a one graph
operation that can identifypotential foldings. For these
experiments, we use a few annotations that add achecker edge that
say, for example, treat this null as the empty list (1 each inlist
insertion sort and skip list rebalance).
The scull driver is from the Linux 2.4 kernel and was used by
McPeak andNecula [MN05]. The main data structure used by the driver
is an array of doubly-linked lists. Because we also do not yet have
support for arrays, we rewrote thearray operations as linked-list
operations (and ignored other char arrays). Weanalyzed each
function individually by providing appropriate pre-conditions
andinlining all calls, as our implementation does not yet support
proper interpro-cedural analysis. One function (cleanup module) was
not completely analyzedbecause of an incomplete handling of the
array issues; it is not included in theline count. We also had 6
annotations for adding checker edges in this example.In all the
test cases (including the driver example), the number of graphs
weneed to maintain at any program point (i.e., the number of
disjuncts) seems tostay reasonably low.
6 Related Work
Shape analysis. Shape analysis has long been an active area of
research withnumerous algorithms proposed and systems developed.
Our analysis is closest tosome more recent work on separation
logic-based shape analyses by Distefano etal. [DOY06] and Magill et
al. [MNCL06]. Their shape analyses infer invariantsfor programs
that manipulate linked-lists. They summarize linked-list
regionsusing a notion of list segments ( ls), which is an
inductively-defined predicate,that gets unfolded and folded during
the course of their analyses. Also like theiranalyses, we utilize
separation explicitly in our memory abstraction, which allows
-
18 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
the update operation to affect the memory state in a local
manner. The primarydifference is that the list segment abstraction
is built into their analyses, whileour analysis is parameterized by
inductive checker definitions. To ensure termi-nation of the
analysis, they use a canonicalization operation on list segments(an
operation from a memory state to a memory state), while we use a
history-guided approach to identify where to fold (an operation
from two memory statesto one). Note that these approaches are not
incompatible with each other, andthey have different trade-offs.
The additional history information allowed us todevelop a generic
weakening strategy, but because we are history-dependent,we cannot
weaken whenever (e.g., we cannot weaken aggressively after
eachupdate). It might be possible to derive automatically
canonicalization rules incertain situations based on an analysis of
checker definitions. If combined withhistory-guided weakening,
canonicalization would not need to ensure finitenessand could be
less aggressive in its folding. Recently, Berdine et al.
[BCC+07]have developed a shape analysis over generalized
doubly-linked lists. They usea higher-order list segment predicate
that is parameterized by the shape of the“node”, which essentially
adds a level of polymorphism to express, for exam-ple, a linked
list of cyclic doubly-linked lists. We can instead describe
customstructures monomorphically with the appropriate checkers, but
an extension forpolymorphism could be very useful.
Lee et al. [LYY05] propose a shape analysis where memory regions
aresummarized using grammar-based descriptions that correspond to
inductively-defined predicates in separation logic (like our
checkers). A nice aspect of theiranalysis is that these
descriptions are derived from the construction of the datastructure
(for a certain class of tree-like structures). For weakening, they
use acanonicalization operation to fold memory regions into
grammar-based descrip-tions (non-terminals), but to ensure
termination of the analysis, they must fix inadvance a bound on the
number of nodes that can be in a canonicalized graph.
TVLA [SRW02] is a very powerful and generic system based on
three-valuedlogic and is probably the most widely applied tool for
verifying deep proper-ties of complex heap manipulations (e.g.,
[LRS06,LARSW00]). The frameworkis parametric in that users can
provide specifications (instrumentation predi-cates) that affect
the kinds of structures tracked by the tool. Our analysis isinstead
parameterized by inductive checker definitions, but since we focus
onstructural properties, we do not handle any data invariants. Much
recent workhas been targeted at improving the scalability of TVLA.
Yahav and Rama-lingam [YR04] partition the memory state into
regions that are either trackedmore precisely or less precisely
depending on their relevance to the property inquestion. Manevich
et al. [MSRF04] describe a strategy to merge memory stateswhose
canonicalizations are “similar” (i.e., have isomorphic sets of
individuals).Our folding strategy can be seen as being particularly
effective when the memorystates are “similar”; like them, we would
like to use disjunction when the strat-egy is ineffective. Arnold
[Arn06] identifies an instance where a more aggressivesummarization
loses little precision (by allowing summary nodes to represent
-
Shape Analysis with Structural Invariant Checkers 19
zero-or-more concrete nodes instead of one-or-more). Our
abstraction is relatedin that our checker edges denote zero-or-more
steps.
Hackett and Rugina [HR05] present a novel shape analysis that
first parti-tions the heap using region inference and then tracks
updates on representativeheap cells independently. While their
abstraction cannot track certain globalproperties like the
aforementioned shape analyses, they make this trade-off toobtain a
very scalable shape analysis that can handle singly-linked lists.
Recently,Cherem and Rugina [CR07] have extended this analysis to
handle doubly-linkedlists by including the tracking of neighbor
cells.
McPeak and Necula [MN05] identify a class of axioms that can
describemany common data structure invariants and give a complete
decision procedurefor this class. Their technique is based on
verification-condition generation andthus requires loop invariant
annotations. PALE [MS01] is a similar system alsobased on
verification-condition generation but instead uses monadic
second-orderlogic. Weis et al. [WKL+06] have extended PALE with
non-deterministic fieldconstraints (and some loop invariant
inference), which enables some reasoningof skip list
structures.
Inductive checkers. It comes at no surprise that inductive data
structures arenaturally described using inductive definitions (in
some form). For inductive datastructures in imperative languages,
separation logic enables the specification ofsuch structures in a
particularly concise manner because disjointness is builtinto the
logic [Rey02]. By restricting our attention to definitions that can
beviewed as code, we impose strictures that are useful for shape
analysis. All shapeanalyses based on separation logic (e.g.,
[DOY06,MNCL06]) use inductively-defined predicates for abstraction
that fall into this class. Perry et al. [PJW06]have also observed
inductive definitions in a substructural logic could be aneffective
specification mechanism. They describe shape invariants for
dynamicanalysis with linear logic (in the form of logic
programs).
7 Conclusion
We have described a lightweight shape analysis based on
user-supplied structuralinvariant checkers. These checkers, in
essence, provide the analysis with user-specified memory
abstractions. Because checkers are only unfolded when theregions
they summarize are manipulated, these specifications allow the user
tofocus the efforts of the analysis by enabling it to expose
disjunctive memorystates only when needed. The key mechanisms we
utilize to develop such a shapeanalysis is a generalization of
checker-based summaries with partial checker runsand a folding
strategy based on guidance from previous iterates. In this paper,we
have focused on using structural checkers to analyze algorithms
that traversethe structures unidirectionally. We believe such ideas
could be applicable morebroadly (both in terms of utilizable
checkers and algorithms analyzed).
-
20 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
Acknowledgments. We would like to thank Hongseok Yang, Bill
McCloskey,Gilad Arnold, Matt Harren, and the anonymous referees for
providing helpfulcomments on drafts of this paper.
References
[Arn06] Gilad Arnold. Specialized 3-valued logic shape analysis
using structure-based refinement and loose embedding. In Static
Analysis Symposium(SAS), pages 204–220, 2006.
[BCC+ 07] Josh Berdine, Cristiano Calcagno, Byron Cook, Dino
Distefano, Peter W.O’Hearn, Thomas Wies, and Hongseok Yang. Shape
analysis for compositedata structures. In Conference on
Computer-Aided Verification (CAV),2007.
[CC77] Patrick Cousot and Radhia Cousot. Abstract
interpretation: A unified lat-tice model for static analysis of
programs by construction or approxima-tion of fixpoints. In
Symposium on Principles of Programming Languages(POPL), pages
238–252, 1977.
[CR07] Sigmund Cherem and Radu Rugina. Maintaining doubly-linked
list invari-ants in shape analysis with local reasoning. In
Conference on Verification,Model Checking, and Abstract
Interpretation (VMCAI), 2007.
[DOY06] Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. A
local shapeanalysis based on separation logic. In Conference on
Tools and Algorithmsfor the Construction and Analysis of Systems
(TACAS), pages 287–302,2006.
[HR05] Brian Hackett and Radu Rugina. Region-based shape
analysis withtracked locations. In Symposium on Principles of
Programming Languages(POPL), pages 310–323, 2005.
[LARSW00] Tal Lev-Ami, Thomas W. Reps, Shmuel Sagiv, and
Reinhard Wilhelm.Putting static analysis to work for verification:
A case study. In Interna-tional Symposium on Software Testing and
Analysis (ISSTA), pages 26–38,2000.
[LRS06] Alexey Loginov, Thomas W. Reps, and Mooly Sagiv.
Automated veri-fication of the Deutsch-Schorr-Waite tree-traversal
algorithm. In StaticAnalysis Symposium (SAS), pages 261–279,
2006.
[LYY05] Oukseh Lee, Hongseok Yang, and Kwangkeun Yi. Automatic
verificationof pointer programs using grammar-based shape analysis.
In EuropeanSymposium on Programming (ESOP), pages 124–140,
2005.
[MN05] Scott McPeak and George C. Necula. Data structure
specifications vialocal equality axioms. In Conference on
Computer-Aided Verification(CAV), pages 476–490, 2005.
[MNCL06] Stephen Magill, Aleksandar Nanevski, Edmund Clarke, and
Peter Lee.Inferring invariants in separation logic for imperative
list-processing pro-grams. In Workshop on Semantics, Program
Analysis, and ComputingEnvironments for Memory Management (SPACE),
2006.
[MR05] Laurent Mauborgne and Xavier Rival. Trace partitioning in
abstract inter-pretation based static analyzers. In European
Symposium on Programming(ESOP), pages 5–20, 2005.
[MS01] Anders Møller and Michael I. Schwartzbach. The pointer
assertion logicengine. In Conference on Programming Language Design
and Implemen-tation (PLDI), pages 221–231, 2001.
-
Shape Analysis with Structural Invariant Checkers 21
[MSRF04] Roman Manevich, Shmuel Sagiv, Ganesan Ramalingam, and
John Field.Partially disjunctive heap abstraction. In Static
Analysis Symposium(SAS), pages 265–279, 2004.
[NMRW02] George C. Necula, Scott McPeak, Shree Prakash Rahul,
and WestleyWeimer. CIL: Intermediate language and tools for
analysis and trans-formation of C programs. In Conference on
Compiler Construction (CC),pages 213–228, 2002.
[PJW06] Frances Perry, Limin Jia, and David Walker. Expressing
heap-shape con-tracts in linear logic. In Conference on Generative
Programming and Com-ponent Engineering (GPCE), pages 101–110,
2006.
[Pug90] William Pugh. Skip lists: A probabilistic alternative to
balanced trees.Commun. ACM, 33(6):668–676, 1990.
[Rey02] John C. Reynolds. Separation logic: A logic for shared
mutable datastructures. In Symposium on Logic in Computer Science
(LICS), pages55–74, 2002.
[SRW02] Shmuel Sagiv, Thomas W. Reps, and Reinhard Wilhelm.
Parametric shapeanalysis via 3-valued logic. ACM Trans. Program.
Lang. Syst., 24(3):217–298, 2002.
[WKL+ 06] Thomas Wies, Viktor Kuncak, Patrick Lam, Andreas
Podelski, and Mar-tin C. Rinard. Field constraint analysis. In
Conference on Verification,Model Checking, and Abstract
Interpretation (VMCAI), pages 157–173,2006.
[YR04] Eran Yahav and G. Ramalingam. Verifying safety properties
using sep-aration and heterogeneous abstractions. In Conference on
ProgrammingLanguage Design and Implementation (PLDI), pages 25–34,
2004.
-
22 Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula
A Approximation Test
In this section, we give the rules that define the approximation
test as describedin Sect. 4.2. Viewed from goal to premise, each
rule either matches and con-sumes edges (pointsto and checker) or
simplifies edges in order for matching toapply (assume , apply ,
and unfold) until all edges have been consumed (emp).Here, to make
explicit the updating of the global mapping Φ , we extend
theapproximation test judgment slightly as follows: M1 v M2[ΦI
][ΦO] where ΦIis the input mapping (as in Sect. 4.2) and ΦO is the
output mapping (i.e., theresulting mapping after matching
nodes).
M1 v M2[ΦI ][ΦO]
Edge Matching.
M1 v M2[Φ, r2 � r1][Φ′] α2 � α1 ∈ ΦM1 ∗ (α1@f 7→ r1) v M2 ∗
(α2@f 7→ r2)[Φ][Φ′]
pointsto
M1 v M2[Φ, β2 � β1][Φ′] α2 � α1 ∈ ΦM1 ∗ α1.c(β1) v M2 ∗
α2.c(β2)[Φ][Φ′]
checker
Partial Checkers.
M ′1 ∗ α′1.c(β1) v α2.c(β2)[Φ, β2 � β1][Φ′] α2 � α1, α′2 � α′1 ∈
ΦM1 v M2[Φ′][Φ′′] (β1 fresh)
M1 ∗M ′1(α1 α′1) v M2 ∗ α2.c(β2) ∗− α′2.c(β2)[Φ][Φ′′]assume
M ′1 v α′2.c(β2)[Φ, β2 � β1, α′2 � α′1][Φ′] α2 � α1 ∈ ΦM1 v
M2[Φ′][Φ′′] (α′2 fresh)
M1 ∗M ′1(α′1 ) ∗ α1.c(β1) ∗− α′1.c(β1) v M2 ∗
α2.c(β2)[Φ][Φ′′]apply
Unfolding.
M1 v M2 ∗ [α2, β2/π, ρ][~δ/ fv(Mi)]Mi[Φ][Φ′] α2 � α1 ∈ Φ (~δ
fresh)(π.c(ρ) := · · · ∨ 〈Mi ; Pi〉 ∨ · · · )
M1 v M2 ∗ α2.c(β2)[Φ][Φ′]unfold
Finish.
emp v emp[Φ][Φ]emp
We write M ∗ M ′(α α′) for splitting a graph into two
sub-graphs: M ′ ,which is the slice from α to α′ (all nodes and
edges reachable from α but notfrom α′ ), and M , which is the
remainder. Similarly, M ∗ M ′(α ) indicatesM ′ is the slice from α
(all nodes and edges reachable from α).
-
Shape Analysis with Structural Invariant Checkers 23
The unfold rule enables matching when M1 is an unfolded instance
of M2 .Algorithmically, we want to apply checker to match checker
edges before consid-ering unfolding. In the unfold rule, we write
fv(Mi) for the free variables of Mi .For the overall approximation
test on analysis states, we need to remember thePi from unfolding
to check the approximation relation on the pure constraints,which
is left implicit here.
B Widening
In this section, we give the rewriting rules for the upper bound
operation asdescribed in Sect. 4.2.
Ψ # (M1OM2) � M _ Ψ ′ # (M ′1OM ′2) � M ′
Edge Matching.
〈α1, α2〉 ∈ Ψ
Ψ # (M1 ∗ α1@f 7→ r1OM2 ∗ α2@f 7→ r2) � M_ Ψ, 〈r1, r2〉 # (M1OM2)
� M ∗ 〈α1, α2〉@f 7→ 〈r1, r2〉
m-pointsto
〈α1, α2〉 ∈ Ψ
Ψ # (M1 ∗ α1.c(β1)OM2 ∗ α2.c(β2)) � M_ Ψ, 〈β1, β2〉 # (M1OM2) � M
∗ 〈α1, α2〉.c(〈β1, β2〉)
m-checker
Folding.
〈α1, α2〉 ∈ Ψ M ′2 v α2.c(β2)[α2 � α2] (β2 fresh)Ψ # (M1 ∗
α1.c(β1)OM2 ∗M ′2(α2 )) � M
_ Ψ # (M1OM2) � M ∗ 〈α1, α2〉.c(〈β1, β2〉)w-checker
〈α1, α2〉, 〈α′1, α′2〉 ∈ ΨM ′2 v α2.c(β2)∗−α′2.c(β2)[α2 � α2, α′2
� α′2] (β2 fresh)
Ψ # (M1 ∗ α1.c(β1)∗−α′1.c(β1)OM2 ∗M ′2(α2 α′2)) � M_ Ψ # (M1OM2)
� M ∗ 〈α1, α2〉.c(〈β1, β2〉)∗−〈α′1, α′2〉.c(〈β1, β2〉)
w-partial
〈α1, α2〉, 〈α1, α′2〉 ∈ ΨM ′2 v α2.c(β2)∗−α′2.c(β2)[α2 � α2, α′2 �
α′2] (β1, β2 fresh)
Ψ # (M1OM2 ∗M ′2(α2 α′2)) � M_ Ψ # (M1OM2) � M ∗ 〈α1, α2〉.c(〈β1,
β2〉)∗−〈α1, α′2〉.c′(〈β1, β2〉)
w-aliases
IntroductionOverviewMemory AbstractionAnalysis AlgorithmAbstract
Transition and Checker UnfoldingHistory-Guided FoldingExtensions
and Limitations
Experimental EvaluationRelated WorkConclusionApproximation
TestWidening