Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl 08061, February 7, 2008
Jan 13, 2016
Heap Decompositionfor Concurrent Shape
Analysis
R. ManevichT. Lev-AmiM. SagivTel Aviv
University
G. Ramalingam
MSR India
J. Berdine
MSR Cambridge
Dagstuhl 08061, February 7, 2008
2
Thread modular analysisfor coarse-grained concurrency E.g., [Qadeer & Flanagan,
SPIN’03][Gotsman et al., PLDI’07] …
With each lock lk subheap h(lk) Partition heap
H = h(lk1) *…* h(lkn) local invariant I(lk)
inferred/specified When thread t
acquires lk it assumes I(lk) releases lk it ensures I(lk) Can analyze each thread “separately”
Avoid explicitly enumerating all thread interleavings
3
Thread modular analysisfor fine-grained concurrency?
CAS
CAS
CAS
CAS
CAS (Compare And Swap)
No locks means more interference between threads
No nice heap partitioning
Still idea of reasoning about threads separately appealing
4
Overview State space is too large for two reasons
Unbounded number of objects infinite Apply finitary abstractions to data structures (e.g.,
abstract away length of list) Exponential in the number of threads
Observation: Threads operate on part of state Correlations between different substates often
irrelevant to prove safety properties Our approach: develop abstraction for
substates Abstract away correlations between substates
of different threads Reduce exponential state space
5
Non-blocking stack [Treiber 1986]
[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }
[9] data_type pop(Stack *S){[10] do {[11] Node *t = S->Top;[12] if (t == NULL)[13] return EMPTY;[14] Node *s = t->n;[15] data_type r = s->d;[16] } while (!CAS(&S->Top,t,s));[17] return r;[18] }
#define EMPTY -1
typedef int data type;
typedef struct node t { data type d; struct node t *n;} Node;
typedef struct stack t { struct node t *Top;} Stack;
6
Example: successful push
[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }
Top
n
tn
xn
7
Example: successful push
[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }
Top=CAS succeeds
n
n
tn
x
8
Example: unsuccessful push
[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }
CAS fails
Top
n
tn
xn
n
9
Concrete states with storable threads
Top
n
x
nx t
st
t
n
n
prod1
cons1
prod2
pc=7
cons2
pc=6
pc=14
pc=16
t
thread object:name +program location
local variable
next field of list
10
Full state S1
Top
n
x
nx t
st
t
n
n
prod1
cons1
prod2
pc=7
cons2
pc=6
pc=14
pc=16
t
11
Top
n
x
n
t
n
prod1
pc=7
Top
n
nx
t
prod2
pc=6
Top
n
n
cons1
pc=14t
Top
n
n
t
s
n
cons2
pc=16
M1 M2 M3 M4
Decomposition(S1) = M1 M2 M3 M4
Decomposition(S1)
Note that S1Decomposition(S1)
A substate represents all full states that
contain it
Decomposition isstate-sensitive
(depends on values of pointers and heap
connectivity)
12
Full states S1 S2
S1 S2
Top
n
x
nx t
st
t
n
n
prod1
cons1
prod2
pc=7
cons2
pc=6
pc=14
pc=16
t
Top
n
x
nx t
st
t
n
n
prod2
cons2
prod1
pc=7
cons1
pc=6
pc=14
pc=16
t
13
Decomposition(S1 S2)improve explanation
Top
nx
n
t
n
prod1
pc=7
Top
n
nx
t
n
prod2
pc=6
Top
n
n
t
cons1
pc=14
Top
n
nt
s
n
pc=16
cons2
Top
n
nx
t
n
prod1
pc=6
Top
nx
n
t
n
prod2
pc=7
Top
n
nt
s
n
pc=16
cons1
Top
n
n
t
cons2
pc=14
M1
M2
M3
M4
K1
K2
K3
K4
(S1S2) Decomposition(S1S2)Cartesian abstraction ignores
correlations between substates
Decomposition(S1S2) = (M1K1) (M2K2) (M3K3) (M4K4)
State space exponentially more compact
14
Abstraction properties Substates in each subdomain
correspond to a single thread Abstract away correlations between
threads Exponential reduction of state space
Substates preserve information on part of heap (relevant to one thread)
Substates may overlap Useful for reasoning about programs with
fine-grained concurrency Better approximate interference between
threads
15
Main results New parametric abstraction for heaps
Heap decomposition + Cartesian abstraction Parametric in underlying abstraction +
decomposition Parametric sound transformers
Allows balancing efficiency and precision Implementation in HeDec
Heap Decomposition + Canonical Abstraction Used to prove interesting properties of heap-
manipulating programs with fine-grained concurrency Linearizability
Analysis scales linearly in number of threads
16
Sound transformers
{XHj1} j1
{XHj2} j2
{XHj3} j3
{Xj4} j4
{YHj1’} j1’
{YHj2’} j2’
{YHj3’} j3’
{YHj4’} j4’
#
17
Pointwise transformers
{XHj1} j1
{XHj2} j2
{XHj3} j3
{XHj4} j4
{YHj1’} j1’
#
{YHj2’} j2’
#
{YHj3’} j3’
#
{YHj4’} j4’
#
often too imprecise
efficient
18
Imprecision example[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;
[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }
Top
n
nx
t
n
prod2
pc=6
M2 # : schedules prod1 and executes x->n=t
But where do x and t of prod1
point to?
19
Imprecision example[1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;
[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }
Top
n
x
nx t
st
t
n
n
prod2
cons1
prod1
pc=7
cons2
pc=6
pc=14
pc=16
t #Top
n
x
n
t
n
prod2
pc=7
false alarm:possible cyclic
list
20
Full composition transformers
{XHj1} j1
{XHj2} j2
{XHj3} j3
{XHj4} j4{XHj1}{XHj1}{XHj1}{X
Hj1} #
#({XHj1}{XHj2}{XHj3}{XHj4})
{YHj1’} j1’
{YHj2’} j2’
{YHj3’} j3’
{YHj4’} j4’
exponential space blow-up
precise
21
Partial composition
{XHj1} j1
{XHj2} j2
{XHj3} j3
{XHj4} j4
{XHj1}{XHj2}
{XHj1}{XHj3}
{XHj1}{XHj4}
22
Partial composition
{XHj1}{XHj2}
{XHj1}{XHj3}
{XHj1}{XHj4}
{YHj1’} j1’
{YHj2’} j2’
{YHj3’} j3’
{YHj4’} j4’
#
#({XHj1}{XHj2})
#
#({XHj1}{XHj3})
#
#({XHj1}{XHj4})
efficient and precise
23
Partial composition example
Top
nx
n
t
n
prod1
pc=7
Top
n
nx
t
n
prod2
pc=6
Top
n
nx
t
n
prod1
pc=6
Top
nx
n
t
n
prod2
pc=7
M1
M2
K1
K2
{XHj1}{XHj2}
24
Partial composition example
{XHj1} j1
{XHj2} j2
{XHj1}{XHj2}
Top
n
x
nx
t
t
n
prod2
prod1
pc=7
pc=7
Top
n
x
nx
t
t
n
prod2
prod1
pc=7
pc=6n
K2k1 K2M1
pc=7
false alarm avoided
26
Experimental results List-based fine-grained algorithms
Non-blocking stack [Treiber 1986] Non-blocking queue [Doherty and Groves
FORTE’04]
Two-lock queue [Michael and Scott PODC’96] Benign data races
Verified absence of nullderef + mem. Leaks Verified Linearizability
Analysis built on top of existing full heap analysis of [Amit et al. CAV’07]
Scaled analysis from 2/3 threads to 20 threads Extended to unbounded threads (different work)
27
0
50000
100000
150000
200000
250000
0 5 10 15 20
number of threads
nu
mb
er
of
stat
es
Decomp
Full
0
1000
2000
3000
4000
0 10 20
number of threads
tim
e (s
ec.)
Experimental results Exponential time/space reduction
Non-blocking stack + linearizability
28
Related work Disjoint regions decomposition [TACAS’07]
Fixed decomposition scheme Most precise transformer is FNP-complete
Partial join [Manevich et al. SAS’04]
Orthogonal to decomposition In HeDec we combine decomposition + partial join
[Yang et al.] Handling concurrency for an unbounded
number of threads Thread-modular analysis [Gotsman et al. PLDI’07] Rely-guarantee [Vafeadis et al. CAV’07] Thread quantification (submitted)
29
More related work Local transformers
Works by Reynolds, O’Hearn, Berdine, Yang, Gotsman, Calcagno
Heap analysis by separation[Yahav & Ramalingam PLDI’04] [Hackett & Rugina POPL’05] Decompose verification problem itself and
conservatively approximate contexts Heap decomposition for interprocedural
analysis [Rinetzky et al. POPL’05] [Rinetzky et al. SAS’05] [Gotsman et al. SAS’06] [Gotsman et al. PLDI’07] Decompose/compose at procedure boundaries
Predicate/variable clustering [Clark et al. CAV’00] Statically-determined decomposition
30
Conclusion Parametric framework for shape
analysis Scaling analyses of program with fine-
grained concurrency Generalizes thread-modular analysis Key idea: state decomposition Also useful for sequential programs
Used prove intricate properties like linearizability
HeDec tool http://www.cs.tau.ac.il/~tvla#HEDEC
31
Future/ongoing work Extended analysis for an unbounded
number of threads via thread quantification Orthogonal technique Both techniques compose very well
Can we automatically infer good decompositions?
Can we automatically tune transformers?
Can we ruse ideas to non-shape analyses?
32
Invited questions How do you choose a decomposition? How do you choose transformers? How does it compare to separation
logic? What is a general principle and what
is specific to shape analysis? Caveats / limitations?
33
How do you choose a decomposition? In general this an open problem
Perhaps ctrex. refinement can help Depends on property you want to prove Aim at causes of combinatorial explosion
Threads Iterators
For linearizability we used For each thread t
Thread node, objects referenced by local variables, objects referenced by global variables
Objects referenced by global variables and objects correlated with seq. execution
Locks component: for each lock thread that acquires it
34
How do you choose transformers? In general challenging problem
Have to balance efficiency and precision Have some heuristics
Core subdomains
35
How does it compare to separation logic? Relevant separating conjunction *r
Like * but without the disjointness requirement Do you have an analog of the frame rule?
For disjoint regions decomposition [TACAS’07] In general no, but instead we can use
transformers of different level of precision
#(I1 I2) = #precise(I1) #less-precise(I2)
where #less-precise is cheap to compute Perhaps can find conditions for which
#(I1 I2) = #precise(I1) I2 Relativized formulae
36
What is a general principle and what is specific to shape analysis? Decomposing abstract domains is
general Substate abstraction + Cartesian product
Parametric transformers for Cartesian abstractions is general
Chopping down heaps by heterogeneous abstractions is shape-analysis specific
37
Caveats / limitations? Decomposition + transformers defined by
user Not specialized for program/property
Too much overlap between substates can lead to more expensive analyses
Too fine decomposition requires lots of composition
Partial composition is a bottle neck We have the theory for finer grained
compositions + incremental transformers but no implementation
Instantiated framework for just one abstraction (Canonical Abstraction) Can this be useful for separation logic-based
analyzers?