Summarizing Procedures in Concurrent Programs Shaz Qadeer Sriram K. Rajamani Jakob Rehof Microsoft Research.

Summarizing Procedures in Concurrent Programs

Shaz Qadeer

Sriram K. Rajamani

Jakob Rehof

Microsoft Research

Motivation

• How do you scale program analyses for sequential programs?– Summarize at procedure boundaries

• Sharir-Pnueli ‘81, Reps-Horwitz-Sagiv ‘95

– Used in compiler dataflow analyses– Used in error detection tools

• SLAM (Ball-Rajamani ‘00)• ESP (Das-Lerner-Seigle ‘02)

Summarization is efficient!

• Boolean program with:– g globals– n procedures, each with at most m locals– |E| = size of the CFG of the program

• Complexity : O( |E| 2 O(g+m)

)

• Complexity linear in the number of procedures!

Summarization gives termination!

• Possibly recursive boolean programs

• Infinite state systems

• Checking terminates with summarization!

Question

Can summarization help analysis of concurrent programs?

Difficulty

Assertion checking for multithreaded programs is undecidable– Even if all variables are boolean– Further, even if only two threads!– Reduce emptiness of intersection of two CFLs

to this problem(Ramalingam 00)

Our work

• New model checking algorithm using summarization – useful for concurrent programs

• Summaries provide re-use and efficiency for analyzing concurrent programs

• Enable termination of analysis in a large class of concurrent programs – includes programs with recursion, shared variables

and concurrency

Difficulties in summarizing concurrent programs

• What is a summary?– For sequential programs

• Summary of procedure P = Set of all pre-post state pairs (s,s’) obtained by invoking P

– This doesn’t work for concurrent programs• Does not model concurrent updates by other

threads

Insight

• In a well synchronized concurrent program– A thread’s computation can be viewed as a

sequence of transactions– While analyzing a transaction, interleavings

with other threads need not be considered– Key idea: Summarize transactions!

How do you identify transactions?

Lipton’s theory of reduction

B: both right + left movers– variable access holding lock

N: non-movers – access unprotected variable

Four atomicities

•R: right movers– lock acquire

S0 S1 S2

acq(this) x

S0 T1 S2

x acq(this)

S7T6S5

rel(this) z

S7S6S5

rel(this)z

L: left movers– lock release

S2 S3 S4

r=bal y

S2 T3 S4

r=baly

S2 T3 S4

r=bal x

S2 S3 S4

r=balx

Transaction

Any sequence of actions whose atomicities are in R*(N+)L* is a transaction

S0 S1 S2

R R

S5 S6

LS3 S4

R N LS7

R

Precommit

Transaction

Postcommit

Transactions and summaries

Corollary of Lipton’s theorem:

No need to schedule other threads in the middle of a transaction

If a procedure body occurs in a transaction, we can summarize it!

Resource allocator (1) bool available[N]; mutex m;

int getResource() { int i = 0; L0: acquire(m); L1: while (i < N) { L2: if (available[i]) { L3: available[i] = false; L4: release(m); L5: return i; } L6: i++; } L7: release(m); L8: return i; }

Choose N = 2

Summaries:<pc, i, m, (a[0],a[1])> <pc’, i’, m’, (a[0]’,a[1]’)>

<L0, 0, 0, (0, 0)> <L8, 2, 0, (0,0)><L0, 0, 0, (0, 1)> <L5, 1, 0, (0,0)><L0, 0, 0, (1, 0)> <L5, 0, 0, (0,0)><L0, 0, 0, (1, 1)> <L5, 0, 0, (0,1)>

What if transaction boundaries and procedure boundaries do not coincide?

Two level model checking algorithm

Two level algorithm

• First level maintains stack

• Second level maintains stack-less summaries

• Summaries can start and end anywhere in a procedure

Resource allocator (2) bool available[N]; mutex m[N];

int getResource() { int i = 0; L0: while (i < N) { L1: acquire(m[i]); L2: if (available[i]) { L3: available[i] = false; L4: release(m[i]); L5: return i; } else { L6: release(m[i]); } L7: i++; } L8: return i; }

Choose N = 2

Summaries:<pc,i,(m[0],m[1]),(a[0],a[1]> <pc’,i’,(m[0]’,m[1]’),(a[0]’,a[1]’)>

<L0, 0, (0,0), (0,0)> <L1, 1, (0,0), (0,0)>

<L0, 0, (0,0), (0,1)> <L1, 1, (0,0), (0,1)>

<L0, 0, (0,0), (1,0)> <L5, 0, (0,0), (0,0)>

<L0, 0, (0,0), (1,1)> <L5, 0, (0,0), (0,1)>

<L1, 1, (0,0), (0,0)> <L8, 2, (0,0), (0,0)>

<L1, 1, (0,0), (0,1)> <L5, 1, (0,0), (0,0)>

<L1, 1, (0,0), (1,0)> <L8, 2, (0,0), (1,0)>

<L1, 1, (0,0), (1,1)> <L5, 1, (0,0), (1,0)>

Two level model checking algorithm: in pictures

Lets first review the sequential CFL algorithm…

bar()

main( ) bar( )

bar()

main( ) bar( )

Two level model checking algorithm: in pictures

bar()

main( ) bar( )

bar()

main( ) bar( )

main

T1

main

T2

End of transaction

bar

Three kinds of summaries:

1. MAX

2. MAXCALL

3. MAXRETURNMAXCALL

MAXRETURN

MAXRETURN

MAX

Concurrency + recursion

void foo(int r) {L0: if (r == 0) {L1: foo(r); } else {L2: acquire(m);L3: g++;L4: release(m); }L5: return;}

Summaries for foo:<pc,r,m,g> <pc’,r’,m’,g’><L0,1,0,0> <L5,1,0,1><L0,1,0,1> <L5,1,0,2>

Summaries for main:<pc,q,m,g> <pc’,q’,m’,g’><M0,1,0,0> <M1,1,0,1><M0,1,0,1> <M1,1,0,2>

<M1,1,0,1> <M4,1,0,1><M1,1,0,2> <M4,1,0,2>

void main() {

int q = choose({0,1});

M0: foo(q);

M1: acquire(m)

M2: assert(g >= 1);

M3: release(m);

M4: return;

}

P = main() || main()

int g = 0;

mutex m;

What if the same procedure is called from different phases of a transaction?

Instrument the transaction phase into the state of the program

Transactional context

void foo1() {L0: acquire(n);L1: gn++;L2: bar();L3: release(n);L4: return;}

void foo2() {

M0: acquire(n);

M1: gn++;

M2: release(n);

M3: bar();

M4: return;

}

P = foo1() || foo2()

int gm = 0, gn = 0;

mutex m, n;

void bar() {

N0: acquire(m);

N1: gm++;

N2: release(m);

}

Recap of technical problems

• How do you identify transactions– Using the theory of reduction (Lipton ’75)

• What if transaction boundaries do not coincide with procedure boundaries?– Two level model checking algorithm– First level maintains stack– Second level maintains stack-less summaries

• Procedure can be called from different phases of a transaction– Instrument the transaction phase into the state of

program

Termination

• A function is transactional if no transaction ends in the “middle” of its exectution (includes all transitive callees)

• Theorem: For concurrent boolean programs, if all recursive functions are transactional, then the algorithm terminates.

Sequential case

• If we feed a sequential program to our algorithm it functions exactly like the Reps-Sagiv-Horwitz-POPL95 algorithm

• Our algorithm generalizes the RHS algorithm to concurrent programs!

Related work

• Summarizing sequential programs– Sharir-Pnueli ‘81, Reps-Horwitz-Sagiv ‘95,

Ball-Rajamani ‘00

• Concurrency+Procedures– Bouajjani-Esparza-Touili ‘02– Esparza-Podeslki ‘00

• Reduction– Lipton ‘75– Qadeer-Flannagan ‘03

(joint work with Tony Andrews)

Sequential C program

Finite state machines

Source code

FSM

abstraction

modelchecker

C data structures, pointers,procedure calls, parameter passing,scoping,control flow

Automatic abstraction

Boolean program

Data flow analysis implemented using BDDs

SLAM

Push down model

Source code

abstraction

modelchecker

Zing

Rich control constructs: thread creation, function call, exception, objects, dynamic allocation

Model checking is undecidable!

Device driver (taking concurrency into account), web services code

What is Zing?

• Zing is a framework for software model-checking– Language, compiler, runtime, tools

• Supports key software concepts– Enables easier extraction of models from code

• Supports research in exploring large state spaces

• Operates seamlessly with the VS.Net design environment

Current status• Summarization:

– Theory: to appear in POPL 04– Implementation: in progress

• Zing: – Compiler, model checker and conformance checker

operational– State-delta and transaction-based reduction

implemented– Plans:

• Symbolic reasoning• Automatic abstraction

Bluetooth demo

BPEL4WS checking

Zing Model

Zing State

Explorer

BuyerBuyer SellerSeller

AuctionAuctionHouseHouse

RegRegServiceService

BPEL ProcessesBPEL Processes

Summarizing Procedures in Concurrent Programs Shaz Qadeer Sriram K. Rajamani Jakob Rehof Microsoft Research.

Documents

procedure slide

concurrency slide

bar main bar slide

state of program slide

concurrent programs

summaries summaries

sequential cfl algorithm

transaction boundaries