Modular Data Structure Verification Viktor Kuncak Supervisor: Martin Rinard Committee members: Arvind, Daniel Jackson
Modular Data Structure Verification
Viktor Kuncak
Supervisor: Martin RinardCommittee members: Arvind, Daniel Jackson
Program analysis and verification
Discover/verify properties of software systems
Practical relevance: programmer productivity– performance: compiler optimizations– reliability: discovering and preventing errors– maintainability: understanding code
Ultimate impact:– make it easier to produce working software– create more sophisticated systems
Spectrum of analysis techniques
Broad research area, many dimensions– bug finding versus bug prevention– control-intensive versus data-intensive systems– generic versus application-specific properties
Original ideal was automated full verification
Reality: verify partial correctness properties– success story: type systems– active area: temporal properties (typestate)
trend: towards more complex properties
My research
verifying properties of data structures
Data structure consistency properties
next
prev
next next
prev prev
root
acyclicity of next
x.next.prev == x
rightleftgraph is a treeshape not given by types,
but by structural properties;
may change over time
unbounded number of objects, dynamically allocated
rightleft
class Node { Node f1, f2;}
elements are sorted
Data structure consistency properties
next nextfirst
3
size value of size field is the number of stored objects
table
key value
node is stored in the bucket given by the hash of node’s key
hashCode
dynamically allocated arrays
numerical quantities
Examples of internal data structure consistency properties
instances do not share array
External data structure consistency
If a book is loaned to a person, then – book is in the catalog– person is registered with library
Book
Person
loanedTo
[0..4] A person can loan at most 4 books at a time
Can loan a book to at most one person at a time
[0..1]
- correlate different data structures - global- meaningful to users of the system- capture design constraints (object models)- inconsistency can lead to policy violations
relies on internal consistency to be even meaningful
Simple Library System
Both static and dynamic properties
Invariant properties: talk about single state– data structure invariants hold
State change properties:correlate multiple states– operations have the expected effect
• add operation inserts element into a set• removal removes all elements with a given key• operations have no unintended side effects
– expected sequencing of operations• can remove only after adding elems to data structure
Goal
Prove data structure properties – for all program executions (sound)– both internal and external consistency– both invariant and state change properties– both implementation and use of data structures
• also absence of run-time errors
with high level of automation
Proving data structure properties
Java source code of a program
automatedverifier
program satisfiesthe properties
error in program(or property) !
(x,y) 2 r ! x 2 A Æ y 2 B
BAr
data structureproperties(Isabelle/HOL)
. . . proc remove(x : Node) { Node p=x.prev; n=x.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; } . . .
Challenges in verifying consistency
complexheterogenous data structures,in the context of application;developer-defined properties
precision
no single approachwill work
communicationwith developers
scalability
Contributions: Jahob verification system
Field constraintanalysis
front end,verification condition generator
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
modular verification
methodology
method to deploymultiple
reasoning techniques
Translation tofirst-order logic
three complementary reasoning techniques
splitting proof obligations,dispatching each result
1
2
3 4 5
program property
8 b 8 p. (9 i < M. 9 n. (loanedTable[i], n) 2 next* Æ n.key = b Æ n.value = p) ! (bookTreeRoot, b) 2 (left [ right)* Æ (personListRoot, p) 2 next*
Isolate data structure complexity into separate Java classes
Then verify:
1. properties hold for simplified system w/ sets and relations
2. classes correctly implement sets and relations
If a person has borrowed a book, then – book is in the catalog– person is registered with library
TreeCollectionimplementation
Mapimplementation
ListCollectionimplementation
Library system example
Book
Person
loanedTo
8 b 8 p. (b, p) 2 loanedTo ! b 2 Book Æ p 2 Person
nib p
p
b
b p
1. 2.
class Map { specvar mcontent ::(obj*obj)set;
public void remove(Object key) ensures mcontent = old mcontent – {(k,v).k=key}
1. Verifying high-level properties
Book
Person
8 b8 p. (b, p) 2 loanedTo ! b 2 Book Æ p 2 Person
b
ListCollection person;TreeCollection book;Map loanedTo;
public void decomissionBook(Book b) { books.remove(b); loanedTo.remove(b); }
class TreeCollection { specvar tcontent :: obj set;
public void remove(Object x) ensures tcontent = old tcontent – {x}
loanedTo
class LibrarySystem {
2. Verifying Map implementationclass Map { // Implemented as a hash table private AssocList[] table; public specvar mcontent ::“(obj*obj) set”;
invariant contentDef: mcontent = {(k,v). 9i· M.(k,v)2 table[i].acontent}
invariant correctBucket: 8 k v. 8i·M.(k,v)2 table[i].acontent ! hash k M = i
public void remove(Object key) requires key null ensures mcontent = old mcontent – {(k,v).k=key} { int hash = compute_hash(key); table[hash] = AssocList.removeAll(key, table[hash]); mcontent := old mcontent – {(k,v).k=key} } ...
class AssocList { // Functional linked list private Object key, data; private AssocList next; specvar acontent ::(obj*obj) set;
invariant contentDef2: this null ! acontent={(key,data)} [ next.acontent
static AssocList removeAll(Object key, AssocList list) requires key null modifies content ensures result.acontent = list.acontent – {(k,v).k=key} { if (list==null) return null; if (key==list.key) return removeAll(key,list.next); else return cons(list.key,list.data, removeAll(key,list.next)); }
2. Verifying association list implementation
Modular verification summary
TreeCollectionimplementation
Mapimplementation
ListCollectionimplementation
Association listimplementationLibrary example
Key benefits of modular verification– each individual verification task simpler– verification results for collections and maps are reusable (repositories of verified data structures)
Jahob verification system
Field constraintanalysis
front end,verification condition generator
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
splitting proof obligations,dispatching each result
modular verification
methodology
method to deploymultiple
reasoning techniques
Translation tofirst-order logic
three complementary reasoning techniques
front end,verification condition generator
Reducing verification to validity of formulas
annotated code
verification condition
formula validity checker
valid
Verification condition (VC) – a logical formula saying: “If precondition holds at entry, then postcondition holds in the final state, invariants are preserved, and there are no run-time errors”
invalidprogram satisfies properties error in program
or property !
Formula validity checking in Jahob
Field constraintanalysis
front end,verification condition generator
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
splitting proof obligations,dispatching each result,approximating HOL formulas
modular verification
methodology
method to deploymultiple
reasoning techniques
Translation tofirst-order logic
formula validity checker
What do verification conditions look like?
(8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æ b1 2 books ! (books1 = books - {b1} ! b1 null Æ (loanedTo1 = loanedTo - {(b,p).b=b1} ! (8 b. b 2 books1 ! b null ) Æ (8 b p. (b,p) 2 loanedTo1 ! b 2 books1 Æ p 2 persons )))
invariant 8 b. b 2 books ! b nullinvariant 8 b p.(b,p)2 loanedTo ! b2 books Æ p2 persons public void decomissionBook(Book b1)requires b1 2 books { books.remove(b1); loanedTo.remove(b1); }
annotated code
verification condition - an Isabelle formula
Interactively proving VCs in Isabelle
lemma verificationCondition:“(8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æ b1 2 books ! (books1 = books - {b1} ! b1 null Æ (loanedTo1 = loanedTo - {(b,p).b=b1} ! (8 b. b 2 books1 ! b null ) Æ (8 b p. (b,p) 2 loanedTo1 ! b 2 books1 Æ p 2 persons )))”
apply (rule_tac impI)
apply (rule_tac conjI)...done
apply (rule_tac impI)
Isabelle checks manually supplied proof
Automation limited for larger formulas
Interactive = user supplies proof script
Can we check VCs with more automation?(8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æ b1 2 books ! (books1 = books - {b1} ! b1 null Æ (loanedTo1 = loanedTo - {(b,p).b=b1} ! (8 b. b 2 books1 ! b null ) Æ (8 b p. (b,p) 2 loanedTo1 ! b 2 books1 Æ p 2 persons )))
1
2
3 4
verification condition - an Isabelle formula
splittinginto conjuncts
(8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æb1 2 books Æbooks1 = books - {b1} ÆloanedTo1 = loanedTo - {(b,p).b=b1} Æ(b0,p0) 2 loanedTo1 ! b02 books1
Sequent3: A1 Æ...Æ An! G
Reasoning Technique1 multiple reasoning
techniques
S1
S4
S2
RT2
RT4
RT3
3
valid
valid
valid
valid
Constructing a reasoning technique
How can a specialized technique accept Isabelle formulas?
A1 Æ...Æ An! G
sequent - an Isabelle formulabelongs to an undecidable class
specialized algorithmexpects as input e.g.
formula in a decidable class(or otherwise “easier” class)
soundly approximates formulawith a simpler formula
A1’Æ A3! G’valid
valid
Jahob reasoning technique
formula approximation
Range of sound approximations
Worst: a(F) = False (useless)
Best: a(F) = if “F is valid” then True else False (impossible)
General idea of our approximations:a(F) = a1(simplify(F))ap(F1 Æ F2) = ap(F1) Æ ap(F2)ap(F1 Ç F2) = ap(F1) Ç ap(F2)ap(: F) = : a:p(F) ap(goodF) = translation of goodFa1(badF) = Falsea0(badF) = True
Jahob verification system
front end,verification condition generator
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
modular verification
methodology
method to deploymultiple
reasoning techniques
three complementary reasoning techniques
splitting proof obligations,dispatching each resultIsabelle
Field constraintanalysis
Boolean Algebra with Presburger Arithmetic
(BAPA)
Translation tofirst-order logic
first-ordertheorem prover
MONA decision procedure
Presburger Arithmeticdecision procedure
w/ Charles Bouillaguet w/ Thomas Wies
Translation to first-order logic
Motivation: FOL provers effective, fully automated– decades of research in resolution, paramodulation– solved open problems (e.g. axiomatization of BAs)
Approach: approximate HOL by FOL– substitute, beta-reduce definitions– sets and relations become predicates– flattening, function updates– eliminate tuples– linear arithmetic axioms– approximate otherwise: avoid full encoding
(using combinators S, K, or encoding set theory)
Encoding types
Translated formulas have two types: obj,int
Input to resolution-based provers is untyped!
Standard solution: types as unary predicates– makes formulas larger, provers much slower
Faster solution: omit them!– not sound in general
Theorem: Omitting types is sound if– sorts are disjoint, and – sorts have equal cardinality
Orders of magnitude speedup
Results obtained using first-order provers
Instantiable set and relation implementations:– Hash table (120 sec)– Association list (12 sec)– Functional sorted binary search tree (178 sec)– Imperative list (18 sec)
Library example (20 sec)
Hash table insertionpublic void add(Object key, Object value) ... { int hash = compute_hash(key); table[hash] = AssocList.cons(key,value, table[hash]); mcontent := (old mcontent) [ {(key,value)} if (size > (4 *table.length)/5) rehash(table.length + table.length);} public void rehash(int m) ... ensures “mcontent = old mcontent”{ AssocList[] t = table; init(m); rehash_aux(0,t);}private void rehash_aux(int i, AssocList[] t) ... { addAll(t[i]); if (j < t.length) rehash_aux(j,t);}public addAll(AssocList[] pairs) ... { AssocList lst = pairs; while inv “...” (!AssocList.is_nil(lst)) { Pair p = AssocList.getOne(lst); lst = AssocList.remove(p.key, p.value, lst); add(p.key, p.value); }}
Verifying imperative lists
private Node first;private ghost specvar con :: obj set;public specvar lcontent :: obj set;vardefs lcontent = first.con; invariant this null ! con = {data} [ next.con & : data2 next.con;
public void remove(Object x)modifies lcontentensures lcontent = old lcontent – {x}
1
firstnext next next
2 3 4
{1,2,3,4}
x=3
con con con con
{2,3,4} {3,4} {4}
Loop searching for 3 must also remove 3 from preceding con fields
We really want is something that can express reachability
During search, invariant defining con temporarily violated
Jahob verification system
Field constraintanalysis
front end,verification condition generator
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
modular verification
methodology
method to deploymultiple
reasoning techniques
Translation tofirst-order logic
three complementary reasoning techniques
first-ordertheorem prover
MONA decision procedure
Presburger Arithmeticdecision procedure
splitting proof obligations,dispatching each result
Imperative list using reachabilityprivate static Node first;public static specvar content :: obj setvardefs content=={x.x null Æ (first,x) 2 {(a,b).b=next a}* }invariant tree [next]invariant 8x y. prev x = y ! next y = x (almost)
public void remove(Object x)requires n 2 contentmodifies contentensures content = old content – {x}{ if (n==first) root = root.next else n.prev.next = n.next; if (n.next != null) n.next.prev = n.prev; n.next = null; n.prev = null;}
content is dependent variable – no need to update it in removereachability expressed directly – not using induction
Proving formulas with reachability
Reachability properties in trees are decidable– Monadic Second-Order Logic over Trees– existing MONA decision procedure
• constructs a tree automaton for each formula• checks emptiness of the language of automaton
rightleft Can analyze list, tree implementations
But not doubly-linked lists or trees with parent pointers
Using simple MONA approximation:
Field constraint analysis
Enables reasoning about non-tree fields
Can handle broader class of data structures– doubly-linked lists, trees with parent pointers– skip lists
nextnext next
prev
next nexttreebackbone
constrainedfields
prev prev prev prev
Constrained fields satisfy constraint invariant: 8 x y. prev y = x ! next x = y
Elimination of constrained fields
MONAfield
constraintanalysis
VMCAI'06
treebackbone
constrainedfields
VC1(next,prev)VC2(next)
valid valid soundness
invalid invalidcompleteness
(for useful class including preservation of field constraints)substitute (prev a = b) with (next b = a)
nextnext next
prev
next next
prev prev prev prev
Constrained fields satisfy constraint invariant: 8 x y. prev y = x ! next x = y
Field constraints: a comparison
nextnext next
nextSub
next next
nextSub
treebackbone
constrainedfields
Constrained fields satisfy constraint invariant: 8 x y. nextSub x = y next+ x y
Previous approaches– constraining formula must be deterministic
We allow arbitrary constraint formulas– fields need not be uniquely given by backbone
Field constraint analysis results
Results within Jahob– lists– trees with parent pointer (insertion)– two-level skip list
Proved sound and complete*
High automation level– no need to for specification variable updates
Symbolic shape analysis (Thomas Wies)– infers loop invariants
Jahob verification system
Field constraintanalysis
front end,verification condition generator
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
modular verification
methodology
method to deploymultiple
reasoning techniques
Translation tofirst-order logic
three complementary reasoning techniques
first-ordertheorem prover
MONA decision procedure
Presburger Arithmeticdecision procedure
splitting proof obligations,dispatching each result
BAPA: Sets with cardinality bounds
Imposing constraints on abstract content
card(content) = size
2 card(circulatedBooks) · card(books)
next nextfirst
3
size size field is consistent withthe number of stored objects
Boolean Algebra with Presburger Arithmetic
Not widely known, but natural extension of BAs
I gave first complexity bound (CADE'05, JAR)– quantifier elimination algorithm (as in LICS’03)
S ::= V | S1 [ S2 | S1 Å S2 | S1 n S2
T ::= k | C | T1 + T2 | T1 – T2 | C¢T | card(S)
A ::= S1 = S2 | S1 µ S2 | T1 = T2 | T1 < T2
F ::= A | F1 Æ F2 | F1 Ç F2 | :F | 9S.F | 9k.F
From BAPA to PA
If A,B are disjoint, then |A [ B| = |A| + |B|Make them disjoint: Venn diagram
Reduce set vars to integer varsFor quantifiers, use quantifier eliminationPreserves alternations complexity same as for PA
2 3
6
1
4 |xc Å y Å zc|
x y
z
58
Quantifier-free BAPA
Previous technique gives NEXPTIME
We show it can be done in PSPACE:– analyze resulting integer linear equations
• exponentially many variables• polynomially many equations
small model property: solutions singly exponential
– guess sizes of sets– use alternating PTIME algorithm to check them
Real-valued relaxation is NP-complete|x| + |y| = |x\y| + |x\y| + 1- satisfiable in relaxation
Summary of BAPA results
Application within Jahob– verified updates to size field– library example: at most ½ books in circulation
Observations– clarified that problem is not undecidable (!)– first formalization of algorithm– showed complexity identical to PA– QFBAPA bound from NEXPTIME to PSPACE– QFBAPA fragments in P (with Bruno Marnette)– real-value version of QFBAPA is NP-complete
Jahob verification system
Field constraintanalysis
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
parsing, type checking,intermediate forms,variable dependencies
modular verification
methodology
method to deploymultiple
reasoning techniques
Translation tofirst-order logic
three complementary reasoning techniques
first-ordertheorem prover
MONA decision procedure
Presburger Arithmeticdecision procedure
splitting proof obligations,dispatching each result
front end,verification condition generator
there is more toJahob verification system
Field constraintanalysis
Boolean Algebra with Presburger Arithmetic
(BAPA)
decision proceduredispatcher
Translation tofirst-order logic
first-ordertheorem prover
MONA decision procedure
Presburger Arithmeticdecision procedure
symbolic shape analysis
syntactic loop invariant inference
Karen Zee
Thomas Wies
Isabelle
Coq
CVC Lite
Omega
w/ Thomas Wiesw/ Charles Bouillaguet
Huu Hai Nguyen
Charles
Karen, Thomas
front end,verification condition generator
Synergy of reasoning techniques
Mapimplementation
ListCollectionimplementation
Association listimplementationLibrary example
Translation tofirst-order logic
BAPA
Field constraintanalysis
How Jahob addresses challenges
complexheterogenous data structures,in the context of application;developer-defined properties
precision
no single approachwill work
communicationwith developers
scalability
modular verification
multiple reasoning techniques
Isabelle as specification language
reduce verification to formulas in logic
Verified data structures
Lists implementing sets and relations
Trees implementing sets and relations
List with a cursor (simplified iterator)
Hash table
Two-level skip list
Insertion sort
Library benchmark
In progress: small game; part of file system
Future workCase studies
Methodology for encapsulation
Inference of specifications, specialized analyses
New specification annotations and their power
Finer-grained combination techniques
Executing and under-approximating formulas– counterexamples for formulas (FSE’05)– testing, run-time checking of specifications– efficient execution of declarative specifications
design appropriate specification language
Related workProgram verification systems
– King ’70, Deutsch’73, Suzuki’73, Nelson’81, Guttag, Horning’93– Good, Akers, Smith ’86: Gypsy– Jones’86: VDM– Abrial, Lee, Neilson, Scharbach, Soerensen’91: B method– Owre, Shankar, Rushby, Stringer-Calvert: PVS– Ahrendt, Baar, Beckert, Giese, Habermalz, Haehnle, Menzel,
Schmitt’00: KeY– Foulger, King’01: SPARK Ada– Flanagan, Leino, Lilibridge, Nelson, Saxe, Stata‘02: ESC/Java– Marche, Paulin-Mohring, Urbain’03: Krakatoa– Breunesse, Poll’05: model fields in JML– Barnett, DeLine, Jacobs, Fähndrich, Leino, Schulte, Venter’05: Spec#– Leino, Mueller’06: model fields in Spec#
Related workShape analysis
– Larus, Hilfinger’88: detecting conflicts in memory accesses– Hendren, Nicolau ’90: parallelization, connection analysis– Chase, Wegman, Zadeck’90: allocation-site model– Klarlund, Schwartzbach’93: graph types– Deutsch ’94: symbolic bounds on paths– Fradet, Metayer ’97: graph-grammars– Sagiv, Reps, Wilhelm ’99: 3-valued framework– Lev-Ami, Sagiv ’00: TVLA implementation– Moeller, Schwartzbach ’01: PALE based on MONA– Yorsh, Reps, Sagiv ’04: assume/guarantee reasoning for 3VL– McPeak, Necula ’05: local pointer properties– Rugina, Hacket’05: region-based– Lee, Yang, Yi’05: combining three-valued and grammar-based– separation logic
Related workType systems
– Freeman, Pfenning ’91: refinement types– Xi, Pfenning ’99: dependent ML, Xi: ATS– Nguyen, David, Qin, Chin’06: size, shape, bag properties
Bug finding– Jackson, Vaziri ’00;
Dennis, Chang, Jackson’06: finding errors using constraint solving– Xie, Aiken ’05: Saturn – low-level errors– Evans ’94: LCLint– Boyapati, Khurshid, Marinov ’02: imperative specifications– Sen, Marinov, Agha: symbolic execution and random testing
Related workDecision procedures and theorem provers
– Barrett, Berezin’04: CVC Lite– Detlef, Nelson, Saxe’03: Simplify– Ball, Lahiri, Musuvathi ’05: Zap– Thatcher, Wright’68: MSOL over finite trees– Klarlund, Moeller, Schwartzbach’00: MONA– Yorsh, Rabinovich, Sagiv, Meyer, Bouajjani’06: reachability logic– BAPA: Feferman,Vaught’59; Zarba’04,’05– Voronkov’95: Vampire, Weidenbach’01: Spass, Schulz’02: E– Gordon’85: HOL, Pfenning’91: LF, Coquand, Huet’85: Coq– Constable, Allen, Bromley, Cleaveland, Cremer, Harper, Howe, Knoblock,
Mendler, Panangaden, Sasaki, Smith’86: NuPRL– Kaufmann, Manolios, Moore ’00: ACL2– Nipkow, Paulson, Wenzel’02: Isabelle– Translations: Meng, Paulson’06
What I didDesigned, built (with colleagues) Jahob:
a new data structure verification system
Modular verification w/ specification variables
Addressed a key technical problem: proving validity of expressive formulas
Combination technique: split, approximate, decide
Three reasoning techniques– translation to first-order logic– field constraint analysis – Boolean Algebra with Presburger Arithmetic
Verified: lists, hash tables, trees, client examples
Bottom line
Can have verified data structures– individual data structures– correlated uses of multiple data structures