Synthesizing DataStructure Manipula5ons with Natural Proofs Xiaokang Qiu (Joint work with Armando SolarLezama)
Synthesizing Data-‐Structure Manipula5ons with Natural Proofs
Xiaokang Qiu (Joint work with Armando Solar-‐Lezama)
Building Reliable SoHware
no yes
(Verifica5on Condi5on)
(precondi5on, postcondi5on)
(loop invariants, lemmas, etc.)
Program Synthesis
Program Verification Program
Verification Program Verification
Program Verification
Constraint Solving
Counter-‐ example?
Synthesizing Provably-‐Correct Data-‐Structure Manipula5ons?
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
Key Challenges: • The synthesizer is usually verifica5on-‐agnos5c (Storyboard, Sketch, etc.) • Synthesizing loop invariants, ranking func5ons, etc., is hard (Verifast, Bedrock, HIP/SLEEK, Dafny, VCC, Jahob …) • Inherent conflicts among expressivity, automa5city and efficiency
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
Key Challenges: • The synthesizer is usually verifica5on-‐agnos5c (Storyboard, Sketch, etc.) • Synthesizing loop invariants, ranking func5ons, etc., is hard (Verifast, Bedrock, HIP/SLEEK, Dafny, VCC, Jahob …) • Inherent conflicts among expressivity, automa5city and efficiency
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
Key Challenges: • The synthesizer is usually verifica5on-‐agnos5c (Storyboard, Sketch, etc.) • Synthesizing loop invariants, ranking func5ons, etc., is hard (Verifast, Bedrock, HIP/SLEEK, Dafny, VCC, Jahob …) • Inherent conflicts among expressivity, automa5city and efficiency
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
Key Challenges: • The synthesizer is usually verifica5on-‐agnos5c (Storyboard, Sketch, etc.) • Synthesizing loop invariants, ranking func5ons, etc., is hard (Verifast, Bedrock, HIP/SLEEK, Dafny, VCC, Jahob …) • Inherent conflicts among expressivity, automa5city and efficiency
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
PID: 12
Start Addr
End Addr
Next
PID: 30
Start Addr
End Addr
Next
PID: 19
Start Addr
End Addr
Next Prev Prev Prev
Key Challenges: • Induc5ve synthesis systems are not expressive enough
(Storyboard, Sketch, etc.) • Deduc5ve synthesis systems are usually interac5ve for
sophis5cated data-‐structures (Leon, Fiat, etc.) • Verifica5on systems are usually not synthesis-‐enabled
(Verifast, Bedrock, HIP/SLEEK, Dafny, VCC, Jahob, etc.) • Conflicts among expressivity, automa5city and efficiency
Our approach
Provably-‐Correct Programs (loop invariants, ranking func5on)
Natural Proof-‐Based Synthesizer (unbounded
heap)
SKETCH (bounded heap, bounded inlining, bounded unrolling)
Natural Proofs (soundly reducing infinite heap reasoning to finite symbolic heap reasoning)
DRYAD formulas (expressive pre-‐/post-‐condi5on)
DRYAD formulas (expressive pre-‐/post-‐condi5on)
DRYAD specifica5ons (expressive pre-‐/post-‐condi5on)
IMPSKETCH programs (Program sketch with high-‐level holes)
node rev_sorted_list(node h) {! requires sorted_l*(h);! ensures rev_sorted_l*(ret) /\ len*(ret) = old(len*(h)) /\ max*(ret) = old(max*(h)) /\ min*(ret) = old(min*(h)) ; stmt(2);! while (cond(2)) {! invariant sorted_l*(h) /\ r_sorted_l*(p) /\ conj(5);! decreases exp(1);! stmt(4);! }! return ??;!}
Input: a sketched program
Unknown holes: Kind(Size)!
Eg.: Reverse a sorted list
min*(h) ≡INT_MAX if h = nil
min h.key, min*(h.next)( ) otherwise
"#$
%$
sorted _ l*(h) ≡true if n = nil
h.key <min*(h.next)∧sorted _ l*(h.next) otherwise
#$%
&%
node rev_sorted_list(node h) {! requires sorted_l*(h);! ensures rev_sorted_l*(ret) /\ len*(ret) = old(len*(h)) /\ max*(ret) = old(max*(h)) /\ min*(ret) = old(min*(h)) ;! p := nil;! while (h != nil) {! invariant sorted_l*(h) /\ r_sorted_l*(p) /\
disjoint(p,h) /\ max*(p) <= min*(h) /\ old(len*(h)) = len*(h) + len*(p) /\ old(max*(h)) = max(max*(h), max*(p)) /\ old(min*(h)) = min(min*(h), min*(p)) ;!
decreases len*(h);! t := h.next;! h.next := p;! p := h;!
h := t;! }! return p;!}!
Output: a synthesized and verified program
Natural proofs: in a nutshell [POPL’12, PLDI’13, PLDI’14]
• Handle a logic that is very expressive (searching for a proof is inevitably undecidable)
• Retain automa5city at the same level as decidable logics
• Iden5fy a class of natural proofs N such that • N includes natural proof tac5cs used in human
proofs • Many correct programs can be proved using a
proof in class N • The class N is effec5vely searchable
(checking if there is a proof in N is efficiently decidable)
All Possible Proofs
N
!
!
!
!
Example of Natural Proofs Invariant: sorted_l*(h) /\ r_sorted_l*(p) /\ max*(p) <= min*(h)
5 next
p! h!
4
next 3 2
next … 6
next 7 …
next’
h’!p’!
5 next
p!h!
<=4
next’
h’!p’!
h
t!
t!
>=6
(only concre5ze the footprint)
t := h.next;!h.next := p;!p := h;!h := t;!
For any concrete heap CH |= φerr
There is a corresponding symbolic heap SH |= φerr
(sound but incomplete)
Natural Proofs:
Program synthesis with Natural Proofs
• Goal: fill the holes in the sketched program, such that a natural proof goes through. ( i.e., synthesize a correct program w.r.t. the symbolic
execu5on/interpreta5on. )
SKETCH
Symbolic executer
Symbolic interpreter
IMPSKETCH program
+ DRYAD spec !
Completeprogram!
yes (Provably-‐correct for arbitrary heap)
no
(no solu5on can be proved with a natural proof)
Symbolic executer in SKETCH • Symbolic heaps as arrays
• Encoding statements
int[H][F] heap !(heap[l2][next] == l3)!! ! ! ! !(heap[l2][key] == 5)!
int[V] var ! !(var[p] == l1)!bool[H] sym ! !(sym[t] == true)!bool[H] active !(active[l4] == false)!
5 next
p! h! t!
l1 l2 l3
5 next
p! h! t!
l1 l2 l3 next
h.next := p;! int source := var[h];!int dest := var[p];!assert !sym[source];!heap[source][next] := dest;!
Symbolic interpreter for DRYAD formulas
h
5 next
6 next
7 nil
h!
interpreted on concrete nodes: len*(t) = 2 len*(h) = len*(t) + 1 = 3
uninterpreted on symbolic nodes: len*(t) = unint_len*(t, ts) = 2 len*(h) = len*(t) + 1 = 3
t!
5
h! t!
(len*(t) can be arbitrary)
int len_rec(int l) {! if (l == nil) !
!return 0;! else !
!return len_rec(heap[next][l]) + 1;!}!
int len_rec(int l) {! if (l == nil) return 0;! else {!
!if (sym[l]) !! return unint_len (l, ts);!!else !! return len_rec(heap[next][l]) + 1;!
}!}!
Uninterpreted func5on
parameterized by a 5mestamp
Experimental results Example Cond
synthesized? Loop
synthesized? Control size
(bits) # IteraBons Time
srtl_prepend! Y N 290 5 12s srtl_insert_body Y N 512 7 55s sll_max! Y Y 817 45 8m55s sll_min! Y Y 817 47 8m12s srtl_reverse! N Y 112 75 5m48s srtl_ins_rec! Y N 185 40 18m53s bst_left_rotate! N N 70 3 2m6s mreg_cut_right! N N 338 13 6m19s mreg_find! Y Y 735 14 10m53s
• Sketched programs wrinen in IMPSKETCH and DRYAD • Mechanically encoded to SKETCH • Several op5miza5on techniques applied
(malloc budget, break redundancy caused by symmetry, etc.) • Several program-‐independent axioms provided (e.g., min*(ret) < max*(ret))
2817 ≈ 10240