Formal Verification of Shared Memory Systems During their Design Ganesh Gopalakrishnan Department of Computer Science University of Utah http://www.cs.utah.edu/~ganesh
Dec 22, 2015
Formal Verification ofShared Memory Systems
During their Design
Ganesh Gopalakrishnan
Department of Computer Science
University of Utah
http://www.cs.utah.edu/~ganesh
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
2
FM and shared-memory system design
• Processor speed increasing at 55% per year - memory speeds at 7%
• Mismatch exacerbated by shared memory multiprocessors
• Complex protocols employed to hide memory latencies
• Need for formal verification techniques that can be employed during design
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
4
A Shared Memory Multiprocessor(a “shared memory system”)
Memory
CPUCPU
Interconnect
Memory
...
...
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
5
Classification: Symmetric Multi-Processors (SMP)
CPU$
Memory
CPU$
CPU$
Coherentsnoopingbus
Potential bugs in complex bus designs:
• Deadlocks, lack of forward progress
• Lack of coherency
• Incorrect shared memory consistency model
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
6
2. Distributed Shared Memory (DSM) systems
Memory
CPU CPU CPU...
DC Memory
CPU CPU CPU...
DC
…
High-speed networkSMPnode
Problems due to complex DSM protocols:
• Deadlocks, lack of forward progress, …
• Incorrect shared memory consistency models
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
7
Formal Methods for Shared Memory System Design
Verification Provably-correctSynthesis
Theorem-proving
Model-checkingProtocol
Low-level concerns(e.g. deadlocks, progress,...)
Higher-level concerns (e.g. shared memory consistency models)
Finite-state Reachability
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
8
Results of the UV group
• New Partial Order reduction algorithm• Realized in verifier called PV
• Outperforms SPIN “10 to 1” on most examples
• Selective state-caching is available “for free”
• A DSM Protocol synthesis algorithm• Safety of synthesis proved correct using PVS
• Derives realistic (hand-quality) DSM protocols
• Incorporates a scalable buffer-reservation scheme
• Verifying Formal Memory Models
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
10
Motivations
• Distributed directory based coherence protocols difficult
to understand and debug
• low-level requests / acks / nacks don’t reveal *what* is being implemented
• transient states are introduced and handled in an ad-hoc way
• buffer allocation is not tied to desired high-level properties (e.g. progress)
• verification is tedious
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
11
Example of problems due to “unexpected msgs”
Req Ack
Another Req? ? ?
Usually don’t know what to say…...saying nothing causes deadlock!
CacheCtrlr
DirectoryCtrlr
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
12
Our approach
• Based on synthesis
• Transient states introduced automatically
• Buffer allocation is tied to desired high-level properties (e.g. progress
• Verification becomes much easier
• Synthesized protocols seem efficient
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
13
Overview of Synthesis Method
I ECacheCtrlr
F EDir Ctrlr
I E
F E
Req (N)ack
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
14
Model-checking Efficiency
Protocol N states / time(low level)
states / time(high level)
Mig 2 23,164 / 2.8 54 / 0.14 235 / 0.48 965 / 0.5
Inv 2 193,389 / 19.23 546 / 0.64 18,686 / 18.4
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
15
An Illustration: Migratory Protocol (i)
I V
V1
V2
r(i)?reqr(i)!gr(data) r(j)?req r(o)!inv
r(o)?LR(data)r(j)!gr(data) r(o)?ID(data)
r(o)?LR(data)
Process ‘h’
h!LR(data) evict
h!ID(data)
rwh!req
h?inv
h?gr(data)
Process ‘r(i)’
F E I2
I3
I1
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
16
An Illustration: Migratory Protocol (ii)
I V
V1
V2
r(i)?reqr(i)!gr(data) r(j)?req r(o)!inv
r(o)?LR(data)r(j)!gr(data) r(o)?ID(data)
r(o)?LR(data)
Process ‘h’
h!LR(data) evict
h!ID(data)
rwh!req
h?inv
h?gr(data)
Process ‘r(i)’
F E I2
I3
I1
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
17
A Generic Example
P Q R
Q!aR!b
P?x
Q!c
R?y
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
18
Async Implementation of Example (i)
P Q R
Q!aR!b
P?x
R?y
Q!c
1 msg buffer location for Ack/Nack
R!!bQ!!a
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
19
Async Implementation of Example (ii)
P Q R
Q!aR!b
P?x
R?y
Q!c
R!!bQ!!aQ!!cP!!ack
Progress Buffer
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
20
Organization of Protocol - per Cache Line
RemoteNodes
HomeNode
- Remote nodes (cache ctrlrs) communicate w. home directory controller only
- If Remote and Home requests cross in medium, . Remote request treated as Nack by Home . Home request is dropped by Remote
- Pt-to-pt order-preserving error-free communication
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
21
General Nature of Communication States
(Remote)
h!msg T h?m1h?m2
(Home)
Tr(i)?m1
r(j)!m2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
22
Summary: Remote node rules
Statement Buffer ActionH!m empty Req; goto trans.H!m req Del req; req; goto trans.H?m req Ack / Nacktrans. ack successtrans. nack retrytrans. req Ignore req
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
23
Summary: Home node (i)
stmt buffer has action
r(i)?msg msg from r(i) ack the msg
r(i)!msg reserve progress, responsebuffers.Req and go trans.
trans ack from r(i) done
trans nack go back
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
24
Summary: Home node (ii)
condition action
trans. req from r(i) “implicit” nack
trans. req from r(j), buff has space add to buffer
trans. req x from r(j), progress buff isfree, and r(j)?x in comm. state
add to progress buffer
trans. req from r(j), progress buff is fullor r(j)?x not in comm. state
nack the request
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
25
Status of Work
• Correctness of Protocol Synthesis Proved in PVS
• Write-invalidate protocol also synthesized
• Offers a general synthesis method for protocols (not necessarily for DSM)– Related work: Buckley and Silberschatz, Chandra
et.al., Park and Dill, Gribomont, ...
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
26
Verifying Conformance toFormal Memory Models
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
27
FM and shared-memory system design
• Shared-memory systems are complex!
• Designers need “safety net” when exploring optimizations formal verification
• We focus on verifying that a (finite-state model of a) shared memory system provides the required memory model (mainly Sequential Consistency)
– E.g. Verify a Cache Coherence Protocol for SC
• Our approach: finite-state reachability analysis
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
28
Importance of Memory Models -- An Example Peterson’s algorithm for mutex undera memory model called “TSO”:
P1:
A = 1 ;turn = 2 ;while (B /\ turn==2 );
..CS..
P2:
B = 1 ;turn = 1;while (A /\ turn==1 );
..CS..
w(A,1);r(B,0);
w(B,1);r(A,0);
Init A=B=0
Must Specify Synchronization Routines and the Shared Memory Consistency Model(s) under which they work!
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
29
Impact on CPU design -- Do Read-Speculation Right!
wr(a,2) - Missrd(b, 0) - SpeculateSnoop wr(a) - Spec OK
wr(b,3) - Missrd(a, 0) - SpeculateSnoop wr(a)
CPU1 CPU2
busMEM
Without reissue, results are inconsistent with SC
..wr(a,2);.. wr(b,3)..
Spec not OK reissue rd(a, 2)
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
30
Basis for our work: ARCHTEST (Collier)
• Multi-threaded C programs
• Used to debug actual multiprocessor machines
– unavailable at design-time
• Based on the theory of graph-sets
– used in our work also
• Our CAV’98 work: adapt Collier’s tests for model-checking
– incomplete
• This work: a complete verification method (sound too!)
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
31
What is a shared memory model? Captured by the set of all executions of a concurrent program!
Memory
CPUCPU
w(A,1);r(B,0);
w(B,1);r(A,0);
Init A=B=0
Memory
CPUCPU
w(A,1);r(B,0);
w(B,1);r(A,1);
Init A=B=0
SC TSO
TSO allows more executions than SC (hence “weaker”)
Execution #1 Execution #2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
32
An Operational Definition of SC and TSO
MemoryMemorySC TSO
fifo fifoMUX
cpu1 cpu2 cpu1 cpu2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
33
How are allowed executions specified?
As constraints on events generated by the execution!
Constraints are expressed in terms of ordering rules:
RO - Read OrderingROA - RO over the same addressWOS - Write Ordering by StoragePOS - Program Ordering by StorageCMP - Computational OrderingWA - Write Atomicity
Ordering rules specify constrains on EVENTS
Memory Model = “Collier Cocktail!” - e.g. (CMP, RO, WOS)
06/21/99 34
CPU_i
STORE_i
CPU_j
STORE_j
R1(a,0) ;W2(b,1) ;R5(d,2) ;
R3(c,0) ;W4(d,2) ;W6(d,3) ;
R3(c,T) W4(d,2) W6(d,3)
W2(b,1)
RO(i)
part ofPOS(j)
R1(a,T) W2(b,1) R5(d,2)
W4(d,2) W6(d,3) WOS(i)
WOS(j)
Definition of POS (and also RO and WOS)
PO includes RR, RW, WR, and WW orders
View theseevents first asan unordered setwhich is subsequently ordered bythe arcs
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
35
CPU_i
STORE_i
CPU_j
STORE_j
W2(b,1) R5(d,2)
W4(d,2) W6(d,3)
W4(d,2) W6(d,3)
W2(b,1)
OneCMPordercmp1(i,d)
Another cmp2(i,d)
cmp1(j,d)
cmp2(j,d)
Definition of CMP (defined per CPU per address)
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
36
Assumptions in defining CMP...… and in the rest of this talk
• We are interested in more than SC
– We would like to set-up a general framework for defining and verifying memory models
– Assume that RO is obeyed by every memory model of interest to us
• We Assume
– Projectability,
– Data Independence
– Unambiguous executions
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
37
CPU_i
STORE_i
R1(a,T) ;W2(b,1) ;R5(d,2) ;
Projectible: R3(c,T) ;W4(d,2) ;W6(d,3) ;
CPU_j
STORE_j
Data independent:
Assume Projectability, Data Independence,and consider only Unambiguous executions
Executionsprojected ontosubsets ofaddresses resultin executions
Replacing all data values d in anexecution with f(d) for some function f results in an execution
Unambiguous: Same datum never written twice (sowe can uniquely trace source of data!)
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
38
CPU_i
STORE_i
CPU_j
STORE_j
R1(d,T)
R2(d,2)
W4(d,2)
W4(d,2)
W2(d,4)
Definition of CMP for CPU i for address d
W4(d,2) R2(d,2)
W2(d,4)
R1(d,T)
R3(d,2) R3(d,2)
W3(d,5)
W3(d,5)
ROA
W2(d,4)R4(d,5)
W3(d,5) R4(d,5)
ROA
CMP includesROA ; also is an implied edge
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
39
Initially a = 0 R1(a,1) ;W2(a,1) ; ..no writes to a..
CPU_i
STORE_i
CPU_j
STORE_j
Even thisexecutionis possibleunder (CMP,RO,WOS)
Let’s study (CMP, RO, WOS) - a useful drosophila!
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
40
An execution satisfying (CMP, RO, WOS)R1(a,T) ;W2(b,1) ;R5(d,2) ;
R3(c,T) ;W4(d,2) ;W6(d,3) ;
CPU_i
STORE_i
CPU_j
STORE_j
R3(c,T) W2(b,1)
W4(d,2) W6(d,3)
WOS(j)
CMP(j,d)
R1(a,T) W2(b,1) R5(d,2)
W4(d,2) W6(d,3) WOS(i)
CMP(i,d)
RO
Execution satisfies (CMP, RO, WOS) as there areno cycles created by adding their arcs!
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
41
An execution that violates (CMP,RO,WOS)
wr(A,2) ;wr(A,3) ;
CPU_i
STORE_i
CPU_j
STORE_j
rd(A,3) ;rd(A,2) ;
wr(A,2) rd(A,2)
wr(A,3) rd(A,3)
ROAWOS
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
42
Verification Techniques for Memory Models
• Consider all possible executions– involving all possible addresses A
– and all possible data D
– for all possible concurrent programs P
• Introduce the arcs due ordering rules
• Look for cycles
Impractical!
• So, look for ways to limit A, D, and P
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
43
Our approach
• Assume address projectability (or “projectability”)
and data independence
• Prove limited address theorems (helps limit A)
• Characterize all violating executions { E_i } over A
• Come up with finite-state abstractions for each E_i – using data independence to limit D, and
– using non-determinism
to arrive at a finite number of test automata aut_i
• Explore state-space of each aut_i || memory-system
• Look for entry into error-states
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
44
Use of data abstraction & non-determinism
P2X1 := AX2 := AX3 := A
....Xk := A
P1A := 1A := 2A := 3
....A := k
Look for some i,js.t. j < i /\ X(j) < X(i)
Suppose E_i are:
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)Errorstate
P2P1
- Achieves the effect of k = infinity- Considers all interleavings
Then a_iare:
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
45
Limited Address Theorem for (CMP,RO,WOS)
Two addresses suffice!
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
46
PowerPoint proof of the limited address theorem for (CMP,RO,WOS)
R1(P1)
R2(P1)
W1(P2)
W2(P2)
W3(P3)
W4(P3)
P1:
R1
R2
W1
W2
W3
W4
P1:
RO
WOS
WOS
R1
R2
W1
W2
W3
W4
RO
WOS
WOS
CMP
CMP
CMP
R1
R2
W1
W2
W3
W4
RO
WOS
WOS
CMP
CMP
CMP
R
RO
RO
Involves twoaddrs!
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
47
Exhaustive characterization of violations of (CMP, RO, WOS) over one address, “a”
v is not the initialvalue T of a, and a is not writtenanywhere
(1)
P_i...rd(a, v)…
P_ j...…...
(2)
P_i...rd(a,v1)…rd(a,v2)...
P_ j…wr(a,v2)…wr(a,v1)...
P_ i and P_ jcould be thesame process
(3)
P_i...rd(a,v)…rd(a,T)...
P_ j…wr(a,v)…
P_ i and P_ jcould be thesame process
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
48
Test automata for 1-address (CMP,RO,WOS) violations
Error states: E1, E2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
49
Exhaustive characterization of two addresses violations of (CMP, RO, WOS)
(1)
Allone-addressviolationsinvolvingonly address Aor only address B
(2)
P_i...rd(B,v2)…rd(A,v1)...
P_ j…wr(A,v1)…wr(B,v2)...
P_ i and P_ jcould be thesame process
R1(P1)
W3(P3)
W4(P3)WOS
CMP
R3(P1)
RO
CMP
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
50
Test automata for 2-address (CMP,RO,WOS) violations
Error states: E1, E2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
51
Limited Address Theorem for (CMP,POS)
2 addresses suffice
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
52
1-address (CMP,POS) verification
Error states: E1, E2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
53
2-address (CMP,POS) verification
Error states: E1, E2
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
54
CPU_1
STORE_1
CPU_2
STORE_2
w(A,1);r(B,0);
w(B,1);r(A,1);
w(A,1)r(B,0)w(B,1)
w(A,1)r(A,1)
w(B,1)
• Write Atomicity
• POS
• CMP
Memory
CPU2CPU1
w(A,1);r(B,0);
w(B,1);r(A,1);
SC
SC = (CMP, POS, WA)
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
55
Memory
CPUCPU
w(A,1);r(A,1);w(A,2);r(A,2);
r(A,2);r(A,1);
Init A=0
Definition of WA - by showing what is not WA!
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
56
The limited-address theorem for SC = (CMP, POS, WA)
• In an N-processor system, N addresses are– sufficient
• IF concurrent program P using M > N addresses shows a violation
• THEN there exists a subset A of N addresses
• such that P projected onto A yields concurrent program P’ that also shows a violation.
PowerPoint proof to follow
– and necessary:
Wr(A,1)Rd(B,0)
Wr(B,1)Rd(C,0)
Wr(C,2)Rd(A,0)
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
57
PowerPoint proof of the limited address theorem for SC = (CMP, POS, WA)
- Suppose C is the cycle containing the smallest number of events that involves more than N <pos edges. - Then two <pos edges connect events generated by the same processor, say `g’, and observed by `a’ and `b’.- If a=b, we can eliminate one of these POS edges- if a <> b, consider g <> a, and possibly equal to b. - a0 and a1 are writes. Find corresp events in `b’.
a0
a1
b2
b3
Pos(g) Pos(g)
a0
a1
b2
b3
Pos(g) Pos(g)
b0One linearization
wa
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
58
All N-address (CMP, POS, WA) violations:
(1) (2)
(CMP, POS)violations
Two processors “see”two writes w1 and w2 in different orders
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
59
Complete test for SC for 1-address programs
Error states:- < P14, Q41 >- { P41a, P41b } x { Q14a, Q14b }
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
60
Complete test for SC for 2-address programs
Error states:- < P14, Q41 >- { P41a, P41b } x { Q14a, Q14b }
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
61
Case Studies
Runway/PA system model– Bus based design
– An aggressive split transaction protocol
– Out-of-order (speculative) completion of transactions on Runway for high-performance
• not modeled in current experiments
– In-order completion of instructions in PA for sequential consistency
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
62
SC verification of the HP/Runway model
Spin PV
PO-1 56K 2794
PO-2 > 5M/DNF 11M
SC-1 499K 7880
SC-2a > 5M/DNF 5.9M
SC-2b > 4M/DNF 574K
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
63
Conclusions• Promising
– Violations caught very quickly
– Need to try larger examples
• Currently studying weaker memory models• Future work:
– Combatting state-explosion• Symmetries
• Better automata
• Integrate into design cycle of CPUs Support performance optimizations
and verification regressions
06/21/99 Ganesh, Utah Verifier group -- SAS talk ('99 visit)
64
• Graf (CAV’94)
– for more than SC (hence unsound for SC)
– properties depend on design
• Alur, McMillan, Peled (LICS’96)
– undecidable if data can be compared
• Nalumasu, Ghughal, Mokkedem, Gopalakrishnan (CAV’98)
– incomplete
• Henzinger, Qadeer, Rajamani (CAV’99)
– needs invariants
– invariants depend on design
– assumes address-symmetry
• Collier (‘80s)– not available at design-time
Related Work