Verifying MP Executions against Itanium Orderings using SAT* Ganesh Gopalakrishnan Yue Yang Hemanthkumar Sivaraj School of Computing, University of Utah Salt Lake City, UT, 84112 * Work supported in part by SRC Contract 1031.001 and NSF Award 0219805
49
Embed
* Work supported in part by SRC Contract 1031.001 and NSF Award 0219805
Verifying MP Executions against Itanium Orderings using SAT* Ganesh Gopalakrishnan Yue Yang Hemanthkumar Sivaraj School of Computing, University of Utah Salt Lake City, UT, 84112. * Work supported in part by SRC Contract 1031.001 and NSF Award 0219805. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Verifying MP Executions against Itanium Orderingsusing SAT*
Ganesh GopalakrishnanYue Yang
Hemanthkumar Sivaraj
School of Computing, University of UtahSalt Lake City, UT, 84112
* Work supported in part by SRC Contract 1031.001 and NSF Award 0219805
2
Efficient Multiprocessors must have Efficient Shared Memory Systems
* Hide the cost of memory operations by postponing updates
* Increasingly important because CPUs are growing faster faster than memory systems are
3
How to build Efficient Shared-memory Multiprocessor Systems?
• Employ weak memory models
– They permit global state updates to be postponed
• All runs were on a 1.733 GHz 1GB Redhat Linux V9 Athlon
• ~2 minutes to generate Sat instance
• 14,053,390 clauses
• 117,823 variables
• ~1 minute to solve Sat problem - found Unsat
• Unsat Core generation runs fast – gave 23 clauses! - 23 of the 14M clauses were causing the problem to be Unsat- Sat time for these 23 clauses … under a second
Unsat Core’s annotations were traced back to offending instructions andthe memory ordering rules that situated them in a “cycle”
12
The rest of the talk
• Itanium memory model in Higher Order Logic (well, not so high actually… )
• Our HOL specs translation “sat-generating checker programs”
• Execution to be checked translation by above program to Sat
• Each assembly instruction clauses it generates + annotations
• When Sat, what interleaving explains?
• When Unsat, how to get “core” (root-cause) + annotations on core
• Translating annotations on core to cycle on original program
13
• Itanium memory model in Higher Order Logic (well, not so high actually… )
The initial focus of our presentation :
- How to model an execution ?
- Why use “split stores” in modeling ?
14
• Itanium memory model in Higher Order Logic (well, not so high actually… )
Basic problem-modeling idea:
Find a “shuffle” of the instructions that explains the observations…
st [y] = 1
ld reg1 = [y] <1>
ld reg2 = [y] <1> st [y] = 1
ld reg1 = [y] <1>
ld reg2 = [y] <1>
P0 P1 Explanation…
The basic idea won’t always work …
st.rel [y] = 1
ld reg1 = [x] <0> ld reg2 = [y] <0>
st.rel [x] = 2
ld.acq r3 = [y] <1> ld.acq r4 = [x] <2>
Dat. Dep. Dat. Dep.
Ld . Acq OrderLd . Acq Order
No Shuffleof thesesequencesrespecting satisfiesthe read-values“ ”
15
• Problem Modeling…
Idea: Find a shuffle after each store is split into (p+1) copies….(by the way, this idea has sort of become “standard”)
st [y] = 1
P0 P1
st [x] = 2
Local copy for P0
“remote” copy for P0
“remote” copy for P1
Now, arrange the split copies…
A similar split
16
• Problem Modeling…
st [y] = 1
ld reg1 = [x] <0> ld reg2 = [y] <0>
P0 P1
st [x] = 2
Now, arrange the split copies…
st [y] = 1 “l”
st [y] = 1 “rp0”
st [y] = 1 “rp1”
st [x] = 2 “l”st [x] = 2 “rp0”st [x] = 2 “rp1”
st [y] = 1 “l”
st [y] = 1 “rp0”
st [y] = 1 “rp1”
st [x] = 2 “l”
st [x] = 2 “rp0”
st [x] = 2 “rp1”
ld reg1 = [x] <0>
ld reg2 = [y] <0>
Explanation…
ld.acq r3 = [y] <1> ld.acq r4 = [x] <2>
ld.acq r3 = [y] <1>
ld.acq r4 = [x] <2>
Dependencies
Anti-dependencies
17
Informal statement:
Store-Releases to write-back memory become visible to all processors in the same order
st.rel [x] = 1
• Back to Itanium memory model in Higher Order Logic thru an example
Implementation:
All copies of a “split st.rel” are visible atomically
Atomic set
18
One standard way of specifying atomicity:
All other events “e” are strictly before orstrictly after the atomic set
e
Another standard way of specifying atomicity:
If some event “e” is between two events in the atomic set,then “e” also belongs to the atomic set
SC(ops) =Exists order.( requireStrictTotalOrder ops order
/\ requireProgramOrder ops order
/\ requireReadValue ops order
Execution 1 Execution 2
e.g., which execution is legal under which memory model ?
22
• Itanium memory model in Higher Order Logic (well, not so high actually… )
• Our HOL specs translation “sat-generating checker programs”
23
Transformation of HOL specs to generate constraints
atomicWBRelease(ops,order) = forall (i in ops).(j in ops).(k in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) /\ (i.wrID = k.wrID) /\ order(i,j) /\ order(j,k) ==> (j.wrID = i.wrID)
atomicWBRelease(ops,order) = forall (i in ops).(j in ops).(k in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) /\ (i.wrID = k.wrID) /\ ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))
atomicWBRelease(ops,order) = forall (i in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) ==> forall (k in ops). (i.wrID = k.wrID) ==> forall (j in ops). ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))
Initial Spec
Applying Contrapositive
After Reducing quantifier Scopes
24
Functional (Ocaml) Program Derivation from HOL Specs:
atomicWBRelease(ops,order) = forall (i in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) ==> forall (k in ops). (i.wrID = k.wrID) ==> forall (j in ops). ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))
atomicWBRelease(ops) = forall(i,ops,wb(i))
wb(i) = if ~((attr_of i.var=WB) & (i.op=StRel) & (i.wrType=Remote) then true else forall(k,ops,wb1(i,k))
wb1(i,k) = if ~(i.wrID=k.wrID) then true else forall(j,ops,wb2(i,k,j))
wb2(i,k,j) = if (j.wrID=i.wrID) then true else ~(order(i,j) & order(j,k)) forall(i,S, e(i)) = for all i in S : e(i) (* foldr( map (fn i -> e(i)) (S) (&), true) *)
Transformed Spec
Functional Program that generates the constraints (will be automated)
25
• Itanium memory model in Higher Order Logic (well, not so high actually… )
• Our HOL specs translation “sat-generating checker programs”
• Execution to be checked translation by above program to Sat
26
P1: St a,1; Ld r1,a <1>; St b,r1 <1>;
P2: Ld.acq r2,b <1>; Ld r3,a <0>;
Have built tool for tuple-generation that addresses many details:(1) Expansion into tuples with variable address allocation
legalItanium(ops) =Exists order.( requireStrictTotalOrder ops order /\ requireOtherOrderItanium ops order
/\ requireReadValue ops order
st c,1 ;st d,2
ld d, 2;ld c, 0
SC(ops) =Exists order.( requireStrictTotalOrder ops order /\ requireOtherOrderSC ops order
/\ requireReadValue ops order
Example Execution
Break it down into “tuples”
• Store c viewed at P1 for modeling bypassing• Store c viewed at P1 for modeling global visibility• Store c viewed at P2 for modeling global visibility• Store d viewed at P1 for modeling bypassing• Store d viewed at P1 for modeling global visibility• Store d viewed at P2 for modeling global visibility• Ld d viewed at P2 for modeling read value• Ld c viewed at P2 for modeling read value
8 tuples obtained
28
Constraint Encoding Approach #1
n logn approach (“small domain” encoding)
• Attach a word w_t of 2 bits to each tuple t• Tuple i before Tuple j --> Assert wi < wj
• StrictTotalOrder --> Assert that the wt words are distinct
• Smaller # of Boolean Vars • Much Harder SAT instances (abandoned for now)
Illustration on4 tuples
requireStrictTotalOrder ops
order requireOtherOrder ops
order requireReadValue ops order
x00 x01 x10 x11
x20 x21 x30 x31
For all i, j: xi1,xi0 != xj1, xj0
A system of constraintswith primitive constraint xi1, xi0 < xj1, xj0
29
Constraint Encoding Approach #2
n n approach (“e_ij” encoding)
• Assign a matrix position mij for each pair of tuples ti and tj • Tuple i before Tuple j --> Assert mij true• StrictTotalOrder --> Assert Irreflexitivity, Transitivity, Totality
• Larger # of Boolean Vars • Easier SAT instances (being pursued now)
Illustration on4 tuples
requireStrictTotalOrder ops
order requireOtherOrder ops
order requireReadValue ops order
A system of constraintswith primitive constraint mij
Forall i : ~mii
Forall i,j : mij \/ mji
Forall i,j,k : mij /\ mjk
=> mik
i . . . .
j . mij . .
. . . . . . . .
30
Table of Results (somewhat dated…)SAT-instance generation time for n logn method
• Source-level optimizations– Record known orderings (e.g., “i before j”) – these manifest as unit clauses– Infer others (e.g., “not j before i”) - generate unit-clauses for these too– Prevent generating transitivity axioms that depend on “j before i”
• The use of incremental SAT can perhaps be directed by “functional scripts” that are automatically generated
• Use of Unsat cores to pinpoint errors
44
Concluding Remarks
• Main source of complexity: the transitivity axiom
• “Lazy” methods for handling transitivity must be investigated
• Hybrid Sat encoding (partly nn and partly n log n) can also help as was the experience of Lahiri, Seshia, and Bryant
• Analyzing larger programs: – Somehow view program in terms of “basic blocks”
– Treat each basic block as super instruction
– If super-instruction unordered, no need to descend into basic block
• Exploit incremental Sat when same litmus tests are rerun
• Try modeling another weak memory model
45
Extra Slides
46
Unsat Core generation
• The CNF file generated by the sat-generating program is solved using zchaff.
• If SAT, then we get a satisfying assignment.• First n*n variables in the assignment correspond
to the n*n variables in our ordering. Can be used to output a valid ordering of the ops.
• If UNSAT, then need a way to find a “root-cause” for the illegality of the execution.
• We use unsatisfiable core generation to get to the root cause.
• An unsatisfiable core of an unsatisfiable Sat instance is a subset of clauses of the formula such that its conjunction is still UNSAT.
47
Generating Unsatisfiable Core
• Zchaff can be told to generate resolution trace while checking for Sat.
• Zcore – tool that takes as input a CNF file and resolution trace produced by zchaff and produces unsatisfiable core.
• Zcore available as part of zchaff.• Unsatisfiable core is another CNF file with the
reduced set of clauses.• Can be fed back into zchaff/zcore to generate
a potentially smaller unsatisfiable core.• Process repeated till fixed point reached.
48
Mapping back to root-cause
• Clauses in the unsatisfiable core contain the ordering violation information in them
• Tool to home in towards the root-cause for the violation• If the root cause is not something trivial, then the cause
is usually a cycle of instructions. Each link in the cycle corresponds to an ordering requirement between the instuctions involved.
• If cycle exists, then Transitivity can be applied to show that Irreflexivity is not satisfied.
• Input to the tool to generate root cause: – The original set of annotated machine instructions for all
processors– The default values stored in memory locations at the
beginning of the execution– Clause annotations for the clauses that form the
unsatisfiable core
49
Root-cause cycle analysis algorithm
Each ReadValue rule generates a set of clauses.From the annotations, find the tuples that come from the same
ReadValue rule (two different ops will be involved in a rule)– Extract the ops out of the annotations and get the
corresponding instructions (using the proc and pc values)
From the data being used in the ld instruction and the default date value for the corresponding memory address, it can be seen if the effect of a store is being reflected in a load.
This way the dependency between a load and a store is established.
The above is done for all the ReadValue rules in the annotations
Ops (and the corresponding instructions) on both sides of a mf that form a link in the cycle are inferred based on ProgramOrder rule annotations and the pc values involved.
The other missing links in the violating cycle are also inferred based on the remaining ProgramOrder rule annotations.