Carnegie Mellon University Deductive Verification Deductive Verification of Advanced Out-of- of Advanced Out-of- Order Microprocessors Order Microprocessors Shuvendu K. Lahiri Randal E. Bryant
Carnegie Mellon University
Deductive Verification of Deductive Verification of Advanced Out-of-Order Advanced Out-of-Order
MicroprocessorsMicroprocessors
Deductive Verification of Deductive Verification of Advanced Out-of-Order Advanced Out-of-Order
MicroprocessorsMicroprocessors
Shuvendu K. LahiriRandal E. Bryant
– 2 –
OOO Processor ModelOOO Processor Model
head tail
validvaluesrc1validsrc1valsrc1tagsrc2validsrc2valsrc2tagdesttypepctargetpredict
BranchPredictor
ArithmeticUnit
BranchUnit
Reorder Buffer
lsq stq
Mem
MemoryUnit
PC Unit Instruction Mem
DECODE
src1
src2
destimmtype
RegisterRename Unit
Result Bus
PC
epc
– 3 –
Complexity of Out-of-Order Processor VerificationComplexity of Out-of-Order Processor VerificationUnbounded DataUnbounded Data
Integer data paths
Parameterized ComputationParameterized Computation Uninterpreted functions and predicates ALU, ExceptionRaise?, Decoding Logic
Unbounded Data structuresUnbounded Data structures Memory Ordered Data structures
Highly concurrentHighly concurrent Retire, execute, dispatch happen concurrently
Proving Sequential SemanticsProving Sequential Semantics With respect to an Instruction Set Architecture (ISA)
– 5 –
Related WorkRelated Work
Deductive MethodsDeductive Methods Theorem prover based Hosabettu et al. and Sawada et al. Large proof scripts Manual intervention to discharge the proofs Uses “flushing” technique
Compositional Model Checking basedCompositional Model Checking based McMillan et al. Does not apply to deep or superscalar processors Exploits symmetry in the design User decomposes the proof Does not need auxiliary invariants
– 6 –
Earlier WorkEarlier Work
Lahiri, Seshia and Bryant FMCAD’02
Modeling and Verification of Out-of-Order ProcessorsModeling and Verification of Out-of-Order Processors Simple Out-of-order execution unit Only arithmetic instructions All proof obligations handled by decision procedure for
UCLID
– 7 –
This workThis work
Apply earlier work to more complex designsApply earlier work to more complex designs Handle speculation and exceptions Memory instructions, store forwarding etc. Superscalar out-of-order processors
Can we model the new components in UCLID?Can we model the new components in UCLID? Load store queues, exceptions
Is refinement based deductive verification feasible ?Is refinement based deductive verification feasible ? Earlier deductive methods use Burch-Dill technique Recursive “flushing” function Aarons & Pnueli use “refinement” for simpler models
Can we retain the automation of proofs ?Can we retain the automation of proofs ? Relieve the user from interactively proving theorems
– 8 –
Access Modes for Reorder BufferAccess Modes for Reorder Buffer
FIFOFIFO Insert when dispatch Remove when retire
Content AddressableContent Addressable Broadcast result to all
entries with matching source tag
head tail
Retire Dispatch
result bus
GlobalGlobal Flush all queue entries when
instruction at head causes exception
ALU
execute
Directly AddressableDirectly Addressable Select particular entry for
execution Retrieve result value from
executed instruction
– 9 –
CLU : Logic of UCLIDCLU : Logic of UCLID
Terms (Terms (T T )) Integer Expressions ITE(F, T1, T2) If-then-else
Fun (T1, …, Tk) Function application
succ (T) Increment
pred (T) Decrement
Formulas (Formulas (F F )) Boolean ExpressionsF, F1 F2, F1 F2 Boolean connectives
T1 = T2 Equation
T1 < T2 Inequality
P(T1, …, Tk) Predicate application
Functions (Functions (FunFun)) Integers Integerf Uninterpreted function symbol
x1, …, xk . T Function definition
Predicates (Predicates (PP)) Integers Booleanp Uninterpreted predicate symbol
x1, …, xk . F Predicate definition
– 10 –
Modeling Memories with ’sModeling Memories with ’s
Memory M Modeled as FunctionMemory M Modeled as Function
M(a): Value at location a
InitiallyInitially
Arbitrary state Modeled by uninterpreted
function m0
Writing Transforms MemoryWriting Transforms Memory M = Write(M, wa, wd)
a . ITE(a = wa, wd, M(a)) Future reads of address wa
will get wd
Ma
M
a m0
M
Ma 1
0
wd
=wa
– 11 –
Modeling Parallel UpdatesModeling Parallel Updates
Simultaneous-Update MemoriesSimultaneous-Update Memories Update arbitrary subset of entries at the
same step Useful for modeling Reorder Buffer
Forwarding data to all dependant instructions
•••
•••
•••
M(i)
M(i+2)
M(i+1)
M(j)
M(j+1)
M(j+2)
M(j+3)
P(i+1) is true
P(i+2) is true
P(j+1) is true
P(j+3) is true
next[M] := i. ITE(P(i), D(i), M(i))
If entry i satisfies a predicate
P(i) it is updated with D(i)
– 12 –
Modeling Parallel UpdatesModeling Parallel Updates
next[M] := i. ITE(P(i), D(i), M(i))
If entry i satisfies a predicate
P(i) it is updated with D(i)
•••
•••
•••
M(i)
D(i+2)
D(i+1)
M(j)
D(j+1)
M(j+2)
D(j+3)
P(i+1) is true
P(i+2) is true
P(j+1) is true
P(j+3) is true
Simultaneous-Update MemoriesSimultaneous-Update Memories Update arbitrary subset of entries at the
same step Useful for modeling Reorder Buffer
Forwarding data to all dependant instructions
– 13 –
Modeling Unbounded FIFO BufferModeling Unbounded FIFO Buffer
Queue is Subrange of Infinite SequenceQueue is Subrange of Infinite Sequence Q.head = h
Index of oldest element
Q.tail = t Index of insertion location
Q.val = qFunction mapping indices to valuesq(i) valid only when h i < t
Initial State: Arbitrary QueueInitial State: Arbitrary Queue Q.head = h0, Q.tail = t0
Impose constraint that h0 t0
Q.val = q0
Uninterpreted function
q(h–2)
q(h–1)
q(h)
q(h+1)
•••
q(t–2)
q(t–1)
q(t)
q(t+1)
•••
•••
tailtail
headhead
AlreadyPopped
Not YetInserted
incr
ea
sin
g in
dic
es
– 14 –
Modeling FIFO Buffer (cont.)Modeling FIFO Buffer (cont.)
tt
q(h–2)
q(h–1)
q(h)
q(h+1)
•••
q(t–2)
q(t–1)
q(t)
q(t+1)
•••
•••
hh
next[h] := ITE(operation = POP, succ(h), h)
next[q] := (i).ITE((operation = PUSH & i=t), x, q(i))
next[t] :=ITE(operation = PUSH, succ(t), t)
q(h–2)
q(h–1)
q(h)
q(h+1)
•••
q(t–2)
q(t–1)
x
q(t+1)
•••
•••
next[t]next[t]
next[hnext[h]]
op = PUSHInput = x
– 15 –
Modeling Components of ProcessorsModeling Components of Processors
Reorder BufferReorder Buffer FIFO
Instructions in Program Order Parallel Update memory
Update from an executed instruction Content Addressable
Load-Store QueueLoad-Store Queue FIFO
Store QueueStore Queue FIFO Associative lookup by content
Find the latest entry containing an address Flush part of the queue
Do not flush retired instructions
– 16 –
Verification ApproachVerification Approach
Extending the approach in FMCAD’02Extending the approach in FMCAD’02 Worked with a simple OOO execution unit No speculation or memory
Deductive verificationDeductive verification
– 17 –
Deductive VerificationDeductive Verification
is the state transition relation, is the state transition relation,
describes the initial statesdescribes the initial states
pp is the property to be proved, is the property to be proved,
is an inductive invariant, which implies is an inductive invariant, which implies pp
Prove Prove
Prove Prove ’’
Prove Prove pp
p is proved
– 18 –
Restricted Invariants and ProofsRestricted Invariants and Proofs
Invariants of the form Invariants of the form xx11xx22……xxk k (x(x11…x…xkk))
(x1…xk) is a CLU formula without quantifiers
x1…xk are integer variables free in (x1…xk)
Proving these invariants requires quantifiersProving these invariants requires quantifiers
|= (x1x2…xk (x1…xk)) y1y2…ym (y1…ym)
Automatic instantiation of Automatic instantiation of x1…xk with concrete termswith concrete terms Sound but incomplete method
Reduce the quantified formula to a CLU formula Can use the decision procedure for CLU
– 19 –
Proving correctnessProving correctness
Refinement MapsRefinement Maps Establish relation between OOO and sequential ISA model A refinement map for each ISA visible state element
Register FileProgram CounterData Memory
ExampleExample “If a register is not being modified in OOO, then it should
have the same value as in the ISA”
– 20 –
Description of VerificationDescription of Verification
– 21 –
Auxiliary Data StructuresAuxiliary Data Structures
Shadow FieldsShadow Fields “Predicts” correct value for OOO state elements Updated during DISPATCH by ISA machine
Auxiliary FieldsAuxiliary Fields Need to define a consistent internal state of OOO Does not depend on ISA machine Usually additional maps
– 22 –
Adding Shadow StateAdding Shadow State
McMillan, ‘98 Arons & Pnueli, ‘99
Provides Link Between ISA Provides Link Between ISA & OOO Models& OOO Models Additional entries in ROB
Do not affect OOO behavior
Generated when instruction dispatched
Predict values of operands and result
From ISA model
ISA
Reg.File
PC
OOO
Reg.File
PCReorder Buffer
– 23 –
Shadow StatesShadow States
Operands and Result of an instruction Operands and Result of an instruction Correct values
Shadow Register Rename UnitShadow Register Rename Unit Latest non-speculative instruction to modify a register
Shadow Memory Address MapShadow Memory Address Map Latest non-speculative instruction to modify a memory
address
– 24 –
Auxiliary StructuresAuxiliary Structures
Restricted Invariant StructureRestricted Invariant Structure x1x2…xk (x1…xk)
Adding complicated InvariantsAdding complicated Invariants For every non-executed memory instruction I in ROB, there
exists an entry in the Load-Store Queue (LSQ)
Requires Existential () Properties
Add auxiliary structure as Add auxiliary structure as witness witness for for Add a map - rob_lsq_ptr : ROB LSQ For every non-executed memory instruction I in ROB,
rob_lsq_ptr (I) is present in LSQ
– 25 –
Auxiliary StructuresAuxiliary Structures
Restricted Invariant StructureRestricted Invariant Structure x1x2…xk (x1…xk)
Adding Complicated InvariantsAdding Complicated Invariants For every non-executed
memory instruction I in ROB, there exists an entry in the Load-Store Queue (LSQ)
Requires Existential () Properties
Add auxiliary structure as Add auxiliary structure as witness witness for for
Add a map - rob_lsq_ptr : ROB LSQ
For every non-executed memory instruction I in ROB, rob_lsq_ptr (I) is present in LSQ
– 26 –
Auxiliary StructuresAuxiliary Structures
rob_lsq_ptr : ROB rob_lsq_ptr : ROB LSQ LSQ lsq_rob_ptr : LSQ ROB already part of the model
rob_stq_ptr : ROB rob_stq_ptr : ROB STQ, stq_rob_ptr : STQ STQ, stq_rob_ptr : STQ ROB ROB Need reverse maps
ld_stq_ptr : ROB ld_stq_ptr : ROB STQ STQ For each Load instruction, the STQ entry that would forward
data
– 29 –
Incremental ModelsIncremental Models
1.1. Basic Out-of-order execution unit (base)Basic Out-of-order execution unit (base)1. Reorder Buffer, Register Rename Unit
2.2. Exception Handling (exc)Exception Handling (exc)1. Arithmetic exceptions
3.3. Branch Prediction (exc/br)Branch Prediction (exc/br)
4.4. Memory Instruction – Simple (exc/br/mem-simp)Memory Instruction – Simple (exc/br/mem-simp)1. Stores commit during RETIRE
2. Illegal Address exceptions
5.5. Memory Instruction (exc/br/mem)Memory Instruction (exc/br/mem)1. Stores commit sometime after RETIRE
– 30 –
CounterexamplesCounterexamplesStrengthen InvariantsStrengthen Invariants
Use counter-examples to (manually) strengthen the invariants
ExampleExample
Invariant : t ROB. reg.valid(rob.dest(t))Is the invariant inductive ?Is the invariant inductive ?
Is it preserved by the transition function ?
CounterexampleCounterexample rob.hd = 1, rob.tl = 10 rob.valid[1] = true t = 5 rob.dest[5] = r10 reg.tag[r10] = 1 reg.valid[r10] = false operation = retire
t t ROBROB. t . t reg.tag(rob.dest(t))reg.tag(rob.dest(t))
– 31 –
Misspeculation InvariantsMisspeculation Invariants
Predict the instruction that would cause misspeculationPredict the instruction that would cause misspeculation Result of branch misprediction or exception
ShadowShadow entry to keep track of this instruction entry to keep track of this instruction shdw_exn_mpred_tag : tag in the ROB Gets updated from ISA machine during DISPATCH Reset during a “flush” of the OOO state
InvariantsInvariants Earliest misspeculated instruction Instruction at shdw_exn_mpred_tag should raise an
exception or be mispredicted Others
– 32 –
Ordering InvariantsOrdering Invariants
Maintain Program Order in different data structuresMaintain Program Order in different data structures Reorder Buffer Load Store Queue Store Queue
Often the source of complicated invariantsOften the source of complicated invariants For memory instructions I1, I2
Instruction I1 precedes I2 in Reorder Buffer iff I1 precedes I2 in Load-Store Queue
If instruction I1 depends on instruction I2, then I1 precedes I2 in program order
– 33 –
Load-Store InvariantsLoad-Store Invariants
Correct Value of a Load (r,A)Correct Value of a Load (r,A) If A present in STQ
Value from STQ
If shdw.mem_tag(A) in ROB and A not in STQValue of the store
ElseValue from the memory
– 34 –
Shadow InvariantsShadow Invariants
Relate Shadow Variables to State VariablesRelate Shadow Variables to State Variables t ROB. [rob.valid(t) rob.value(t) = shdw.value(t)] t ROB. [rob.src1valid(t) rob.src1val(t) = shdw.src1val(t) ] t ROB. [rob.src2valid(t) rob.src2val(t) = shdw.src2val(t) ]
– 35 –
Comparative Verification EffortComparative Verification Effort
Proof script size substantially smaller Proof script size substantially smaller 67KB as opposed to 1909 KB (Hosabettu et al.) Very little user intervention in discharging proofs
Instantiation of quantifiersInstantiation of quantifiers Mostly automatic, few manual for larger examples
base exc exc / br exc / br / mem-simp
exc / br / mem
Total Invariants
13 34 39 67 71
Manually instantiate
0 0 0 4 8
UCLID time
54 s 236 s 403 s 1594 s 2200 s
Person time
2 days 5 days 2 days 15 days 10 days
– 36 –
Going SuperscalarGoing Superscalar
SuperscalarSuperscalar Dispatch 0… d instructions at each step Retire 0… r instructions at each step
Complex Control LogicComplex Control Logic Additional forwarding in DISPATCH window Additional forwarding in RETIRE window
Extended the base modelExtended the base model
– 37 –
Statistics for Superscalar ModelsStatistics for Superscalar Models
Does not require any change to proof scriptDoes not require any change to proof script Complicates control logic but the invariants still hold
Scales well with increasing widthScales well with increasing width Almost linear with the (Dispatch*Retire) width Instantiation considers terms in (Dispatch + Retire) window
WidthWidth #-instant#-instant Time (sec)Time (sec)
DispatchDispatch RetireRetire
22 11 1212 86.6386.63
22 22 2828 137.43137.43
22 44 8888 308.55308.55
22 88 304304 1040.601040.60
– 38 –
ConclusionConclusion
Case study of complex processors in UCLIDCase study of complex processors in UCLID CLU expressive enough to model advanced features
Reasonable automation in discharging proofsReasonable automation in discharging proofs Use of automatic decision procedures Quantification strategy robust
Need to generate invariantsNeed to generate invariants Using Predicate Abstraction Automatically constructed invariant for OOO-base model
given the predicates
Improve desirability for deductive methodsImprove desirability for deductive methods
– 39 –
Modeling Circular Queues Modeling Circular Queues
head tail
H0 T0
next[head] := case (operation = POP) : succ’(head) ; default : head ;esac
next[tail] := case (operation = PUSH) : succ’(tail) ; default : tail;esac
succ’ := Lambda x. case x = T0 : H0 ; default : succ(x);esac;
next[content] := Lambda i. case (operation = PUSH) & (i = tail) : D ; default : content(i);esac
– 40 –
Store QueueStore Queue
Content AddressableContent Addressable Look for an address Same address at
multiple index
Latest MatchLatest Match Latest index that
matches address
Partial FlushPartial Flush Remove entries
after an index
tt
••
A(h–2)
A(h–1)
A(h)
A(h+1)
A(t–2)
A(t–1)
A(t)
A(t+1)
•••
hh
d(h–2)
d(h–1)
d(h)
d(h+1)
d(t–2)
d(t–1)
d(t)
d(t+1)
•••
•••
•••
•••
A(r) d(r) rr• •
AddressAddress DataData
speculativespeculative
retired retired
– 41 –
Store QueueStore Queue
Content AddressableContent Addressable Look for an address Same address at
multiple index
tt
••
A(h–2)
A(h–1)
A(h)
A(h+1)
A(t–2)
A(t–1)
A(t)
A(t+1)
•••
hh
d(h–2)
d(h–1)
d(h)
d(h+1)
d(t–2)
d(t–1)
d(t)
d(t+1)
•••
•••
•••
•••
A(r) d(r) rr• •
AddressAddress DataData
speculativespeculative
retired retired
A1A1
A2A2
A3A3••
– 42 –
Quantifier InstantiationQuantifier Instantiation
ProveProve|= (x1x2…xk (x1…xk)) y1y2…ym (y1…ym)
1.1. Introduce Skolem Constants (Introduce Skolem Constants (y*y*1,…,y*y*mm))
|= (x1x2…xk (x1,…,xk)) (y*1,…,y*m)
2.2. Instantiate Instantiate x1,…,xk with concrete terms Assume single-arity functions and predicates Let Fx = {f | f(x) is a sub-expression of (x1…xk)}
Let Tf = {t | f(t) is a sub-expression of (y*1…y*m)}
For each bound variable x, Ax = {t|f Fx and t Tf}
Instantiate over Axi x Ax2 ...x Axk
Formula size grows exponentially with the number Formula size grows exponentially with the number of bound variablesof bound variables