March 9, 2011 http:// csg.csail.mit.edu/6.375 L11-1 Implementing for Correct Concurrency Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology http:// csg.csail.mit.edu/6.375
Feb 23, 2016
March 9, 2011 http://csg.csail.mit.edu/6.375 L11-1
Implementing for Correct Concurrency
Nirav DaveComputer Science & Artificial Intelligence LabMassachusetts Institute of Technology
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-2http://csg.csail.mit.edu/6.375
Dealing with ConflictsWhen do conflicts arise?
How do we Analyze them?
How do we fix them?
How do we make sure we’re okay?
March 9, 2011 L11-3http://csg.csail.mit.edu/6.375
SFIFOinterface SFIFO#(type t, type tr, type v); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Maybe#(v) find(tr); // search FIFOendinterface
n = # of bits needed to represent the values of type “t“ m = # of bits needed to represent the values of type “tr“ v = # of bits needed to represent the values of type “v“
not full
not empty
not empty
rdyenab
n
nrdy
enab
rdy
enq
deq
first SF
IFO
mod
ule
clea
renab
findmbool
V
March 9, 2011 L11-4http://csg.csail.mit.edu/6.375
Processor Example
fetch execute
iMem
rf
CPU
decode memory
pc
write-back
dMem
5 – stage Processor. 1 element FIFOs in between stages
Let’s add bypassing
March 9, 2011 L11-5http://csg.csail.mit.edu/6.375
Decode Rulerule decode (!newStallFunc(instr, d2eQ, e2mQ, m2wQ)); let fetInst = f2dQ.first(); f2dQ.deq(); match {.ra, .rb} = getRARB(fetInst);
let va0 = rf[ra]; let va1 = fromMaybe (m2wQ.find(ra), va0); let va2 = fromMaybe (e2mQ.find(ra), va1);
let vb0 = rf[rb]; let vb1 = fromMaybe (m2wQ.find(rb), vb0); let vb2 = fromMaybe (e2mQ.find(rb), vb1);
let newInst = case (fetInst) match Add: return (DAdd .va2 .vb2); … endcase; d2eQ.enq(newInst);endrule When do we want it to execute?
Decode is also correct correct anytime it’s allowed to execute
Search through each place in
design
March 9, 2011 L11-6http://csg.csail.mit.edu/6.375
some insight intoConcurrent rule firing
There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges
Rules
HW
Ri Rj Rk
clocks
rulesteps
Ri
RjRk
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-7http://csg.csail.mit.edu/6.375
Parallel executionreorders reads and writes
In the rule semantics, each rule sees (reads) the effects (writes) of previous rules In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks
Rules
HW clocks
rulestepsreads writes reads writes reads writesreads writesreads writes
reads writes reads writes
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-8http://csg.csail.mit.edu/6.375
Correctness
Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution Consequence: the HW can never reach a state unexpected in the rule semantics
Rules
HW
Ri Rj Rk
clocks
rulesteps
Ri
RjRk
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-9http://csg.csail.mit.edu/6.375
UpshotGiven the concurrency of method/rules in a system we can determine viable schedules Some variation do to applicability
BUT we know what schedule we want (mostly) We should be able to back propagate results
to submodules
March 9, 2011 L11-10http://csg.csail.mit.edu/6.375
Determining Concurrency Properties
March 9, 2011 L11-11http://csg.csail.mit.edu/6.375
Processor: Concurrencies
In-order: F < D < E < M < WPipelined W < M < E < D < F
fetch execute
iMem
rf
CPU
decode memory
pc
write-back
dMem
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-12http://csg.csail.mit.edu/6.375
Concurrency requirements for Full Pipelining – Reg File
In-Order RF: (D calls sub) < (W calls upd)
Pipelined RF: (W calls upd) < (D calls sub)
fetch
execute
imem
rf
CPU
decode memory
pc
write-back
dMem
March 9, 2011 L11-13http://csg.csail.mit.edu/6.375
Concurrency requirements for Full Pipelining – FIFOs
In-Order FIFOs: 1. m2wQ, e2mQ: find < enq < first < deq 2. d2eQ: find < enq < first < deq, clear
Pipeline FIFOs: 3. m2wQ, e2mQ : first < deq < enq < find 4. d2eQ : first < deq < find < enq
fetch
execute
imem
rf
CPU
decode memory
pc
write-back
dMem
March 9, 2011 L11-14http://csg.csail.mit.edu/6.375
Constructing Appropriately concurrent submodules
March 9, 2011 L11-15http://csg.csail.mit.edu/6.375
From Analysis to DesignWe need to create modules which behave as needed
Construct modules using “unsafe” primitives to have “safe” behaviors
Three major concepts: Use primitives which remove “false” concurrency
orderings (e.g. ConfigRegs vs. Regs) Add RWires for forwarding values intra-cycle Reason carefully to assure that execution appears
“atomic”
March 9, 2011 L11-16http://csg.csail.mit.edu/6.375
ConfigReg and RWiremkConfigReg is a Reg without this restriction mkReg requires that read < write Allows us to read stale values (dangerous)
RWire is a “wire” wset :: a -> Action writes wget :: Maybe#(a) returns written value if
read happened. wset happens before wget each cycle
March 9, 2011 L11-17http://csg.csail.mit.edu/6.375
Let’s implement some modules
March 9, 2011 L11-18http://csg.csail.mit.edu/6.375
Processor Redux
In-order: F < D < E < M < WPipelined W < M < E < D < F
fetch execute
iMem
rf
CPU
decode memory
pc
write-back
dMem
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-19http://csg.csail.mit.edu/6.375
Concurrency: RegFileThe standard library regfile is implemented using with concurrency (sub < upd) This handles the in-order case
We need to build a RegisterFile for the pipelined case
March 9, 2011 L11-20http://csg.csail.mit.edu/6.375
BypassRegFilemodule mkBypassRegFile(RegFile#(a,d)) #(d l, d h) provisos#(Bits(a,asz), Bits#(d,dsz)); RegFile#(a,d) rfInt <- mkRegFileWCF(l,h); RWire#(Tuple2#(a,d)) curWrite <- mkRWire();
method Action upd(a x, d v); rfInternal.upd(x,v); curWrite.wset(tuple2(x,v));endmethod
method d sub(a x); case (curWrite.wget()) matches tagged Valid {.wa, .wd} &&& wa == a: return wd; default: return
rfInternal.sub(a); endcase endmethod endmodule
March 9, 2011 L11-21http://csg.csail.mit.edu/6.375
Processor Redux
In-order: F < D < E < M < WPipelined W < M < E < D < F
fetch execute
iMem
rf
CPU
decode memory
pc
write-back
dMem
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-22http://csg.csail.mit.edu/6.375
One Element SFIFO (Naïve)module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full ? findf(r, data): Nothing); endmethod endmodule
http://csg.csail.mit.edu/6.375
Concurrency:find < first < (enq C deq)
March 9, 2011 L11-23http://csg.csail.mit.edu/6.375
One Element SFIFO (In-Order d2eQ #1)module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; endmethod method t first() if (full); return data; endmethod method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmethodendmodule
http://csg.csail.mit.edu/6.375
find < first < enq < deq
March 9, 2011 L11-24http://csg.csail.mit.edu/6.375
One Element SFIFO (In-Order e2mQ, m2wQ #2)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; endmethod method t first() if (full || isValid(enqv.wget())); return (fromMaybe(enqv.wget(), data)); endmethod method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmethodendmodule
http://csg.csail.mit.edu/6.375
find < enq < first < deq
March 9, 2011 L11-25http://csg.csail.mit.edu/6.375
One Element Searchable SFIFO (Pipelined #3)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); RWire#(void) enqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; enqw.wset(x); endmethod method Action deq() if (full); full <= False; deqw.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r,data) : isValid(enqw.wget()) ? findf(r, fromMaybe(enqw.wget(),?)): Nothing; endmethod endmodulehttp://csg.csail.mit.edu/6.375
first < deq < enq < find
March 9, 2011 L11-26http://csg.csail.mit.edu/6.375
One Element Searchable SFIFO (Pipelined #4)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqw.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r, data): Nothing;endmethod endmodule
http://csg.csail.mit.edu/6.375
first < deq < find < enq
March 9, 2011 L11-27http://csg.csail.mit.edu/6.375
One Element Searchable SFIFO (Pipelined #4)module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full|| deqp); full <= True; data <= x; 12endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r);
return (full&&!deqp) ? findf(r, data): Nothing; endmethod endmodule
http://csg.csail.mit.edu/6.375
first < deq < find < enq
March 9, 2011 L11-28http://csg.csail.mit.edu/6.375
Up-Down Counter
March 9, 2011 L11-29http://csg.csail.mit.edu/6.375
Counter Module Interfaceinterface Counter method Action up(); method Action down(); method Bit#(32) _read();endinterface
Concurrency: up and down should be independent
March 9, 2011 L11-30http://csg.csail.mit.edu/6.375
Naïve Counter Examplemodule mkCounter(Counter); Reg#(int) r <- mkReg(); method int _read(); return r; endmethod method Action up(); r <= r + 1; endmethod method Action down(); c <= r – 1; endmethodendmodule
March 9, 2011 L11-31http://csg.csail.mit.edu/6.375
Counter Examplemodule mkCounter(Counter); Reg#(int) r <- mkConfigReg(); RWire#(void) upW <- mkRWire(); RWire#(void) downW <- mkRWire();
method int _read(); return r; endmethod method Action up(); upW.wset(); endmethod method Action down(); downW.wset(); endmethod
rule updateR(True); r <= r + (isValid( upW.wget()) ? 1 : 0) - (isValid(downW.wget()) ? 1 : 0); endruleendmodule
What if want to call up then _read?
March 9, 2011 L11-32http://csg.csail.mit.edu/6.375
Completion Buffer
March 9, 2011 L11-33http://csg.csail.mit.edu/6.375
Completion buffer: Interface
interface CBuffer#(type t); method ActionValue#(Token) getToken(); method Action put(Token tok, t d); method ActionValue#(t) getResult();endinterface
typedef Bit#(TLog#(n)) TokenN#(numeric type n);typedef TokenN#(16) Token;
cbuf getResultgetToken
put (result & token)
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-34http://csg.csail.mit.edu/6.375
IP-Lookup module with the completion buffer
module mkIPLookup(IPLookup); rule recirculate… ; rule exit …; method Action enter (IP ip); Token tok <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(tok,ip[15:0])); endmethod method ActionValue#(Msg) getResult(); let result <- cbuf.getResult(); return result; endmethodendmodule
done?RAM
fifo
enter
getResultcbufyes
no
getToken
for enter and getResult to execute simultaneously, cbuf.getToken and cbuf.getResult must execute simultaneously
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-35http://csg.csail.mit.edu/6.375
IP Lookup rules with completion buffer
rule recirculate (!isLeaf(ram.peek())); match{.tok,.rip} = fifo.first(); fifo.enq(tuple2(tok,(rip << 8))); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq();endrule
rule exit (isLeaf(ram.peek())); cbuf.put(ram.peek()); fifo.deq(); ram.deq();endrule
For rule exit and method enter to execute simultaneously, cbuf.put and cbuf.getToken must execute simultaneously
For no dead cycles cbuf.getToken and cbuf.put and cbuf.getResult must be able to execute simultaneously
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-36http://csg.csail.mit.edu/6.375
Naïve Completion Buffermodule mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkReg(0); Reg#(Token) wrP <- mkReg(0); Reg#(Token) cnt <- mkReg(0); method ActionValue#(Token) getToken() if (cnt < Max); cnt <= cnt + 1; rdP <= nextPointer(rdP); valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); endmethod method ActionValue#(t) getResult() if (valids[wrP]) cnt <= cnt -1; wrP <= nextPointer(wrP); return (data.sub(wrP)); endmethodendmodule
March 9, 2011 L11-37http://csg.csail.mit.edu/6.375
Completion buffer: Interface Requirements
cbuf getResultgetToken
put (result & token)
Rules and methods concurrency requirement to avoid dead-cycles: exit < getResult < enter cbuf methods’ concurency: cbuf.getResult < cbuf.put < cbuf.getToken
http://csg.csail.mit.edu/6.375
March 9, 2011 L11-38http://csg.csail.mit.edu/6.375
Completion Buffermodule mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkConfigReg(0); Reg#(Token) wrP <- mkConfigReg(0); Counter cnt <- mkCounter(); method ActionValue#(Token) getToken() if (cnt < Max); cnt.up(); rdP <= rdP + 1; valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); endmethod method ActionValue#(t) getResult() if (valids[wrP]) cnt.down(); wrP <= wrP + 1; return (data.sub(wrP)); endmethodendmodule
getResult < put < getToken
Is the ordering correct?
Is valids okay?