This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
““MOUSETRAP”: uses a “capture protocol”MOUSETRAP”: uses a “capture protocol”
LatchesLatches … … areare normally transparent: normally transparent: beforebefore new data arrivesnew data arrives becomebecome opaque: opaque: afterafter data arrives (“capture” data)data arrives (“capture” data)
Control Signaling:Control Signaling: transition-signaling = 2-phasetransition-signaling = 2-phase simple protocol: simple protocol: req/ackreq/ack = only 2 events per handshake = only 2 events per handshake
(not 4)(not 4) nono “return-to-zero” “return-to-zero” each transition (up/down) signals a distinct operationeach transition (up/down) signals a distinct operation
Our Goal:Our Goal: very fast cycle time very fast cycle time simple inter-stage communicationsimple inter-stage communication
3
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
Stage NStage N-1 Stage N+1
En
MOUSETRAP: A Basic FIFOMOUSETRAP: A Basic FIFOStages communicate usingStages communicate using transition-transition-
signaling:signaling:
1 transition1 transitionper data item!per data item!
11stst data item flowing through the pipeline data item flowing through the pipeline1st data item flowing through the pipeline22ndnd data item flowing through the pipeline data item flowing through the pipeline
4
MOUSETRAP: A Basic FIFO MOUSETRAP: A Basic FIFO (contd.)(contd.)Latch controller (XNOR) acts as Latch controller (XNOR) acts as “phase “phase
converter”:converter”: 2 distinct transitions (up or down) 2 distinct transitions (up or down) pulsed latch pulsed latch
enableenable
2 transitions per2 transitions per latch cyclelatch cycle
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
Stage NStage N-1 Stage N+1
En
Latch is re-enabled when Latch is re-enabled when next stage is “done”next stage is “done”Latch is disabled when Latch is disabled when current stage is “done”current stage is “done”
5
MOUSETRAP: FIFO Cycle TimeMOUSETRAP: FIFO Cycle Time
XNORLATCH TT 2Cycle Time =Cycle Time =
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
Stage NStage N-1 Stage N+1
En
Fast self-loop:Fast self-loop: N disables itselfN disables itself
One pulse per data item flowing through:One pulse per data item flowing through: down transition:down transition: caused bycaused by “done”“done” of N of N up transition:up transition: caused bycaused by “done”“done” of N+1 of N+1
No minimum pulse width constraint!No minimum pulse width constraint! simply, down transition should start “early enough”simply, down transition should start “early enough” can be “negative width” (no pulse!)can be “negative width” (no pulse!)
ack from N+1
Stage N’s Latch Controller
to Latch
done from N
7
Stage N+1
logic
delaydelay
Stage N
Data Latch
Latch Controller
doneN
logic
delaydelay
Stage N-1
logic
delaydelayreqreqNN
ackN-1
reqreqN+N+11
ackN
MOUSETRAP: Pipeline With LogicMOUSETRAP: Pipeline With Logic
Logic Blocks:Logic Blocks: can use standard single-rail (non-hazard-free)can use standard single-rail (non-hazard-free)““Bundled Data” Requirement:Bundled Data” Requirement:
eacheach “req”“req” must arrive must arrive after after data inputs valid and stabledata inputs valid and stable
Simple Extension to FIFO: Simple Extension to FIFO:
insert insert logic blocklogic block + + matching delaymatching delay in each in each stagestage
8
Special Case: Using “Clocked Special Case: Using “Clocked Logic”Logic”Clocked-CMOS = CClocked-CMOS = C22MOS: eliminate explicit MOS: eliminate explicit
latcheslatches latch folded into logic itselflatch folded into logic itself
pull-upnetworkpull-upnetwork
pull-downnetwork
pull-downnetwork
“keeper”
EnEn
EnEn
A General C2MOS gate
logicinputs
logicinputs
logicoutput
C2MOS AND-gate
“keeper”
EnEn
EnEn
A
B
BA
logicoutput
9
Gate-Level MOUSETRAP: with Gate-Level MOUSETRAP: with CC22MOSMOS
Use CUse C22MOS:MOS: eliminate explicit latcheseliminate explicit latches
New Control Optimization =New Control Optimization = “Dual-Rail “Dual-Rail XNOR”XNOR” eliminate 2 inverters from critical patheliminate 2 inverters from critical path
C2MOS logic
Latch Controller
Stage NStage N-1 Stage N+1
2 22
2 22
2
2
En,En
pair ofbit latches
reqN
ackN-1
reqN+1
ackN
doneN
(En,En’)(En,En’) (done,done’)(done,done’)
(ack,ack’)(ack,ack’)
10
Problems with Linear Pipelining:Problems with Linear Pipelining: handles limited applications; real systems are more handles limited applications; real systems are more
Related ProtocolsRelated ProtocolsDay/Woods (’97), and Charlie Boxes (’00)Day/Woods (’97), and Charlie Boxes (’00)
Similarities: Similarities: all use…all use… transition signaling transition signaling for handshakesfor handshakes phase conversion phase conversion for latch signalsfor latch signals
Differences: Differences: MOUSETRAP has…MOUSETRAP has… higher throughputhigher throughput ability to handleability to handle fork/joinfork/join datapathsdatapaths more aggressive timing, less insensitivity to delaysmore aggressive timing, less insensitivity to delays
13
Performance, Timing and Optzn.Performance, Timing and Optzn.
MOUSETRAP with Logic:MOUSETRAP with Logic:
XNORMOSCTT 22Cycle Time =Cycle Time =
MOSCT 2Stage Latency =Stage Latency =
LOGICLATCH TT Stage Latency =Stage Latency =
LOGICXNORLATCH TTT 2Cycle Time =Cycle Time =
MOUSETRAP Using CMOUSETRAP Using C22MOS Gates:MOS Gates:
Data must be safely “captured” byData must be safely “captured” by Stage NStage N
before new inputs arrive frombefore new inputs arrive from Stage Stage N-1N-1 simple 1-sided timing constraint: simple 1-sided timing constraint: fast latch disablefast latch disable Stage N’s Stage N’s “self-loop”“self-loop” faster than faster than entire pathentire path through through
previous stageprevious stage
Stage Stage NN
Data Latch
Latch Controller
doneN
logic
delaydelay
Stage Stage N-N-11
logic
delaydelayreqN
ackN-1
reqN+1
ackN
15
Timing Optzn: Reducing Cycle Timing Optzn: Reducing Cycle TimeTimeAnalytical Cycle Time =Analytical Cycle Time =
Goal:Goal: shorten shorten (in steady-state operation) (in steady-state operation)
Steady-state = no undue pipeline congestionSteady-state = no undue pipeline congestion
Observation:Observation: XNOR switches twice per data item: XNOR switches twice per data item:
only 2nd (up) transition criticalonly 2nd (up) transition critical for performance:for performance:
Solution: Solution: reduce XNOR output swingreduce XNOR output swing degrade “slew” for start of pulsedegrade “slew” for start of pulse allows quick pulse completion: faster rise timeallows quick pulse completion: faster rise time
Still safe when congested:Still safe when congested: pulse starts on timepulse starts on time pulse maintained until congestion clearspulse maintained until congestion clears
XNORLOGICLATCH TTT2
XNORXNOR TT and
XNORT
XNORT
16
Timing Optzn (contd.)Timing Optzn (contd.)
N “done”N “done” N+1 “done”N+1 “done”
N’s latchN’s latch disableddisabled
N’s latchN’s latch re-enabledre-enabled
“unoptimized” XNOR output
“optimized” XNOR output
latch only partlylatch only partlydisabled;disabled;recovers quicker!recovers quicker!
(no pulse width(no pulse widthrequirement)requirement)
17
Comparison with Wave PipeliningComparison with Wave Pipelining
Two Scenarios:Two Scenarios: Steady State:Steady State:
both MOUSETRAP and wave pipelines act like both MOUSETRAP and wave pipelines act like transparent “flow through” combinational pipelinestransparent “flow through” combinational pipelines
Congestion:Congestion: right environment stalls: right environment stalls: each MOUSETRAP stage each MOUSETRAP stage
safely captures datasafely captures data internal stage slow:internal stage slow: MOUSETRAP stages to its left MOUSETRAP stages to its left
safely capture datasafely capture data
congestion properly handled in congestion properly handled in MOUSETRAPMOUSETRAP
Conclusion: MOUSETRAP has potential of…Conclusion: MOUSETRAP has potential of… speed of wave pipeliningspeed of wave pipelining greater robustness and flexibilitygreater robustness and flexibility
Preliminary ResultsPreliminary ResultsPre-Layout Simulations of FIFO’s:Pre-Layout Simulations of FIFO’s:
do not account for wire delays, parasitics, etc.do not account for wire delays, parasitics, etc. careful transistor sizing/verification of timing careful transistor sizing/verification of timing
constraintsconstraints
C2MOS FIFO in 0.6 HP CMOS (3.3V, 300K, normal corner)
Design Latch Delay (ps)
XNOR Delay (ps) tXNOR tXNOR
Throughput (GHz)
MOUSETRAP 220 130 160 1.67
MOUSETRAP (optimized)
200 180 120 1.92
FIFO in 0.25 TSMC (2.5V, 300K, normal corner)
Design Latch Delay (ps)
XNOR Delay (ps) tXNOR tXNOR
Throughput (GHz)
MOUSETRAP 110 65 63 3.51
21
Conclusions and Future WorkConclusions and Future WorkIntroduced a new asynchronous pipeline style:Introduced a new asynchronous pipeline style:
Static logic blocksStatic logic blocks Simple latches and control:Simple latches and control:
transparent latches, or Ctransparent latches, or C22MOS gatesMOS gates single gate control = 1 XNOR gate/stagesingle gate control = 1 XNOR gate/stage
3.5 GHz in 0.253.5 GHz in 0.25, 1.9 GHz in 0.6, 1.9 GHz in 0.6 comparable to wave pipelines; yet more robust/less design comparable to wave pipelines; yet more robust/less design
efforteffort Correctly handle Correctly handle forksforks and and joinsjoins in datapaths in datapaths Timing constrains: local, 1-sided, easily metTiming constrains: local, 1-sided, easily met
Ongoing Work:Ongoing Work: more realistic performance measurement (incl. more realistic performance measurement (incl.
parasitics)parasitics) layout and fabricationlayout and fabrication