1 2004 MAPLD: 153 Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University
Jan 23, 2016
1 2004 MAPLD: 153Brej
Early output logic and Anti-Tokens
Charlie Brej
APT Group
Manchester University
2 2004 MAPLD: 153Brej
Overview
Synchronous ProblemsAsynchronous Logic
Why?How?
SolutionsEarly OutputAnti-Tokens
3 2004 MAPLD: 153Brej
Problems: Communication
Communication horizon“For a 60 nanometer
process a signal can reach only 5% of the die’s length in a clock cycle” [D. Matzke,1997]
Clock distributed using wave pipelining
4 2004 MAPLD: 153Brej
Problems: Performance
Cycletime
Unbalanced Stages
Clock Skew/Jitter
Transistor Variability
Signal Integrity
Worst – Averagecase performance
Real Computation
Clockoverheads
TimingAssumptionoverheads
5 2004 MAPLD: 153Brej
Clock! What is it good for?
No arguing with the clock9am - 5pm. No excuses!
6 2004 MAPLD: 153Brej
Bundled-Data
When you finish, do the next taskFlexitime
Request + Delay
Acknowledge
7 2004 MAPLD: 153Brej
How do you know when you are finished?
Synchronous:EstimateGlobal timing reference
Asynchronous (bundled-data)EstimateLocal delay elements
Asynchronous (delay-insensitive)When the data arrivesIntrinsic
8 2004 MAPLD: 153Brej
Becoming Delay Insensitive
Dual-RailTwo wires00 – NULL01 – Zero10 – One(11 – Not used)
Four Phase handshakeReturn to zero
R1
Ack
R0
9 2004 MAPLD: 153Brej
Early Output Logic
Dual-Rail interfacesOutput generated as
early as possibleTwo Early output cases
If either input is ‘0’ then the output is ‘0’
10 2004 MAPLD: 153Brej
Bit level pipelining
Forward completed parts of the resultPace workDon’t stall parts unless you have to
11 2004 MAPLD: 153Brej
Bit level pipelining
Forward completed parts of the resultPace workDon’t stall parts unless you have to
12 2004 MAPLD: 153Brej
Bit level pipelining
Forward completed parts of the resultPace workDon’t stall parts unless you have to
13 2004 MAPLD: 153Brej
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Inputs Present
Pro
babi
lity
of O
utpu
t
Branch unitinc. Coproc.
Branch unit
CompareLT/GE 8bit
CompareEqual 8bit
Adder bit 8
Mux 8:1
Memoryshift unit
ALU Slice
16 inputAND
Early Output cases
14 2004 MAPLD: 153Brej
Validity
Unnecessary late inputsMust be acknowledgedMust wait until they arrive
Validity signalLatch generatedReady to be acknowledged
Result before all inputs presentAcknowledge after all inputs present
15 2004 MAPLD: 153Brej
Synchronisation Hurts
No need to wait before generating result
Need to wait for input in order to acknowledge it
Unnecessary stall
16 2004 MAPLD: 153Brej
Anti-Tokens
Unnecessary late inputsStall the entire stage
Proactive approachSend a ‘cancel’ signal backward to the sourceAcknowledge before data arrives
Anti-Token latchesAssert validity early
17 2004 MAPLD: 153Brej
Anti-token generation
0
1
C
18 2004 MAPLD: 153Brej
Anti-token generation
0
A 1
C
19 2004 MAPLD: 153Brej
Anti-token Propagation
1
C
A
20 2004 MAPLD: 153Brej
Anti-token Propagation
1
C
AA
21 2004 MAPLD: 153Brej
Anti-token Token collisions
1 1 A A
1 1 A A?
A?1
22 2004 MAPLD: 153Brej
Anti-token Token collisions
1 1 A
1 1 A A1
A1
11
23 2004 MAPLD: 153Brej
Remove Unnecessary computation
Cycletime
Unbalanced Stages
Clock Skew/Jitter
Transistor Variability
Signal Integrity
Worst – Averagecase performance
Real Computation
Clockoverheads
TimingAssumptionoverheads
Unnecessary Computation/Delays
24 2004 MAPLD: 153Brej
Summary
AsynchronousDelay Insensitive
Safe No timing assumptions
Average case performanceRemove unnecessary computationAnti-tokens without mutual exclusion units