Structural and Temporal Control for Simultaneous Speed and Power Improvement Applied on a 32x32 Dynamic Wallace Tree Multiplier EE241 Prof. Jan Rabaey Zhujie Lin and Michael Liao
Structural and Temporal Control for Simultaneous Speed and Power Improvement
Applied on a 32x32 Dynamic Wallace Tree Multiplier
EE241 Prof. Jan RabaeyZhujie Lin and Michael Liao
Motivation
Faster Evaluation Lower Power Performance and Power
determined by the typical case, not the worst case
The Leakage Issue There is a only
one large “resistor”
Leakage current increases with technology
Solution? Introduce more large “resistors”
Rp
Rn
Precharge Eval
Leakage Paths without Sleep Mode
Sleep Mode When dynamic
circuit is in sleep mode, there is extra large sleep “resistor”
Rp
R_sleep
Rn
R_sleep
Precharge Eval-Sleep
Leakage Paths with Sleep Mode
The Utilization Issue
Unused parts of the multiplier still see clock Cost: CV2 in power The clock tree dissipates power
Solution: Turn on only active parts of the multiplier
Power Dissipation w/o Sleep Mode
blockevalevaleprecheprech FPPP )( argarg
DDeprecheprech VIP argarg
DDleakeval VICVP 2
1arg evaleprech
;
Power Dissipation w/Sleep Mode
widthblocksleepsleepevalevaleprecheprech FPPPP ')( argarg
DDeprecheprech VIP argarg
DDleakeval VICVP 2
DDsleepsleep VIP
1arg sleepevaleprech
clockFF /'
;
;
Sleep Mode and Pulsed Clock
PMOS CLK
NMOS CLK
Sleep Mode and the Use of the Pulsed Clock
Precharge
Precharge Sleep
Sleep
Evaluation Sleep Precharge
Precharge
Visualizing a Wallace Tree as Equal-delay Layers
AND Gates
Vector Add
Multiplier
Equal- Delay Layers
Pulsed-Leap Clock
Worst Case Operation for Pulsed Clock
Equal Delay Layers of Logic
NMOS CLK
PMOS CLK
External CLK
Pulsed-Leap Clock
External CLK
PMOS CLK
NMOS CLK
Typical Case of Pulsed-Leap Clock
Sleep
Sleep
SleepEqual Delay Layers of LogicSleep
SleepSleep
Results - Power
Energy Consumption
0.E+00
2.E-11
4.E-11
6.E-11
8.E-11
0 8 16 24 32
Input Bits
En
erg
y/c
yc
le(J
)
Benchmark Pulsed-Leap Clock
Results - Delay
Delay
0
500
1000
1500
2000
2500
0 8 16 24 32
Input Bits
De
lay
(p
s)
Benchmark Pulsed-Leap Clock
Results - Improvements
% Improvement over Benchmark
0.0
20.0
40.0
60.0
80.0
0 8 16 24 32
Input Bits
%Im
pro
ve
me
nt
Energy Performance