August 9, 2006 Agrawal: VDAT'06 Tutorial II 1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849, USA http://www.eng.auburn.edu/~vagrawal [email protected]
118
Embed
August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
August 9, 2006 Agrawal: VDAT'06 Tutorial II 1
Low-Power Electronics and Systems
Vishwani D. AgrawalJames J. Danaher Professor
Department of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849, USA
– Short circuit power– Reduced supply voltage operation– Glitch elimination
• Static (leakage) power reduction• Low power systems
– State encoding– Processor and multi-core design
• Books on low-power design
August 9, 2006 Agrawal: VDAT'06 Tutorial II 3
Introduction
Why is it a concern?
Power Consumption of VLSI Chips
August 9, 2006 Agrawal: VDAT'06 Tutorial II 4
ISSCC, Feb. 2001, Keynote“Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second -- about the same number of calculations that the world's fastest supercomputer can perform now.
“Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor. . . .”
Patrick P. Gelsinger Senior Vice PresidentGeneral ManagerDigital Enterprise Group INTEL CORP.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 5
VLSI Chip Power Density
40048008
80808085
8086
286386
486Pentium®
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Po
wer
Den
sity
(W
/cm
2 )
Hot Plate
NuclearReactor
RocketNozzle
Sun’sSurface
Source: Intel
August 9, 2006 Agrawal: VDAT'06 Tutorial II 6
Meaning of Low-Power Design
• Design practices that reduce power consumption at least by one order of magnitude; in practice 50% reduction is often acceptable.
• General considerations in low-power design– Algorithms and architectures– High-level and software techniques– Gate and circuit-level methods– Power estimation techniques– Test power
August 9, 2006 Agrawal: VDAT'06 Tutorial II 7
Topics in Low-Power• Power dissipation in CMOS circuits• Device technology
– Low-power CMOS technologies– Energy recovery methods
• Circuit and gate level methods– Logic synthesis– Dynamic power reduction techniques– Leakage power reduction
• System level methods– Microprocessors– Arithmetic circuits– Low power memory technology
• Test power• Power estimation methods and tools
August 9, 2006 Agrawal: VDAT'06 Tutorial II 8
Power in a CMOS GateVVDDDD
iiDDDD(t)(t)
GroundGround
August 9, 2006 Agrawal: VDAT'06 Tutorial II 9
Power Dissipation in CMOS Logic (0.25µ)
%75 %5%20
Ptotal (0→1) = CL VDD2
+ tscVDD Ipeak + VDDIleakage
CL
VDD VDD
August 9, 2006 Agrawal: VDAT'06 Tutorial II 10
Power and Energy
• Instantaneous power (Watts)
P(t) = iDD(t) VDD
• Peak power (Watts)
Ppeak = Max {P(t)}• Average power (Watts)
Pav = [ ∫0
T P(t) dt ]/T
• Energy (Joules)
E = ∫0
T P(t) dt
August 9, 2006 Agrawal: VDAT'06 Tutorial II 11
Low-Power Design Techniques
• Circuit and gate level methods–Reduced supply voltage
–Adiabatic switching and charge recovery
–Logic design for reduced activity
–Reduced Glitches
–Transistor sizing
–Pass-transistor logic
–Pseudo-nMOS logic
–Multi-threshold gates
August 9, 2006 Agrawal: VDAT'06 Tutorial II 12
Low-Power Design Techniques
• Functional and architectural methods– Clock suppression– Clock frequency reduction– Supply voltage reduction– Power down– Algorithmic and Software methods
August 9, 2006 Agrawal: VDAT'06 Tutorial II 13
Test Power• Power grid on a VLSI chip is designed for
certain current capacity during functional operation:– Average current → heat dissipation– Peak current → noise, ground bounce
• Problem – Tests like scan or BIST are nonfunctional and may cause higher than the functional circuit activity; a functionally good chip can fail the test.
∞ ∞ -t V -t∫ v(t) i(t) dt = ∫ V [1- exp( ── )] ─ exp( ── ) dt0 0 RC R RC
1 = ─ CV2
2
August 9, 2006 Agrawal: VDAT'06 Tutorial II 23
Transition Power• Gate output rising transition
– Energy dissipated in pMOS transistor = CV2/2– Energy stored in capacitor = CV2/2
• Gate output falling transition– Energy dissipated in nMOS transistor = CV2/2
• Energy dissipated per transition = CV2/2• Power dissipation:
Ptrans = Etrans α fck = α fck CV2/2
α = activity factor
August 9, 2006 Agrawal: VDAT'06 Tutorial II 24
Short Circuit Current, isc(t)
Time (ns)0 1
Amp
Volt
VDD
isc(t)
0
Vi(t)Vo(t)
VDD - VTp
VTn
tB tE
Iscmaxf
VDD
Vi(t) Vo(t)
GND
August 9, 2006 Agrawal: VDAT'06 Tutorial II 25
Short-Circuit Energy per Transition
• Escf =∫tB
tE VDD isc(t)dt = (tE – tB) IscmaxfVDD /2
• Escf = tf (VDD- |VTp| -VTn) Iscmaxf /2
• Escr = tr (VDD- |VTp| -VTn) Iscmaxr /2
• Escf = 0, when VDD = |VTp| + VTn
August 9, 2006 Agrawal: VDAT'06 Tutorial II 26
Short-Circuit Power and Voltage Scaling
• Decreases and eventually becomes zero when VDD is scaled down but the threshold voltages are not scaled down.
• References:– M. A. Ortega and J. Figueras, “Short Circuit Power
Modeling in Submicron CMOS,” PATMOS’96, Aug. 1996, pp. 147-166.
– T. Sakurai and A. Newton, “Alpha-power Law MOSFET model and Its Application to a CMOS Inverter,” IEEE J. Solid State Circuits, vol. 25, April 1990, pp. 584-594.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 27
Psc and Output Capacitance
VVDDDD
GroundGround
CL
Ron
R=large
vi (t) vo(t) ic(t)+isc(t)
tftr vo(t)───
R↑
August 9, 2006 Agrawal: VDAT'06 Tutorial II 28
isc and Output Capacitance
-tVDD[1- exp(─────)]
vo(t) R↓tf (t)CIsc(t) = ──── = ──────────────
R↑tf (t) R↑tf (t)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 29
iscmax and Output Capacitance
Small C Large C
tf
1────R↑tf (t)
iscmax
vo(t) vo(t)
i
t
August 9, 2006 Agrawal: VDAT'06 Tutorial II 30
Psc, Output Rise Times, Capacitance
• For given input rise and fall times short circuit power decreases as output capacitance increases.
• Short circuit power increases with increase of input rise and fall times.
• Short circuit power is reduced if output rise and fall times are smaller than the input rise and fall times.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 31
Effects of Scaling Down
• 1-16% short-circuit power at 0.7 micron
• 4-37% at 0.35 micron
• 12-60% at 0.17 micron
• Reference: S. R. Vemuru and N. Steinberg, “Short Circuit Power Dissipation Estimation for CMOS Logic Gates,” IEEE Trans. on Circuits and Systems I, vol. 41, Nov. 1994, pp. 762-765.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 32
Summary: Short-Circuit Power
• Short-circuit power is consumed by each transition (increases with input transition time).
• Reduction requires that gate output transition should not be faster than the input transition (faster gates can consume more short-circuit power).
• Increasing the output load capacitance reduces short-circuit power.
• Scaling down of supply voltage with respect to threshold voltages reduces short-circuit power.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 33
Dynamic Power
VVDDDD
GroundGround
CL
R
R
Dynamic Power
= CLVDD2/2 + Psc
Vi
Vo
isc
August 9, 2006 Agrawal: VDAT'06 Tutorial II 34
Dynamic Power Reduction
• Reduce power per transition– Reduced voltage operation – voltage scaling– Capacitance minimization – device sizing
• Reduce number of transitions– Glitch elimination
For very short channel devices, α = 1, VDD = 1.5Vt
August 9, 2006 Agrawal: VDAT'06 Tutorial II 44
Transistor Sizing for Performance
• Problem: If we increase W/L to make the charging or discharging of load capacitance, then the increased W increases the load for the driving gate
Cin CL
August 9, 2006 Agrawal: VDAT'06 Tutorial II 45
Fixed-Taper Buffer
VinVout
CLCin
1 α α2 αi-1 αn-1
Ci = αi-1Cin
CL = αnCin
Delay= t0
Ref.: J. Segura and C. F. Hawkins, CMOS Electronics, How It Works,How It Fails, Piscataway, New Jersey: IEEE Press, 2004.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 46
Buffer (Cont.)
αn = CL/Cin
ln (CL/Cin)n = ──────
ln α
ith stage delay, ti = αt0, i = 1, . . . n, because each stage drives a stage α times bigger than itself.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 47
Buffer (Cont.)
nTotal delay = Σ ti = nαt0
i=1
= ln(CL/Cin) αt0/ln(α)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 48
Buffer (Cont.)
Differentiating total delay with respect to α and equating to 0, we get
αopt = e ≈ 2.7
The optimum number of stages is
nopt = ln(CL/Cin)
August 9, 2006 Agrawal: VDAT'06 Tutorial II 49
Further Reading
B. S. Cherkauer and E. G. Friedman, “A Unified DesignMethodology for CMOS Tapered Buffers,” IEEE Trans.VLSI Systems, vol. 3, no. 1, pp. 99-111, March 1995.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 50
Logic Activity and Glitches
4 5
7
61
2
3
d=2d=1 d=1
d=1
August 9, 2006 Agrawal: VDAT'06 Tutorial II 51
Glitch Power Reduction
• Design a digital circuit for minimum transient energy consumption by eliminating hazards
August 9, 2006 Agrawal: VDAT'06 Tutorial II 52
Theorem 1• For correct operation with minimum
energy consumption, a Boolean gate must produce no more than one event per transition.
Output logic state changesOne transition is necessary
Output logic state unchangedNo transition is necessary
August 9, 2006 Agrawal: VDAT'06 Tutorial II 53
Inertial Delay of a Gate (Inverter)
dHL dLH
dHL+dLH
d = ──── 2
Vin
Vout
time
August 9, 2006 Agrawal: VDAT'06 Tutorial II 54
• Given that events occur at the input of a gate with inertial delay d at times,
t1 ≤ . . . ≤ tn , the number of events at the gate output cannot exceed
Theorem 2
min ( min ( n n , 1 + ), 1 + )ttnn – t – t11
----------------dd
ttnn - t - t11
tt11 t t22 t t33 t tnn timetime
August 9, 2006 Agrawal: VDAT'06 Tutorial II 55
Minimum Transient Design
• Minimum transient energy condition for a Boolean gate:
| t| tii - t - tjj | < d | < d
Where tWhere tii and t and tjj are arrival times of input are arrival times of input
events and d is the inertial delay of gateevents and d is the inertial delay of gate
August 9, 2006 Agrawal: VDAT'06 Tutorial II 56
Balanced Delay Method
• All input events arrive simultaneously• Overall circuit delay not increased• Delay buffers may have to be inserted
11 111111 11
111111
33
11 11
4?4?
August 9, 2006 Agrawal: VDAT'06 Tutorial II 57
Hazard Filter Method• Gate delay is made greater than maximum input path delay
difference• No delay buffers needed (least transient energy)• Overall circuit delay may increase
33 111111 11
33111111 11
August 9, 2006 Agrawal: VDAT'06 Tutorial II 58
Glitch-Free Design by Linear Programming
• Variables: gate and buffer delays
• Objective: minimize number of buffers
• Subject to: overall circuit delay
• Subject to: minimum transient condition for multi-input gate
August 9, 2006 Agrawal: VDAT'06 Tutorial II 59
Variables for Full-Adder
• Gate delay variables d4 . . . d12
• Buffer delay variables d15 . . . d29
Delay variables are located at the checkpoints of the circuit.
Delay variables
August 9, 2006 Agrawal: VDAT'06 Tutorial II 60
Objective Function
• Ideal: minimize the number of non-zero delay buffers
• Actual: minimize sum of buffer delays
August 9, 2006 Agrawal: VDAT'06 Tutorial II 61
Specify Critical Path Delay
1111
11 11
1111
11
11
11
0000
00
0000
00
00 0000
0000
00
00
Sum of delays on critical path ≤ Sum of delays on critical path ≤ maxdelmaxdel
Original design
August 9, 2006 Agrawal: VDAT'06 Tutorial II 62
Multi-Input Gate Condition
11
11 11
11
d1d1
d2d2
dd
d1 - d2 ≤ dd1 - d2 ≤ dd2 - d1 ≤ dd2 - d1 ≤ d
dd
dd
|d1 - d2| ≤ d ≡|d1 - d2| ≤ d ≡
August 9, 2006 Agrawal: VDAT'06 Tutorial II 63
Results: 1-Bit Adder
R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 64
AMPL Solution: maxdel = 6
2211
11 11
1111
22
11
22
22
11
August 9, 2006 Agrawal: VDAT'06 Tutorial II 65
AMPL Solution: maxdel = 7
2222
11 11
1111
11
11
33
22
August 9, 2006 Agrawal: VDAT'06 Tutorial II 66
AMPL Solution: maxdel ≥ 11
2233
11 11
1111
44
33
55
August 9, 2006 Agrawal: VDAT'06 Tutorial II 67
Removing a Limitation• Constraints are written by path enumeration.
• Since number of paths in a circuit can be exponential
in circuit size, the formulation is infeasible for large
circuits.
• Example: c880 has 6.96M constraints.
• Solution: A linear complexity method. See,– T. Raja, Master’s Thesis, Rutgers University, 2002.
– T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum
Dynamic Power CMOS Circuit Design by a Reduced
Constraint Set Linear Program,” Proc. 16th International Conf.
VLSI Design, 2003, pp. 527-532.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 68
Comparison of Constraints
Number of gates in circuit
Nu
mb
er
of
con
stra
ints
August 9, 2006 Agrawal: VDAT'06 Tutorial II 69
Benchmark Circuits
Circuit
C432
C880
C6288
c7552
Maxdel.(gates)
1734
2448
4794
4386
No. ofBuffers
9566
6234
294120
366111
Average
0.720.62
0.680.68
0.400.36
0.380.36
Peak
0.670.60
0.540.52
0.360.34
0.340.32
Normalized Power
August 9, 2006 Agrawal: VDAT'06 Tutorial II 70
c7552: 3,500-gate CMOS Circuit
Clock CyclesInst
an
tan
eo
us
En
erg
y x
10--
10 J
ou
les
August 9, 2006 Agrawal: VDAT'06 Tutorial II 71
References• R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for
Mathematical Programming, South San Francisco: The Scientific Press, 1993.• M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc.
ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp. 183-188.• V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10th Int’l Conf.
VLSI Design, Jan. 1997, pp. 193-197.• V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital
Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12th Int’l Conf. VLSI Design, Jan. 1999, pp. 434-439.
• M. Hsiao, E. M. Rudnick and J. H. Patel, “Effects of Delay Model in Peak Power Estimation of VLSI Circuits,” Proc. ICCAD, Nov. 1997, pp. 45-51.
• T. Raja, A Reduced Constraint Set Linear Program for Low Power Design of Digital Circuits, Master’s Thesis, Rutgers Univ., New Jersey, 2002.
• T. Raja, V. D. Agrawal and M. L. Bushnell, “Transistor Sizing of Logic gates to Maximize Input Delay Variability,” J. of Low Power Electronics (JOLPE), vol. 2, pp. 121-128, 2006.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 72
Static (Leakage) Power
• Dynamic– Signal transitions
• Logic activity• Glitches
– Short-circuit
• Static– Leakage
August 9, 2006 Agrawal: VDAT'06 Tutorial II 73
Leakage Power
IG
ID
Isub
IPT
IGIDL
n+ n+
GroundVDD
R
August 9, 2006 Agrawal: VDAT'06 Tutorial II 74
Leakage Current Components
• Subthreshold conduction, Isub
• Reverse bias pn junction conduction, ID
• Gate induced drain leakage, IGIDL due to
tunneling at the gate-drain overlap
• Drain source punchthrough, IPT due to short
channel and high drain-source voltage
• Gate tunneling, IG through thin oxide
August 9, 2006 Agrawal: VDAT'06 Tutorial II 75
Subthreshold Current
Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH)/nVt}
μ0: carrier surface mobility
Cox: gate oxide capacitance per unit area
L: channel lengthW: gate widthVt = kT/q: thermal voltage
n: a technology parameter
August 9, 2006 Agrawal: VDAT'06 Tutorial II 76
IDS for Short Channel Device
Isub = μ0 Cox (W/L) Vt2 exp{(VGS-VTH+ηVDS)/nVt}
VDS = drain to source voltage
η: a proportionality factor
August 9, 2006 Agrawal: VDAT'06 Tutorial II 77
Increased Subthreshold Leakage
0 VTH’ VTH
Lo
g I
sub
Gate voltage
Scaled device
Ic
August 9, 2006 Agrawal: VDAT'06 Tutorial II 78
Reducing Leakage Power
• Leakage power as a fraction of the total power increases as clock frequency drops. Turning supply off in unused parts can save power.
• For a gate it is a small fraction of the total power; it can be significant for very large circuits.
• Scaling down features requires lowering the threshold voltage, which increases leakage power; roughly doubles with each shrinking.
• Multiple-threshold devices are used to reduce leakage power.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 79
Problem Statement• Problem: To Design a CMOS Circuit,
– using dual-threshold devices to globally minimize subthreshold leakage
– using delay elements to eliminate all glitches– maintaining specified performance– allowing performance-power tradeoff
• Reference: Y. Lu and V. D. Agrawal, “Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing,” Proc. PATMOS, 2005, pp. 217-226.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 80
MILP: Mixed Integer Linear ProgramMinimize { Σ Xi ILi + (1-Xi)IHi
all gates i
+ Σ Σ Δdij } all gates i→ j
Where Xi = 1, gate i has low Vth, low leakage = ILi
Xi = 0, gate i has high Vth, high leakage = IHi
Δdij = delay inserted between gates i and j
for glitch suppression
Xi = [0,1], is an integer, Δdij is a real variable
ILi and IHi are constants for gate i obtained by SPICE simulation
August 9, 2006 Agrawal: VDAT'06 Tutorial II 81
MILP - Constraints
Circuit delay constraint for each PO i: Tmax can be the delay of critical path or clock period specified by the
circuit designer.
Glitch suppression constraint for each gate i:
(1)
(2)
(3)
Constraints (1), (2) and (3) make sure that Ti - ti < di for each gate, so glitches are eliminated.
Ti is the latest signal arrival time at the output of gate i.
ti is the earliest signal arrival time at the output of gate i.
iiHiiLii
HiiLiijiji
HiiLiijiji
tTDXDX
DXDXdtt
DXDXdTT
)1(
)1(
)1(
,
,
maxTTi
August 9, 2006 Agrawal: VDAT'06 Tutorial II 82
Power-Delay Tradeoff Example14-Gate Full Adder (Unptimized, Tmax = Tc)
A
B
C
S
C0
Low Vth gates
Critical path
Ileak = 161 pA
August 9, 2006 Agrawal: VDAT'06 Tutorial II 83
Power-Delay Tradeoff Example14-Gate Full Adder (Optimized, Tmax = Tc)
A
B
C
S
C0
Low Vth
High Vth
Delay buffer (high Vth)
Critical path
Ileak = 73 pA
August 9, 2006 Agrawal: VDAT'06 Tutorial II 84
Power-Delay Tradeoff Example14-Gate Full Adder (Optimized, Tmax =
1.25Tc)A
B
C
S
C0
Low Vth
High Vth
Delay buffer (high Vth)
Critical path
Ileak = 16 pA
August 9, 2006 Agrawal: VDAT'06 Tutorial II 85
Leakage Reduction and Performance Tradeoff @ 27 , 70nm℃
State encoding can be selected using a power-based cost function.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 91
FSM: Clock-Gating• Moore machine: Outputs depend only on
the state variables.– If a state has a self-loop in the state transition
graph (STG), then clock can be stopped whenever a self-loop is to be executed.
Sj
SiSk
Xi/Zk
Xk/Zk
Xj/Zk
Clock can be stopped when (Xk, Sk) combination occurs.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 92
Clock-Gating in Moore FSM
Combinational logic
LatchClock
activation logic
Flip
-flo
ps
PI
CK
PO
L. Benini and G. De Micheli,Dynamic Power Management,Boston: Springer, 1998.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 93
Clock-Gating in Low-Power Flip-Flop
D QD
CK
C. Piguet, “Circuit and Logic Level Design,” pages 103-133 in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 94
Reduced-Power Shift Register
D Q D Q D Q
D QD QD Q
D Q
D Q
D
CK(f/2)
mu
ltip
lexe
r
Output
Flip-flops are operated at full voltage and half the clock frequency.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 95
Power Reduction in Processors
• Just about everything is used.• Hardware methods:
• Voltage reduction for dynamic power• Dual-threshold devices for leakage reduction• Clock gating, frequency reduction• Sleep mode
• Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W• Reduce voltage to 1.5V, power (5.3x) = 4.9W• Eliminate FP, power (3x) = 1.6W• Scale 0.75→0.35μ, power (2x) = 0.8W• Reduce clock load, power (1.3x) = 0.6W• Reduce frequency 200→160MHz, power (1.25x) = 0.5W• J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1703-1714, Nov. 1996.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 98
Low-Power Datapath Architecture• Lower supply voltage
– This slows down circuit speed– Use parallel computing to gain the speed back
• Works well when threshold voltage is also lowered.
• About 60% reduction in power obtainable.• Reference: A. P. Chandrakasan and R. W.
Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 99
A Reference Datapath
Combinationallogic
OutputInputR
eg
iste
r
Re
gis
ter
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fPower consumption: Pref = CrefVref
2f
Cref
August 9, 2006 Agrawal: VDAT'06 Tutorial II 100
A Parallel ArchitectureComb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy N
Re
gis
ter
Re
gis
ter
Re
gis
ter
Re
gis
ter
N to
1 m
ulti
ple
xer
MultiphaseClock gen. and mux
control
InputOutput
CK
f
f/N
f/N
f/N
A copy processes every Nth input, operates at reduced voltage
Supply voltage:VN ≤ V1 = Vref
N = Deg. of parallelism
August 9, 2006 Agrawal: VDAT'06 Tutorial II 101
Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
August 9, 2006 Agrawal: VDAT'06 Tutorial II 102
PowerPN = Pproc + Poverhead
Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN
2f
= (Cinreg+ Ccomb+Coutreg)VN2f
= CrefVN2f
Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN
2f
PN = [1 + δ(N – 1)]CrefVN2f
PN VN2
── = [1 + δ(N – 1)] ───P1 Vref
2
August 9, 2006 Agrawal: VDAT'06 Tutorial II 103
Voltage vs. Speed CLVref CLVref
Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2
where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage
Supply voltage
No
rma
lize
d g
ate
de
lay,
T
4.0
3.0
2.0
1.0
0.0 Vt Vref =5VV2=2.9V
N=1
N=2
V3
N=31.2μ CMOS Voltage reduction
slows down as we get closer to Vt
August 9, 2006 Agrawal: VDAT'06 Tutorial II 104
Increasing Multiprocessing
PN/P1
1 2 3 4 5 6 7 8 9 10 11 12
1.0
0.8
0.6
0.4
0.2
0.0
Vt=0V (extreme case)
Vt=0.4V
Vt=0.8V
N
1.2μ CMOS, Vref = 5V
August 9, 2006 Agrawal: VDAT'06 Tutorial II 105
Extreme Cases: Vt = 0Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2
For negligible overhead, δ→0
PN 1── ≈ ──P1 N2
For Vt > 0, power reduction is less and there will be an optimum value of N.
Approximate Trend n-parallel proc. n-stage pipeline proc.
Capacitance nC C
Voltage V/n V/n
Frequency f/n f
Power CV2f/n2 CV2f/n2
Chip area n times 10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 112
Multicore Processors
2000 2004 2008
Per
form
ance
bas
ed o
nS
PE
Cin
t200
0 an
d S
PE
Cfp
2000
ben
chm
arks
Multicore
Single core
Computer, May 2005, p. 12
August 9, 2006 Agrawal: VDAT'06 Tutorial II 113
Multicore Processors
• D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005.
• A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors.
• S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 114
Cell - Cell Broadband Engine Architecture
L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony
Books on Low-Power Design (1) • L. Benini and G. De Micheli, Dynamic Power Management Design Techniques and
CAD Tools, Boston: Springer, 1998.• T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston:
Springer, 2002.• A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston:
Springer, 1995.• A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE
Press, 1998.• J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral
and System Levels using Formal Methods, Boston: Springer, 1999.• M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced Low-Power Digital
Circuit Techniques, Boston: Springer, 1997.• R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum
Publishers, 2002.• S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston:
Springer, 1998.• J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: Wiley-
Interscience, 1999.• J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power
Sequential Logic Circuits, Boston: Springer, 1997.• S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS
Technologies, Boston: Springer, 2005.• W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics,
Boston: Springer, 1997.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 117
Books on Low-Power Design (2)• N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits,
Boston: Springer, 2003.• V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System
Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, 2005.• M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston:
Springer, 2002.• C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, 2005.• J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston:
Springer, 1996.• S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor
Networks, Boston: Springer, 2003.• K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: Wiley-
Interscience, 2000.• E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated
Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE Press, 1999.
• W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits, Boston:Springer, 1995.
• S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A Wideband CDMA System Design, Boston: Springer, 1998.
• G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: springer, 2001.• G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Springer, 1998.• K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004.
August 9, 2006 Agrawal: VDAT'06 Tutorial II 118
Other Books Useful in Low-Power Design
• A. Chandrakasan, W. J. Bowhill and F. Fox, Design of High-Performance Microprocessor Circuits, New York: IEEE Press, 2001.
• N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005.
• S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, New York: McGraw-Hill, 1996.
• E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization, Springer, 2005.
• J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Second Edition, Upper Saddle River, New Jersey: Prentice-Hall, 2003.
• J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, New York: IEEE Press, 2004.