1 EE241 - Spring 2004 Advanced Digital Integrated Circuits Borivoje Nikolic Lecture 17 Low-Power Design: Dynamic Body Bias Energy Recovery in CMOS SOI 2 Announcements Midterm project reports due this Friday Please post links on your project web page Homework #3 due after the Spring break
34
Embed
Announcements - University of California, Berkeleybwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s04/lectures/Lectur… · Power savings for T OFF > ~100 idle cycles PMOS sleep transistor
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
EE241 - Spring 2004Advanced Digital Integrated Circuits
Borivoje Nikolic
Lecture 17Low-Power Design:
Dynamic Body BiasEnergy Recovery in CMOSSOI
2
Announcements
Midterm project reports due this FridayPlease post links on your project web page
Homework #3 due after the Spring break
2
3
Class Material
Last lectureDynamic voltage scalingLeakage control
Today’s lectureDynamic body biasEnergy recovery techniquesSilicon-on-insulator
4
Dynamic Body Bias
Similar concept to dynamic voltage scalingControl loop adjusts the substrate bias to meet the timing
Can be used just as runtime/sleep
Limited range of threshold adjustments (<100mV)Limited leakage reduction (<10x)No delay penalty
Can increase speed by forward bias
Energy cost of charging/discharging the substrate capacitance
3
5
Dynamic Body Bias
6
VTCMOS Variants
Courtesy of IEEE Press, New York. 2000
4
7
Substrate Bias in VT
Kuroda, JSSC 11/96
8
Dynamic Body Bias
... ...
450mVFBB
450mVFBB
VCC
VSS
PMOSbody
NMOSbody
PMOSbias
NMOSbias
PMOSbias
... ...NMOS
bias
500mVRBB
500mVRBB
VCC
VSS
PMOSbody
NMOSbody
VHIGH
VLOW
Forward body bias (FBB)
Local VCC tracking
Active mode
Reverse body bias (RBB)
Triple well needed
Idle mode
Dual-VTcore
Tschanz, ISSCC’03
5
9
Substrate Bias Effect
BSTBSFTTH VVVVV 100 2 γ−≈−φγ−=
10
Body Bias Layout
Sleep transistor LBGs
10Number of sleep transistor LBGs
8%Area overhead
13mmPMOS device width
30Number of ALU core LBGs
ALU core LBGs
Sleep transistor LBGsALU core LBGs
ALU
6
11
0
0.2
0.4
0.6
0.8
1
0.01 1 100 10000
1.32V75°C
Leakage Power Savings vs. Decap
Minimize capacitance on virtual VCC
Overhead: charging & discharging of virtual VCCcapacitance
Dual-VTcore
Virtual VCC1.32V, 75°C
Idle time10ns 1µs 100µs 10ms10µsN
orm
aliz
ed le
akag
e p
ow
er
in id
le m
od
e
90%
40%
Low-leakage 133nF decap on
virtual VCC
No decap on virtual VCC
12
Decoupling Capacitor Placement
OxideleakageDual-VT
core
Reducedleakage
Longertime
constant
Dual-VTcore
Decap on full supply Decap on virtual supply
PPerformance
Convergence time
Oxide leakage savings PP
7
13
0%
5%
10%
15%
20%
10 100 1000 10000 100000 1000000Number of idle cycles
Tot
al p
ower
sav
ings
Total Active Power Savings(Fixed activity: α = 0.05)
Body bias (1.28V): active: FBB, idle: ZBB
Reference: 450mV FBB to core with clock gating, 1.28V, 4.05GHz, 75°C
0.5 5 50 500 5000 50000
Number of consecutive idle cycles (TOFF)
Number of consecutive active cycles (TON)
Power savings for TOFF > ~100 idle cycles
PMOS sleep transistor (1.32V)
Tot
al p
ower
sav
ings Max 18%
Max 8%
14
Body Biasing
Body biasing with a local control loop can be used to lower the impact of process variationsUsed to limit die-to-die and within-die variations
8
15
Normalized Delay vs VDD & VTH
Sakurai, Kuroda
VTH (V)
0 0.2 0.4 0.7 1
1.5 V
3.0 V
5.0 V
0.6
1.0
1.4
1.8
No
rmal
ized
Del
ay ±0.15V
VDD =1.0 V
±0.05V
? VTH =
0.5
16
Self-Adjusting Threshold-Voltage Scheme (SATS)
9
17
SATS Experimental Results
18
Substrate Biasing
Tschanz, JSSC 11/02
10
19
Effectiveness of Substrate Bias
Die-to-die variations
20
Effectiveness of Substrate Bias
Within-die variations
11
21
Dynamic Voltage Scaled MicroprocessorExternal VDD 3.3V±10% Internal VDDL 0.8V~2.9V ±5%
Large variation in optimal circuit parameters Vddopt, Vth
opt, wopt
Vddmax
Vddmin Vth
min
Vthmax
Sizing, Supply, Threshold Optimization
26
Adiabatic Circuits
R
Ctr
E = (RC/tr)CV2 (for tr >> RC)
Applying slow input slopes reduces E below CV2
Useful for driving large capacitors (Buffers)Power reduction > 4 for pad drivers (1 MHz) ISI
ADIABATIC CHARGING
14
27
Adiabatic Computing
Basic ConceptsWhen charging a capacitor through RC-network with a slowly changing ramp, power dissipation is reduced by reducing the slope of the ramp.No switch should ever be enabled when a voltage is over itMake sure every node is reset to the original stage before performing the next operation!“reversible computing”-> take energy back to the source-> ensure that state is known
28
Adiabatic Computing
Principles of storing and erasing information:Energy dissipation of the combinational logic can be made arbitrarily small by operating the circuit slowly enoughInformation can be loaded into memory circuits dissipating only arbitrarily small energyInformation can be copied with arbitrarily small energyErasing the last copy of a piece of information dissipates an irreducible finite amount of energy.
Koller, Athas, PhysComp’92, Landauer, IBM J. ResDev’61
15
29
Six-Phase Charge Transfer
Watkins, JSSC 12/67
One-bit delay
30
Split-Level Charge Recovery
Younis, Knight, IWLPD’94
16
31
Adiabatic Circuits
A B C
φ0
φ1
φ2
from Athas
Holding the inputs for the each stageuntil the output energy has been returned
32
Reversible Pipelines
In
OutLogic
return logic
clkC
Make return path different
Problem: always results in CVth2/2 loss!
17
33
Reversible Pipelines
holdresetφ0
φ1
φ2
φ3
di
di-1
di+1
φi
retrun
din
φ1 φ3
φ0 φ2
34
Partially Erasable Latches
Pck
M1F1 F1
M2
M3 M4
V
0
Output stays at Vth
Stored energy is½ CVth
2, vs. ½ CV2
How to use this?
18
35
Partially Erasable Latches
Pck
M1F1 F1
M2
M3 M4M5 M6F0 F0
Denker, ISLPED’94
Pck
Pck1
Pck
Pck1
Requires 4-clocks for interfacing
36
Single Pck + Auxiliary Clock
Pck
M1F1 F1
M2
M3 M4
M5 M6
M8M7F0 F0
CXCX
2 412..
µµ
2 412..
µµ
2 412..
µµ
2 412..
µµ
2 412..
µµ
2 412..
µµ
2 412..
µµ
2 412..
µµ
1
0
0
1 1 0
PckCX
F0
F1A B
Maksimovic et al,ISLPED’97
19
37
Clock Generation
Logi
c
VDD 2 Clk QR
C
PckL
VG
01 fc
Clk
÷ 2QQ
Stage1
Stage2
Stagen
L
VB
Q
F0
F0
F1
F1
Pck
Ck
CX
CX
Clk
F0 F2
F2
Fn
Fn
F1 F1 F2 Fn−1
72012
µµ.
Enable
Principle
Implementation
38
Single Pck + Reference Voltages
Kim, Papaefthymiou, ISLPED’98
Cascading Gates:
20
39
Adiabatic µP
Athas, et al, JSSC 12/97Athas, et al, JSSC 11/00
40
Adiabatic µP
21
41
E-R Latch
W/ dynamic logic
W/ PTL
42
Other Ideas
Charge recycling busH. Yamauchi, et al, JSSC, 4/95
Adiabatic display driverJ. Ammer, ISSCC’99
Various examples of charge-recycling logic
22
43
Silicon on Insulator (SOI)
References:Chapter 5 by Shahidi, Assaderaghi, AntoniadisK. Bernstein, N.J. Rohrer, “SOI Circuit Design Concepts,” Kluwer2000.K. Bernstein, ISSCC’00 SOI Tutorial2001 Microprocessor Design Workshop, lectures by C.T Chuangand R. PrestonArticles from Chandrakasan/Brodersen, IEEE Press 1998.
44
SOI Transistor
Bernstein, ISSCC’00
23
45
SOI DevicesPartially depleted (PD)
Pros: Easier to manufacture (Si thickness)Scalable, tolerance to variationsDecoupling VT from Si thickness
Cons: Floating body effects: I-V kink, parasitic bipolar effect
Fully depleted (FD)Pros:
Significantly reduced floating body effectsSharper subthreshold S
Cons:VT is a function of the charge in the body – variesManufacturability, compatibility with bulk CMOS
46
Soi Microprocessors
IEDM 9720% Perf. Over Bulk
PD/SOI
0.35 um
230 MHz
(Tester Limit)
StrongArm-110
(Core Only)
DEC
ISSCC 99Bulk 433 MHz
0.35 um Lgate
FD/SOI
0.25 um Lgate, 4LM Al
600 MHz64b DEC AlphaSamsung
ISSCC 99Bulk 480 MHz
0.12 um Leff
PD/SOI
0.12 um Leff, 6LM Cu
580 MHz32b PowerPC
(PowerPC750)
IBM
ISSCC 99Bulk 450 MHz
0.12 um Leff
PD/SOI
0.12 um Leff, 6LM Cu
550 MHz64b PowerPCIBM
ISSCC 00Migration from
0.12 um Leff
PD/SOI
0.08 um Leff, 7LM Cu
660 MHz64b PowerPCIBM
Hot Chip 99
EE Times 99
PD/SOI
0.08 um Leff, 7LM Cu
1.00 GHz64b Power4IBM
IEDM 99PD/SOI
0.08 um Leff, 7LM Cu
1.00 GHz64b Power4
(Test Chip)
IBM
ISSCC 01PD/SOI
0.08 um Leff, 7LM Cu
1.10 GHz64b Power4IBM
SourceCommentTechnologyFreq.ProcessorComp.
From C.T. Chuang
24
47
SOI Design
Advantages:Less Capacitance (~5-40%)Lower powerReduced effective VT, short channel effects, body effectLayout simplicity (no wells, plugs, …)