By: Jabulani Nyathi Washington State University School of EECS April 30, 2009 Circuits and Architectures to Deliver Low Power and High Speed Systems.
Post on 27-Dec-2015
215 Views
Preview:
Transcript
By: Jabulani NyathiWashington State University
School of EECSApril 30, 2009
Circuits and Architectures to Deliver Low Power and High Speed Systems.
OutlineCMOS Scaling
Its benefits and The challenges it brings about
Various Techniques for Limiting Leakage Currents Their shortfalls
Bridging the speed-Power Gap The Tunable Body Biasing Scheme
Emerging Devices and TechnologiesConcluding Remarks
CMOS Scaling and its BenefitsAggressive CMOS scaling has been a very
positive development allowing:Fast switching devices, thus high speed computing.Massive integration due to miniaturization
No longer do we need multiple chips to implement a microprocessor and its peripherals
In fact, we can now have multiple computing elements on a single die resulting in system on a chip.
CMOS Scaling and its ChallengesCMOS scaling results in:
increased leakage currents (5X/node) and Increased dynamic power dissipation.
The interconnect does not scale as fast as the transistor thus
Highly integrated designs require elaborate clock distribution schemes.
IPs within a System on a Chip would be difficult to synchronize with a single clock source.
Scaling Implications
Module1
Lo
cal
Inte
rcon
ne
cts
Module2
Glo
ba
l Inte
rcon
ne
cts
Glo
ba
l Inte
rcon
ne
cts
Scaled
Dynamic Vs Leakage Power
Research Motivation Desire to Bridge the Speed-Power Gap by
Exploring the feasibility of optimizing devices to operate effectively in both sub-threshold and above threshold voltages.
Emerging Technologies that are Ultra-Low power can benefit from increased speed.Wearable computers, sensor networks, implantable
medical technologyEmphasis on design for energy-efficiency
Existing Low Power Design Approaches
Solve energy dissipation problem from a region of operation standpoint Sub-threshold design
DTMOS: shows a 5.5 times increase in current Dynamic threshold provides energy efficiency
SBB: 4.4 times frequency increase Above threshold (Super-threshold) design
MTCMOS: high and low threshold devices VT Scheme: reduce power by 50% using ABB and
“sleep”/“active” modes Architectural
Gating Techniques: 45% of total power
DTMOS/SBB Output Voltage Clamping
Traditional
SBB, DTMOS, TBB
600 mV
1.8 V
Proposed Approach
Change approach to include all possible operating regions: Tunable Body Biasing (TBB) Sub-threshold and super-threshold operation bridged Ultra-low energy and low speed or high energy and high speed
Utilize body biasing to improve performance of sub-threshold operation Target increased performance at sub-threshold and slightly above threshold. Save energy by eliminating idle time and process continuously with
variable power supplies (perform just in time task completion) Target applications
Mobile, battery operated (power constrained), variable processing devices Cell phones, PDAs, notebooks, wireless sensors, embedded systems, ASICs,
medical technology, etc.
TBB Implementation Goals
Attain ON state current gain while minimizing OFF state leakage current increase
Highlight advantages of sub-threshold operation while allowing super-threshold operation if needed
Control bulk terminal to tunable potentials depending on VDD and desired region of operation
MOS Bulk Control Circuits Multiplexer-based approach
Two transistors per bulk control circuit Utilizes Vthn0
TBB Bulk Control Circuits
Relies on passing of good/poor logic “1” and logic “0” properties of pass-transistors
Requires external control signals SubVt and SubVt_b
VDD
TBB MOS Bulk Control Signal
pMOS Bulk nMOS Bulk
VSS<VDD ≤Vthn0 VSS VDD
VDD > Vthn0 VDD – Vthn0 Vthn0
TBB Bulk Control Circuit Simulation
Sub-threshold: pBulk = 0 V
Super-threshold: pBulk = VDD – Vthn0
Device Optimization
TBB encourages varying supply voltagesHow will devices be sized for optimal operation at
any supply voltage?Maintain symmetric switchingExamine inverter at varying supply voltages
Device Optimization (Switching Point)
VDD
IdealInverter
Threshold
Simulated Inverter
Threshold
Percent Variation
1.8 V 900 mV 900 mV 0.0%
1.0 V 500 mV 498 mV 0.4%
376.2 mV 188.1 mV 198.7 mV 5.6%
188.1 mV 94.05 mV 108.6 mV 13.4%
Sub-threshold Noise Margins
Noise Margins significant for proper logic levels
TBB and Traditional static CMOS inverter have comparable noise margins TBB VIH is 12.5% worse
TBB VIL is 14.3% better
0
50
100
150
200
250
300
TRADITIONAL SBB TBB DTMOSStatic CMOS at Vdd = Vthn0 with varying Body Biasing
AV
ER
AG
E S
WIT
CH
ING
DE
LA
Y (
ns) Transmission Gate
InverterTwo Input NANDTwo Input NORTwo Input XOR
Propagation Delay
Gate Traditional Delay TBB Delay % DecreaseTG 98 ns 14 ns 86Inv 125 ns 20 ns 84NAND 133 ns 18 ns 86NOR 163 ns 25 ns 85XOR 289 ns 40 ns 89
Review of SubVth Circuits Benefits So far, the presentation has shown:
TBB requires control of MOS bulks to span the operating regions of interest. Implementation is successful.
Study of simple logic gates showed: TBB gives a dramatic speed increase (up to 7x) Static CMOS design style is suitable for sub-threshold and super-
threshold operation
Sizing of efficient devices for the TBB approach is possible
However, how will a complex system perform? Design with previous knowledge (logic style, sizing) Analyze post-layout simulations
Complex System-on-Chip Design Using TBB
Work addresses the challenges ofGlobal Interconnect DelaysClock distributionSynchronization of unrelated clocks andPower dissipation
Conclusion
TBB scheme has been devised to span all regions of operation from ultra-low power to high-speed. New kind of body biasing Forward-biasing causes exponential sub-threshold current gain
Leads to 7 times frequency increase in simple logic gates Focus on sub-threshold and slightly above threshold to utilize leakage
Bulk control circuits are effective 4% area and 8.9% power dissipation increase
Static CMOS is ideal overall design style Device sizing at either sub-threshold or super-threshold allows efficient
operation with variable supply voltages
Concluding Remarks Allowing tunable operation allows the designer to
choose operating point (kHz, MHz, GHz) – Energy Dissipation is affected. Other schemes do not offer this flexibility TBB can lead to significant energy savings
LFSR results show TBB gives: Maximal 5.7 times speed increase (sub-threshold) Comparable energy at super-threshold and favorable at sub-
threshold Favorable EDP at all operating regions Operate at the same speed with less energy dissipation
Idle state leakage current can be minimized by collapsing the supply voltage
ROUTER CHIP
Integrating Research Into Instruction
Data Path Circuits Memory Design Sub-System
Incorporating Research into Instruction
A long term objective is to place some of the integrated chips on development boards such as those Digilent Inc produces.
The integrated chips become part of a system and can be used in some of our low level courses.
Most important is the use of these programmable boards to show case the research outcomes, particularly to visiting prospective students.
A sample development board:
Questions and Comments Welcome!
Multiple Clock Domain Synchronization
Computational Module
Computational Module
Computational Module
Computational Module
Computational Module
Computational Module
Synchronous Islands
Mic
ro-
Net
wor
k
IsochronousCommunication
locksArbitraryC;
ocksRationalCl;
sEqualClock;1
Qn
Zn
n
fnf slowfast
Reducing Interconnect Delays Improved latency and bandwidth Global interconnects are pipelined at or near the rate of computation
Sources of Power Consumption
circuitshort avgswingcircuitshort
loadclkswingdddynamic
DCleakagestatic
circuitshort dynamicstatictotal
I V P
C f V V P
P P P
P P P P
Most straight forward method to reduce power consumption from any source is to reduce VDD
Controlling frequency directly manipulates dynamic power Controlling device threshold manipulates leakage current,
affecting leakage and short circuit power.
Distributed FIFO Control Circuitry
Traditional Body Biasing Tunable Body Biasing
Tunable BB % diffVdd LocalClock2 current LocalClock2 current
Vdelay (ps)
freq (GHz) uA
delay (ps)
freq (GHz) uA freq current
1 111.2 9 3100 103.1 9.7 2988 7.8 -3.6
0.7 172.55 5.8 1240 177.7 5.6 1042 -3.4 -16
0.35 1354.5 0.7383 71 1438 0.6954 72.9 -5.8 -2.7
0.2 96700 0.0103 2.81 16640 0.0601 5.051 483 79.8
Traditional vs. Tunable Body Biasing
The synchronizer/buffer shows an increase in performance at sub-threshold voltages when using tunable body biasing
Tunable Body BiasingCurrent (uA) Power (uW)
Vdd (V)
Max Freq (GHz) Peak Avg Idle Peak Avg Idle
Traditional Body
Biasing
1 4 5597 2382 8.696 5597 2382 8.696
0.7 2 2222 803.4 4.873 1555.4 562.38 3.411
0.35 0.125 131.1 35.58 1.468 45.885 12.453 0.514
0.2 0.01 7.452 2.895 1.349 1.49 0.579 0.27
Tunable Body
Biasing
1 4 5140 2460 9.54 5140 2460 9.54
0.7 2 2050 833 4.423 1435 583.1 3.096
0.35 0.167 132 39.8 1.589 46.2 13.93 0.556
0.2 0.015 9.468 4.03 1.239 1.894 0.806 0.248
Pursuit of Low Power OperationIt is likely that not all IP blocks in a SoC need
to operate at high speedPower dissipation for those IP blocks could be
reduced by operating at a lower voltageTBB offers the possibility to dynamically
operate at either sub-threshold or super-threshold voltages
Variable Voltage SoC
Consider a SoC with 50 IP blocks, each requiring communication at a rate of 10 MHz
Each IP could operate at sub-threshold levels
The channel could operate at super-threshold voltages while the IP blocks are in sub-threshold
Computational Module
Computational Module
Computational Module
Computational Module
Computational Module
Computational Module
Synchronous Islands
Mic
ro-
Net
wor
k
IsochronousCommunication
Vdd1
Vdd2
Vdd3
Vdd4Vdd5
Idle vs Operating PowerIdle Operating
Vdd (V)
Current (uA) Power (uW)
Current (uA)
Power (uW)
1 16.9 16.9 2988 2988
0.7 5.3 3.71 1042 729.4
0.35 1.5 0.525 72.9 25.52
0.2 0.925 0.185 5.051 1.01
During idle periods, it is advantageous to reduce leakage current by Reducing the power supply voltage or Increasing the threshold voltage (e.g. bulk voltage manipulation)
Speed at Varying VDD
0
1
10
100
1000
10000
100000
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Supply Voltage ( V )
Min
imu
m C
lock
Per
iod
( n
s )
TBB Delay
Traditional Delay
TBB 5.7x FasterAt 376.2 mV
TBB 20% FasterAt 1.8 V
Energy-delay Product
EDP of TBB outperforms Traditional at ALL operating regions, significantly in super-threshold
Regions of Operation
3.9 MHz with0.6 fJ/cycle
222.2 MHz with103 fJ/cycle
1.1 GHz with3.85 nJ/cycle
Contributions of this workProposed scheme alleviates the communication
bottleneck and offers a way to synchronize SoC multiple clocks
Perform data transfers up to 10 GHz Proposed scheme maintains high performance under the
influence of any clock skew 6.5 GHz for any process corner and any skew
Low power FIFO scheme with a small impact on area when used in SoCs with many modules
Contributions of this workProcess corners have a minor impact on performance,
resulting in a 10% reduction of speedThe optimal voltage for minimum energy consumption per
transaction is at 2Vth
Introduction of TBB to address leakage and dynamic power dissipation
500% increase in performance at sub-threshold voltages with a modest 80% increase in power
5-10% less power dissipation than traditional body biasing
Summary of Proposed FIFO Scheme Linear FIFO scheme that addresses
Signal propagation across communication channel Sustained throughput over long distances
Successful Synchronization Synchronizes equal, rational & arbitrary clocks 6.5 GHz sustained performance after process corner analysis using 3 stages.
Compared to CN scheme Fewer devices per stage, fewer stages needed 25% higher performance, 12% lower power
Operates at both super- and sub-threshold voltages Lower instantaneous power demands from local clocks (less di/dt) Optimal energy per transaction at 0.7V in a 65nm process Sub-threshold reduces power by 3 orders of magnitude Tunable Body Biasing provides 50% increased performance in sub-threshold while
maintaining super-threshold operation
TBB Scalability
Technology 180 nm 90 nm
Body Biasing and Operating Region
Total Average Power Dissipation
Static Power Contribution
[%]
Total Average Power Dissipation
Static Power Contribution
[%]
Traditional in Sub-threshold
193 pW 0.1% 13.1 nW 1.8%
Traditional inSuper-threshold
39.6 μW Negligible 22.1 μW negligible
TBB in Sub-threshold
1430 pW 25.2% 20.4 nW 6.1%
TBB in Super-threshold
39.4 μW 0.000034% 22.1 μW 0.0025%
At 180 nm, TBB sub-threshold static power % is largeAt 90 nm, the % difference is much less
Total TBB sub-threshold power is large
Total TBB sub-threshold power isn’t so large
LFSR Energy vs. FrequencyTBB and Traditional LFSR Energy Dissipation vs Frequency
0
25
50
75
100
125
150
175
200
225
0 100 200 300 400 500 600 700 800 900 1000 1100
Frequency [MHz]
En
erg
y D
issip
ati
on
[fJ
]
Traditional Energy
TBB Energy
TBB Implementation Cont.
TBB Implementation Cont.
Logic Gate Analysis (Power)Power Dissipation vs Supply Voltage
0.0001
0.0010
0.0100
0.1000
1.0000
10.0000
100.0000
1000.0000
0.25 0.3762 0.75 1.8Supply Voltage
Po
we
r D
iss
ipa
tio
n [
nW
]
Traditional CMOS Power
TBB CMOS Power
Inverter Power Dissipation
VDDPower Dissipation
[fW]
•Average Power•[nW]
Maximum Frequency[MHz]
Period[ns]
0.3262 8.27 3.5 0.416 2400.0
0.4262 11.41 30.0 2.6 380.0
0.5643 15.64 651.6 41.7 24.0
1.8 82.30 68.60 833.3 1.2
VDDPower Dissipation
[fW]
•Average Power•[nW]
Maximum Frequency[MHz]
Period[ns]
0.3262 8.52 22.4 2.6 380.0
0.4262 13.00 259.8 20. 50.0
0.5643 15.13 2102.0 138.9 7.2
1.8 81.47 81.5 1000. 1.0
Logic Gate Analysis (Energy)Energy Dissipation vs Supply Voltage
0
20
40
60
80
100
120
140
160
180
0.25 0.3762 0.75 1.8
Supply Voltage [V]
En
erg
y D
issi
pat
ion
[ f
J ]
Traditional CMOS Energy
TBB CMOS Energy
Logic Gate Analysis (EDP)EDP vs Power Supply
-5000
0
5000
10000
15000
20000
25000
30000
0.25 0.3762 0.75 1.8Supply Voltage [V]
ED
P [
fJ*
ns
]
Traditional CMOS EDP
TBB CMOS EDP
Logic Gate Analysis (Fan-in)
0
200
400
600
800
1000
1200
1400
One Two Three Four
Number of Inputs
Pro
pa
ga
tion
De
lay
[ n
s ]
Traditional NAND
TBB NAND
Traditional NOR
TBB NOR
Logic Gate Analysis (Logic Styles)
0
10
20
30
40
50
60
70
0.5*Vthn 0.75*Vthn Vthn - 50 mV Vthn Vthn + 50 mV 1.5*Vthn
Supply Voltage [V]
En
erg
y D
issi
pat
ed [
fJ
]
Traditional Pseudo-nMOS Energy
TBB Pseudo-nMOS Energy TBB
LFSR Power Dissipation
-100
0
100
200
300
400
500
600
700
800
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Supply Voltage ( V )
Ave
rag
e P
ow
er D
issi
pat
ion
( u
W )
TBB Power
Traditional Power
Device Optimization (Optimal Region)
0
500
1000
1500
2000
2500
3000
3500
4000
0.3262 0.3762 0.5643 0.7524 1.1286 1.5048 1.8
Supply Voltage ( V )
Clo
ck P
erio
d (
ns
)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
En
erg
y D
issi
pat
ion
( f
J )
TBB Delay
TBB Energy Dissipation
Regions of Operation
Design
Super-threshold(1.8 V)
Sub-threshold(250 mV)
Optimal(750 mV)
Delay (ns) Energy (fJ) Delay (ns) Energy (fJ) Delay (ns) Energy (fJ)
TraditionalLFSR
0.7 437.6 20000 105 7 74.1
TBBLFSR
0.6 437 4500 22.8 4.5 73.6
GHz kHz MHz
Logic Gate Results
Results Highlights TBB, SBB, and DTMOS increase speed up to 7 times in
sub-threshold Static CMOS has best overall logic style performance
Pseudo-nMOS, Domino, and pass-transistor still are valuable in niche situations
TBB and Traditional Noise Margins are comparable
top related