Micro transductors ’08 Micro transductors ’08 Low Power VLSI Design 2 Low Power VLSI Design 2 Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil [email protected]http://www.cpdee.ufmg.br/~frank/
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Micro transductors ’08Micro transductors ’08 Low Power VLSI Design 2Low Power VLSI Design 2
Dr.-Ing. Frank SillDepartment of Electrical Engineering, Federal University of Minas Gerais,
Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil
Micro transductors ‘08, Low Power 2 2Copyright Sill, 2008
AgendaAgenda
Recap Power reduction on
Gate level Architecture level Algorithm level System level
Micro transductors ‘08, Low Power 2 3Copyright Sill, 2008
Recap: Problems of Power DissipationRecap: Problems of Power Dissipation
Continuously increasing performance demands
Increasing power dissipation of technical devices
Today: power dissipation is a main problem
High Power dissipation leads to:
High efforts for cooling
Increasing operational costs
Reduced reliability
High efforts for cooling
Increasing operational costs
Reduced reliability
Reduced time of operation
Higher weight (batteries)
Reduced mobility
Reduced time of operation
Higher weight (batteries)
Reduced mobility
Micro transductors ‘08, Low Power 2 4Copyright Sill, 2008
CL
Recap: Consumption in CMOSRecap: Consumption in CMOS Voltage (Volt, V) Water pressure (bar) Current (Ampere, A) Water quantity per second (liter/s) Energy Amount of Water
Energy consumption is proportional to capacitive load!
0
1
Micro transductors ‘08, Low Power 2 5Copyright Sill, 2008
Watts
time
Power is height of curve
Watts
time
Energy is area under curve
Approach 1
Approach 2
Approach 2
Approach 1
Recap: Energy and PowerRecap: Energy and Power
Energy = Power * time for calculation = Power * Delay
Micro transductors ‘08, Low Power 2 6Copyright Sill, 2008
P = α f CL VDD2 + VDD Ipeak (P01 + P10 ) + VDD Ileak
Dynamic power(≈ 40 - 70% today and decreasing
relatively)
Short-circuit power(≈ 10 % today and
decreasing absolutely)
Leakage power(≈ 20 – 50 %
today and increasing)
Recap: Power Equations in CMOSRecap: Power Equations in CMOS
Micro transductors ‘08, Low Power 2 7Copyright Sill, 2008
System
Algorithm
Architecture
Gate
Transistor
T
T
+
ST1
ALU
ME
M
ME
MMP3
Savings Speed Error
> 70 %
40-70 %
25-40 %
15-25 %
10-15 %
Seconds
Minute
Minutes
Hour
Hours
> 50 %
25-50 %
15-30 %
10-20 %
5-10 %
Recap: Levels of Recap: Levels of OptimizationOptimization
nach Massoud Pedram
Micro transductors ‘08, Low Power 2 8Copyright Sill, 2008
Voltage can be dropped while maintaining the original throughput Ppipe = CpipeVpipe
2 fpipe = (1.1 Cref) (Vref/1.7)2 fref = 0.37 Pref
Micro transductors ‘08, Low Power 2 31Copyright Sill, 2008
Approximate TrendApproximate Trend
N-parallel proc. N-stage pipeline proc.
Capacitance N*Cref Cref
Voltage Vref/N Vref/N
Frequency fref/N fref
Dynamic Power CrefVref2fref/N2 CrefVref
2fref/N2
Chip area N times 10-20% increase
Source: G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.
Micro transductors ‘08, Low Power 2 32Copyright Sill, 2008
Reduction of switching activity by adding latches at inputs
Latch preserves previous value of inputs to suppress activity
Could also use AND gates to mask inputs to zero
= forced zero
A
MultiplierB
C
condition
MultiplierB
C
A
Latc
h
condition
Guarded EvaluationGuarded Evaluation
Micro transductors ‘08, Low Power 2 33Copyright Sill, 2008
Outputs
Precomputationlogic
Precomputedinputs
Gatedinputs
g(X)
Combinationlogic f(X)
g(X)
R2
R1
Loaddisable
Source: Irwin, 2000
PrecomputationPrecomputation
Identify logical conditions at inputs that are invariant to the output Since those inputs don’t affect output, disable input transitions Trade area for energy
Micro transductors ‘08, Low Power 2 34Copyright Sill, 2008
Design steps1. Selection of precomputation architecture
2. Determination of precomputed and gated inputs (Register R1 should be much smaller than R2)
3. Search good implementation for g(X)
4. Evaluation of potential energy savings based on input statistics (if savings not sufficient go to step 2 or 3 and try again)
Also works for multiple output functions where g(X) is the product of gj(X) over all j
Micro transductors ‘08, Low Power 2 35Copyright Sill, 2008
A > Bn-bit binary value
comparatorA > BR2
R1
LoaddisableAn = Bn
An
Bn
An-1
A1
Bn-1
B1
Can achieve up to 75% power reduction with 3% area overhead and 1 to 5 additional gate delays in worst case path
Source: Irwin, 2000
Precomputation: ExamplePrecomputation: Example
Binary Comparator
Micro transductors ‘08, Low Power 2 36Copyright Sill, 2008
Various algorithms exist to implement an integer adder Ripple, select, skip (x2), Look-ahead, conditional-sum. Each with its own characteristics of timing and power consumption.
Adders differ in Energy and delay Different adders for different applications Also true for other units (multiplier, counter, …)
Micro transductors ‘08, Low Power 2 38Copyright Sill, 2008
Bus PowerBus Power
Buses are significant source of power dissipation 50% of dynamic power for interconnect switching (Magen, SLIP 04) MIT Raw processor’s on-chip network consumes 36% of total chip
power (Wang et al. 2003) Caused by:
High switching activities Large capacitive loading
Ain Bin Cin Din
Wout Xout Yout Zout
Busdrivers
Busreceivers
Bus
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 39Copyright Sill, 2008
Bus Power ReductionBus Power Reduction
For an n-bit bus: Pbus = n* αfClkCloadVDD2
Alternative bus structures Segmented buses (lower Cload)
Charge recovery buses
Bus multiplexing (lower fClk possible)
Minimizing bus traffic (n) Code compression Instruction loop buffers
Minimization of bit switching activity (fclk) by data
encoding
Minimize voltage swing (VDD2) using differential signaling
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 40Copyright Sill, 2008
Micro transductors ‘08, Low Power 2 46Copyright Sill, 2008
Slow down processor to fill idle time More Delay lower operational voltage
Runtime Scheduler determines processor speed and selects appropriate voltage
Transitions delay for frequencies ~150s Potential to realize 10x energy savings
Active Idle Active Idle 3.3 V
Active 2.4 V
Adaptive Dynamic Voltage Scaling (DVS) Adaptive Dynamic Voltage Scaling (DVS)
Micro transductors ‘08, Low Power 2 47Copyright Sill, 2008
Adaptive DVS: ExampleAdaptive DVS: ExampleS
peed
Time
T1 T2 T1 T2
Idle
Same work,lower energy
TaskTask
Task with 100 ms deadline, requires 50 ms CPU time at full speed Normal system gives 50 ms computation, 50 ms idle/stopped time Half speed/voltage system gives 100 ms computation, 0 ms idle
Same number of CPU cycles but: E = C (VDD/2)2 = Eref / 4
Dynamic Voltage Scaling adapts voltage to workload
Time
Micro transductors ‘08, Low Power 2 48Copyright Sill, 2008
Design Layer: System LevelDesign Layer: System Level
Basic Elements: Complex modules
Processors
Calculation and control units
SensorsALU
ME
M
ME
M
MP3
Micro transductors ‘08, Low Power 2 49Copyright Sill, 2008
Systems are: Designed to deliver peak performance, but … Not needing peak performance most of the time
Components are idle sometimes Dynamic power management (DPM):
Puts idle components in low-power non-operationalstates when idle
Power manager: Observes and controls the system Power consumption of power manager is negligible
Dynamic Power ManagementDynamic Power Management
Micro transductors ‘08, Low Power 2 50Copyright Sill, 2008
Software power control - power management
DOZE Most units stopped except on-chip
cache memory (cache coherency)
NAP Cache also turned off, PLL still on,
time out or external interrupt to resume
SLEEP PLL off, external interrupt to resume
Deeper sleep mode consumesless power
Deeper sleep mode requiresmore latency to resume
Processor Sleep ModesProcessor Sleep Modes
Micro transductors ‘08, Low Power 2 51Copyright Sill, 2008
Mode 66Mhz 80Mhz
No power mgmt 2.18W 2.54WDynamic power mgmt 1.89W 2.20W
DOZE 307mW 366mW
NAP 113mW 135mW
SLEEP 89mW 105mW
SLEEP without PLL 18mW 19mW
SLEEP without clock 2mW 2mW
10 cycles to wake up from SLEEP 100us to wake up from SLEEP+
Source: Irwin, 2000
Processor Sleep Modes: ExampleProcessor Sleep Modes: Example PowerPC sleep modes
Micro transductors ‘08, Low Power 2 52Copyright Sill, 2008
Transmeta LongRunTransmeta LongRun
Applies adaptive DVS LongRun policies:
Detection of different workload scenarios Based on runtime performance information
After detection accordingly adaptation of: Processor supply voltage Processor frequency Clock frequency always within limits required by supply voltage to avoid
clock skew problems
Use of core frequency/voltage hard coded operating points
Best trade-off between performance and power possible
Micro transductors ‘08, Low Power 2 53Copyright Sill, 2008
0
10
20
30
40
50
60
70
80
90
100
300 400 500 600 700 800 900 1000
Frequency (MHz)
% o
f m
ax p
ow
erl c
on
sum
pti
on
300 Mhz0.80 V
433 Mhz0.87 V
533 Mhz0.95 V
667 Mhz1.05 V
800 Mhz1.15 V
900 Mhz1.25 V
1000 Mhz1.30 V
Typical operating region Peak performance region
Transmeta LongRun cont’dTransmeta LongRun cont’d
Source: Transmeta
Micro transductors ‘08, Low Power 2 54Copyright Sill, 2008
Transmeta LongRun: ExampleTransmeta LongRun: Example
Source: Transmeta
Micro transductors ‘08, Low Power 2 55Copyright Sill, 2008
Battery aware designBattery aware design
1000 mAh(Standard Capacity)
Discharge current (mA)
Cap
acity
(m
Ah)
( Rated Current)125mA
1000800
600
400
200
AvailableCharge(mA) time
idleDischarge
Current(mA)
time
Non-linear effects influence life time of batteries
“Rate Capacity” If discharging currents
higher than allowed
real capacity goes under nominal capacity
“Battery Recovery” Pulsed discharge increases
nominal capacity Based on recovery times (as long there is no rate
capacity effect)
Source: Timmermann, 2007
Micro transductors ‘08, Low Power 2 56Copyright Sill, 2008
Analytically very sound but computationally intensive Cannot be used for online scheduling decisions.
Micro transductors ‘08, Low Power 2 57Copyright Sill, 2008
Performance of a bipolar lead-acid battery subjected to six current impulses. Pulse length=3 ms, rest period=22 ms.
Current Battery Voltage
Battery aware design: Example 1Battery aware design: Example 1
Source: LaFollette, “Design and performance of high specific power, pulsed discharge, bipolar lead acid batteries”, 10th Annual Battery Conference on Applications and Advances, Long Beach, pp. 43–47, January 1995.
Micro transductors ‘08, Low Power 2 58Copyright Sill, 2008
Discharge profile A Discharge profile B
Minimum average current ≠ Maximum battery life time
Profile Aver. Current [mA] Battery lifetime [ms] Specif. energy [Wh/Kg]
A 123.8 357053 15.12
B 124.2 536484 18.58
Battery aware design: Example 2Battery aware design: Example 2
Source: Timmermann, 2007
Cur
rent
[m
A]
Cur
rent
[m
A]
Micro transductors ‘08, Low Power 2 59Copyright Sill, 2008
BackupBackup
Micro transductors ‘08, Low Power 2 60Copyright Sill, 2008
FSM: Clock-GatingFSM: Clock-Gating
Moore machine: Outputs depend only on the state variables. If a state has a self-loop in the state transition graph
(STG), then clock can be stopped whenever a self-loop is to be executed.
Sj
SiSk
Xi/Zk
Xk/Zk
Xj/Zk
Clock can be stopped when (Xk, Sk) combination occurs.
Micro transductors ‘08, Low Power 2 61Copyright Sill, 2008
Trend: Interconnects Trend: Interconnects
Interconnects
Source: Tenhunen, 2005
Example (very optimistic):
6–10 clock cycles in 50nm technology
[Benini, 2002]
Propagation delays of global wires will be a multiple of the clock cycle.
Micro transductors ‘08, Low Power 2 62Copyright Sill, 2008
or
Number of bus transitions per cycle
= 2 (1 + 1/2 + 1/4 + ...) = 4
Bus MultiplexingBus Multiplexing
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 63Copyright Sill, 2008
Resource Sharing and Activity IIResource Sharing and Activity II
Micro transductors ‘08, Low Power 2 64Copyright Sill, 2008
Bus MultiplexingBus Multiplexing
S2
S1D1
D2
S1
S2 D2
D1
Source: Irwin, 2000
Sharing of long data buses with time multiplexing Example:
S1 uses even cycles
S2 odd
Micro transductors ‘08, Low Power 2 65Copyright Sill, 2008
Correlated Data StreamsCorrelated Data Streams
0
0,5
1
14 12 10 8 6 4 2 0
Muxed
Dedicated
Bit positionMSB LSB
Bit
switc
hin
g p
roba
bili
ties For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams) Bus sharing should not be used
for positively correlated data streams
Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching
Source: Irwin, 2000
Micro transductors ‘08, Low Power 2 66Copyright Sill, 2008
If data bus is shared, advantages of data correlation are lost (bus carries samples from two uncorrelated data streams)
Bus sharing should not be used for positively correlated data streams
Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) - more random switching
Disadvantages of Bus MultiplexingDisadvantages of Bus Multiplexing
Micro transductors ‘08, Low Power 2 67Copyright Sill, 2008