1
Low power architecture and design techniques for mobile handset LSI Medity™ M2
NEC Electronics Shuichi [email protected]
January 24th, 2008
ASP-DAC 2008
2
Outline
Medity™ M2 overviewUse case analysis and low power issues
heavy workloadlight workloadstandby mode
Medity™ M2 low power techniquesHierarchical clock gatingAutomatic frequency controlMultiple Vt transistors and on-chip power switchBack-bias control (UltimateLowPower™)
Power measurement ResultsSummary and conclusions
3
Medity™ M2 Overview
M2 is second-generation LSI for mobile handset which integrates DBB and Application processor on a chip
(130nm)
DBB LSIDBB LSI
HSDPA Acc.HSDPA Acc.
(90nm)
2G DBBLSI
2G DBBLSI
Logic size
*DBB:Digital BaseBand
ApplicationApplication
SolutionSolution
(65nm)M1M1MP211
M2M2
4
M1 and M2 comparisonsM2 aims to be same power consumption as M1, although there are big differences between M1 and M2
128 chords
D1 Enc, Dec(Dedicated macro)
WVGA854×480
166MHz(AXI)
500MHz(ARM11)
GSM/GPRS
HSDPA(3.6Mbps)
W-CDMA
M2
Functions
Application
DB
B
-
QVGA+ Dec(DSP)
QVGA+345×240
125MHz(AHB)
250MHz(ARM9)
-
-
W-CDMA
M1
Frequency
-
Communication new
new
5 timesLCD
new3D sound
newMotion codecH.264
1.3 timesSystem Bus(3D graphics, image)
2 timesCPU/DSP
M2/M1Function
5
156
0.3
85
34
0
20
40
60
80
100
Heavy workload(TV phone)
Light workload (music player)
Standby
Rela
tive p
ow
er
consum
pti
on
Dynamic
Leak
Power consumptions in three use casesHeavy workloadLight workloadStandby mode
These power values are alreadyapplied to M2 low power techniques
Typ, 1.2V@26C
6
Medity M2 Low power techniques
(a) Multiple Vt transistors and on-chip power switch
(b) PMU
Hierarchical clock gating
Automatic frequency control
Logic
Clock
Leakage
Dynamicpower
Leakagepower
LPLP--Tech1Tech1
LPLP--Tech2Tech2
LPLP--Tech3a, 3bTech3a, 3b
LPLP--Tech4Tech4Back-bias control
(UltimateLowPower™)
M2 low power techniques are covered for reducing each element of the power
7
Reduce the dynamic power of unnecessary macros and clock tree at the clock starting point
Active monitor makes clock control condition
Hierarchical clock gating
LPLP--Tech1Tech1
FlipFlopFlipFlop
GG
FlipFlopFlipFlop
FlipFlopFlipFlop
FlipFlopFlipFlop
enGG
CTS
ActivemonitorActive
monitorMarco3
Clock gating
FlipFlopFlipFlop
GG
FlipFlopFlipFlop
FlipFlopFlipFlop
FlipFlopFlipFlop
enGG
CTS
ActivemonitorActive
monitorMacro2
Clock gating
GG
GG
GG
Monitoring macro state for auto clock control
1. Clock gating at the clock tree start point
●●●
Multifrequencygenerator
Multifrequencygenerator
3. Register-level dynamic clock gating
clock controlclock control
2. Sub module level clock gating
Macro1INTC/Timer
ActivemonitorActive
monitor
GGGGGGGG
8
Automatic frequency controlReduce dynamic power of a unstoppable clock macro and system bus
Active monitor makes clock control condition for multi frequency generator
Macro1Macro1
Macro2Macro2
Macro3Macro3
time
ActiveMonitorActive
Monitor
ActiveMonitorActive
Monitor
ActiveMonitorActive
Monitor
reg1reg1
reg2reg2
reg3reg3
0
1
1
Normal Clock down Normal
Clock down when Macro2 and Macro3 is in idle.
Macro2_clkreq
Macro3_clkreq
Macro1_clkreq
FrequencyChange!
Macro1_clk
Macro2_clk
Macro3_clk
Multi frequency generator
Clock stop
Clock stop
LPLP--Tech2Tech2
9
Multiple Vt Transistors & on-chip power switchAchieve target speed (Low-Vt)Reduce the area (Low-Vt)Reduce the leakage power (High-Vt & on-chip power switch)Reduce the external materials (on-chip power switch)
High-Vt: Slow speed, low leakMid-Vt: Middle speed and leakLow-Vt: High speed, high leak
H.264 macro51% Low-Vt
3D graphics61% Low-Vt
ARM/DSP100% Low-VtDBB
>99% High-Vt
ARM37% Mid-Vt63% High-Vt
DSP100% Mid-Vt
11.9% area reductionLow-Vt 4.5%→51%
12.4% area reductionLow-Vt 14%→61%
Misc Apl. macro
LPLP--Tech3aTech3aLPLP--Tech3bTech3b
Low-Vton-chip power switch
High & Mid-Vt ApplicationApplicationDigitalDigitalBasebandBaseband
10
Power-on switching sequence
Power-on switching sequential is divided 4Preventing the rush current for IR Drop
LPLP--Tech3aTech3a
time3ns 150 225ns 350ns
1.2V
# of SW on
Voltage
Current
30 3500 1000 10000
VDD IR drop: max 8mV
typ
typ
max
max
time
⊿95mA⊿80mA
⊿70mA108mA
Rush current: max 100mA
1st SW 2nd SW 3rd SW 4th SW
11
Power Management Unit :PMUCould migrate to various power mode sequence flexibly and smoothly by PMU SW
LPLP--Tech3bTech3b
PMUPC
RAM
PowerCLK Div.
PLL
INTC
ARM1176~500MHz
DSP~500MHz
3Dgraphic
AXI+AHB
APB
H.264
DMA
CAM
USB,UART...
Power, PLL, Clock, and Reset control
Transfer Interrupt to wake up
Timer
Isolation cell Isolation cell Isolation cell Isolation cell
Isolation cell control Isolation cell
12
UltimateLowPower™ (1)Reduces the leakage power using back-biasImprove the leakage yield to manage FAST side samples to be around TYP side samples, no effect to SLOW side samples
LPLP--Tech4Tech4
Transistor characteristics
Devicecharacteristics
w/o Back-bias Control
FASTSLOW TYP
SLOW TYP
w/ Back-bias Control
Speed
Leak current w/o Back-
bias Control
Leak current with Back-bias
control
Speed is not
changed
Shift the FAST side characteristics with back-bias control
FAST
Transistors On Chip Variation
Leak current
13
=Vdd to Vdd+1V
Back-bias for Nmos Tr.
Clock
Back-bias for Pmos Tr.
=0 to -1V
Positive variable Regulator
Negative variable Regulator
Control Signals
Application ARM/DSP
Monitor
UltimateLowPower™ (2)
3G3G--DBBDBB
ApplicationApplicationDigitalDigitalBasebandBaseband
biasbias
DSP DSP 100% Low100% Low--VtVt
MonitorBack-bias regulator
Apply to application ARM/DSP macrosUse 100% optimized Low-Vt which more sensitive for leakage current with respect to back-bias voltageMulti-Vt in same region is not suited for UltimateLowPower
LPLP--Tech4Tech4
ARM1176ARM1176100% Low100% Low--VtVt
14
0
5
10
15
20
25
30
35
40
0 1 2 3 4 5
Back-bias level
Leak
age
curr
ent (
AR
M+D
SP
) [m
A
F/FF/SS/FS/ST/T
Leakage current with back-bias
The Leakage current of Fast sample (F/F) is 70% reducedthe characteristics of Slow (S/S) samples are not changed
Target delay with back-bias
70% reduced
Typ, 1.2V@26C
LPLP--Tech4Tech4
no back-bias
15
Outline
Medity™ M2 overviewUse case analysis and low power issues
heavy workloadlight workloadstandby mode
Medity™ M2 low power techniquesHierarchical clock gatingAutomatic frequency controlMultiple Vt transistors and on-chip power switchBack-bias control (UltimateLowPower™)
Power measurement ResultsSummary and conclusions
16
Power consumptions in light workload80% power reduction is achievedMusic player (AAC) with WVGA displaying
0
20
40
60
80
100
(A) (B) (C) (D)
Rela
tive p
ow
er
consum
pti
on 100
70
3020
Typ, 1.2V@26C
80% reduction
OFFOFFOFF
ONOFFOFF
ONONOFF
ONONON
(1) Hierarchical clock gating
(2) Automatic frequency control(3)Power switch control(4) Back-Bias control
LPLP--Tech1Tech1
LPLP--Tech2Tech2
LPLP--Tech3a,bTech3a,bLPLP--Tech4Tech4
17
Power Consumption resultsLow power consumption is achieved in heavy and light workload
66.5mAVideo dec. (H.264, D1, 30fps)
22.3mAVideo dec. (H.264, QVGA, 30fps)
5.2mAVGA still image displaying 23.8mAAudio dec. (Enhanced AAC+, 48Kbps)
typ, 1.2V@26C (w/o I/O)
Application
LPLP--Tech1Tech1LPLP--Tech2Tech2LPLP--Tech3a, 3bTech3a, 3bLPLP--Tech4Tech4
18
M2 die Photo
ApplicationApplicationDigitalDigitalBasebandBaseband
Logic 15Mgate, SRAM 12Mbit8.52x8.52mm@65nm
FPBGA PKG529pin14x14mm□
19
M2 Products Photo
NTT DoCoMoN905i
NTT DoCoMoN905iμ
20
Summary and Conclusions
Achieve 80% power reduction in light workloadHierarchical clock gating and automatic frequency control by HWMuti-Vt transistor and on-chip power switch which is controlled by PMUBack-bias by UltimateLowPower
Expectation for EDA toolsMaturity and stabilityReducing performance variationEnhancing automatic optimization
To realize truly ultimated low power, Study the elements of power consumption in use casesApply the knowledge to the design from front-end to back-end phase