1 Challenges in VLSI Design Toward the New Millennium Takayasu Sakurai Prof. at Center for Collaborative Research, and Institute of Industrial Science, University of Tokyo E-mail:[email protected]1 1 Scaling and three crises 2 Power crisis 3 Interconnection crisis 4 Complexity crisis IDEC ’99/10
123
Embed
Challenges in VLSI Design Toward the New Millennium · 1 Challenges in VLSI Design Toward the New Millennium Takayasu Sakurai Prof. at Center for Collaborative Research, and Institute
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Challenges in VLSI DesignToward the New Millennium
Takayasu SakuraiProf. at Center for Collaborative Research, and
Institute of Industrial Science,University of Tokyo
Digital BipolarDigital BipolarDigital BipolarDigital Bipolar
AnalogAnalogAnalogAnalog
DiscreteDiscreteDiscreteDiscrete
T.Sakurai
Moore’s Law
10K
1M
100M
10G
1001970 1980 1990 2000
1M4M
16M64M
256M1G
2GD
evic
e co
unt p
er c
hip
Year
DRAMμμμμP
2010
T.Sakurai
System LSI for Next Generation Games
Clock freq. 300MHz
10M transistors
Graphics synthesizer integrate
40M tr. With embedded DRAM
Memory bandwidth 3.2GB/s
Floating operation 6.2GFLOPS/sec
3D CG 6.6M polygon/sec
MPEG2 decode
T.Sakurai
Applications of System LSI’s
PCprintergamePDA
hard disk • CDROMdisplay
communicationLAN/WAN
mobile phonewireless network
Fax • modem
digital TVdigital cameradigital movie
car navigationDVD • CD • MD
Digital consumer
PC & peripherals
Communication / network
T.Sakurai
Limit of Miniturization
Conventional I-V curve at 0.04µm (Even down to 0.014µm)
0.04µm MOSFET
0.0 0.4 0.8 1.2 1.6 2.00.00
0.21
0.42
0.63 Vg = 2.0 V
Vg = 1.6 V
Vg = 1.2 V
Vg = 0.8 V
Gate Length = 40 nm
Dra
in C
urre
nt[m
A/µ
m]
Drain Voltage [V]
0.04µm
M. Ono, M. Saito, T. Yoshitomi, C. Fiegna, T. Ohguro, and H. Iwai, "Sub-50nm gate Length N-MOSFETs with 10 nm Phosphorus Source and Drain Junctions", IEDM Technical Digest, pp. 119 -122, 1993.H. Kawaura, T. Sakamoto, Y. Ochiai, J. Fujita, and T. Baba, "Fabrication and Characterization of 14-nm-Gate-Length EJ-MOSFETs", Extended Abstracts of SSDM, pp.572-573, 1997.
T.Sakurai
Scaling law
T.Sakurai&A.Newton,"Alpha-power law MOSFET model and its application to CMOS inverter delay and other formulas",IEEE JSSC,vol25, no,2, pp.584-594, Apr. 1990.
Transistor Numbers are exponent to k (kn)
Voltage [V] -1Tr. size [x] -1Oxide thickness [t] -1Current [I~V1.3/t] - 0.3Tr. capacitance [Cg~x2/t] -1Tr. delay [Tg~CgV/I] -1.7Tr. power [Pg~CgV2/Tg] -1.3Tr. power density [p~Pg /x2] 0.7
CL • VS amount of charge loses VDD of potential-> CL • VDD • VS energy consumption per cycle
CL
αααα : Switching probabilityCL : Load capacitanceVS : Signal swingVDD : Supply voltageISC : Mean crowbar current∆∆∆∆tSC : Crowbar current durationfCLK : Clock frequencyIDC : DC current ILEAK : Subthreshold leak current
CL
VDD
T.Sakurai
Voltage waveform of CMOS inverter
CIN =10[pF] COUT= FO C IN
Target inverterID0P,ID0NI D0PIN,ID0NINvTHP
vTHN
t0 t1 tT
Inputvoltage
Outputvoltage
Short-circuit current
0 Time
N
THNDD
THNGSNDONDN
OUTOUT VV
VVIIdt
dVCα
−−
−=−≅
T.Sakurai
Short-circuit power dissipation formula
PTrPD
TPD
DDINPPDS
fovhFOvk
vgvVCfovkP
),()(2
),()(
0
0
220
αβα +=
22/
2/
)1()1()1(
)(1),( ++−−
−−+=NP
PN
TPTN
TPTNNT vv
vvf
vg αα
αα
ααα
evvvk PDPD
PD00
010
ln8.08.0
9.0)( +=
IN
OUT
CCFO =
PIND
PDP I
Ifo0
0=ND
PDr I
I
0
0=β
(Fanout)
K. Nose and T. Sakurai, "Closed-Form Expressions for Short-Circuit Power of Short-Channel CMOS Gates and Its Scaling Characteristics," ITC-CSCC (Korea), July 1998.
1)1()1()1(2),( +−−
−+=P
PP
TPTN
TPPT vv
vvh α
αα αα
T.Sakurai
Comparison between proposed formula and other formula
Verumu et al’s formula deviates from SPICE simulation fanout > 3 fanout is small
(diverge to infinity)
1 2 3 4 50
5
10
15
Fanout : FO
Shor
t-circ
uit p
ower
[pW
]
SPICEsimulation
Verumu formula
This work
ƒ=1[Hz]Tech. A
CIN=10[pF]
T.Sakurai
The change of the short-circuit power dissipation with scaling
0 1 2 3 4 5
0.1
0.2 η ηηη
P=P S/
(PD+P
S)
VDD [V]
VTH/VDD=0
VTH/VDD=0.1
VTH/VDD=0.2
VTH/VDD=0.3
Fanout=1
T.Sakurai
Voltage dependent gate cap. effect
-2 -1 0 10
50
100
150
200G
ate
capa
cita
nce
[fF]
Gate voltage : VG [V]
VDS=0V (linear)VDS=1V (saturation)I(COX)
VG
VDS
W/L=100µµµµ/0.4µµµµ
VTH=0.3V
T.Sakurai
Voltage dependent gate cap. effect
-0.2 0 0.2 0.4 0.60
1
2
3
4
VTH / VDD
Ave
rage
gat
e cu
rren
t (A
vera
ge C
gate
)
VDD=1VVDD=0.5VI(COX) FO=5
VTH=0.2FO=5
VTH=VTHOUT
DelayLarge C
-0.4 0 0.4 0.8 1.20
0.5
1
1.5
2
2.5
VTHout / VDD
Del
ay [n
sec]
inverter
COX
VDD=0.5V
T.Sakurai
Power & Delay Dependence on VDD & VTH
Power : P = pt •fCLK •CL •VDD + I0 •10 •VDD 2
V thS
(αααα=1.3)
k ・・・・ CL ・・・・ VDD
(VDD - Vth)ααααDelay =
k•QI
=
12
34
-0.400.40.8
00.2
0.4
0.6
0.8
1x 10-4
Vth (V)
VDD(V)
Pow
er (W
)
A
B
12
34
-0.400.40.8
0
1
2
3
4
5x 10
-10
Del
ay (s
)
Vth (V)VDD(V)
A B
T.Sakurai
Lowering Only Internal VDD (Example)3V
VDDINT
SwitchingDC-DC
Converter
>95%VDDEXT
VDDINT<=50%
0~3V
0~1.5V
1.5V 3Vleak
Swing Conv. 1 Swing Conv. 2
Leve
lcon
v. 2
Leve
lcon
v. 1
0~1.5V
0~3V
DC-DC Conv.
Inpu
t3V O
utpu
t 3V
DC-DC Conv.
EfficiencyEfficiency
VDDEXT
3V
1.5V
Internal VDD1.5V
T.Sakurai
Standby Power Reduction (SPR) CircuitISSCC'95 pp.318-31
VDD (2V)
VSS (0V)
VPBB (-2V)
VNWELL (2 or 4V)VNBB (4V)
Level ShifterVoltage Switch
V1 CW
M3 M4M5
CW
M2M1
V2V3
V4 VPWELL (0 or -2V)
St'by
St'by
are added to ensure reliability
• In standby mode and in IDDQ test, substrate bias is applied to increase VTH, which reduces leakage.
• In active mode, substrate bias is not applied to lower VTH, which ensures high speed.
T.Sakurai
Self-Adjusting Threshold-voltage Scheme(SATS)
CICC'94, pp.271-274
VBBN
Self-Sub-Bias
Circuit (SSB)
Leakage Sensor
leak
VGN1 Pwell
ON/OFF
low Vth →→→→ large leakage →→→→ SSB ON →→→→ deepVBB →→→→ high Vth
high Vth→→→→ little leakage →→→→ SSB OFF →→→→ shallow VBB →→→→ low Vth• control Vth to adjust leakage current• compensate Vth fluctuation
In active mode, low-VTH MOSFET’s achieve high speed. In standby mode when St'by signal is high, high-VTHMOSFET’s in series to normal logic circuits cut off leakage current.
T.Sakurai
VTCMOS / MTCMOS
p-well
VDDL
GND
n-well
Low-Vth
VDD
Low-Vth
GNDHi-Vth
VTCMOS MTCMOS
Threshold control with sub-bias On-off control of internal VDD/VSS
Principle
Merit/Demerit
VT control
St'by
o Low leakage in standby o Low leakage in standby- Needs circuit development + Conceptually easier+ Compensate Vth fluctuation - Compensate Vth fluctuation+ IDDQ test - IDDQ test+ No serial MOSFET - Large serial MOSFET
slower, larger, lower yield...o Conventional design tools o Conventional design tools+ Reuse of existing design - Special F/F's- Triple well is desirable - Two VTH's
T.Sakurai
Concept of Super Cut-off CMOS(SCCMOS)
St'by: VDD+0.4VActive: VSS
Low-VTH cut-off MOSFET
Low-VTH logic circuitVirtual VDD
VDD (0.5 - 0.8V)
pMOS insertion case
H.Kawaguchi and K.Nose, T.Sakurai, "A CMOS Scheme for 0.5V Supply Voltage with pico-Ampere Standby Current," 1998 ISSCC, Digest of Tech. Papers, pp.192-193, Feb. 1998.
SCCMOS0.2V VTH circuit with 0.2V VTHcut-off MOSFET
MTCMOS0.2V VTH circuit with 0.6V VTHcut-off MOSFET
ConventionalAll 0.6V circuitNo cut-off MOSFET
T.Sakurai
Dynamic Leakage Cut-off
VDD
2VDD
-VDD
VSS t
VNWELL
VWL
VPWELL
Select Disselect
VNWELLDriver
Addr
ess
deco
der
VPWELLDriver
VBLm-1 VBLm-1
VDD
VSS
VNWELL
VPWELL
VBL0 VBL0
VWL
VWL+1
# of selected bit at a time
T.Sakurai
Leakage Reduction of DLC SRAM
VTH [V]0 0.1 0.2 0.3 0.410-8
10-710-610-510-410-310-210-11
I LEAK
[A]Memory capacity: 1MBit
VTH=0.25V
VDD=1V
w/o DLC
Total subthreshold leak of 1Mbit SRAM. At 1V VDD, VTH of the dormant cell is 0.25V while that of the active cell is 0V, keeping the total leakage power at 0.9mW.
T.Sakurai
Dynamic Leakage Cut-off (DLC) SRAM
Addressdecoder MCs
Wellbiasdriver
H.Kawaguchi and T.Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% Power Reduction," IEEE J. of Solid-State Circuits, pp.807-811, May 1998.
T.Sakurai
Area Overhead of DLC SRAM
16 32 64 1280
0.1
0.2
0.3
0.4
0.5
# of selected bit at a time
Are
a O
verh
ead
Memory capacity: 1MBit
T.Sakurai
Clustered Voltage Scaling for Multiple VDD’s
Lower VDD portion is shown as shaded
CVS StructureConventional Design
Critical Path
Level-Shifting F/F
Critical Path
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb.1998.
Once VL is applied to a logic gate, VL is applied to subsequent logic gates until F/F’s to eliminate DC current paths. F/F’s restore VH.
T.Sakurai
Slave-Latch Level-Conversion F/F
CK
CK
CKCLK CK
D
VL
CK
Q
CK
VH
M1 M2
CK
T.Sakurai
Dual-VS Scheme
VH
Combi-national Logic
(VL-cell)
Combi-national Logic
(VH-cell)
Inpu
t Pad
s
Out
put P
ads
VL
VDD
DQ
CLKD
Q
VSH critical path
replica (VH-cells)CLK
DC-DC
VSLcritical path
replica (VL-cells)
DC-DC CLK
clock tree Level-Conversion Flip-Flop
T.Sakurai
Power Reduction vs. VL/VH
Optimum VL/VH is between 0.6~0.7for any kinds of path-delay distribution functions.
t
p(t)
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Pow
er re
duct
ion
ratio
VL /VH
t
p(t)
t
p(t) t
p(t) t
p(t)
T.Sakurai
Path-delay Distribution in Dual-VS
MEF (1527 cells)
MCB (1366 cells)
VLD (3812 cells)
DMA (1493 cells)
DCT (5466 cells)
RISC (5645 cells)
before
after
t
MEC (2912 cells)
p(t)
IDCT (6227 cells)
VLC (3462 cells)
T.Sakurai
Clustered Voltage Scaling Technique
M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb.1998.
Zero Temperature Coefficient(ZTC) point around VGS=1.0V
VZTC ≈≈≈≈ 1V
0 1 20.0
0.1
0.2
0.3
0.4
VGS [V]
I DS
[mA
]
NMOS
PMOS
Temperature increases
0ºC120ºC
VZTC
• Temp. coeff < 0
• Temp. coeff > 0
when VDD > VZTC
when VDD > VZTC
Measured
K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai,"Design Impact of Positive Temperature Dependence of Drain Current in Sub 1V CMOS VLSI's",CICC99, pp.563-566, May 1999.
T.Sakurai
Cause of positive temp. dependence of IDS
IDS ∝∝∝∝ µµµµ(T) ( VDD - VTH(T) )αααα
• αααα-power law model
µµµµ(T) = µµµµ(T0)(T / T0)-m
VTH(T) =VTH(T0) - κκκκ( T - T0 )
T T
Typical Value : αααα=1.5, m=1.5, κκκκ=2.5[mV/T]
(T = Temp. µµµµ= Mobility)
Effects of VTH and µµµµ on IDS when temp. goes up 100[K]
Better package is needed to avoid thermal runaway in low voltage.
K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai,"Design Impact of Positive Temperature Dependence of Drain Current in Sub 1V CMOS VLSI's",CICC99, pp.563-566, May 1999.
T.Sakurai
Careful temperature design for low-voltage
IDS and gate speed shows positive temperature dependence in VDD < 1V region.This will change the design validation process for worst conditions.
In low-VDD, low-Vth designs, temperature goes up much more than the high-VDD, high-VTH design, even if power consumption at room temperature and package are the same.
T.Sakurai
D-type CMOS
K~1 (K=0.91 in this case)
D-type leakage can not be neglected in the range VTH<-0.2V.
)()( NLEAKP
DDL
NOFFPON
DDLLH II
VCKII
VCKt−
=−
=0
)()( PLEAKN
DDL
POFFNON
DDLHL II
VCKII
VCKt−
=−
=0
2HLLH
dttt +
=
SPICEK CLVDD/IONK CLVDD/(ION -IOFF)
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.30
0.1
0.2
0.3
0.4 10-4
10-5
10-6
10-7
10-8
10-9
10-10
Threshold voltage : VTH [V]
Del
ay [n
s]
Leak
age
pow
er d
issi
patio
n [W
]
VDD=0.5V
VTH
VTH
T.Sakurai
Power Distribution in CMOS LSI's
Clock
ASSP1
LogicMemory
I/O
ASSP2
Clock
Logic
MemoryI/O
MPU1 Clock
Logic
MemoryI/O
MPU2Clock
Logic
Memory
I/O
T.Sakurai
Power Distribution in Processor
High-end µP
ClockDatapath
Memory
I/O &Synthesized
Logic
Courtesy: Dr. Vivek Tiwari, Intel
o Synthesis for low-power is not so effective.
o Clock system is the key. In this respect, gated clock is one of the most efficient way to reduce the power in current processors.
o Gated clock is useful in reducing average power but not that effective in reducing peak power.
o Circuit / device level is important.
T.Sakurai
Reduced Clock Swing Flip-Flop
(a) RCSFFVoltage swing of CLK is reduced toVclk down to 1V.
(b) Conventional F/F
H.Kawaguchi and T.Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63%Clock Power Reduction ," in Symp. on VLSI Circuits '97, June,1997.
2.5 2.5
5.0 5.0 5.0
2.5
5.0 5.02.5 2.5 5.0
5.02.5 2.5
5.0 5.02.5 2.5
CLK
D
Q5.02.5 Q
5.02.5
5.02.5
CLK
D
Wclk
D
Q
Q
CLK
CLK
VWELL
0.5 0.5
0.5 0.5
2.5
2.52.50.5
2.5
3.5 3.5
3.5 3.5
3.5 3.53.5 3.5
P1 P2
N1
(3.3V~6V)
PP
φφφφφφφφ
φφφφφφφφ
φφφφ φφφφ
φφφφ
φφφφ
φφφφ
φφφφ
T.Sakurai
Layout Example
CK D
24µm
15µm
20µm
15µm
(b) Conventional F/F(a) RCSFF
T.Sakurai
Delay and power comparison
1 1.5 2 2.5 30
0.5
1
Vclk [V]
Clo
ck-to
-Q D
elay
[ns]
Conv.
Wclk=6.5µm
10µm
20µm
1 1.5 2 2.5 30
50
100
150
Vclk [V]
Pow
er p
er F
/F [µ
W]
Conv.
Type A driver
Type B driver
VWELL=3.3V
VWELL=6V
Wclk=10µm
CLK
CLK
CLK
Vclk
VDD
VDD
Type A1
Type B
Type An
Type A
T.Sakurai
Modified Sense Amplifier-Based F/F
B.Nikolic et al., “Sense Amplifier-Based Flip-Flop,” ISSCC, pp.282-283, Feb.1999.
This can be used with RCSFF scheme.
T.Sakurai
Ultra Low-Voltage Operation
inverter (T=300K)
2nand(T=300K)
inverter (T=77K)
Vin (V)
Vout
(V)
50mV
25mV
100mV360mV
140mV
J.Burr&J.Shott,"A 200mV Self-Testing Encoder/Decoder using Stanford Ultra-Low-Power CMOS",ISSCC94, pp.84-85.
(Stanford Univ.)
T.Sakurai
Ultra Low-Voltage Operation
T.Sakurai
Vth, Leff, tox Optimized Low-Power MOS
M.Kakumu et al.,"Low-Voltage and Power CMOS Technology", SSDM, 1995, pp.213-
T.Sakurai
SOI Processors in ISSCC’99Paper# WP25.1 WP25.3 WP25.7 WP25.4Company IBM (East Fishkill) IBM (Essex & Austin) IBM (Rochester) SamsungTarget PowerPC 604e PowerPC 750 PowerPC Alpha
32b for Apple 64b 64bPD/FD PD PD (SIMOX) PD (SIMOX) FD (SIMOX/Unibond no dep.)Rule 0.25um 0.2um (Leff=0.12um) 0.25umInterconnect 5 Al + W local Cu 6 Cu 4 AlArea 49mm2 139mm2 209mm2# of Tr's 6.5M 34M 9.7MFreq. 500MHz 580MHz@85C, fast proc. 550MHz 600MHzVDD 1.7V 2V 1.8V 1.5V (2V I/O)Power 5.1W @2V,400MHz 24W 40WSpeed gain ov25-30% 20% 20% 30%@1.2V, 20%@1.5V SRAM
22% Ctotal reduction 12% by Cj 15-20% simple gates10-15% more Ids 15-25% by less body-bias25-40% complex gates
T.Sakurai
Hi-Speed is Low-Power
From URL: www.erniefernandez.com/html/soi.html
T.Sakurai
Advantage of SOI over Bulk CMOSo Lower CJ and CGROUND achieves 20% lower CTOTAL. Good
for hi-speed & low-power. (For interconnection limiting cases, less effective)
o 10-15% higher IDS due to lower VTH in turning-on and parasitic bipolar current (Effects reduced in VDD=0.6V)
o Lower negative body-bias effect in pass-gates and series-connected MOS’s as in NAND’s achieves higher IDS and hence hi-speed.
o s of 60mV/dec is achievable in FD and DTMOS. Lower VTH is possible with the same off-leak. (Less effective in lower VTH like 0.1V)
o Lower SER (Normal dynamic gates )
o 25-30% higher speed in total for 0.25um generation
T.Sakurai
Design Issues of PD-SOI
o History dependent delay (3-8% fluctuation)
o Pass-gate leakage by parasitic bipolar current (pull-down internal nodes)
o Lowered noise immunity in dynamic circuits (several techniques)
o Self-heating (only for circuits with DC current path. 4
o ESD protection (process/device & circuits remedies)
o Redesign efforts (higher for PD, lower for FD)
o Higher wafer cost
T.Sakurai
Dynamic Threshold MOSFET (DTMOS)
F.Assaderaghi et al.,"A Dynamic Threshold Voltage MOSFET (DTMOS) for Very Low Voltage Operation", ED Letters, vol.15, no.12, Dec. 1994.T.Fuse, et al. "A 0.5V 200MHz 1-Stage 32b ALU Using a Body Bias Controlled SOI Pass-Gate Logic," in ISSCC Dig. Tech. Papers, pp. 286-287, Feb., 1997.
T.Sakurai
Pass Transistor Logic with SOI
C C
A
B
A,A
B,B
C,C
OutOut
Pass tr. NMOS networkwith DTMOS
T.Fuse, et al. "A 0.5V 200MHz 1-Stage 32b ALU Using a Body Bias Controlled SOI Pass-Gate Logic," in ISSCC Dig. Tech. Papers, pp. 286-287, Feb., 1997.
For NMOS with VDD=0.5VGate is 0.5V → Body bias=0.5V → Vth= -0.05VGate is 0V → Body bias=0V → Vth= 0.15V
T.Sakurai
Pass Transistor Logic with SOI
0 1 2 3 4Supply Voltage (V)
0.1
1
10
100
10001000
800
600
400
200
0
Bulk Pass-gate
FrequencyPower
DTMOSPass-gate
Act
ive
Pow
er (m
W)
T.Fuse, et al. "A 0.5V 200MHz 1-Stage 32b ALU Using a Body Bias Controlled SOI Pass-Gate Logic," in ISSCC Dig. Tech. Papers, pp. 286-287, Feb., 1997.
T.Sakurai
DTMOS vs. Normal SOIDTMOS SOI
• Suppose DTMOS ≈≈≈≈ front gate + back gate• IDS/CG of back gate device < IDS/CG of front gate device.
• DTMOS needs body contact area. FD SOI can use larger W.
• Both can achieve s=60mV/dec.
• With the same leakage and area, which is really faster?
• DTMOS is good in driving large CLOAD.
• Pass transistor will show better performance with DTMOS.
T.Sakurai
CMOS Static vs. Pass-Transistor Logic
Reduced number of transistors leads to low-power, high-speed and reduced area.
Pass-Tr. Logic Synthesis with BDDBDD: Binary Decision Diagram
T.Sakurai and A.R.Newton, "Multiple-Output Shared Transistor Logic (MOSTL) Family Synthesized Using Binary Decision Diagram," Dept. EECS, Univ. of Calif., Berkeley, ERL Memo M90/21, Mar. 1990.
c00001111
1 0 0 1 00 1 1
a
0 1 1 0 11 0 0
f
a a a a a a a a
b b b bb
c
fcc
aaaaaaa
bbb
cb fa0 0 00 1 11 0 11 1 00 0 10 1 01 0 01 1 1
10010110
BDD for function f BDD for function f
SumSum
Truth table forf & f
f
T.Sakurai
x
BDD Reduction Rules
Rule 1
Collapse two nodes A1 and A2 whose right and left branch each point to the same node.
Rule 2
Eliminate a node A whose right and left branch point to the same node.
x
y y
B B
BDD
A
BDD
y y
BBDD
xA1
BDD
x x xA2
C B C
x xA
z z
T.Sakurai
BDD Reduction Example
1
c ccc
bb
f f
b bbb b
b
a a a a0
1 0
f fc c cc
b b b b bbbb
aa aa aa a
aa
a a a a a aa
1 0
ff
c c
b b
a a
c c
b b
a a
Reducing & by Rule 1
Reducing & by Rule 1
T.Sakurai
Mapping BDD to MOS Circuit
1 0
ff
c c
b b
a a
c c
b b
a a
VSSVDD
f ffff
c c
c c
b b
bb
a aa a
Mapping toMOS circuit
Introducingpass variables
f f
c c
c c
b bbb
a a
1111 →→→→ VDD
→→→→ VSS0000
x branch to VDD x branch to VSS x branch to VSS x branch to VDD
pass variable x
pass variable x
→→→→
→→→→
T.Sakurai
Approach to low-power LSI
Example of MPEG2 decodingProcessor (software)~~~~25W
DSP~~~~4W
Dedicated sytem LSI (SW/HW)~~~~0.7W
Low
-pow
er
Hig
h fle
xibi
lity
T.Sakurai
Power * Area vs. Performance
0
1
2
3
4
10 100 1000 10000Power * Area (W mm2)16
bit p
erfo
rman
ce (G
OPS
)µP + Multimedia extensionMediaprocessor for PCMediaprocessor for AV
T.Sakurai
Homogeneous vs. Heterogeneous
SpecialEngine
Homogeneous Architecture
(High flexibility)
Heterogeneous Architecture(System LSI)
(Low-power, more efficient)
MPUMPUMPUMPU
Memory
I/F, Analog
MPUMPUMPUMPU
I/F, Analog
MPUDSP
Memory
T.Sakurai
DRAM Embedding
DRAM Processor System LSI
Two orders of magnitude improvement in bandwidth and power
K.Sawada, T.Sakurai, et al, "A 72K CMOS Channelless Gate Array with Embedded 1Mbit Dynamic RAM," in Proc. CICC'88, pp.20.3.1-20.3.4, May 1988.
T.Sakurai
Neural chip
3 orders of magnitude smaller power consumption for recognition compared to software implementation
B.M.Gordon, E.Tsern, T.Meng,"Design of a Low Power Video Decompression Chip Set for Portable Applications," J. of VLSI Signal Processing Systems 13, pp.125-142, 1996
T.Sakurai
Software-Hardware cooperation
A.Chandrakasan, R.Amirtharajah, S.H.Cho, J.Coodman, G.Konduri, J.Kurik, W.Rabiner and A.Wong, ”Design Considerations for Distributed Microsensor systems," CICC99, pp.279-286, May 1999.
StrongArm-1100
(Clock frequency control instruction equipped, an encryption algorithm)
o Code optimization for power -> factor of 5 power reduction
o Adaptive VDD control together with frequency control -> factor of 3 further power reduction
T.Sakurai
Important technologies for low-power
Low-voltage• VTH control, multi-VTH, SOI, leakage control• VDD control, multi-VDD , DC-DC conv.• Ultra low voltage circuit (PLL, analog)• Software controlLow-swing• Bus, clockLow-C• Less # of Tr’s, fused digital-analog, pass-transistor• Low-k (air isolation) • System on a chip, memory embeddingLow- αααα ƒ• Locally synch.-globally asynch., gated clock• Low transition coding
P = αααα ƒ C Vs VDD + leak power
T.Sakurai
Lorentz Force MOS (LMOS)
Electrons deflected by By. Voltage difference between Vo1 and Vo2
Powersupply line
Gate DrainSourceN+ N+
B
e-
IP
vx
FVo1
Vo2N+
N+By
WP
K.Nose and T.Sakurai,”Micro IDDQ Test using Lorentz Force MOSFET's,” Symp. On VLSI Circuits, June 1999.
T.Sakurai
Microphotograph of LMOS
10 parallel connection
Wp : 10µµµµm 8µµµµm 5µµµµm 2µµµµm
T.Sakurai
Measured ∆∆∆∆VD dependence on IP
0 5 10
1
2
3
4
Power supply current [mA]
∆ ∆∆∆V D
[µ µµµV]
WP=8µµµµm
VDDT=VGT=2V
∆∆∆∆VD is proportional to IP.
T.Sakurai
Circuit for micro IDDQ test
It is possible to measure the current of thousands LMOS.
Shift registers are used to control the gate of LMOS.
D Q D Q D Q
CLK
VSTART
Shiftregister
VDDT Pad
Macro1
Macro2
Macro3
VDD
Pad
VD
T.Sakurai
Low-Power CMOS LSI Circuit Techniques
Low VDD
General
Bus
Data Path
Random Logic
Memory
I / O
CLpt
• gated clock
Glitch Suppress
VS VDD f CLK I SC I DC I LEAKCareful Design• design verif. by CAD
T.Sakurai, H.Kawaguchi and T.Kuroda, "Low-Power CMOS Design through VTH COntrol and Low-Swing Circuits," invited, 1997 International Symp. on Low-Power Electronics and Design, pp.1-6, Aug.1997.
T.Sakurai
SA-F/F (Sense-Amplifying Flip-Flop) circuits
CLK
SA-F/F
Q
Q
D
D
fP
fP
fP
fP
DVin
A
A
B
B S
S
CLK
NMOS Dynamic Differential Logic
CLK
XOR Gate
T.Sakurai
Skin Effects for Signal Lines
108 109 101010-8
10-7
10-6
10-5
Frequency (Hz)
Skin depth
Hi-end clock freq.
Cu wire
Low-end clock freq.
Skin
dep
th, i
nter
conn
ect w
idth
[m]
T.Sakurai
Skin Depth and R Increase
a/D1 10 1001
10
100
0
Da
D: skin depth
RÄ
/ R0
: Inc
reas
ed R
by
skin
effe
ct
T.Sakurai
Inductance?
・ Now RC effects surmounts LC effects because R > |jωωωωL|.
・ In the future, both of R and ωωωωLincrease (R increases more rapid?).
・ Exception in low-R lines
・ Inductive effects in wide clock lines in a fast processor are claimed to be observed in simulation.
・ Clock lines are placed on power plane to reduce inductive effects.
[1] D.A.Priore, "Inductance on Silicon for Sub-micron CMOS VLSI," Symp. on VLSI Circuits, 1993.
W / H
L : S
elf-i
nduc
tanc
e (n
H/ c
m)
L = 2 ln 6H0.8W + T
100
10
10.001 0.01 0.1 0.5
0.1
0.01
0.001T/H=
T.Sakurai
Inductive Effects
10-2
10-1
100
101
102
1996 2000 2004 2008 2012Year
ω ωωωL
/ R
Min. width (scaled)
W=1μm
W=10μm
W=100μm
T.Sakurai
Inductive Effects in Clock Lines
Board design practice is imported in LSI.
P.J.Restle & A Deutsch, “Designing the Best Clock Distribution Network,” VLSI circuits symp., pp.2-3, May 1998.
T.Sakurai
Interconnect Cross-Section and Noise
Unscaled / anti-scaled・ Clock・ Long bus・ Power supply
Scaled interconnect・ Signal
1V 15W -> 15A current5% noise -> 0.05V noise -> 3mΩΩΩΩ sheet R -> 10µm thick AlArea pad + package, or thick layer on board is needed.
T.Sakurai
Possible solutions for interconnect issuesArchitecture
• Hierarchical architecture, local memories (10~)
Circuit• Repeater (5)• Line width sizing (10)• Sense amplifier (5)• Interconnection pipelining (10)• Differential circuit (10)
Device / Process• Low-r (Cu 1.3 (10 for EM)), Low-ε (F 1.1, polymer 2, air 4)• Multi-layer interconnection (un/anti-scaled layers 100)• Area pads + thick package / board layers (10)
CAD• R, C extraction, fast simulation (1000)• Optimization (repeater insertion...)
T.Sakurai
Three crises in VLSI designs
Power crisis
Interconnection crisis
Complexity crisis
T.Sakurai
VLSI Design in 2010
Designing a map of 10m wide roadsfor a world atlas
T.Sakurai
Complexity vs. Productivity
System LSI design complexity increases faster than productivity. (http://notes.sematech.org/97melec.htm)
2000 2002 2004 2006 2008 2010 20121
10
100
1000Design complexity
Productivity improved with lots of development
Productivity improved with current rate
T.Sakurai
Coping with complexity crisis
MPU Core
Cache
ROMRAM
MPEG CoreUSB Core
ProprietaryLogic
IP (A inc.)IP (B univ.)IP (C inst.)IP (D semi.)
IP ; CPU, DSP, memories, analog, I/O, logic..HW/FW/SW
• Re-use and sharing of IP’s• Design at high abstraction
T.Sakurai
Hot design topics initiates CAD tools
S/W, H/W Co-design
Behavioral
RTL
Logic
Circuit
Physical (deep submicron)
New dimensions• LSI/package/board• Power• RC delay• Signal integrity• Interconnect reliability• Noise• IR drop• Distribution of parameters• Memory embedding• Analog-digital mix...
New dimensions• LSI/package/board• Power• RC delay• Signal integrity• Interconnect reliability• Noise• IR drop• Distribution of parameters• Memory embedding• Analog-digital mix...
Total system design
T.Sakurai
LSI in 2014
Year Unit 1999 2014 FactorDesign rule µm 0.18 0.035 0.2Tr. Density /cm2 6.2M 390M 30Chip size mm2 340 900 2.6Tr. Count per chip (µP) 21M 3.6G 170DRAM capacity 1G 1T 256Local clock on a chip Hz 1.2G 17G 14Global clock on a chip Hz 1.2G 3.7G 3.1Power W 90 183 2.0Supply voltage V 1.5 0.37 0.2Current A 60 494.6 8Interconnection levels 6 10 1.7Mask count 22 28 1.3Cost / tr. (packaged) µcents 1735 22 0.01Chip to board clock Hz 500M 1.5G 3.0# of package pins 810 2700 3.3Package cost cents/pin 1.61 0.75 0.5