1 Intel Labs July 5, 2005 Microprocessor and Microprocessor and DSP Technologies for DSP Technologies for the Nanoscale Era the Nanoscale Era Seminar 1 Seminar 1 Ram Kumar Krishnamurthy Ram Kumar Krishnamurthy Microprocessor Research Labs Microprocessor Research Labs Intel Corporation, Hillsboro, OR Intel Corporation, Hillsboro, OR [email protected][email protected]
48
Embed
1 Intel Labs July 5, 2005 Microprocessor and DSP Technologies for the Nanoscale Era Seminar 1 Ram Kumar Krishnamurthy Microprocessor Research Labs Intel.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
IntelLabsJuly 5, 2005
Microprocessor and DSP Microprocessor and DSP Technologies for the Technologies for the
Nanoscale EraNanoscale Era
Microprocessor and DSP Microprocessor and DSP Technologies for the Technologies for the
Nanoscale EraNanoscale EraSeminar 1Seminar 1
Ram Kumar KrishnamurthyRam Kumar KrishnamurthyMicroprocessor Research LabsMicroprocessor Research LabsIntel Corporation, Hillsboro, ORIntel Corporation, Hillsboro, [email protected]@intel.com
• >50 patents, >25 papers per year>50 patents, >25 papers per year
About Circuits Research LabAbout Circuits Research Lab
33
0.01
0.1
1
10
100
1000
10000
100000
1000000
1970 1980 1990 2000 2010
MIP
S
Pentium® Pro Architecture
Pentium® 4 Architecture
Pentium® Architecture
486386
2868086
Strong demand for > 1 TIPS performance beyond this decadeStrong demand for > 1 TIPS performance beyond this decadeHow do you get there?How do you get there?
Strong demand for > 1 TIPS performance beyond this decadeStrong demand for > 1 TIPS performance beyond this decadeHow do you get there?How do you get there?
Motivation: Higher performance at Motivation: Higher performance at lower power and costlower power and cost
44
Our Research Agenda OutlookOur Research Agenda Outlook
0.70.7 ~0.7~0.7 >0.7>0.7 Delay scaling will slow downDelay scaling will slow down
Energy/Logic Op Energy/Logic Op scalingscaling
>0.35>0.35 >0.5>0.5 >0.5>0.5 Energy scaling will slow downEnergy scaling will slow down
Bulk Planar CMOSBulk Planar CMOS High Probability Low ProbabilityHigh Probability Low ProbabilityAlternate, 3G etcAlternate, 3G etc Low Probability High ProbabilityLow Probability High ProbabilityVariabilityVariability Medium High Very HighMedium High Very HighILD (K)ILD (K) ~3~3 <3<3 Reduce slowly towards 2-2.5Reduce slowly towards 2-2.5RC DelayRC Delay 11 11 11 11 11 11 11 11
Metal LayersMetal Layers 6-76-7 7-87-8 8-98-9 0.5 to 1 layer per generation0.5 to 1 layer per generation
Internal UniversityFCRP(MARCO)
55
Intel’s Research FocusIntel’s Research FocusTechnology Leadership
Complete solution stack
0.01
0.1
1
1990 2000 2010
10
100
1000
nm
Gate Length
Industry
Intel
Technology Arch & Design Platforms Software
66
Architectures & DesignsArchitectures & DesignsBack End Back End
PowerPower ~130W~130W ~100 W~100 W < 100 W< 100 W ~25 W~25 W < 1W< 1W
Power MetricPower Metric Watts/sq ftWatts/sq ft
Watts/cu ftWatts/cu ft
WattsWatts Watt-hoursWatt-hours
Battery LifeBattery Life
CostCost HighHigh HighHigh MedMed MedMed LowLow
Our research agenda addresses all these platformsOur research agenda addresses all these platformsOur research agenda addresses all these platformsOur research agenda addresses all these platforms
77
Is Transistor a Good Switch?Is Transistor a Good Switch?
On
I = ∞
I = 0
Off
I = 0
I = 0
I ≠ 0
I = 1ma/u
I ≠ 0
I ≠ 0Sub-threshold Leakage
88
Sub-threshold LeakageSub-threshold Leakage
Transistors will not be Transistors will not be switchesswitches, but , but
dimmersdimmers
Transistors will not be Transistors will not be switchesswitches, but , but
dimmersdimmers
Ids(log)
Vgs
Vt
ExponentialIncrease in Ioff
MOS Transistor Characteristics
1
10
100
1000
10000
30 50 70 90 110 130
Temp (C)
Ioff
(n
a/u
)
0.25
90
45
0.1
1
10
100
1000
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
SD
Lea
kag
e (W
atts
)
2X Tr Growth1.5X Tr Growth
99
Leakage PowerLeakage Power
0%
10%
20%
30%
40%
50%
1.5 1 0.7 0.5 0.35 0.25 0.18 0.13 0.09 0.07 0.05
Technology (m)
Lea
kag
e P
ow
er(%
of
To
tal)
Must stopat 50%
Leakage power limits Vt scalingLeakage power limits Vt scalingLeakage power limits Vt scalingLeakage power limits Vt scaling
A. Grove, IEDM 2002
1010
M21 M2j M2K
Pk0Clock
M11M1j M1K
I Leak
Dyn_out
INV_out
High Leakage High Leakage Impacts Functionality Impacts Functionality
Sub-65nm Dynamic Circuit Active Leakage Tolerance:Sub-65nm Dynamic Circuit Active Leakage Tolerance: Cache, RF, Arrays, Bitlines most affectedCache, RF, Arrays, Bitlines most affected Keeper sizes > 50% of pulldown strengthKeeper sizes > 50% of pulldown strength High contention High contention degraded performance degraded performance Slow keeper shutoff Slow keeper shutoff high short-circuit power high short-circuit power
1X 3X 5X 10X 20X
Subthreshold + gate leakage
0
0.4
0.8
1.2
1.6Sub-70nm
Kee
per
/ p
ulld
ow
n r
atio
M. Anders, R. Krishnamurthy et al, 2001 Symp. VLSI CircuitsM. Anders, R. Krishnamurthy et al, 2001 Symp. VLSI Circuits
1111
Power Will be the LimiterPower Will be the Limiter
Applications will demand TIPS performanceApplications will demand TIPS performance But the Power…But the Power…
Challenge: Highest performance in the power envelope Challenge: Highest performance in the power envelope Challenge: Highest performance in the power envelope Challenge: Highest performance in the power envelope
1212
Power TrendPower Trend
1
10
100
1985 1990 1995 2000
386386
Pentium® processorPentium® processor
Pentium® II processorPentium® II processor
Pentium® 4 processorPentium® 4 processor
Cooling Capacity Of Conventional SystemCooling Capacity Of Conventional System
C scales by 30% per generation…C scales by 30% per generation………but Vcc scales by 10-15% only!but Vcc scales by 10-15% only!
Must maintain or reduce power in futureMust maintain or reduce power in future
C scales by 30% per generation…C scales by 30% per generation………but Vcc scales by 10-15% only!but Vcc scales by 10-15% only!
Must maintain or reduce power in futureMust maintain or reduce power in future
486486
““Business As Usual”Business As Usual” is Not an Option is Not an Option
““Business As Usual”Business As Usual” is Not an Option is Not an Option
Po
wer
(W
)P
ow
er (
W)
1313
Gate Oxide is Near LimitGate Oxide is Near Limit
Poly Si Gate Electrode
Si Substrate
1.5 nm Gate Oxide
70 nm
Si3N4
CoSi2130nm Transistor
Intel’s High K leadershipIntel’s High K leadership is crucial for the is crucial for the
industryindustry
Intel’s High K leadershipIntel’s High K leadership is crucial for the is crucial for the
Need low leakage and leakage tolerant techniquesNeed low leakage and leakage tolerant techniquesNeed low leakage and leakage tolerant techniquesNeed low leakage and leakage tolerant techniques2-200X reduction
1717
Dual Vt Design for Active Leakage ReductionDual Vt Design for Active Leakage Reduction
Logic path between latch boundaries
DelayNu
mb
er
of
pa
ths
DelayNu
mb
er
of
pa
ths
DelayNu
mb
er
of
pa
ths
Technology provides two Vt
High Vt with nominal Ioff (lower performance)
Low Vt with ~10X higher loff (higher performance)
Employing high Vt everywhere yields lowerperformance, and lower leakage (1X)
Employing low Vt everywhere yields higherperformance, but higher leakage (10X)
Selective usage of low and high Vt yields higher performance, yet low leakage between 1X, and <<10X
High Vt
Low Vt
Dual VtDual Vt
1818
Chip Multi-ProcessingChip Multi-Processing
1
1.5
2
2.5
3
3.5
1 2 3 4
Die Area, PowerR
ela
tive
Pe
rfo
rma
nce
CMP
ST
C1 C2
C3 C4
Cache
• Multi-core, each core Multi-threaded• Shared cache and front side bus• Each core has different Vdd & Freq• Core hopping to spread hot spots• Lower junction temperature
1919
Memory LatencyMemory Latency
MemoryCPU Cache
Small~few Clocks Large
50-100ns1
10
100
1000
100 1000 10000
Freq (MHz)M
em
ory
La
ten
cy
(C
loc
ks
)Assume: 50ns Memory latency
Cache miss hurts performanceCache miss hurts performanceWorse at higher frequencyWorse at higher frequency
Need power efficient high-speed I/O techniquesNeed power efficient high-speed I/O techniques
Cache miss hurts performanceCache miss hurts performanceWorse at higher frequencyWorse at higher frequency
Need power efficient high-speed I/O techniquesNeed power efficient high-speed I/O techniques
2020
Increase on-die MemoryIncrease on-die Memory
Pentium ® 4
Pentium III & 4Pentium III
Pentium II
Pentium Pro
Pentium
0%
20%
40%
60%
80%
100%
m m m m m m m
Cache % offull chiparea ?
Large on die memory provides:
1. Increased Data Bandwidth & Reduced Latency
2. Hence, higher performance for much lower power
1
10
100
m m m m
Po
we
r D
en
sit
y (
Wa
tts
/cm
2 )
Logic
Memory
2121
Special Purpose Hardware AccelerationSpecial Purpose Hardware Acceleration
TC
B ExecCore
PLL
OOO
ROMC
AM
1
TC
B ExecCore
PLL
ROB
ROMC
LB
Inputseq
Sendbuffer
2.23 mm X 3.54 mm, 260K transistors
Opportunities for acceleration:Network processing enginesMPEG Encode/Decode enginesSpeech enginesWireless communication/baseband
Non-critical Sum GeneratorNon-critical Sum GeneratorNon-critical Sum GeneratorNon-critical Sum Generator
Non-critical path: ripple carry chain Reduced area, energy consumption, leakage Generate conditional sums for each bit Sparse-tree carry selects appropriate sum
Pi Pi+1Pi+2 ,Gi+2
Sumi+1Sumi+2Sumi+3Sumi+3
XOR XORXOR XOR
Pi+3,Gi+3
Sumi
Su
mi ,1
Su
mi ,0
Carry
Gi+1
2:1 2:1 2:1
11 00
2:12:1
CMCM CMCM
CMCMCMCM CMCM
CMCMCMCM CMCMCMCM
XORXOR XORXOR
3939
Adder Core Critical PathAdder Core Critical Path
Critical path: 7 gate stages same as KSSparse-tree: single-rail dynamicExploit non-criticality of sum generatorConvert to static logicSemi-dynamic design
Significant Challenges AheadSignificant Challenges AheadCan only be solved with joint industry-university Can only be solved with joint industry-university
collaborationcollaboration
Significant Challenges AheadSignificant Challenges AheadCan only be solved with joint industry-university Can only be solved with joint industry-university
collaborationcollaboration
Speculative, OOO
Era of Era of Instruction Instruction
LevelLevelParallelismParallelism
Super Scalar
486386
2868086 Era of Era of
PipelinedPipelinedArchitectureArchitecture
Multi ThreadedEra of Era of
Thread &Thread &ProcessorProcessor
LevelLevelParallelismParallelism
Special Special Purpose HWPurpose HW
Multi-Threaded, Multi-Core
4444
Thank You for Your AttentionThank You for Your Attention
Q&AQ&A
Our publications can be found in:Our publications can be found in:•IEEE Intl. Solid-State Circuits Conference, 2001-IEEE Intl. Solid-State Circuits Conference, 2001-•IEEE Journal of Solid-State Circuits, 2001-IEEE Journal of Solid-State Circuits, 2001-•Symposium on VLSI Circuits, 1999-Symposium on VLSI Circuits, 1999-•Intl. Symposium on Low-power Design, 1999-Intl. Symposium on Low-power Design, 1999-•Custom Integrated Circuits Conference, SOCC, etc., 1999-Custom Integrated Circuits Conference, SOCC, etc., 1999-
Our publications can be found in:Our publications can be found in:•IEEE Intl. Solid-State Circuits Conference, 2001-IEEE Intl. Solid-State Circuits Conference, 2001-•IEEE Journal of Solid-State Circuits, 2001-IEEE Journal of Solid-State Circuits, 2001-•Symposium on VLSI Circuits, 1999-Symposium on VLSI Circuits, 1999-•Intl. Symposium on Low-power Design, 1999-Intl. Symposium on Low-power Design, 1999-•Custom Integrated Circuits Conference, SOCC, etc., 1999-Custom Integrated Circuits Conference, SOCC, etc., 1999-
4545
BackupBackup
4646
Conditional Carry for Cin=0Conditional Carry for Cin=0