The Intel The Intel ® ® Pentium Pentium ® ® M processor M processor Power Power - - Awareness Story Awareness Story From Theory to Practice From Theory to Practice Ronny Ronen Ronny Ronen Senior Principal Engineer Senior Principal Engineer Director of Architecture Research Director of Architecture Research Intel Labs Intel Labs - - Haifa Haifa Intel Corporation Intel Corporation Technion EE, Technion EE, Haifa, Haifa, June 2, 2003 June 2, 2003
31
Embed
From Theory to Practice - FBE - TitleFrame · The Intel ® Centrino. TM. Mobile technologyMobile technology. Announcing . Intel® Centrino™ mobile technology. Intel has expanded
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The IntelThe Intel®® PentiumPentium®® M processorM processorPowerPower--Awareness Story Awareness Story
From Theory to PracticeFrom Theory to Practice
Ronny RonenRonny RonenSenior Principal EngineerSenior Principal EngineerDirector of Architecture ResearchDirector of Architecture ResearchIntel Labs Intel Labs -- HaifaHaifa
Page 3All dates, plans, and features are preliminary and subject to change without notice
IDC IDC –– Israel Development CenterIsrael Development CenterLocated on Israel's Mediterranean coast, Haifa is the home of Intel's Israel Development Center (IDC).
IDC was established in 1974, and is Intel's first development center outside the US. The center is a multi-disciplinary team, with more than 1000 employees.
Many of Intel's leading products were developed and originated at IDC.
IDC's employees are currently working on Intel's future microprocessors, CAD tools, advanced networking components and software technologies.
The Baha`i ShrineHaifa most known attraction
Page 4All dates, plans, and features are preliminary and subject to change without notice
The IntelThe Intel®® CentrinoCentrinoTMTM Mobile technologyMobile technologyAnnouncing Intel® Centrino™ mobile technology. Intel has expanded its history of innovation with new notebook capabilities designed specifically for the mobile world. Now you can work, play and connect without wires. And choose from a whole new generation of thin, light notebooks designed to enable extended battery life.
This new innovative technology enables: Integrated wireless LAN capabilityBreakthrough mobile performance Extended battery lifeThinner, lighter designs
Page 5All dates, plans, and features are preliminary and subject to change without notice
AgendaAgenda
The TheoryThe Theory–– Power, EnergyPower, Energy–– Power AwarenessPower Awareness
The PracticeThe Practice–– The Intel Pentium M processor microThe Intel Pentium M processor micro--architecturearchitecture
The PerformanceThe Performance
Page 6All dates, plans, and features are preliminary and subject to change without notice
Power and the digital world…Power and the digital world…
IN OUT
Power is consumed:Power is consumed:–– When capacitance is charged and discharged.When capacitance is charged and discharged.–– A charged cap is a logical ‘1’, a discharged cap is ‘0’.A charged cap is a logical ‘1’, a discharged cap is ‘0’.
The capacitance can be the gates of other transistors or wiresThe capacitance can be the gates of other transistors or wires(busses and long interconnects).(busses and long interconnects).
00 110011
E=E=11//22CVCV22
Page 7All dates, plans, and features are preliminary and subject to change without notice
Power and the digital world (2)…Power and the digital world (2)…Secondary effects like leakage and shortSecondary effects like leakage and short--circuit current circuit current are increasing with advanced process technologies.are increasing with advanced process technologies.
Depends heavily on operating voltage and temperatureDepends heavily on operating voltage and temperatureLeakage is growing dramaticallyLeakage is growing dramatically–– Reaching 20% in current process technology, and growing…Reaching 20% in current process technology, and growing…
IN OUT
1/21/2
IN OUT
00 11
LeakageLeakage(sub(sub--threshold)threshold)
ShortShort--circuitcircuit
Page 8All dates, plans, and features are preliminary and subject to change without notice
Power & EnergyPower & Energy
Total energyTotal energy–– Total of all switch energy and leakage wasteTotal of all switch energy and leakage waste–– Measured in either in joules or watt x hourMeasured in either in joules or watt x hour
“Energy per task”“Energy per task”Lower Energy per task meansLower Energy per task means–– Longer battery life.Longer battery life.–– Lower electric billsLower electric bills
Power = energy / time Power = energy / time = = ααCVCV22f f (+ leakage power)(+ leakage power)
((αα: activity, C: capacitance, V: voltage, f: frequency): activity, C: capacitance, V: voltage, f: frequency)–– Measured in wattsMeasured in watts
Page 9All dates, plans, and features are preliminary and subject to change without notice
Power & EnergyPower & Energy
Average powerAverage power–– Total Total energy / energy / TotalTotal timetime–– Including lowIncluding low--activity and idleactivity and idle--timetime
Peak powerPeak power–– Higher power Higher power higher current.higher current.–– Higher power Higher power higher temperature.higher temperature.
–– Cannot exceed the thermal constrains.Cannot exceed the thermal constrains.
Page 10All dates, plans, and features are preliminary and subject to change without notice
Power DensityPower Density
Think of watts/cmThink of watts/cm22..Denser power is harder to cool.Denser power is harder to cool.Complex algorithms lead to denser power:Complex algorithms lead to denser power:
–– Dense random logic.Dense random logic.–– Timing pressure leads to faster/bigger/powerTiming pressure leads to faster/bigger/power--hungrier hungrier
gates.gates.
Increased every process technology Increased every process technology generation (higher power @ smaller die size).generation (higher power @ smaller die size).
Page 11All dates, plans, and features are preliminary and subject to change without notice
Power Density and ThermalPower Density and Thermal
Thermal Map (EDO System)2,3
(lowest) blue, green, yellow, orange, purple, white (highest)Power Density (Simulated)1
Color codes: (lowest) black, red, orange yellow, white (highest)
Pentium M Processor power density example1 Source : Intel® Pentium® M Processor Power Estimation,
Budgeting, Optimization, and ValidationDani Genossar, Nachum Shamir, ITJ Q2/2003
2 Source: Dani Genossar, Nachum Shamir, Intel 20033 The L2 left portion thermal map blank due to measurements limitations.
Page 12All dates, plans, and features are preliminary and subject to change without notice
Voltage, Power, FrequencyVoltage, Power, Frequency
Transistor switches faster at higher voltageTransistor switches faster at higher voltagehigher voltage enables higher frequencyhigher voltage enables higher frequency
Maximum frequency grows about linearly with voltage.Maximum frequency grows about linearly with voltage.…Within a given voltage range Vmin…Within a given voltage range Vmin--Vmax.Vmax.–– V < VminV < Vmin
transistors won’t switch.transistors won’t switch.–– V > VmaxV > Vmax
the device may burn.the device may burn.
“The cube law”:“The cube law”:P P ≈≈ kVkV33(or ~1%V = 3%P)(or ~1%V = 3%P)ImplicationsImplications–– Can save energy/power whenCan save energy/power when
Performance is not a factor 0
100
200
300
400
500
600
700
800
900
1000
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9
Fequency(Mhz)Power (mWatt)
XScale processor freq. & power vs. voltage *
Performance is not a factor* Source: Intel Corp. (http://developer.intel.com)* Source: Intel Corp. (http://developer.intel.com)
Page 13All dates, plans, and features are preliminary and subject to change without notice
Bean Counting*Bean Counting*High VoltageHigh Voltage
1.1. Number of transistorsNumber of transistors 80 M80 M2.2. Die areaDie area 80 mm80 mm22
3.3. Operation VoltageOperation Voltage 1.48V1.48V4.4. Operation frequencyOperation frequency 1.6 GHz1.6 GHz5.5. Energy per switch per transistorEnergy per switch per transistor 0.85 fJ0.85 fJ6.6. Power per transistor (#4Power per transistor (#4xx#5)#5) 1.4 uW1.4 uW7.7. Activity factorActivity factor 20%20%8.8. Energy per cycle per chip (#1x#5x#7)Energy per cycle per chip (#1x#5x#7) 14 nJ14 nJ9.9. Power (#4x#8)Power (#4x#8) 22 W22 W10.10. Power Density (#9/#2)Power Density (#9/#2) 27 W/cm27 W/cm22
* These numbers are representative only and * These numbers are representative only and do not intend to reflect any existing devicedo not intend to reflect any existing deviceRepresentative NumbersHigh VoltageHigh Voltage Low VoltageLow Voltage
1.1. Number of transistorsNumber of transistors 80 M80 M 80 M80 M2.2. Die areaDie area 80 mm80 mm22 80 mm80 mm2 2
3.3. Operation VoltageOperation Voltage 1.48V1.48V 0.96V0.96V4.4. Operation frequencyOperation frequency 1.6 GHz1.6 GHz 0.6 GHz0.6 GHz5.5. Energy per switch per transistorEnergy per switch per transistor 0.85 fJ0.85 fJ 0.36 0.36 fJfJ6.6. Power per transistor (#4Power per transistor (#4xx#5)#5) 1.4 uW1.4 uW 0.21 0.21 uWuW7.7. Activity factorActivity factor 20%20% 10%10%8.8. Energy per cycle per chip (#1x#5x#7)Energy per cycle per chip (#1x#5x#7) 14 nJ14 nJ 2.9 2.9 nJnJ9.9. Power (#4x#8)Power (#4x#8) 22 W22 W 1.7 W1.7 W10.10. Power Density (#9/#2)Power Density (#9/#2) 27 W/cm27 W/cm22 2.1 W/cm2.1 W/cm22
Page 14All dates, plans, and features are preliminary and subject to change without notice
Mobile Platform Goals & ChallengesMobile Platform Goals & Challenges
How much power one can afford to spend in How much power one can afford to spend in order to implement a performance feature?order to implement a performance feature?
How to balance the design for maximum How to balance the design for maximum performance and extended battery life?performance and extended battery life?
Page 15All dates, plans, and features are preliminary and subject to change without notice
Higher Performance vs. Longer Battery LifeHigher Performance vs. Longer Battery Life
Processor Processor average poweraverage power is <10% of platformis <10% of platformLCD and other components consume much moreLCD and other components consume much more
Even ideal processor canEven ideal processor canextend battery life by 11%extend battery life by 11%at most!at most!
Decision:Decision:–– Optimize for performance when ActiveOptimize for performance when Active–– Optimize for battery life when idleOptimize for battery life when idle
CaveatCaveat–– This observation is Pentium M specific.This observation is Pentium M specific.
May not hold as such in the future!
Display(panel + inverter)
33%
CPU10%
Power Supply10%
Intel® MCH9%
Misc.8%
GFX8%
HDD8%
CLK5%
Intel® ICH3%
DVD2%
LAN2%
Fan2%
May not hold as such in the future!Source: 2004 Extended Battery Life Technologies,Don J Nguyen, Intel Developer Forum, Spring 2003
Page 16All dates, plans, and features are preliminary and subject to change without notice
Optimize for PerformanceOptimize for Performance““Maximize performance at given thermal constraints”Maximize performance at given thermal constraints”
Approximated by:Approximated by:Maximizing performance at given Power budgetMaximizing performance at given Power budgetThe test:The test:“A micro“A micro--architectural feature that gains performance architectural feature that gains performance or saves power should be better than simply using or saves power should be better than simply using voltage/frequency scaling”voltage/frequency scaling”That is:That is:
f f ≈≈ K*V K*V Power = Power = αα*C*V*C*V 22*f *f ≈≈ αα*C*f*C*f 33
The right Performance/Power tradeoff:The right Performance/Power tradeoff:1%1% more performance in less than more performance in less than 33%% Power Power –– a gain!a gain!
Page 17All dates, plans, and features are preliminary and subject to change without notice
Optimize for Battery LifeOptimize for Battery Life
““Minimize Energy per Task”Minimize Energy per Task”
Should address both active and idle energyShould address both active and idle energy
The active energy tradeoff:The active energy tradeoff:EnergyEnergyactiveactive = Power= Poweractiveactive * Time* Timeactiveactive
The right Performance/Power tradeoff:The right Performance/Power tradeoff:1%1% more performance in less than more performance in less than 11%% Power Power –– a gain!a gain!
Page 18All dates, plans, and features are preliminary and subject to change without notice
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
-30%
-27%
-24%
-21%
-18%
-15%
-12% -9% -6% -3% 0% 3% 6% 9% 12%
15%
18%
21%
24%
27%
30%
<= Performance loss | Performance gain =>
Energy LossConstrained Perf Loss
Wrong trade-off zone
Energy LossConstrained Perf
Gain
Energy GainConstrained Perf
Loss
Energy GainConstrained Perf
Gain
Constrained-Performance
Breakeven line
Energy Breakeven
line
<= P
ower
Gai
n |
Po
wer
Los
s =>
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
-30%
-27%
-24%
-21%
-18%
-15%
-12% -9% -6% -3% 0% 3% 6% 9% 12%
15%
18%
21%
24%
27%
30%
<= Performance loss | Performance gain =>
Energy LossConstrained Perf Loss
Wrong trade-off zone
Energy LossConstrained Perf
Gain
Energy GainConstrained Perf
Loss
Energy GainConstrained Perf
Gain
Constrained-Performance
Breakeven line
Energy Breakeven
line
<= P
ower
Gai
n |
Po
wer
Los
s =>
Performance loss | Performance gain
Pow
er G
ain
| Pow
er L
oss
Putting it all together:Putting it all together:The Pentium M processor Approach The Pentium M processor Approach
Page 19All dates, plans, and features are preliminary and subject to change without notice
Back to Basics: “Less is More” Back to Basics: “Less is More”
Less instructions per taskLess instructions per taskLess microLess micro--ops per instructionops per instructionLess transistor switches per microLess transistor switches per micro--opopLess energy per transistor switch.Less energy per transistor switch.
Page 20All dates, plans, and features are preliminary and subject to change without notice
““Less is More” Less is More” in the Pentium M processorin the Pentium M processor
Less instructions per taskLess instructions per task–– Advanced branch prediction, Advanced branch prediction, SSE instructionsSSE instructions
Less microLess micro--ops per instructionops per instruction–– MicroMicro--ops fusionops fusion–– Dedicated stack engineDedicated stack engine
Less transistor switches per microLess transistor switches per micro--opop–– 1MB “power managed” L2 cache1MB “power managed” L2 cache–– the Pentium M processor busthe Pentium M processor bus–– Various lowerVarious lower--level optimizationslevel optimizations–– Advanced clock gatingAdvanced clock gating
Less energy per transistor switchLess energy per transistor switch–– Enhanced Intel® SpeedStep® technology Enhanced Intel® SpeedStep® technology
Power-awareness top to bottomPower-awareness top to bottom
Page 21All dates, plans, and features are preliminary and subject to change without notice
Advanced Branch PredictionAdvanced Branch PredictionTypical today’s processors spend about Typical today’s processors spend about 11//44--11//33 of of the time in branch misprediction recoverythe time in branch misprediction recovery
–– Losing both performance and energy!Losing both performance and energy!
The Pentium M processor employs bestThe Pentium M processor employs best--inin--class class branch predictionbranch prediction
Captures all standard program behaviors and new Captures all standard program behaviors and new programming paradigms:programming paradigms:
–– Loops of all iteration sizeLoops of all iteration size–– JITed and object oriented codeJITed and object oriented code
20% less mispredcitions 7% performance gain*20% less mispredcitions 7% performance gain** Relative to the Pentium M processor w/ traditional branch pred* Relative to the Pentium M processor w/ traditional branch predictor (simulated)ictor (simulated)
Page 22All dates, plans, and features are preliminary and subject to change without notice
Dedicated Stack ManagerDedicated Stack ManagerIA32 instruction set provides explicit management IA32 instruction set provides explicit management of S/W stackof S/W stack
–– E.g. PUSH, POP, RET, CALLE.g. PUSH, POP, RET, CALL
Stack management operations are an overheadStack management operations are an overhead–– E.g. Stack pointer incrementE.g. Stack pointer increment–– Normally done via machines main execution pathNormally done via machines main execution path
The Pentium M processor employs sophisticated The Pentium M processor employs sophisticated H/W control for stack managementH/W control for stack management
–– Maintains/updates the stack pointer value in the decoderMaintains/updates the stack pointer value in the decoder–– Includes synchronization mechanismIncludes synchronization mechanism
Replaces a power hungry microReplaces a power hungry micro--op controlop control
H/W management instead of power hungry Micro-opsH/W management instead of power hungry Micro-ops
Page 23All dates, plans, and features are preliminary and subject to change without notice
Dedicated Stack ManagerDedicated Stack Manager
ESP+4
Scheduler
ALU
32 Bit adder
Wide uOp
Achieving >5% of Micro-op reductionAchieving >5% of Micro-op reduction
+4DeltaESP
PUSH/POP/CALL/RET
Decoder
Page 24All dates, plans, and features are preliminary and subject to change without notice
MicroMicro--op Fusion op Fusion –– Best of all WorldsBest of all Worlds
IA Instructions are typically broken into IA Instructions are typically broken into micromicro--opsops
Pentium M processor employs MicroPentium M processor employs Micro--op op fusionfusion
–– Instructions with memory operands are fusedInstructions with memory operands are fused–– Single MicroSingle Micro--op during most of the instruction lifetimeop during most of the instruction lifetime–– Enhanced performance and power characteristicsEnhanced performance and power characteristics
MicroMicro--ops are split just in time for executionops are split just in time for execution–– Allows outAllows out--ofof--order, superorder, super--scalar executionscalar execution
Page 25All dates, plans, and features are preliminary and subject to change without notice
MicroMicro--op Fusion op Fusion –– Best of all WorldsBest of all Worldsadd eax, dword ptr data
SchedulerLD
Cache ALU
OP
OP
Decoder
LD
Page 26All dates, plans, and features are preliminary and subject to change without notice
MicroMicro--op Fusion op Fusion –– Best of all WorldsBest of all Worldsadd eax, dword ptr data
Decoder
Scheduler
Cache
LD + OP
LD + OP
Micro-op fusion enables effective
machine utilization
LD
Independent uOp OOO/Super-
scalar executionALUOP
Achieving >10% of Micro-op reductionAchieving >10% of Micro-op reduction
Page 27All dates, plans, and features are preliminary and subject to change without notice
The “Enhanced” version providesThe “Enhanced” version provides–– Multi voltage/frequency operating points. The Pentium M processoMulti voltage/frequency operating points. The Pentium M processor r
1.6GHz operation ranges:1.6GHz operation ranges:–– From 600MHz @ 0.956VFrom 600MHz @ 0.956V–– To 1.6GHz @ 1.484V To 1.6GHz @ 1.484V
Page 28All dates, plans, and features are preliminary and subject to change without notice
Performance ResultsPerformance Results3 system3 system
–– Intel® Pentium® M Processor (1.6 GHz/600 MHz)Intel® Pentium® M Processor (1.6 GHz/600 MHz)–– Mobile Intel® Pentium® 4 ProcessorMobile Intel® Pentium® 4 Processor--M (2.4/1.2 GHz)M (2.4/1.2 GHz)–– Mobile Intel® Pentium® III ProcessorMobile Intel® Pentium® III Processor--M (1.2 GHz/800 MHz)M (1.2 GHz/800 MHz)
3 operation modes3 operation modes–– Always On (Max Frequency)Always On (Max Frequency)–– Portable/Laptop (Adaptive Frequency)Portable/Laptop (Adaptive Frequency)–– Maximum Battery (Min Frequency)Maximum Battery (Min Frequency)
3 benchmarks3 benchmarks–– Mobile Representative Office Productivity WorkloadMobile Representative Office Productivity Workload–– Internet Experience workload Internet Experience workload –– SPEC CPU 2000 V1.2SPEC CPU 2000 V1.2
Measured relative Performance and EfficiencyMeasured relative Performance and Efficiency–– Efficiency = “energy per task” = Performance/averageEfficiency = “energy per task” = Performance/average--power power
Page 29All dates, plans, and features are preliminary and subject to change without notice
Performance ResultsPerformance Results
Always On (Max Frequency)Always On (Max Frequency) Portable/Laptop (Adaptive Frequency)Portable/Laptop (Adaptive Frequency)