Spring 07, Feb 22 Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Ag ELEC 7770: Advanced VLSI Design (Ag rawal) rawal) 1 ELEC 7770 ELEC 7770 Advanced VLSI Design Advanced VLSI Design Spring 2007 Spring 2007 Power Aware Microprocessors Power Aware Microprocessors Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor ECE Department, Auburn University ECE Department, Auburn University Auburn, AL 36849 Auburn, AL 36849 [email protected][email protected]http://www.eng.auburn.edu/~vagrawal/COURSE/E77 http://www.eng.auburn.edu/~vagrawal/COURSE/E77 70_Spr07 70_Spr07
26
Embed
ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors
ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07. SIA Roadmap for Processors (1999). - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11
High-perf. Power (W)High-perf. Power (W) 9090 130130 160160 170170 175175 183183
Source: http://www.semichips.org
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33
Power Reduction in ProcessorsPower Reduction in Processors
Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:
Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode
Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization
Software methodsSoftware methods
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44
SPEC CPU2000 BenchmarksSPEC CPU2000 Benchmarks Twelve integer and 14 floating point programs, Twelve integer and 14 floating point programs,
CINT2000CINT2000 and and CFP2000CFP2000.. Each program run time is normalized to obtain a Each program run time is normalized to obtain a
SPEC ratioSPEC ratio with respect to the run time of with respect to the run time of Sun Sun Ultra 5_10 with a 300MHz processorUltra 5_10 with a 300MHz processor..
CINT2000CINT2000 and and CFP2000CFP2000 summary summary measurements are the geometric means of measurements are the geometric means of SPEC ratios.SPEC ratios.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55
Reference CPU s: Sun Ultra 5_10 Reference CPU s: Sun Ultra 5_10 300MHz Processor300MHz Processor
0
500
1000
1500
2000
2500
3000
3500g
zip
vp
rg
cc
mc
fc
raft
yp
ars
er
eo
np
erl
bm
kg
ap
vo
rte
xb
zip
2tw
olf
wu
pw
ise
sw
imm
gri
da
pp
lum
es
ag
alg
el
art
eq
ua
ke
fac
ere
ca
mm
plu
ca
sfm
a3
ds
ixtr
ac
ka
ps
i
CINT2000
CFP2000
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77
Two Benchmark ResultsTwo Benchmark Results
Baseline: A uniform configuration not optimized Baseline: A uniform configuration not optimized for specific program:for specific program:
Same compiler with same settings and flags used Same compiler with same settings and flags used for all benchmarksfor all benchmarks
Other restrictionsOther restrictions
Peak: Run is optimized for obtaining the peak Peak: Run is optimized for obtaining the peak performance for each benchmark program.performance for each benchmark program.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111
Energy SPEC BenchmarksEnergy SPEC Benchmarks
Energy efficiency mode: Besides the execution Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of programs is also measured. Energy efficiency of a benchmark program is given by:a benchmark program is given by:
1/(Execution time)1/(Execution time)Energy efficiency Energy efficiency == ────────────────────────
joules consumedjoules consumed
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212
Energy EfficiencyEnergy Efficiency
Efficiency averaged on Efficiency averaged on nn benchmark programs: benchmark programs:
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414
Voltage ScalingVoltage Scaling
Dynamic: Reduce voltage and frequency during Dynamic: Reduce voltage and frequency during idle or low activity periods.idle or low activity periods.
Static: Static: Clustered voltage scalingClustered voltage scaling LogicLogic on non-critical paths given lower voltage.on non-critical paths given lower voltage. 47% power reduction with 10% area increase 47% power reduction with 10% area increase
reported.reported. M. Igarashi et al., “Clustered Voltage Scaling M. Igarashi et al., “Clustered Voltage Scaling
Techniques for Low-Power Design,” Techniques for Low-Power Design,” Proc. IEEE Proc. IEEE Symp. Low Power DesignSymp. Low Power Design, 1997., 1997.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515
Incorrect branch prediction results in pipeline stalls and Incorrect branch prediction results in pipeline stalls and wasted energy.wasted energy.
Idea: Stop fetching instructions if a branch hazard is Idea: Stop fetching instructions if a branch hazard is expected:expected:
If the count (M) of incorrect predictions exceeds a pre-If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for specified number (N), then suspend fetching instruction for some k cycles.some k cycles.
Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Gating: Speculation Control for Energy Reduction,” Proc. Proc. 2525thth Annual International Symp. Computer Architecture Annual International Symp. Computer Architecture, , June 1998.June 1998.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616
An instruction is executed as soon as data and resources it An instruction is executed as soon as data and resources it needs become available.needs become available.
A commit unit reorders the results.A commit unit reorders the results.
Delay the execution of instructions whose result is not Delay the execution of instructions whose result is not immediately needed.immediately needed.
Example of RISC instructions:Example of RISC instructions: addadd r0, r1, r2;r0, r1, r2; (A)(A) sub r3, r4, r5;sub r3, r4, r5; (B)(B) and r9, x1, r9;and r9, x1, r9; (C)(C) or r5, r9, r10;or r5, r9, r10; (D)(D) xor r2, r10, r11;xor r2, r10, r11; (E)(E)
J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717
Slack Scheduling ExampleSlack Scheduling Example
Slack schedulingSlack scheduling
AABB CC
DD
EE
Standard schedulingStandard scheduling
AA BB CC
DD
EE
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818
Slack SchedulingSlack Scheduling
Slack bitLow-power
execution units
Re-order buffer
Sch
edul
ing
logi
c
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919
Clock DistributionClock Distribution
clock
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020
Clock PowerClock Power
Pclk = CLVDD2f + CLVDD
2f / λ + CLVDD2f / λ2 + . . .
stages – 1 1= CLVDD
2f Σ ─ n = 0 λn
where CL = total load capacitance
λ = constant fanout at each stage in distributionnetwork
Clock consumes about 40% of total processor power.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121
D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222
Power Reduction ExamplePower Reduction Example
Alpha 21064: 200MHz @ 3.45V, power dissipation =Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W 26W Reduce voltage to 1.5V, power (5.3x) =Reduce voltage to 1.5V, power (5.3x) = 4.9W 4.9W Eliminate FP, power (3x) =Eliminate FP, power (3x) = 1.6W 1.6W Scale 0.75→0.35Scale 0.75→0.35μμ, power (2x) =, power (2x) = 0.8W 0.8W Reduce clock load, power (1.3x) =Reduce clock load, power (1.3x) = 0.6W 0.6W Reduce frequency 200→160MHz, power (1.25x) =Reduce frequency 200→160MHz, power (1.25x) = 0.5W 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” Microprocessor,” IEEE J. Solid-State CircuitsIEEE J. Solid-State Circuits, vol. 31, no. 11, pp. , vol. 31, no. 11, pp. 1703-1714, Nov. 1996.1703-1714, Nov. 1996.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2525
Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline proc.n-stage pipeline proc.
CapacitanceCapacitance nCnC CC
VoltageVoltage V/nV/n V/nV/n
FrequencyFrequency f/nf/n ff
PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22
Chip areaChip area n timesn times 10-20% increase10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2626
For More on MicroprocessorsFor More on Microprocessors
T. D. Burd and R. W. Brodersen, Energy T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, 2002.Efficient Microprocessor Design, Springer, 2002.
R. Graybill and R. Melhem, R. Graybill and R. Melhem, Power Aware Power Aware ComputingComputing, New York: Plenum Publishers, , New York: Plenum Publishers, 2002.2002.