Top Banner
A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha Ranganathan (HP Labs) Dean Tullsen (UCSD)
27

A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Dec 15, 2015

Download

Documents

Imani Kittridge
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors

Rakesh Kumar

Keith Farkas (HP Labs)Norman Jouppi (HP Labs)Partha Ranganathan (HP Labs)Dean Tullsen (UCSD)

Page 2: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Motivation

Power is an important issue for processors

Going up every successive generation (with complexity)

-Up to 150W for Alpha 21464!

Page 3: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Past Techniques for Power Reduction Voltage/frequency scaling Limitation: Limited by technology. Also, not

possible below a certain feature-size.

Architectural Adaptation-shut off portions of core when not

needed-dynamic speculation control -reconfigurable caches

Limitations: -Very few choices to make

-Only dynamic power being saved -Has associated overhead

Page 4: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Single-ISA Heterogeneous Multi-Core Architectures

Page 5: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Our Proposal Have multiple heterogeneous cores on the

same die

Match workload (or workload phase) to core that achieves best efficiency according to some objective function

(Ensure that the new core has acceptable performance)

Power down the unused cores

Page 6: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Motivation Hypotheses

Performance difference between cores varies based on workload or workload phases

Different cores have varying relative energy efficiencies for the same workload

Implication: possibility of dynamically changing “best” core

Page 7: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Goals of the Paper

Validate the hypotheses

Get an idea of the design space

Get an idea of the potential benefits

Page 8: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Outline of Talk Motivation

Past Work

Our Work Assumptions Decisions Methodology

Results and Conclusions

Summary and Future Work

Page 9: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Choice of Cores on the Die Five Cores on the die: In-order: QED R4700, EV4(Alpha 21064), EV5(Alpha 21164)

Out-of-order: EV6 (Alpha 21264),"EV8-“

All cores assumed to be without L2-cache.

“EV8-”: Issue width is same as EV8(Alpha 21464) -Resources reduced to account for a single thread. -Core-power dissipation: 100W

Page 10: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Properties of the Cores

Processor R4700 EV4 EV5 EV6 EV8-

Issue-width 1 2 4 6(OOO) 8(OOO)

I-Cache 2-way 16KB DM, 8KB DM, 8KB 2-way 64KB 4-way 64KB

D-Cache 2-way 16KB DM, 8KB DM, 8KB 2-way 64KB 4-way 64KB

Branch Pred.

No 2KB/1-bit 2K-gshare Hybrid 2-levelHybrid 2-level

MSHR 1 2 4 8 16

Notice the gradation!

Page 11: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Properties of Cores (contd.) Assume all cores implemented in 0.1um

-Scaled area and power accordingly

Clock Speed?

-All Alpha cores assumed to run at 2.1GHz (EV6 frequency at 0.10 micron)

-R4700 assumed to run at 1GHz

Page 12: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Core Power and Area peak power of core estimated from data sheets

- minus that used by L2 caches and pins - then scaled for .1um process

area of core estimated from die photos - minus that of i/o pad, wires, L2 cache & control - then scaled for .1um process

L2 cache area and power - estimated using CACTI

Page 13: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Core Power and Area (contd.)

Processor Core-power (in W) Core-area (in mm^2)

R4700 0.45 3

EV4 4.97 3

EV5 9.83 5

EV6 17.80 24

EV8- 92.88 260

EV8- consumes 200 times more power than R4700! It is more than 85 times

bigger too!

Page 14: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Core Power and Area (contd.)

Page 15: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Methodology Simulator used: SMTSIM

ROB-size, Activelist-size and Load-store queue always kept big enough to ensure no conflicts.

Benchmarks used: 14 chosen randomly out of SPEC2000 suite

Fast-forwarded for 2 billion instructions, simulated for 1 billion instructions.

Data collected after every 1 million instructions.

Page 16: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Validating Hypotheses

Performance difference between cores varies based on workload or workload phases (IPS)

Different cores have varying relative energy efficiencies for the same workload (IPS/W)

Page 17: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Performance Variation with Time

0

0.4

0.8

1.2

1.6

2

1 201 401 601 801

Committed instructions (in millions)

IPS

EV8-EV6EV5EV4R4700

Ah! Those clear, distinct phases!

Page 18: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Variation of Energy Efficiency with Time

0

10

20

30

40

50

60

70

80

1 201 401 601 801

Committed instructions (in millions)

IPS

/W

R4700EV4EV5EV6EV8-

Power dominates IPS/W numbers!

Page 19: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

How does a composite objective function fare?

Page 20: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Energy-delay Product Profile

0

0.04

0.08

0.12

0.16

0.2

1 201 401 601 801

Committed instructions(in millions)

IPS

^2/

W

R4700EV4EV5EV6EV8-

Page 21: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

So why not run on the “best” core at all points of time??

Page 22: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Choosing Dynamically the Core with Best Energy-Delay Product (perf. loss<50%)

0

0.04

0.08

0.12

0.16

0.2

1 201 401 601 801

Committed instructions (in millions)

IPS

^2

/W

R4700EV4EV5EV6EV8-Best-path

Notice the regions where best-path is not along the best energy-delay

product!

Page 23: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Choosing Dynamically the Core with Best Energy-Delay product (perf. loss<50%) [Summary of Results]

Energy-Delay Savings(%)

Performance Degradation(%)

Maximum 97.9 8.5Minimum 0.1 0.1

Mean 65.4 18.2

Number of Switchings:Maximum=387(art)Minimum=0Median=1

Page 24: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Dissecting the Results

More improvements possible –

locally-best decisions not necessarily globally-best

there was a performance constraintchoice of cores not the best for this objective-function

cache-configurations not necessarily the best

Even for present improvements, beats voltage scaling handsomely(44.2% ED2 improvement)

Page 25: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Conclusion Enormous potential for power-savings

No leakage-power solution

Does considerable IP reuse

Complexity-appropriate-every application match to the “appropriate” complexity core

Page 26: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Tip of the iceberg? Current/Future Work

Cores can be non-ordered

Some cores can be multithreaded

Throughput impact of the architecture

Page 27: A Multi-Core Approach to Addressing the Energy-Complexity Problem in Microprocessors Rakesh Kumar Keith Farkas (HP Labs) Norman Jouppi (HP Labs) Partha.

Questions?