Top Banner
Multicores, Manycores and Amdahl’s Law 2012 1
25

Multicores, Manycores and Amdahl’s Law

Feb 24, 2016

Download

Documents

Beck

Multicores, Manycores and Amdahl’s Law. 2012. Amdahl’s Law – Reminder. Original Amdahl’s Law for n identical cores f – fraction of parallelizable execution time (1-f) – fraction of totally sequential execution time Sequential runs on a single core Parallel runs on all n cores - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multicores,  Manycores  and Amdahl’s Law

1

Multicores, Manycores and Amdahl’s Law

2012

Page 2: Multicores,  Manycores  and Amdahl’s Law

2

Amdahl’s Law – Reminder

• Original Amdahl’s Law for n identical cores– f – fraction of parallelizable execution time– (1-f) – fraction of totally sequential execution time

• Sequential runs on a single core• Parallel runs on all n cores• Q: What are the hidden assumptions?

nff

speedup

1

1

Page 3: Multicores,  Manycores  and Amdahl’s Law

3

Multicore CPU

Intel’s Sandy Bridge

• Manycore – Tens or hundreds of cores• Why don’t we have Sandy Bridge with 100 cores?

Page 4: Multicores,  Manycores  and Amdahl’s Law

4

Core Performance Constraints

• Manufacturing technology

• Area (for more logic)– Area = Money; Manufacturing constraints

• Power (for more logic, higher frequencies)– Sub-threshold leakage current– More power requires better cooling solutions

Page 5: Multicores,  Manycores  and Amdahl’s Law

5

So Why Not One Single Core?

Core

Page 6: Multicores,  Manycores  and Amdahl’s Law

6

Large Core Performance

• We have a base line core (BCE) with area=1, performance=1

• We can add microarchitectural features– New core area is then r (r>1)– Large core is faster, with performance of perf(r)

• Q: For which perf(r) function, large core is better than multiple small ones?

• So what is perf(r) ?

Large CoreBCE

Big data caches

e.g., Simple

In-order core

OOOE

Accurate Branch

Prediction

uOp Cache

Page 7: Multicores,  Manycores  and Amdahl’s Law

7

Area: Pollack’s Rule

• An empirical rule• Multicore implications. For example: double the CPU logic and get

– 40% more performance with a larger single-core– For purely parallel code – 100% more performance with dual-core

rrperf ~)(

Page 8: Multicores,  Manycores  and Amdahl’s Law

8

Power• Power is usually considered as proportional to area• In this presentation we consider area as the main

constraint• Not completely true [Esmaeilzadeh’11]

• For simplicity we keep with rrperf ~)(

Page 9: Multicores,  Manycores  and Amdahl’s Law

9

Why Multicore/Manycore?

• More performance per mm2 & watt for parallel code

• Less power (& heat)– Save power by turning on and off each CPU– Run each core in optimized frequency/power– Load balance to distribute heat– Lower die temperatures

• New performance constraint: parallel fraction

Page 10: Multicores,  Manycores  and Amdahl’s Law

10

Cost Model

• To find the best performing CPU configuration we need a cost model

• Basic core - Baseline Core Equivalent (BCE)• Chip is limited to have no more than n BCEs• Performance

– Performance of each BCE is 1– Architects can expand the resources of r BCEs to

create a powerful core with performance of perf(r)• f – fraction of the parallelizable execution time

Page 11: Multicores,  Manycores  and Amdahl’s Law

11

Symmetric Multicore Chips

n=16r=1

16 1-BCE cores 4 4-BCE cores

• Run the sequential part on one core• Run the parallel part on all cores

n=16r=4

Page 12: Multicores,  Manycores  and Amdahl’s Law

12

Symmetric Multicore Chips

• n/r identical cores• Each core performance perf(r)• Execution

– Sequential part – 1 core; performance - perf(r)– Parallel part – all cores; performance - perf(r) * n/r

Page 13: Multicores,  Manycores  and Amdahl’s Law

13

Symmetric, n=16

0.16 1.6 160

2

4

6

8

10

12

14

16

R BCEs

Sym

met

ric S

peed

upF=0.999

F=0.99

F=0.975

F=0.9

F=0.5

F=0.9, R=2, Cores=8, Speedup=6.7

As Moore’s Law enables N to go from 16 to 256 BCEs,More core enhancements? More cores? Or both?

Page 14: Multicores,  Manycores  and Amdahl’s Law

14

Symmetric, n=256

0.256 2.56 25.6 2560

50

100

150

200

250

R BCEs

Sym

met

ric S

peed

up F=0.999

F=0.99

F=0.975

F=0.9F=0.5

F=0.9R=28 (vs. 2)Cores=9 (vs. 8)Speedup=26.7 (vs. 6.7)CORE ENHANCEMENTS!

F1R=1 (vs. 1)Cores=256 (vs. 16)Speedup=204 (vs. 16) MORE CORES!

F=0.99R=3 (vs. 1)

Cores=85 (vs. 16)Speedup=80 (vs. 13.9)

CORE ENHANCEMENTS& MORE CORES!

Page 15: Multicores,  Manycores  and Amdahl’s Law

15

Symmetric Multicores

• In symmetric multicores with fixed n, perf(r)=sqrt(r), maximum performance is achieved when:

• Q1: When will a single core perform better than any symmetric multicore?

• Q2: In the optimal configuration, what are the proportions of the execution time between the optimal sequential and parallel parts?

ffnropt

1

Page 16: Multicores,  Manycores  and Amdahl’s Law

16

Asymmetric Multicore Chips

One 4-BCE core; Twelve 1-BCE cores

• Run the sequential part on the big core• Run the parallel part on all cores

Page 17: Multicores,  Manycores  and Amdahl’s Law

17

Asymmetric Multicore Chips

• One large r-BCE core with performance of perf(r)• n-r small 1-BCE cores with performance of 1• Execution:

– Sequential part – 1 core; performance - perf(r)– Parallel part – all cores; performance - perf(r) + n - r

Page 18: Multicores,  Manycores  and Amdahl’s Law

18

Asymmetric, n=256

• Is asymmetric architecture potential greater than that of symmetric?

0.256 2.56 25.6 2560

50

100

150

200

250

R BCEs

Asym

met

ric S

peed

upF=0.999

F=0.99

F=0.975

F=0.9

F=0.5

Recall F=0.99R=41Cores=216Speedup=166

Page 19: Multicores,  Manycores  and Amdahl’s Law

19

Dynamic (Composed) Multicore Chips

• Combine up to r cores to boost sequential performance– Helper threads– Thread Level

Speculation– Hardware support

may be required

• Q: Why “up to r cores”?

Page 20: Multicores,  Manycores  and Amdahl’s Law

20

Dynamic (Composed) Multicore Chips

• Execution:– Sequential part – 1 big core; performance - perf(r)– Parallel part – all cores; performance – n

Page 21: Multicores,  Manycores  and Amdahl’s Law

21

Dynamic, n=256

• Q: How does dynamic multicore scale relatively to symmetric and asymmetric?

0.256 2.56 25.6 2560

50

100

150

200

250

R BCEs

Dyna

mic

Spe

edup

F=0.999

F=0.99

F=0.975

F=0.9

F=0.5

F=0.99R=256 (vs. 41)Cores=256 (vs. 216)Speedup=223 (vs. 166)

Note: #Cores always N=256

Page 22: Multicores,  Manycores  and Amdahl’s Law

22

Manufacturing Technology

• New manufacturing technology will not save us

Page 23: Multicores,  Manycores  and Amdahl’s Law

23

The Future…

Page 24: Multicores,  Manycores  and Amdahl’s Law

24

Summary

• Multicores and manycores are required due to the diminishing returns of large cores

• Amdahl’s Law allows us to predict the performance of various architectures

• Dynamic (composed) architecture is promising• To take advantage of future CPUs, the parallel

part of the code must be very high• …and still we are going to have a problem

Page 25: Multicores,  Manycores  and Amdahl’s Law

25

References

• Amdahl’s Law in the Multicore Era [Hill’08]• Thousand Core Chips—A Technology

Perspective [Borkar’07]• Dark Silicon and the End of Multicore Scaling

[Esmaeilzade’11]• Performance, Power Efficiency and Scalability

of Asymmetric Cluster Chip Multiprocessors [Morad’05]