Top Banner
4/3/2016 BPOE 7 @ ASPLOS 2016 When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads Jason Lowe-Power || Mark D. Hill || David A. Wood
33

When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

Oct 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

When to use 3D Die-Stacked Memory for

Bandwidth-Constrained Big Data Workloads

Jason Lowe-Power || Mark D. Hill || David A. Wood

Page 2: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Low latency → Real-time

Big Data == Big Memory

2

Can we execute complex queries in 10 ms?What’s the best

performance for 100kW?What is the performance

for 16 TB system?

Page 3: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Best performance!

Lowest power!

Highest capacity!

Which is best?Which is best?It depends

3

Page 4: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Dell PowerEdge R930

Big Memory Machines

Memory capacity 3 TB (3,072 GB)

Memory bandwidth 408 GB/s

Processors 64 cores

4

Page 5: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

DRAM (per socket)

1 GB

Amount accessible per second

Amount accessible in 10 ms

5

Page 6: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Amount accessible per second

Amount accessible in 10 ms

CPU processingin 10 ms

GPU processingin 10 ms

Processing 2x–10x faster than data supply

6

Page 7: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

3D Die-Stacking

DRAM (per socket) Amount accessible per second

Amount accessible in 10 ms

Data supply to data processing ≈1

7

Page 8: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Big-Memory Server

↑ Higher bandwidth↑↑ Higher capacity(compared to traditional)

8

Traditional Server

Die-Stacked Server

Page 9: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Model and Workload

Model results

Discussion

9

Page 10: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Evaluation

10

Option 1: Build the hardware

Option 2: Simulation

Option 3: Analytical Model!

Page 11: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Model Example

Provisioning: 10 ms response time

Data to read: 16,384 GB × 0.20 = 3,276.8 GB

Bandwidth: 3,276.8 GB ÷ 0.010 s = 327.680 TB/s

Chips needed: 327.680 TB/s ÷ 102 GB/s/chip

= 3213 chips= 800 blades

For traditional server

Power: 458 kW

Capacity: 800 TB

11

Page 12: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Model detailsFrom the paper

research.cs.wisc.edu/multifacet/bpoe16_3d_bandwidth_model/Online

12

Page 13: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Workload Assumptions

▪ 16 TB data corpus

▪ Each request accesses 20% of data corpus (3.2 TB)

▪ One core can process 6 GB/s

▪ No communication between cores

13

https://xkcd.com/1339/

Page 14: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Model and Workload

Model results

Discussion

14

Page 15: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Metrics

Performance Response time (SLA)

Power Major component of datacenter cost

Data capacity Workload size

15

Page 16: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016 16

Goal: Design cluster to meet a service level agreement (SLA)

Performance Provisioning

500 ms

50 ms

50 ms

10 msGet matches

50 msSort

100 msAds

. . .

Page 17: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Performance Provisioning10 ms SLA

CapacityPower

17

Current systems require memory over provisioning

50✕

213✕

1✕

Page 18: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Memory Over Provisioning

18

50%Wasted

Page 19: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Performance Provisioning10 ms SLA

CapacityPower

19

Die-stacking:2–5✕ less power

Page 20: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Performance ProvisioningPower for relaxed SLAs

20

Traditional needs less over provisioned memory

Page 21: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Power Provisioning

21

10–20 kW100kW–1MW

10–100 MW

Goal: Design cluster to not exceed some power constraint

Page 22: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Die-stacking:3–5✕ faster

Power Provisioning

Capacity

1 MW PowerDie-stacking:

Less capacity for power budget

Response time

22

Page 23: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Data Capacity Provisioning

23

Search: Inverted Index

Graph: Friends lists

Database: PurchasesGoal: Design cluster capacity for workload

Page 24: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Data Capacity Provisioning16 TB Database

Die-stacking:25-50✕ more power

PowerResponse time

24

Die-stacking:60–256✕ faster

Page 25: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Traditional Big Memory Die-Stacked

Performance

Power

Data capacity

2–5x less power for 10ms SLA

Over provisioned

memory

Best for SLA 60+ms

2x faster with 50 KW

3–4x faster with 1 MW

3x memory capacity

2–50x less power

60–250x faster

Somewherebetween

25

Page 26: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Model and Workload

Model results

Discussion

26

Page 27: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Model deficiencies

You chose the wrong number! See research.cs.wisc.edu/multifacet/bpoe16_3d_bandwidth_model/

Communication between cores

This makes 2048 die-stacked systems worse How to move data between stacks?

Compute energy or data energy?

Cost?

27

Page 28: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

In Memory Big Data Workloads

Which is best?

Today: It depends…Today: It depends…Tomorrow: Die-stacked?

28

Page 30: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

SystemsTraditional Big memory Die-stacked

Bandwidth

Capacity

Blades (16TB)

Cluster bandwidth

102 GB/s 196 GB/s 256 GB/s

256 GB 2 TB 8 GB

16 8

6.4 TB/s 1.5 TB/s 512 TB/s

30

228

Page 31: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Power Breakdown

Compute power dominates die-stacked

31

Page 32: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

Decreased Compute Power

10 msSLA

100 kW Power

16 TBCapacity

32

Page 33: When to use 3D Die-Stacked Memory for Bandwidth ...€¦ · 4/3/2016 BPOE 7 @ ASPLOS 2016 Traditional Big Memory Die-Stacked Performance Power Data capacity 2–5x less power for

4/3/2016 BPOE 7 @ ASPLOS 2016

100 msSLA

100 kW Power

16 TBCapacity

Increased Memory Density

33