Top Banner
1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin
29

Adaptive History-Based Memory Schedulers

Jan 08, 2016

Download

Documents

istas

Adaptive History-Based Memory Schedulers. Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin. Memory system performance is not increasing as fast as CPU performance Latency: Use caches, prefetching, … Bandwidth: Use parallelism inside memory system. Memory Bottleneck. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive History-Based  Memory Schedulers

1

Adaptive History-Based Memory Schedulers

Ibrahim Hur and Calvin LinIBM Austin

The University of Texas at Austin

Page 2: Adaptive History-Based  Memory Schedulers

2

Memory Bottleneck

Memory system performance is not increasing as fast as CPU performance

Latency: Use caches, prefetching, …

Bandwidth: Use parallelism inside memory system

1

10

100

1000

10000

1980 1985 1990 1995 2000

Time

Pe

rfo

rma

nc

e

CPUMemory

Page 3: Adaptive History-Based  Memory Schedulers

3

How to Increase Memory Command Parallelism?

time

Read Bank 0

Read Bank 0

Read Bank 1

bank conflict

Bank 0

Bank 1

Bank 2

DRAM

Bank 3

Read Bank 0

Read Bank 1

Read Bank 0

betterorder

Similar to instruction scheduling, can reorder commands for higher bandwidth

Page 4: Adaptive History-Based  Memory Schedulers

4

Inside the Memory Systemcach

es

DR

AM

Read Queue

Memory Queue

Write Queue

arb

iter

Memory Controller

FIFO

not FIFO

not FIFO

the arbiter schedules memory operations

Page 5: Adaptive History-Based  Memory Schedulers

5

Our Work

Study memory command scheduling in the context of the IBM Power5

Present new memory arbiters 20% increased bandwidth Very little cost: 0.04% increase in chip area

Page 6: Adaptive History-Based  Memory Schedulers

6

Outline

The Problem Characteristics of DRAM Previous Scheduling Methods

Our approach History-based schedulers Adaptive history-based schedulers

Results

Conclusions

Page 7: Adaptive History-Based  Memory Schedulers

7

Understanding the Problem:Characteristics of DRAM

Multi-dimensional structure Banks, rows, and columns IBM Power5: ranks and ports as well

Access time is not uniform Bank-to-Bank conflicts Read after Write to the same rank conflict Write after Read to different port conflict …

Page 8: Adaptive History-Based  Memory Schedulers

8

Previous Scheduling Approaches: FIFO Scheduling

caches

arb

iter

Read Queue

Write Queue

Memory Queue(FIFO)

DRAM

caches

Page 9: Adaptive History-Based  Memory Schedulers

9

Memoryless Scheduling

caches

arb

iter

Read Queue

Write Queue

Memory Queue(FIFO)

DRAM

caches

long delay

Adapted from Rixner et al, ISCA2000

Page 10: Adaptive History-Based  Memory Schedulers

10

What we really want

Keep the pipeline full; don’t hold commands in the reorder queues until conflicts are totally resolved

Forward them to memory queue in an order to minimize future conflicts

C5

8

BD C A

3

7

DD is

better

To do this we need to know history of the commands

Read/Write Queues

memory queue

arbiter

Page 11: Adaptive History-Based  Memory Schedulers

11

Another Goal: Match Application’s Memory Command Behavior

Arbiter should select commands from queues roughly in the ratio in which the application generates them

Otherwise, read or write queue may be congested

Command history is useful here too

Page 12: Adaptive History-Based  Memory Schedulers

12

Our Approach: History-Based

Memory Schedulers

Benefits:

Minimize contention costs Consider multiple constraints

Match application’s memory access behavior 2 Reads per Write? 1 Read per Write? …

The Result: less congested memory system, i.e. more bandwidth

Page 13: Adaptive History-Based  Memory Schedulers

13

How does it work?

Use a Finite State Machine (FSM)

Each state in the FSM represents one possible history

Transitions out of a state are prioritized

At any state, scheduler selects the available command with the highest priority

FSM is generated at design time

Page 14: Adaptive History-Based  Memory Schedulers

14

An Example

First Preference

Second Preference

Third Preference

Fourth Preference

most appropriate command to memory

available commands in reorder queues

next state

current state

Page 15: Adaptive History-Based  Memory Schedulers

15

How to determine priorities?

Two criteria: A: Minimize contention costs B: Satisfy program’s Read/Write command

mix

First Method : Use A, break ties with B Second Method : Use B, break ties with A

Which method to use? Combine two methods probabilistically

(details in the paper)

Page 16: Adaptive History-Based  Memory Schedulers

16

Limitation of the History-Based Approach

Designed for one particular mix of Read/Writes

Solution: Adaptive History-Based Schedulers Create multiple state machines: one for

each Read/Write mix Periodically select most appropriate state

machine

Page 17: Adaptive History-Based  Memory Schedulers

17

Adaptive History-Based Schedulers

Arbiter1 Arbiter2 Arbiter3

Arbiter SelectionLogic

Read Counter

Write Counter

Cycle Counter

select

2R:1W 1R:1W 1R:2W

Page 18: Adaptive History-Based  Memory Schedulers

18

Evaluation

Used a cycle accurate simulator for the IBM Power5 1.6 GHz, 266-DDR2, 4-rank, 4-bank, 2-port

Evaluated and compared our approach with previous approaches with data intensive applications: Stream, NAS, and microbenchmarks

Page 19: Adaptive History-Based  Memory Schedulers

19

The IBM Power5

Memory Controller

• 2 cores on a chip• SMT capability• Large on-chip L2 cache• Hardware prefetching• 276 million transistors

(1.6% of chip area)

Page 20: Adaptive History-Based  Memory Schedulers

20

Results 1: Stream Benchmarks

0

10

20

30

40

50

60

70

80

90

100

copy scale sum triad

No

rma

lize

d E

xe

cu

tio

n T

ime

(%

)

FIFO Memoryless Adaptive History-Based

Page 21: Adaptive History-Based  Memory Schedulers

21

Results 2: NAS Benchmarks

0

10

20

30

40

50

60

70

80

90

100

bt

cg

ep ft is lu

mg

sp

me

an

No

rma

lize

d E

xe

cu

tio

n T

ime

(%

)

FIFO Memoryless Adaptive History-Based

(1 core active)

Page 22: Adaptive History-Based  Memory Schedulers

22

Results 3: Microbenchmarks

0

10

20

30

40

50

60

70

80

90

1004

r0w

2r0

w

1r0

w

8r1

w

4r1

w

3r1

w

2r1

w

3r2

w

1r1

w

1r2

w

1r4

w

0r1

w

0r2

w

0r4

w

No

rma

lize

d E

xe

cu

tio

n T

ime

(%

)

FIFO Memoryless Adaptive History-Based

Page 23: Adaptive History-Based  Memory Schedulers

23

caches

arb

iter

Read Queue

Write Queue

Memory Queue(FIFO)

DRAM

caches

12 concurrent commands

Page 24: Adaptive History-Based  Memory Schedulers

24

DRAM Utilization

0

5001000

1500

20002500

3000

3500

40004500

5000

1 2 3 4 5 6 7 8 9 10 11 12

Nu

mb

er

of

Oc

cu

ren

ce

s

0

5001000

1500

20002500

3000

3500

40004500

5000

1 2 3 4 5 6 7 8 9 10 11 12

Nu

mb

er

of

Oc

cu

ren

ce

s

Number of Active Commands in DRAM

Our Approach

Memoryless Approach

Page 25: Adaptive History-Based  Memory Schedulers

25

Why does it work?cach

es

DR

AM

Read Queue

Memory Queue

Write Queue

arb

iter

Memory Controller

Low Occupancy in Reorder

QueuesFull Reorder QueuesFull Memory QueueBusy Memory System

detailed analysis in the paper

Page 26: Adaptive History-Based  Memory Schedulers

26

Other Results

We obtain >95% performance of the perfect DRAM configuration (no conflicts)

Results with higher frequency, and no data prefetching are in the paper

History size of 2 works well

Page 27: Adaptive History-Based  Memory Schedulers

27

Conclusions

Introduced adaptive history-based schedulers

Evaluated on a highly tuned system, IBM Power5

Performance improvementOver FIFO : Stream 63% NAS 11%Over Memoryless : Stream 19% NAS 5%

Little cost: 0.04% chip area increase

Page 28: Adaptive History-Based  Memory Schedulers

28

Conclusions (cont.)

Similar arbiters can be used in other places as well, e.g. cache controllers

Can optimize for other criteria, e.g. power or power+performance.

Page 29: Adaptive History-Based  Memory Schedulers

29

Thank you