Adaptive History-Based Memory Schedulers

Ibrahim Hur and Calvin LinIBM Austin

The University of Texas at Austin

Memory Bottleneck

Memory system performance is not increasing as fast as CPU performance

Latency: Use caches, prefetching, …

Bandwidth: Use parallelism inside memory system

1980 1985 1990 1995 2000

CPUMemory

How to Increase Memory Command Parallelism?

Read Bank 0

Read Bank 1

bank conflict

Bank 0

Bank 1

Bank 2

Bank 3

Read Bank 0

Read Bank 1

Read Bank 0

betterorder

Similar to instruction scheduling, can reorder commands for higher bandwidth

Inside the Memory Systemcach

Read Queue

Memory Queue

Write Queue

Memory Controller

not FIFO

the arbiter schedules memory operations

Our Work

Study memory command scheduling in the context of the IBM Power5

Present new memory arbiters 20% increased bandwidth Very little cost: 0.04% increase in chip area

Outline

The Problem Characteristics of DRAM Previous Scheduling Methods

Our approach History-based schedulers Adaptive history-based schedulers

Results

Conclusions

Understanding the Problem:Characteristics of DRAM

Multi-dimensional structure Banks, rows, and columns IBM Power5: ranks and ports as well

Access time is not uniform Bank-to-Bank conflicts Read after Write to the same rank conflict Write after Read to different port conflict …

Previous Scheduling Approaches: FIFO Scheduling

caches

Read Queue

Write Queue

Memory Queue(FIFO)

caches

Memoryless Scheduling

caches

Read Queue

Write Queue

Memory Queue(FIFO)

caches

long delay

Adapted from Rixner et al, ISCA2000

What we really want

Keep the pipeline full; don’t hold commands in the reorder queues until conflicts are totally resolved

Forward them to memory queue in an order to minimize future conflicts

BD C A

better

To do this we need to know history of the commands

Read/Write Queues

memory queue

arbiter

Another Goal: Match Application’s Memory Command Behavior

Arbiter should select commands from queues roughly in the ratio in which the application generates them

Otherwise, read or write queue may be congested

Command history is useful here too

Our Approach: History-Based

Memory Schedulers

Benefits:

Minimize contention costs Consider multiple constraints

Match application’s memory access behavior 2 Reads per Write? 1 Read per Write? …

The Result: less congested memory system, i.e. more bandwidth

How does it work?

Use a Finite State Machine (FSM)

Each state in the FSM represents one possible history

Transitions out of a state are prioritized

At any state, scheduler selects the available command with the highest priority

FSM is generated at design time

An Example

First Preference

Second Preference

Third Preference

Fourth Preference

most appropriate command to memory

available commands in reorder queues

next state

current state

How to determine priorities?

Two criteria: A: Minimize contention costs B: Satisfy program’s Read/Write command

First Method : Use A, break ties with B Second Method : Use B, break ties with A

Which method to use? Combine two methods probabilistically

(details in the paper)

Limitation of the History-Based Approach

Designed for one particular mix of Read/Writes

Solution: Adaptive History-Based Schedulers Create multiple state machines: one for

each Read/Write mix Periodically select most appropriate state

machine

Adaptive History-Based Schedulers

Arbiter1 Arbiter2 Arbiter3

Arbiter SelectionLogic

Read Counter

Write Counter

Cycle Counter

select

2R:1W 1R:1W 1R:2W

Evaluation

Used a cycle accurate simulator for the IBM Power5 1.6 GHz, 266-DDR2, 4-rank, 4-bank, 2-port

Evaluated and compared our approach with previous approaches with data intensive applications: Stream, NAS, and microbenchmarks

The IBM Power5

Memory Controller

• 2 cores on a chip• SMT capability• Large on-chip L2 cache• Hardware prefetching• 276 million transistors

(1.6% of chip area)

Results 1: Stream Benchmarks

copy scale sum triad

FIFO Memoryless Adaptive History-Based

Results 2: NAS Benchmarks

ep ft is lu

(1 core active)

Results 3: Microbenchmarks

caches

Read Queue

Write Queue

Memory Queue(FIFO)

caches

12 concurrent commands

DRAM Utilization

5001000

20002500

40004500

1 2 3 4 5 6 7 8 9 10 11 12

5001000

20002500

40004500

1 2 3 4 5 6 7 8 9 10 11 12

Number of Active Commands in DRAM

Our Approach

Memoryless Approach

Why does it work?cach

Read Queue

Memory Queue

Write Queue

Memory Controller

Low Occupancy in Reorder

QueuesFull Reorder QueuesFull Memory QueueBusy Memory System

detailed analysis in the paper

Other Results

We obtain >95% performance of the perfect DRAM configuration (no conflicts)

Results with higher frequency, and no data prefetching are in the paper

History size of 2 works well

Conclusions

Introduced adaptive history-based schedulers

Evaluated on a highly tuned system, IBM Power5

Performance improvementOver FIFO : Stream 63% NAS 11%Over Memoryless : Stream 19% NAS 5%

Little cost: 0.04% chip area increase

Conclusions (cont.)

Similar arbiters can be used in other places as well, e.g. cache controllers

Can optimize for other criteria, e.g. power or power+performance.

Thank you

Adaptive History-Based Memory Schedulers

Documents

An Adaptive Striping Architecture for Flash Memory Storage.....

Adaptive memory: Ancestral priorities and the mnemonic value...

DESIGNING EFFICIENT MEMORY SCHEDULERS FOR FUTURE...

Adaptive Memory: Fitness-Relevant Tunings Help Drive ...

Low-Overhead Memory Leak Detection Using Adaptive...

DESIGNING EFFICIENT MEMORY SCHEDULERS FOR FUTURE … ·...

VayuAnukulani: Adaptive Memory Networks for Air Pollution...

Adaptive Memory Fusion: Towards Transparent, Agile...

1 Adaptive History-Based Memory Schedulers Ibrahim Hur and.....

A Memory-Efficient Huffman Adaptive Coding Algorithm...

Adaptive Memory Reconfiguration Management: The AMRM...

Adaptive Memory: The Evolutionary Significance of Survival.....

Cluster Schedulers

Adaptive reversible composite-based shape memory alloy soft....

Processes, Schedulers, Threads

Adaptive Performance-Aware Distributed Memory Caching