Top Banner
Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401
22

Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Jan 01, 2016

Download

Documents

Myrtle Berry
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Lecture 13 Memory Hierarchy and

Cache Design

Prof. Mike Schulte

Advanced Computer Architecture

ECE 401

Page 2: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Why is memory important?• Processor performance has increased at a much faster rat

e than memory performance, making main memory the bottleneck.

CPU-DRAM Gap

• 1980: no cache in ตproc; 1995 2-level cache, 60% trans. on Alpha 21164 ตproc

1

10

100

1000

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

Page 3: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

General Principles of Memory

• Locality– Temporal Locality: referenced memory is likely to be referenced again

soon (e.g. code within a loop)

– Spatial Locality: memory close to referenced memory is likely to be referenced soon (e.g., data in a sequentially access array)

• Locality + smaller HW is faster = memory hierarchy– Levels: each smaller, faster, more expensive/byte than level below

– Inclusive: data found in top also found in the bottom

• Definitions– Upper: closer to processor

– Block: minimum unit that present or not in upper level

– Block address: location of block in memory

– Hit time: time to access upper level, including hit determination

Page 4: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Memory Hierarchy

Registers

L1 Cache

L2 Cache

DRAM

DisksSecondary Storage

MemoryHierarchy

Processor

Page 5: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Differences in Memory Levels

Level MemoryTechnology

Size TypicalAccess Time

Cost perMbyte

Registers D Flip-Flops

64 32-bit 2 -3 ns N/A

L1 Cache(on chip)

SRAM 16 Kbytes 5 - 25 ns $100 - $250

L2Cache(off chip)

SRAM 256 Kbytes 5 - 25 ns $100 - $250

MainMemory

DRAM 64 Mbytes 60 - 120 ns $5 - $10

SecondaryStorage

MagneticDisk

2 Gbytes 10 - 20 ms $0.10-$0.20

Page 6: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Four Questions for Memory Hierarchy Designers

• Q1: Where can a block be placed in the upper level? (Block placement)

• Q2: How is a block found if it is in the upper level? (Block identification)

• Q3: Which block should be replaced on a miss? (Block replacement)

• Q4: What happens on a write? (Write strategy)

Page 7: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Q1: Where can a block be placed in the upper level?

• Direct Mapped: Each block has only one place that it can appear in the cache.

• Fully associative: Each block can be placed anywhere in the cache.

• Set associative: Each block can be placed in a restricted set of places in the cache.

– If there are n blocks in a set, the cache placement is called n-way set associative

• What is the associativity of a direct mapped cache?

Page 8: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Associativity Examples(Figure 5.2, pg. 376)

Fully associative:Block 12 can go anywhere

Direct mapped: Block no. = (Block address) mod (No. of blocks in cache)Block 12 can go only into block 4(12 mod 8)

Set associative:Set no. = (Block address) mod (No. of sets in cache)Block 12 can go anywhere in set 0(12 mod 4)

Page 9: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Q2: How Is a Block Found If It Is in the Upper Level?

• The address can be divided into two main parts– Block offset: selects the data from the block

offset size = log2(block size)

– Block address: tag + index

ป index: selects set in cache

index size = log2(#blocks/associativity)

– tag: compared to tag in cache to determine hit

tag size = addreess size - index size - offset size

Tag Index

Page 10: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Q3: Which Block Should be Replaced on a Miss?

• Easy for Direct Mapped

• Set Associative or Fully Associative:– Random - easier to implement

– Least Recently used - harder to implement - may approximate

• Miss rates for caches with different size, associativity and replacemnt algorithm.

Associativity: 2-way 4-way 8-way

Size LRU Random LRU Random LRU Random

16 KB 5.18% 5.69% 4.67% 5.29% 4.39% 4.96%

64 KB 1.88% 2.01% 1.54% 1.66% 1.39% 1.53%

256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%

For caches with low miss rates, random is almost as good as LRU.

Page 11: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Q4: What Happens on a Write?• Write through: The information is written to both the block in the

cache and to the block in the lower-level memory.

• Write back: The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.

– is block clean or dirty? (add a dirty bit to each block)

• Pros and Cons of each:– Write through

ป read misses cannot result in writes to memory,

ป easier to implement

ป Always combine with write buffers to avoid memory latency

– Write back

ป Less memory traffic

ป Perform writes at the speed of the cache

Page 12: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Q4: What Happens on a Write?

• Since data does not have to be brought into the cache on a write miss, there are two options:

– Write allocate

ป The block is brought into the cache on a write miss

ป Used with write-back caches

ป Hope subsequent writes to the block hit in cache

– No-write allocate

ป The block is modified in memory, but not brought into the cach

ป Used with write-through caches

ป Writes have to go to memory anyway, so why bring the block into the cache

Page 13: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Cache Measures

• Hit rate: fraction found in the cache– So high that we usually talk about Miss rate = 1 - Hit Rate

• Hit time: time to access the cache

• Miss penalty: time to replace a block from lower level, including time to replace in CPU

– access time: time to acccess lower level

– transfer time: time to transfer block

• Average memory-access time

= Hit time + Miss rate x Miss penalty (ns or clocks)

Page 14: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Block Size vs. Cache Measures

• Increasing Block Size generally increases Miss Penalty and decreases Miss Rate

Block Size Block Size

Block Size

MissRate

MissPenalty

Avg.MemoryAccessTime

X =

Page 15: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Example: Alpha 21064 Data Cache

• The data cache of the Alpha 21064 has the following features

– 8 KB of data

– 32 byte blocks

– Direct mapped placement

– Write through (no-write allocate, 4-block write buffer)

– 34 bit physical address composed of

ป 5 bit block offset

ป 8 bit index

ป 21 bit tag

Page 16: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Example: Alpha 21064 Data Cache

A cache read has 4 steps(1) The address from the cache is divided into the tag, index, and block offset(2) The index selects block(3) The address tag is compared with the tag in the cache, the valid bit is checked, and data to be loaded is selected(4) If the valid bit is set, the data is loaded into the processor

If there is a write, the data is alsosent to the write buffer

Page 17: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

2-Way Set Associative Cache

Features of an 8 KB 2-way set associative cache

5 bit block offset7 bit index22 bit tag

The set associative cache hasextra hardware for

2 tag comparisonsmux to select data

Compared to the direct mappedcache, the set associative cachewill tend to have a smaller miss ratebut a larger hit time.

Page 18: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Split vs. Unified Cache• Unified cache (mixed cache): Data and instructions are stored to

gether (von Neuman architecture)

• Split cache: Data and instructions are stored separately (Harvard architecture)

• Why do instructions caches have a lower miss ratio?

Size Instruction Cache Data Cache Unified Cache

1 KB 3.06% 24.61% 13.34%

2 KB 2.26% 20.57% 9.78%

4 KB 1.78% 15.94% 7.24%

8 KB 1.10% 10.19% 4.57%

16 KB 0.64% 6.47% 2.87%

32 KB 0.39% 4.82% 1.99%

64 KB 0.15% 3.77% 1.35%

128 KB 0.02% 2.88% 0.95%

Page 19: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Example:Split vs. Unified Cache

• Which has the lower average memeory access time?

– Split cache : 16 KB instructions + 16 KB data

– Unified cache: 32 KB instructions + data

• Assumptions– Use miss rates from previous chart

– Miss penalty is 50 cycles

– Hit time is 1 cycle

– Load or store hit takes an extra cycle, since there is only one port for instructions and data

Page 20: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Example:Split vs. Unified Cache

Average memory-access time = Hit time + Miss rate x Miss penalty

AMAT = %instr x (instr hit time + instr miss rate x instr miss penalty) +

%data x (data hit time + data miss rate x data miss penalty)

For the split cache:

AMAT = 75% x (1 + 0.64%x 50) + 25% (1 + 6.74% x 50) = 2.05

For the unified cache

AMAT = 75% x (1 + 1.99%x 50) + 25% x (1 + 1.99% x 50) = 2.24

The unified cache has a longer AMAT, even though its miss rate is lower, due to conflicts for instruction and data hazards.

What are advantages of a split cache? Of unified cache?

Page 21: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Improving Cache Performance

• Average memory-access time

= Hit time + Miss rate x Miss penalty

• Improve performance by:

1. Reduce the miss rate,

2. Reduce the miss penalty, or

3. Reduce the time to hit in the cache.

Page 22: Lecture 13 Memory Hierarchy and Cache Design Prof. Mike Schulte Advanced Computer Architecture ECE 401.

Summary

• CPU-Memory gap is major performance obstacle for achieving high performance

• Memory hierarchies– Take advantage of program locality

– Closer to processor => smaller, faster, more expensive

– Further from processor => bigger, slower, less expensive

• 4 questions for memory hierarchy– Block placement, block identification, block replacement, and

write stategy

• Cache parameters– Cache size, block size, associativity