The Memory Hierarchy CPSC 321 Andreas Klappenecker.

The Memory Hierarchy CPSC 321

Andreas Klappenecker

Some Results from the Survey

• Issues with the CS curriculum• CPSC 111 Computer Science Concepts & Prg• CPSC 310 Databases• CPSC 431 Software Engineering

• Something from the wish list:• More C++• More Software Engineering• More focus on industry needs• Less focus on industry needs

Some Results from the Survey

• Why (MIPS) assembly language? • More detailed explanations of

programming language xyz.• Implement slightly reduced version of the

Pentium 4 or Athlon processors• Have another computer architecture class• Lack of information on CS website about

specialization...

Follow Up

• CPSC 462 Microcomputer Systems• CPSC 410 Operating Systems

• Go to seminars/lectures by Bjarne Stroustrup, Jaakko Jarvi, or Gabriel Dos Reis

Today’s Menu

Caches

Memory

Current memory is largely implemented inCMOS technology. Two alternatives: • SRAM

• fast, but not area efficient• stored value in a pair of inverting gates

• DRAM• slower, but more area efficient• value stored on charge of a capacitor (must be

refreshed)

Static RAM

Static RAM

Dynamic RAM

Dynamic RAM

Memory

• Users want large and fast memories• SRAM is too expensive for main memory• DRAM is too slow for many purposes

• Compromise• Build a memory hierarchy

CPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

Locality

• If an item is referenced, then • it will be again referenced soon (temporal locality)• nearby data will be referenced soon (spatial locality)

• Why does code have locality?

Memory Hierarchy

• The memory is organized as a hierarchy• levels closer to the processor is a subset

of any level further away• the memory can consist of multiple

levels, but data is typically copied between two adjacent levels at a time

• initially, we focus on two levels

Memory Hierarchy

Two Level Hierarchy

• Upper level (smaller and faster)• Lower level (slower)• A unit of information that is present or not

within a level is called a block• If data requested by the processor is in the

upper level, then this is called a hit, otherwise it is called a miss

• If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred

Cache

A cache represents some level of memory between CPU and main memory

[More general definitions are often used]

A Toy Example

• Assumptions• Suppose that processor requests are each one word,• and that each block consists of one word

• Example • Before request C = [X1,X2,…,Xn-1]• Processor requests Xn not contained in C• item Xn is brought from the memory to the cache• After the request C = [X1,X2,…,Xn-1,Xn]

• Issues• What happens if the cache is full?

Issues

• How do we know whether the data item is in the cache?

• If it is, how do we find it?

• Simple strategy: direct mapped cache• exactly one location where data might

be in the cache

• Mapping: address modulo the number of blocks in the cache, x -> x mod B

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

01

001

11

001

011

101

11

• Cache with 1024=210 words• tag from cache is compared against

upper portion of the address• If tag=upper 20 bits and valid bit is

set, then we have a cache hit otherwise it is a cache miss

What kind of locality are we taking advantage of?

Direct Mapped Cache

Address (showing bit positions)

20 10

Byteoffset

Valid Tag DataIndex

0

1

2

1021

1022

1023

Tag

Index

Hit Data

20 32

31 30 13 12 11 2 1 0

Direct Mapped Cache Example



• Taking advantage of spatial locality:

Direct Mapped Cache

Address (showing bit positions)

16 12 Byteoffset

V Tag Data

Hit Data

16 32

4Kentries

16 bits 128 bits

Mux

32 32 32

2

32

Block offsetIndex

Tag

31 16 15 4 32 1 0

• Read hits• this is what we want!

• Read misses• stall the CPU, fetch block from memory, deliver to cache,

restart

• Write hits:• can replace data in cache and memory (write-through)• write the data only into the cache (write-back the cache later)

• Write misses:• read the entire block into the cache, then write the word

Hits vs. Misses

Hits vs. Miss Example

What Block Size?

• A large block size reduces cache misses• Cache miss penalty increases • We need to balance these two

constraints• How can we measure cache

performance?• How can we improve cache

performance?

The performance of a cache depends on many parameters:

• Memory stall clock cycles

• Read stall clock cycles

• Write stall clock cycles

Cache Block Mapping

• Direct mapped cache• a block goes in exactly one place in the

cache

• Fully associative• a block can go anywhere in the cache• difficult to find a block• parallel comparison to speed-up search

Cache Block Mapping

• Set associative• Each block maps to a unique set, and

the block can be placed into any element of that set

• Position is given by (Block number) modulo (# of sets in cache)

• If the sets contain n elements, then the cache is called n-way set associative

Cache Types

The Memory Hierarchy CPSC 321 Andreas Klappenecker.

Documents

memory hierarchy slide

level of memory

memory current memory

memory hierarchy cpsc

dynamic ram slide

static ram slide

andreas klappenecker

memory users