The Memory Hierarchy CPSC 321 Andreas Klappenecker
Dec 19, 2015
Some Results from the Survey
• Issues with the CS curriculum• CPSC 111 Computer Science Concepts & Prg• CPSC 310 Databases• CPSC 431 Software Engineering
• Something from the wish list:• More C++• More Software Engineering• More focus on industry needs• Less focus on industry needs
Some Results from the Survey
• Why (MIPS) assembly language? • More detailed explanations of
programming language xyz.• Implement slightly reduced version of the
Pentium 4 or Athlon processors• Have another computer architecture class• Lack of information on CS website about
specialization...
Follow Up
• CPSC 462 Microcomputer Systems• CPSC 410 Operating Systems
• Go to seminars/lectures by Bjarne Stroustrup, Jaakko Jarvi, or Gabriel Dos Reis
Memory
Current memory is largely implemented inCMOS technology. Two alternatives: • SRAM
• fast, but not area efficient• stored value in a pair of inverting gates
• DRAM• slower, but more area efficient• value stored on charge of a capacitor (must be
refreshed)
Memory
• Users want large and fast memories• SRAM is too expensive for main memory• DRAM is too slow for many purposes
• Compromise• Build a memory hierarchy
CPU
Level n
Level 2
Level 1
Levels in thememory hierarchy
Increasing distance from the CPU in
access time
Size of the memory at each level
Locality
• If an item is referenced, then • it will be again referenced soon (temporal locality)• nearby data will be referenced soon (spatial locality)
• Why does code have locality?
Memory Hierarchy
• The memory is organized as a hierarchy• levels closer to the processor is a subset
of any level further away• the memory can consist of multiple
levels, but data is typically copied between two adjacent levels at a time
• initially, we focus on two levels
Two Level Hierarchy
• Upper level (smaller and faster)• Lower level (slower)• A unit of information that is present or not
within a level is called a block• If data requested by the processor is in the
upper level, then this is called a hit, otherwise it is called a miss
• If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred
Cache
A cache represents some level of memory between CPU and main memory
[More general definitions are often used]
A Toy Example
• Assumptions• Suppose that processor requests are each one word,• and that each block consists of one word
• Example • Before request C = [X1,X2,…,Xn-1]• Processor requests Xn not contained in C• item Xn is brought from the memory to the cache• After the request C = [X1,X2,…,Xn-1,Xn]
• Issues• What happens if the cache is full?
Issues
• How do we know whether the data item is in the cache?
• If it is, how do we find it?
• Simple strategy: direct mapped cache• exactly one location where data might
be in the cache
• Mapping: address modulo the number of blocks in the cache, x -> x mod B
Direct Mapped Cache
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
• Cache with 1024=210 words• tag from cache is compared against
upper portion of the address• If tag=upper 20 bits and valid bit is
set, then we have a cache hit otherwise it is a cache miss
What kind of locality are we taking advantage of?
Direct Mapped Cache
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
• Taking advantage of spatial locality:
Direct Mapped Cache
Address (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
• Read hits• this is what we want!
• Read misses• stall the CPU, fetch block from memory, deliver to cache,
restart
• Write hits:• can replace data in cache and memory (write-through)• write the data only into the cache (write-back the cache later)
• Write misses:• read the entire block into the cache, then write the word
Hits vs. Misses
What Block Size?
• A large block size reduces cache misses• Cache miss penalty increases • We need to balance these two
constraints• How can we measure cache
performance?• How can we improve cache
performance?
The performance of a cache depends on many parameters:
• Memory stall clock cycles
• Read stall clock cycles
• Write stall clock cycles
Cache Block Mapping
• Direct mapped cache• a block goes in exactly one place in the
cache
• Fully associative• a block can go anywhere in the cache• difficult to find a block• parallel comparison to speed-up search
Cache Block Mapping
• Set associative• Each block maps to a unique set, and
the block can be placed into any element of that set
• Position is given by (Block number) modulo (# of sets in cache)
• If the sets contain n elements, then the cache is called n-way set associative