EECS 322 Computer Architecture Improving Memory Access 3/3 The Cache and Virtual Memory
Jan 03, 2016
EECS 322 Computer Architecture
Improving Memory Access 3/3
The Cache and Virtual Memory
Cache associativity Figure 7.15
1
2Tag
Data
Block # 0 1 2 3 4 5 6 7
Search
Direct mapped
1
2Tag
Data
Set # 0 1 2 3
Search
Set associative
1
2Tag
Data
Search
Fully associative
Direct-mappedcache
2-way set associative
cache
Fully associative
cache
Cache associativity
T ag D ata T ag D a ta Tag D a ta Ta g D a ta Ta g D a ta Ta g D a ta Ta g D ata T ag D ata
E ig ht-w a y se t a sso c iat ive (fu lly a ssoc ia tive )
T ag D ata Tag D a ta Ta g D a ta Ta g D a ta
F ou r-w ay set assoc ia tive
S et
0
1
T ag D ata
O n e- w a y se t a sso cia tive(d irec t m a pp ed )
B lock
0
7
1
2
3
4
5
6
Ta g D a ta
T w o- w a y se t a sso cia tive
S et
0
1
2
3
Ta g D a ta
Figure 7.16
Direct Mapping Cache addresses
blockcacheperBytes
addressByteaddressBlock
Data
Size)(Cache%Address)(BlockIndexCache
Size)(Cache%blockcacheperBytes
addressByteIndexCache
Data
BlockCacheperBytes
BytesinSizeCacheBlocksinSizeCache
N-Way Set-Associative Mapping Cache addresses
blockcacheperBytes
addressByteaddressBlock
Data
Size)(Cache%Address)(BlockIndexCache #Set
Size)(Cache%blockcacheperBytes
addressByteIndexCache
Data
BlockCacheperBytesData
BytesinSizeCacheinSizeCache
N
SetswayN
where each set consists on N blocks
An N-way set associative cache consists of a number of sets,
Direct Mapped Cache
Suppose we have the have the following byte addresses:0, 32, 0, 24, 32
Given a direct mapped cache consisting of 16 data bytes where Data bytes per cache block = 4where Cache size = 16 bytes/(4 bytes per block) = 4
The cache block addresses are: 0, 8, 0, 6, 8The cache index addresses are: 0, 0, 0, 2, 0
For example, 24,Block address =floor(byte address / data bytes per block)
= floor(24/4)=6
Cache Index = 6 % Cache size = 6 % 4 = 2
Direct Mapped Cache: Example
lw $2, 010 00 00($0)
lw $3, 000 00 00($0)
Index Valid Tag Data
00 N
01 N
10 N
11 N
Y 000 Memory[000 00 00]
Miss: validMiss: valid
Miss: tagMiss: tag
Miss: tagMiss: tag
Page 571
lw $4, 001 10 00($0) Miss: validMiss: valid
lw $1,0($0)
lw $2,32($0)
lw $3,0($0)
lw $4,24($0)
lw $5,32($0)lw $5, 010 00 00($0) Miss: tagMiss: tag
Byte addressByte address
lw $1, 000 00 00($0)
Tag Tag Byte offsetByte offsetIndexIndex
Y 010 Memory[010 00 00]Y 000 Memory[000 00 00]
Y 001 Memory[001 10 00]
Y 010 Memory[010 00 00]
5/5 Misses5/5 Misses
Given a 2-way set associative cache of 16 data byteswhere Data bytes per cache block = 4where Cache size = 16 bytes/(2 4 bytes per block) = 2
2-Way Set associative cache
Again, suppose we have the have the following byte addresses:
0, 32, 0, 24, 32
Note: Direct cache index addresses were: 0, 0, 0, 2, 0Note: Direct cache index addresses were: 0, 0, 0, 2, 0
The 2-way cache block addresses are: 0, 8, 0, 6, 8The 2-way cache index addresses are: 0, 0, 0, 0, 0
For example, 24,Block address =floor(byte address / data bytes per block)
= floor(24/4) = 6Cache Index = 6 % Cache size = 6 % 2 = 0
Index Valid Tag Data
0 N
N
1 N
N
2-Way Set Associative Cache: Example
lw $1,0($0)
lw $2,32($0)lw $2, 0100 0 00($0)
lw $3,0($0)lw $3, 0000 0 00($0)
Y 00000 Memory[0000 0 00]
Miss: validMiss: valid
Miss: tagMiss: tag
Hit!Hit!
Page 571
lw $4,24($0)lw $4, 0011 0 00($0) Miss: tagMiss: tag
lw $5,32($0)lw $5, 0100 0 00($0) Miss: tagMiss: tag
Byte addressByte address
lw $1, 0000 0 00($0)
Byte offsetByte offset Tag Tag IndexIndex
Y 0100 Memory[0100 0 00]
Y 0100 Memory[0100 0 00]
4/5 Misses4/5 Misses
Y 0011 Memory[0011 0 00]
Given a 2-way set associative cache of 16 data byteswhere Data bytes per cache block = 4where Cache size = 16 bytes/(4 bytes per block) = 4
Fully associative cache
Again, suppose we have the have the following byte addresses:
0, 32, 0, 24, 32
The fully cache block addresses are: None!The 2-way cache index addresses are: NONE!
Valid Tag Data
N
N
N
N
Fully Associative Cache: Example
lw $1,0($0)
lw $2,32($0)lw $2, 01000 00($0)
lw $3,0($0)lw $3, 00000 00($0)
Y 00000 Memory[00000 00]
Miss: validMiss: valid
MissMiss
Hit!Hit!
Page 571
lw $4,24($0)lw $4, 00110 00($0) MissMiss
lw $5,32($0)lw $5, 01000 00($0) Hit!Hit!
Byte addressByte address
lw $1, 00000 00($0)
Byte offsetByte offset Tag Tag IndexIndex
Y 01000 Memory[01000 00]
3/5 Misses3/5 Misses
Y 00110 Memory[00110 00]
Cache comparisons: Examples
Direct Mapped cache: 5 out of 5 misses
2-Way Set associative cache: 4 out of 5 misses
Fully associative cache: 3 out of 5 misses
Chip Area Speed
Bits in a Temporal Direct Mapped Cache
How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand one word (=4 bytes =32 bit) blocksassuming a 32 bit byte memory address?
Cache index width = log2 words= log2 216/4 = log2 214 words = 14 bits
Tag size = <block address width> – <cache index width> = 30 – 14 = 16 bits
Block address width = <byte address width> – log2 word = 32 – 2 = 30 bits
Cache block size = <valid size>+<tag size>+<block data size> = 1 bit + 16 bits + 32 bits = 49 bits
Total size = <Cache word size> <Cache block size> = 214 words 49 bits = 784 210 = 784 Kbits = 98
KB = 98 KB/64 KB = 1.5 times overhead
Page 550
Bits in a Spatial Direct Mapped Cache
How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4 - one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?
Cache index width = log2 words= log2 216/16 = log2 212 words = 12 bits
Tag size = <block address width> – <cache index width> = 28 – 12 = 16 bits
Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits
Cache block size = <valid size>+<tag size>+<block data size> = 1 bit + 16 bits + 128 bits = 145 bits
Total size = <Cache word size> <Cache block size> = 212 words 145 bits = 593920 bits / (1024*8) = 72.5 Kbyte = 72.5 KB/64 KB = 1.13 times overhead
Page 570
How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4 - one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?
Cache index width = log2 words= log2 216/(216) = log2 211 words = 11 bits
Tag size = <block address width> – <cache index width> = 28 – 11 = 17 bits
Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits (same!)
Cache Set size = <valid size>+<tag size>+<block data size> = 1 bit + 2(17 bits + 128 bits) = 311 bits
Total size = <Cache word size> <Cache block size> = 211 words 311 bits = 636928 bits / (1024*8) = 77.75 Kbyte = 72.5 KB/64 KB = 1.21 times overhead
Page 570Bits in a 2-Way Set Associative Cache
How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4 - one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?
Cache index width = log2 words= log2 216/(416) = log2 210 words = 10 bits
Tag size = <block address width> – <cache index width> = 28 – 10 = 18 bits
Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits (same!)
Cache Set size = <valid size>+<tag size>+<block data size> = 1 bit + 4(18 bits + 128 bits) = 585 bits
Total size = <Cache word size> <Cache block size> = 210 words 585 bits = 599040 bits / (1024*8) = 73.125 Kbyte = 73.125 KB/64 KB = 1.14 times overhead
Page 570Bits in a 4-Way Set Associative Cache
How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4-one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?
Cache index width = 0
Tag size = <block address width> – <cache index width> = 28 – 0 = 28 bits
Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits (same!)
Cache Set size = <valid size>+<tag size>+<block data size> = 1 bit + 28 bits + 128 bits = 157 bits
Total size = <Cache word size> <Cache block size> = 212 words 157 bits = 643072 bits / (1024*8) = 78.5 Kbyte = 78.5 KB/64 KB = 1.23 times overhead
Page 570Bits in a Fully Associative Cache
Page 570Bits in 64 KB Data Cache Summary
Temporial Direct Mapped
Spatial Direct Mapped
2 - Way Set Associative
4 - Way Set Associative
Fully Associative
Cache Type
14 bits
12 bits
11 bits
10 bits
0 bits
Index Size
16 bits
16 bits
17 bits
18 bits
28 bits
TagSize
49 bits
145 bits
311 bits
585 bits
157 bits
Set Size
1.5%
1.13%
1.21%
1.14%
1.23%
Over-head
MissRate
5.4%
1.9%
1.5%
1.5%
Chip Area
• Make reading multiple words easier by using banks of memory
• It can get a lot more complicated...
Designing the Memory System
CPU
Cache
Bus
Memory
a. One-word-wide memory organization
CPU
Bus
b. Wide memory organization
Memory
Multiplexor
Cache
CPU
Cache
Bus
Memorybank 1
Memorybank 2
Memorybank 3
Memorybank 0
c. Interleaved memory organization
Figure 7.13
Memory organizations Figure 7.13
Wide memory organizationAdvantage
Fastest: 0.94 bytes/clock transfer rateDisadvantage
Wider bus and increase in cache access time
Interleave memory organizationAdvantage
Better: 0.80 bytes/clock transfer rateBanks are valuable on writes: independently
Disadvantagemore complex bus hardware
One word wide memory organizationAdvantage
Easy to implement, low hardware overheadDisadvantage
Slow: 0.25 bytes/clock transfer rate
Chip Area Speed
Decreasing miss penalty with multilevel caches
Suppose we have a processor withCPI = 1.0Clock Rate = 500 Mhz = 2 nsL1 Cache Miss rate = 5%DRAM = 200 ns
How mach faster will the machine will be if we add aL2 Cache = 20 ns (hit time = miss penalty)L1 Cache Miss rate = 2%
CyclesClock100CycleClockperns2
ns200PenaltyMissMtoL
CyclesClock10CycleClockperns2
ns20PenaltyMissL2toL1
Page 576
Decreasing miss penalty with multilevel caches
The effective CPI with only L1 to M cache
Total CPI = Base CPI + (Memory Stall cycles per Instruction)
= 1.0 + 5% 100ns = 6.0
1.73.5
6.0SpeedupCPU
The effective CPI with L1 and L2 caches
Total CPI = Base CPI + (L1 to L2 penalty) + (L2 to M penalty)
= 1.0 + (5% 10ns) + (2% 100ns) = 3.5
Page 576
Virtual Memory
• Main memory can act as a cache for the secondary storage (disk) Advantages:– illusion of having more physical memory– program relocation – protection
Physical addresses
Disk addresses
Virtual addresses
Address translation
Figure 7.20
Pages: virtual memory blocks
3 2 1 011 10 9 815 14 13 1231 30 29 28 27
Page offsetVirtual page number
Virtual address
3 2 1 011 10 9 815 14 13 1229 28 27
Page offsetPhysical page number
Physical address
Translation
Figure 7.21
The automatic transmission of a car: Hardware/Software does the shifting
The automatic transmission of a car: Hardware/Software does the shifting
Page Tables
Physical memory
Disk storage
Valid
1
1
1
1
0
1
1
0
1
1
0
1
Page table
Virtual pagenumber
Physical page ordisk address
Figure 7.23
DiskDisk
Virtual Example: 32 Byte pages
lw $1,32($0)
lw $2,40($0)lw $2, 00101 0 00($0)C-MissC-Miss
C-MissC-Miss
Index Valid Tag Data
0 N
1 NY 00100 Memory[000100 0 00]
lw $2, 00100 0 00($0)
Page Index Valid Physical Page number
0 N
1 N
... NY 111
001 00000
Virtual Page Number Page offset
MemoryMemory
Y 00101 Memory[000101 0 00]
P-FaultP-Fault
P- Hit!P- Hit!
001 01000
Page Table Size
Suppose we have32 bit virtual address (=232 bytes) 4096 bytes per page (=212 bytes)4 bytes per page table entry (=22 bytes)
What is the total page table size?
entries22
2entriestablepageofNumber 20
12
32
MB4bytes2bytes2entries2tablepageofSize 22220
4 Megabytes just for page tables!! Too Big
Page Tables
Page offsetVirtual page number
Virtual address
Page offsetPhysical page number
Physical address
Physical page numberValid
If 0 then page is notpresent in memory
Page table register
Page table
20 12
18
31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
29 28 27 15 14 13 12 11 10 9 8 3 2 1 0
Figure 7.22
Making Address Translation Fast
• A cache for address translations: translation lookaside buffer
Valid
1
1
1
1
0
1
1
0
1
1
0
1
Page table
Physical pageaddressValid
TLB
1
1
1
1
0
1
TagVirtual page
number
Physical pageor disk address
Physical memory
Disk storage
TLBs and caches
Yes
Deliver datato the CPU
Write?
Try to read datafrom cache
Write data into cache,update the tag, and put
the data and the addressinto the write buffer
Cache hit?Cache miss stall
TLB hit?
TLB access
Virtual address
TLB missexception
No
YesNo
YesNo
Write accessbit on?
YesNo
Write protectionexception
Physical address