Top Banner
EECS 322 Computer Architecture Improving Memory Access 3/3 The Cache and Virtual Memory
30

EECS 322 Computer Architecture

Jan 03, 2016

Download

Documents

uma-sanford

EECS 322 Computer Architecture. Improving Memory Access 3/3 The Cache and Virtual Memory. Cache associativity. Figure 7.15. Fully associative cache. Direct-mapped cache. 2-way set associative cache. Cache associativity. O. n. e. -. w. a. y. s. e. t. a. s. s. o. c. i. a. t. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EECS 322 Computer Architecture

EECS 322 Computer Architecture

Improving Memory Access 3/3

The Cache and Virtual Memory

Page 2: EECS 322 Computer Architecture

Cache associativity Figure 7.15

1

2Tag

Data

Block # 0 1 2 3 4 5 6 7

Search

Direct mapped

1

2Tag

Data

Set # 0 1 2 3

Search

Set associative

1

2Tag

Data

Search

Fully associative

Direct-mappedcache

2-way set associative

cache

Fully associative

cache

Page 3: EECS 322 Computer Architecture

Cache associativity

T ag D ata T ag D a ta Tag D a ta Ta g D a ta Ta g D a ta Ta g D a ta Ta g D ata T ag D ata

E ig ht-w a y se t a sso c iat ive (fu lly a ssoc ia tive )

T ag D ata Tag D a ta Ta g D a ta Ta g D a ta

F ou r-w ay set assoc ia tive

S et

0

1

T ag D ata

O n e- w a y se t a sso cia tive(d irec t m a pp ed )

B lock

0

7

1

2

3

4

5

6

Ta g D a ta

T w o- w a y se t a sso cia tive

S et

0

1

2

3

Ta g D a ta

Figure 7.16

Page 4: EECS 322 Computer Architecture

Direct Mapping Cache addresses

blockcacheperBytes

addressByteaddressBlock

Data

Size)(Cache%Address)(BlockIndexCache

Size)(Cache%blockcacheperBytes

addressByteIndexCache

Data

BlockCacheperBytes

BytesinSizeCacheBlocksinSizeCache

Page 5: EECS 322 Computer Architecture

N-Way Set-Associative Mapping Cache addresses

blockcacheperBytes

addressByteaddressBlock

Data

Size)(Cache%Address)(BlockIndexCache #Set

Size)(Cache%blockcacheperBytes

addressByteIndexCache

Data

BlockCacheperBytesData

BytesinSizeCacheinSizeCache

N

SetswayN

where each set consists on N blocks

An N-way set associative cache consists of a number of sets,

Page 6: EECS 322 Computer Architecture

Direct Mapped Cache

Suppose we have the have the following byte addresses:0, 32, 0, 24, 32

Given a direct mapped cache consisting of 16 data bytes where Data bytes per cache block = 4where Cache size = 16 bytes/(4 bytes per block) = 4

The cache block addresses are: 0, 8, 0, 6, 8The cache index addresses are: 0, 0, 0, 2, 0

For example, 24,Block address =floor(byte address / data bytes per block)

= floor(24/4)=6

Cache Index = 6 % Cache size = 6 % 4 = 2

Page 7: EECS 322 Computer Architecture

Direct Mapped Cache: Example

lw $2, 010 00 00($0)

lw $3, 000 00 00($0)

Index Valid Tag Data

00 N

01 N

10 N

11 N

Y 000 Memory[000 00 00]

Miss: validMiss: valid

Miss: tagMiss: tag

Miss: tagMiss: tag

Page 571

lw $4, 001 10 00($0) Miss: validMiss: valid

lw $1,0($0)

lw $2,32($0)

lw $3,0($0)

lw $4,24($0)

lw $5,32($0)lw $5, 010 00 00($0) Miss: tagMiss: tag

Byte addressByte address

lw $1, 000 00 00($0)

Tag Tag Byte offsetByte offsetIndexIndex

Y 010 Memory[010 00 00]Y 000 Memory[000 00 00]

Y 001 Memory[001 10 00]

Y 010 Memory[010 00 00]

5/5 Misses5/5 Misses

Page 8: EECS 322 Computer Architecture

Given a 2-way set associative cache of 16 data byteswhere Data bytes per cache block = 4where Cache size = 16 bytes/(2 4 bytes per block) = 2

2-Way Set associative cache

Again, suppose we have the have the following byte addresses:

0, 32, 0, 24, 32

Note: Direct cache index addresses were: 0, 0, 0, 2, 0Note: Direct cache index addresses were: 0, 0, 0, 2, 0

The 2-way cache block addresses are: 0, 8, 0, 6, 8The 2-way cache index addresses are: 0, 0, 0, 0, 0

For example, 24,Block address =floor(byte address / data bytes per block)

= floor(24/4) = 6Cache Index = 6 % Cache size = 6 % 2 = 0

Page 9: EECS 322 Computer Architecture

Index Valid Tag Data

0 N

N

1 N

N

2-Way Set Associative Cache: Example

lw $1,0($0)

lw $2,32($0)lw $2, 0100 0 00($0)

lw $3,0($0)lw $3, 0000 0 00($0)

Y 00000 Memory[0000 0 00]

Miss: validMiss: valid

Miss: tagMiss: tag

Hit!Hit!

Page 571

lw $4,24($0)lw $4, 0011 0 00($0) Miss: tagMiss: tag

lw $5,32($0)lw $5, 0100 0 00($0) Miss: tagMiss: tag

Byte addressByte address

lw $1, 0000 0 00($0)

Byte offsetByte offset Tag Tag IndexIndex

Y 0100 Memory[0100 0 00]

Y 0100 Memory[0100 0 00]

4/5 Misses4/5 Misses

Y 0011 Memory[0011 0 00]

Page 10: EECS 322 Computer Architecture

Given a 2-way set associative cache of 16 data byteswhere Data bytes per cache block = 4where Cache size = 16 bytes/(4 bytes per block) = 4

Fully associative cache

Again, suppose we have the have the following byte addresses:

0, 32, 0, 24, 32

The fully cache block addresses are: None!The 2-way cache index addresses are: NONE!

Page 11: EECS 322 Computer Architecture

Valid Tag Data

N

N

N

N

Fully Associative Cache: Example

lw $1,0($0)

lw $2,32($0)lw $2, 01000 00($0)

lw $3,0($0)lw $3, 00000 00($0)

Y 00000 Memory[00000 00]

Miss: validMiss: valid

MissMiss

Hit!Hit!

Page 571

lw $4,24($0)lw $4, 00110 00($0) MissMiss

lw $5,32($0)lw $5, 01000 00($0) Hit!Hit!

Byte addressByte address

lw $1, 00000 00($0)

Byte offsetByte offset Tag Tag IndexIndex

Y 01000 Memory[01000 00]

3/5 Misses3/5 Misses

Y 00110 Memory[00110 00]

Page 12: EECS 322 Computer Architecture

Cache comparisons: Examples

Direct Mapped cache: 5 out of 5 misses

2-Way Set associative cache: 4 out of 5 misses

Fully associative cache: 3 out of 5 misses

Chip Area Speed

Page 13: EECS 322 Computer Architecture

Bits in a Temporal Direct Mapped Cache

How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand one word (=4 bytes =32 bit) blocksassuming a 32 bit byte memory address?

Cache index width = log2 words= log2 216/4 = log2 214 words = 14 bits

Tag size = <block address width> – <cache index width> = 30 – 14 = 16 bits

Block address width = <byte address width> – log2 word = 32 – 2 = 30 bits

Cache block size = <valid size>+<tag size>+<block data size> = 1 bit + 16 bits + 32 bits = 49 bits

Total size = <Cache word size> <Cache block size> = 214 words 49 bits = 784 210 = 784 Kbits = 98

KB = 98 KB/64 KB = 1.5 times overhead

Page 550

Page 14: EECS 322 Computer Architecture

Bits in a Spatial Direct Mapped Cache

How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4 - one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?

Cache index width = log2 words= log2 216/16 = log2 212 words = 12 bits

Tag size = <block address width> – <cache index width> = 28 – 12 = 16 bits

Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits

Cache block size = <valid size>+<tag size>+<block data size> = 1 bit + 16 bits + 128 bits = 145 bits

Total size = <Cache word size> <Cache block size> = 212 words 145 bits = 593920 bits / (1024*8) = 72.5 Kbyte = 72.5 KB/64 KB = 1.13 times overhead

Page 570

Page 15: EECS 322 Computer Architecture

How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4 - one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?

Cache index width = log2 words= log2 216/(216) = log2 211 words = 11 bits

Tag size = <block address width> – <cache index width> = 28 – 11 = 17 bits

Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits (same!)

Cache Set size = <valid size>+<tag size>+<block data size> = 1 bit + 2(17 bits + 128 bits) = 311 bits

Total size = <Cache word size> <Cache block size> = 211 words 311 bits = 636928 bits / (1024*8) = 77.75 Kbyte = 72.5 KB/64 KB = 1.21 times overhead

Page 570Bits in a 2-Way Set Associative Cache

Page 16: EECS 322 Computer Architecture

How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4 - one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?

Cache index width = log2 words= log2 216/(416) = log2 210 words = 10 bits

Tag size = <block address width> – <cache index width> = 28 – 10 = 18 bits

Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits (same!)

Cache Set size = <valid size>+<tag size>+<block data size> = 1 bit + 4(18 bits + 128 bits) = 585 bits

Total size = <Cache word size> <Cache block size> = 210 words 585 bits = 599040 bits / (1024*8) = 73.125 Kbyte = 73.125 KB/64 KB = 1.14 times overhead

Page 570Bits in a 4-Way Set Associative Cache

Page 17: EECS 322 Computer Architecture

How many total bits are required for a cachewith 64KB (= 216 KiloBytes) of dataand 4-one word (=4*4=16 bytes =4*32=128 bits) blocksassuming a 32 bit byte memory address?

Cache index width = 0

Tag size = <block address width> – <cache index width> = 28 – 0 = 28 bits

Block address width = <byte address width> – log2 blocksize = 32 – 4 = 28 bits (same!)

Cache Set size = <valid size>+<tag size>+<block data size> = 1 bit + 28 bits + 128 bits = 157 bits

Total size = <Cache word size> <Cache block size> = 212 words 157 bits = 643072 bits / (1024*8) = 78.5 Kbyte = 78.5 KB/64 KB = 1.23 times overhead

Page 570Bits in a Fully Associative Cache

Page 18: EECS 322 Computer Architecture

Page 570Bits in 64 KB Data Cache Summary

Temporial Direct Mapped

Spatial Direct Mapped

2 - Way Set Associative

4 - Way Set Associative

Fully Associative

Cache Type

14 bits

12 bits

11 bits

10 bits

0 bits

Index Size

16 bits

16 bits

17 bits

18 bits

28 bits

TagSize

49 bits

145 bits

311 bits

585 bits

157 bits

Set Size

1.5%

1.13%

1.21%

1.14%

1.23%

Over-head

MissRate

5.4%

1.9%

1.5%

1.5%

Chip Area

Page 19: EECS 322 Computer Architecture

• Make reading multiple words easier by using banks of memory

• It can get a lot more complicated...

Designing the Memory System

CPU

Cache

Bus

Memory

a. One-word-wide memory organization

CPU

Bus

b. Wide memory organization

Memory

Multiplexor

Cache

CPU

Cache

Bus

Memorybank 1

Memorybank 2

Memorybank 3

Memorybank 0

c. Interleaved memory organization

Figure 7.13

Page 20: EECS 322 Computer Architecture

Memory organizations Figure 7.13

Wide memory organizationAdvantage

Fastest: 0.94 bytes/clock transfer rateDisadvantage

Wider bus and increase in cache access time

Interleave memory organizationAdvantage

Better: 0.80 bytes/clock transfer rateBanks are valuable on writes: independently

Disadvantagemore complex bus hardware

One word wide memory organizationAdvantage

Easy to implement, low hardware overheadDisadvantage

Slow: 0.25 bytes/clock transfer rate

Chip Area Speed

Page 21: EECS 322 Computer Architecture

Decreasing miss penalty with multilevel caches

Suppose we have a processor withCPI = 1.0Clock Rate = 500 Mhz = 2 nsL1 Cache Miss rate = 5%DRAM = 200 ns

How mach faster will the machine will be if we add aL2 Cache = 20 ns (hit time = miss penalty)L1 Cache Miss rate = 2%

CyclesClock100CycleClockperns2

ns200PenaltyMissMtoL

CyclesClock10CycleClockperns2

ns20PenaltyMissL2toL1

Page 576

Page 22: EECS 322 Computer Architecture

Decreasing miss penalty with multilevel caches

The effective CPI with only L1 to M cache

Total CPI = Base CPI + (Memory Stall cycles per Instruction)

= 1.0 + 5% 100ns = 6.0

1.73.5

6.0SpeedupCPU

The effective CPI with L1 and L2 caches

Total CPI = Base CPI + (L1 to L2 penalty) + (L2 to M penalty)

= 1.0 + (5% 10ns) + (2% 100ns) = 3.5

Page 576

Page 23: EECS 322 Computer Architecture

Virtual Memory

• Main memory can act as a cache for the secondary storage (disk) Advantages:– illusion of having more physical memory– program relocation – protection

Physical addresses

Disk addresses

Virtual addresses

Address translation

Figure 7.20

Page 24: EECS 322 Computer Architecture

Pages: virtual memory blocks

3 2 1 011 10 9 815 14 13 1231 30 29 28 27

Page offsetVirtual page number

Virtual address

3 2 1 011 10 9 815 14 13 1229 28 27

Page offsetPhysical page number

Physical address

Translation

Figure 7.21

The automatic transmission of a car: Hardware/Software does the shifting

The automatic transmission of a car: Hardware/Software does the shifting

Page 25: EECS 322 Computer Architecture

Page Tables

Physical memory

Disk storage

Valid

1

1

1

1

0

1

1

0

1

1

0

1

Page table

Virtual pagenumber

Physical page ordisk address

Figure 7.23

Page 26: EECS 322 Computer Architecture

DiskDisk

Virtual Example: 32 Byte pages

lw $1,32($0)

lw $2,40($0)lw $2, 00101 0 00($0)C-MissC-Miss

C-MissC-Miss

Index Valid Tag Data

0 N

1 NY 00100 Memory[000100 0 00]

lw $2, 00100 0 00($0)

Page Index Valid Physical Page number

0 N

1 N

... NY 111

001 00000

Virtual Page Number Page offset

MemoryMemory

Y 00101 Memory[000101 0 00]

P-FaultP-Fault

P- Hit!P- Hit!

001 01000

Page 27: EECS 322 Computer Architecture

Page Table Size

Suppose we have32 bit virtual address (=232 bytes) 4096 bytes per page (=212 bytes)4 bytes per page table entry (=22 bytes)

What is the total page table size?

entries22

2entriestablepageofNumber 20

12

32

MB4bytes2bytes2entries2tablepageofSize 22220

4 Megabytes just for page tables!! Too Big

Page 28: EECS 322 Computer Architecture

Page Tables

Page offsetVirtual page number

Virtual address

Page offsetPhysical page number

Physical address

Physical page numberValid

If 0 then page is notpresent in memory

Page table register

Page table

20 12

18

31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

Figure 7.22

Page 29: EECS 322 Computer Architecture

Making Address Translation Fast

• A cache for address translations: translation lookaside buffer

Valid

1

1

1

1

0

1

1

0

1

1

0

1

Page table

Physical pageaddressValid

TLB

1

1

1

1

0

1

TagVirtual page

number

Physical pageor disk address

Physical memory

Disk storage

Page 30: EECS 322 Computer Architecture

TLBs and caches

Yes

Deliver datato the CPU

Write?

Try to read datafrom cache

Write data into cache,update the tag, and put

the data and the addressinto the write buffer

Cache hit?Cache miss stall

TLB hit?

TLB access

Virtual address

TLB missexception

No

YesNo

YesNo

Write accessbit on?

YesNo

Write protectionexception

Physical address