Top Banner
15-447 Computer Architecture Fall 2007 © November 12th, 2007 Majd F. Sakr [email protected] www.qatar.cmu.edu/~msakr/15447-f07/ CS-447– Computer Architecture M,W 10-11:20am Lecture 20 Cache Memories
58

15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr [email protected] msakr/15447-f07/ CS-447– Computer Architecture.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

November 12th, 2007

Majd F. Sakr

[email protected]

www.qatar.cmu.edu/~msakr/15447-f07/

CS-447– Computer Architecture

M,W 10-11:20am

Lecture 20Cache Memories

Page 2: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Processor-DRAM Memory Gap (latency)

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10 yrs)1

10

100

1000198

0198

1 198

3198

4198

5 198

6198

7198

8198

9199

0199

1 199

2199

3199

4199

5199

6199

7199

8 199

9200

0

DRAM

CPU

198

2Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

Page 3: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Locality

°A principle that makes having a memory hierarchy a good idea

° If an item is referenced,

temporal locality: it will tend to be referenced again soon

spatial locality: nearby items will tend to be referenced soon.

Page 4: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

A View of the Memory Hierarchy

Regs

L2 Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Upper Level

Lower Level

Faster

Larger

CacheBlocks

Page 5: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Our initial focus: two levels (upper, lower)

block: minimum unit of data

hit: data requested is in the upper level

miss: data requested is not in the upper level

Why Does Code Have Locality?

Page 6: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Design° How do we organize cache?

° Where does each memory address map to?(Remember that cache is subset of memory, so

multiple memory addresses map to the same cache location.)

° How do we know which elements are in cache?

° How do we quickly locate them?

Page 7: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff (1/3)°Benefits of Larger Block Size

• Spatial Locality: if we access a given word, we’re likely to access other nearby words soon

• Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well

• Works nicely in sequential array accesses too

Page 8: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff (2/3)°Drawbacks of Larger Block Size

• Larger block size means larger miss penalty

- on a miss, takes longer time to load a new block from next level

• If block size is too big relative to cache size, then there are too few blocks

- Result: miss rate goes up

° In general, minimize Average Access Time

= Hit Time x Hit Rate + Miss Penalty x Miss Rate

Page 9: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff (3/3)°Hit Time = time to find and retrieve data from current level cache

°Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy)

°Hit Rate = % of requests that are found in current level cache

°Miss Rate = 1 - Hit Rate

Page 10: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff ConclusionsMissPenalty

Block Size

Increased Miss Penalty& Miss Rate

AverageAccess

Time

Block Size

Exploits Spatial Locality

Fewer blocks: compromisestemporal locality

MissRate

Block Size

Page 11: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Design° How do we organize cache?

° Where does each memory address map to?(Remember that cache is subset of memory, so

multiple memory addresses map to the same cache location.)

° How do we know which elements are in cache?

° How do we quickly locate them?

Page 12: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

010

011

100

101

110

111

°Mapping: address is modulo the number of blocks in the cache

Page 13: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache (1/2)° In a direct-mapped cache, each memory address is associated with one possible block within the cache

• Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

• Block is the unit of transfer between cache and memory

Page 14: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache (2/2)

° Cache Location 0 can be occupied by data from:

• Memory location 0, 4, 8, ...

• 4 blocks => any memory location that is multiple of 4

MemoryMemory Address

0123456789ABCDEF

4 Byte Direct Mapped Cache

Cache Index

0123

Page 15: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Issues with Direct-Mapped° Since multiple memory addresses map to same

cache index, how do we tell which one is in there?

° What if we have a block size > 1 byte?

° Answer: divide memory address into three fields

ttttttttttttttttt iiiiiiiiii oooo

tag index byteto check to offsetif have select withincorrect block block block

Page 16: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Terminology° All fields are read as unsigned integers.

° Index: specifies the cache index (which “row” of the cache we should look in)

° Offset: once we’ve found correct block, specifies which byte within the block we want

° Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

Page 17: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Caching Terminology° When we try to read memory,

3 things can happen:

1. cache hit: cache block is valid and contains proper address, so read desired word

2. cache miss: nothing in cache in appropriate block, so fetch from memory

3. cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy)

Page 18: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Example (1/3)° Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks

° Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture

° Offset

• need to specify correct byte within a block

• block contains 4 words

= 16 bytes

= 24 bytes

• need 4 bits to specify correct byte

Page 19: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Example (2/3)° Index: (~index into an “array of blocks”)

• need to specify correct row in cache

• cache contains 16 KB = 214 bytes

• block contains 24 bytes (4 words)

• # blocks/cache

= bytes/cachebytes/block

= 214 bytes/cache 24 bytes/block

= 210 blocks/cache

• need 10 bits to specify this many rows

Page 20: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Example (3/3)° Tag: use remaining bits as tag

• tag length = addr length – offset - index = 32 - 4 - 10 bits

= 18 bits

• so tag is leftmost 18 bits of memory address

° Why not full 32 bit address as tag?

• All bytes within block need same address (4b)

• Index must be same for every address within a block, so it’s redundant in tag check, thus can leave off to save memory (here 10 bits)

Page 21: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct Mapped Cache (for MIPS)Address (showing bit positions)

Data

Hit

Data

Tag

Valid Tag

3220

Index

012

102310221021

=

Index

20 10

Byteoffset

31 30 13 12 11 2 1 0

Page 22: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Read hits• this is what we want!

°Read misses• stall the CPU, fetch block from memory, deliver to cache, restart

Hits vs. Misses

Page 23: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Write hits:• can replace data in cache and memory (write-through)

• write the data only into the cache (write-back the cache later)

°Write misses:• read the entire block into the cache, then write the word

Hits vs. Misses

Page 24: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Make reading multiple words easier by using banks of memory

Hardware IssuesCPU

Cache

Memory

Bus

One-word-widememory organization

a.

b. Wide memory organization

CPU

Cache

Memory

Bus

Multiplexor

CPU

Cache

Bus

Memory

bank 0

Memory

bank 1

Memory

bank 2

Memory

bank 3

c. Interleaved memory organization

Page 25: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

° Increasing the block size tends to decrease miss rate:

Performance

1 KB

8 KB

16 KB

64 KB

256 KB

256

40%

35%

30%

25%

20%

15%

10%

5%

0%

Miss

rate

64164

Block size (bytes)

Page 26: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance

ProgramBlock size in

wordsInstruction miss rate

Data miss rate

Effective combined miss rate

gcc 1 6.1% 2.1% 5.4%4 2.0% 1.7% 1.9%

spice 1 1.2% 1.3% 1.2%4 0.3% 0.6% 0.4%

Use split caches because there is more spatial locality in code:

Page 27: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance

°Simplified model:

execution time = (execution cycles + stall cycles) cycle time

stall cycles = # of instructions miss ratio miss penalty

°Two ways of improving performance:• decreasing the miss ratio• decreasing the miss penalty

What happens if we increase block size?

Page 28: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Complexities°Not always easy to understand implications of caches:

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

200

400

600

800

1000

1200

64 128 256 512 1024 2048 4096

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

400

800

1200

1600

2000

64 128 256 512 1024 2048 4096

Theoretical behavior of Radix sort vs. Quicksort

Observed behavior of Radix sort vs. Quicksort

Page 29: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Complexities

° Here is why:

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

1

2

3

4

5

64 128 256 512 1024 2048 4096

Page 30: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Complexities

°Memory system performance is often critical factor

• multilevel caches, pipelined processors, make it harder to predict outcomes

• Compiler optimizations to increase locality sometimes hurt ILP

°Difficult to predict best algorithm: need experimental data

Page 31: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Summary

°The :

Page 32: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Our initial focus: two levels (upper, lower)

block: minimum unit of data

hit: data requested is in the upper level

miss: data requested is not in the upper level

Why Does Code Have Locality?

Page 33: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Two issues:• How do we know if a data item is in the cache?

• If it is, how do we find it?

°Our first example:• block size is one word of data

• "direct mapped"

For each item of data at the lower level, there is exactly one location in the cache where it might be.

e.g., lots of items at the lower level share locations in the upper level

Cache

Page 34: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Design° How do we organize cache?

° Where does each memory address map to?(Remember that cache is subset of memory, so

multiple memory addresses map to the same cache location.)

° How do we know which elements are in cache?

° How do we quickly locate them?

Page 35: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff (1/3)°Benefits of Larger Block Size

• Spatial Locality: if we access a given word, we’re likely to access other nearby words soon

• Very applicable with Stored-Program Concept: if we execute a given instruction, it’s likely that we’ll execute the next few as well

• Works nicely in sequential array accesses too

Page 36: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff (2/3)°Drawbacks of Larger Block Size

• Larger block size means larger miss penalty

- on a miss, takes longer time to load a new block from next level

• If block size is too big relative to cache size, then there are too few blocks

- Result: miss rate goes up

° In general, minimize Average Access Time

= Hit Time x Hit Rate + Miss Penalty x Miss Rate

Page 37: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff (3/3)°Hit Time = time to find and retrieve data from current level cache

°Miss Penalty = average time to retrieve data on a current level miss (includes the possibility of misses on successive levels of memory hierarchy)

°Hit Rate = % of requests that are found in current level cache

°Miss Rate = 1 - Hit Rate

Page 38: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Block Size Tradeoff ConclusionsMissPenalty

Block Size

Increased Miss Penalty& Miss Rate

AverageAccess

Time

Block Size

Exploits Spatial Locality

Fewer blocks: compromisestemporal locality

MissRate

Block Size

Page 39: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Caching Terminology° When we try to read memory,

3 things can happen:

1. cache hit: cache block is valid and contains proper address, so read desired word

2. cache miss: nothing in cache in appropriate block, so fetch from memory

3. cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory (cache always copy)

Page 40: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Design° How do we organize cache?

° Where does each memory address map to?(Remember that cache is subset of memory, so

multiple memory addresses map to the same cache location.)

° How do we know which elements are in cache?

° How do we quickly locate them?

Page 41: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct Mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

000

Cache

Memory

001

010

011

100

101

110

111

°Mapping: address is modulo the number of blocks in the cache

Page 42: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache (1/2)° In a direct-mapped cache, each memory address is associated with one possible block within the cache

• Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

• Block is the unit of transfer between cache and memory

Page 43: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache (2/2)

° Cache Location 0 can be occupied by data from:

• Memory location 0, 4, 8, ...

• 4 blocks => any memory location that is multiple of 4

MemoryMemory Address

0123456789ABCDEF

4 Byte Direct Mapped Cache

Cache Index

0123

Page 44: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Issues with Direct-Mapped° Since multiple memory addresses map to same

cache index, how do we tell which one is in there?

° What if we have a block size > 1 byte?

° Answer: divide memory address into three fields

ttttttttttttttttt iiiiiiiiii oooo

tag index byteto check to offsetif have select withincorrect block block block

Page 45: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Terminology° All fields are read as unsigned integers.

° Index: specifies the cache index (which “row” of the cache we should look in)

° Offset: once we’ve found correct block, specifies which byte within the block we want

° Tag: the remaining bits after offset and index are determined; these are used to distinguish between all the memory addresses that map to the same location

Page 46: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Example (1/3)° Suppose we have a 16KB of data in a direct-mapped cache with 4 word blocks

° Determine the size of the tag, index and offset fields if we’re using a 32-bit architecture

° Offset

• need to specify correct byte within a block

• block contains 4 words

= 16 bytes

= 24 bytes

• need 4 bits to specify correct byte

Page 47: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Example (2/3)° Index: (~index into an “array of blocks”)

• need to specify correct row in cache

• cache contains 16 KB = 214 bytes

• block contains 24 bytes (4 words)

• # blocks/cache

= bytes/cachebytes/block

= 214 bytes/cache 24 bytes/block

= 210 blocks/cache

• need 10 bits to specify this many rows

Page 48: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct-Mapped Cache Example (3/3)° Tag: use remaining bits as tag

• tag length = addr length – offset - index = 32 - 4 - 10 bits

= 18 bits

• so tag is leftmost 18 bits of memory address

° Why not full 32 bit address as tag?

• All bytes within block need same address (4b)

• Index must be same for every address within a block, so it’s redundant in tag check, thus can leave off to save memory (here 10 bits)

Page 49: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Direct Mapped Cache (for MIPS)Address (showing bit positions)

Data

Hit

Data

Tag

Valid Tag

3220

Index

012

102310221021

=

Index

20 10

Byteoffset

31 30 13 12 11 2 1 0

Page 50: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Read hits• this is what we want!

°Read misses• stall the CPU, fetch block from memory, deliver to cache, restart

Hits vs. Misses

Page 51: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Write hits:• can replace data in cache and memory (write-through)

• write the data only into the cache (write-back the cache later)

°Write misses:• read the entire block into the cache, then write the word

Hits vs. Misses

Page 52: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

°Make reading multiple words easier by using banks of memory

Hardware IssuesCPU

Cache

Memory

Bus

One-word-widememory organization

a.

b. Wide memory organization

CPU

Cache

Memory

Bus

Multiplexor

CPU

Cache

Bus

Memory

bank 0

Memory

bank 1

Memory

bank 2

Memory

bank 3

c. Interleaved memory organization

Page 53: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

° Increasing the block size tends to decrease miss rate:

Performance

1 KB

8 KB

16 KB

64 KB

256 KB

256

40%

35%

30%

25%

20%

15%

10%

5%

0%

Miss

rate

64164

Block size (bytes)

Page 54: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance

ProgramBlock size in

wordsInstruction miss rate

Data miss rate

Effective combined miss rate

gcc 1 6.1% 2.1% 5.4%4 2.0% 1.7% 1.9%

spice 1 1.2% 1.3% 1.2%4 0.3% 0.6% 0.4%

Use split caches because there is more spatial locality in code:

Page 55: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Performance

°Simplified model:

execution time = (execution cycles + stall cycles) cycle time

stall cycles = # of instructions miss ratio miss penalty

°Two ways of improving performance:• decreasing the miss ratio• decreasing the miss penalty

What happens if we increase block size?

Page 56: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Complexities°Not always easy to understand implications of caches:

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

200

400

600

800

1000

1200

64 128 256 512 1024 2048 4096

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

400

800

1200

1600

2000

64 128 256 512 1024 2048 4096

Theoretical behavior of Radix sort vs. Quicksort

Observed behavior of Radix sort vs. Quicksort

Page 57: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Complexities

° Here is why:

Radix sort

Quicksort

Size (K items to sort)

04 8 16 32

1

2

3

4

5

64 128 256 512 1024 2048 4096

Page 58: 15-447 Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr msakr@qatar.cmu.edu msakr/15447-f07/ CS-447– Computer Architecture.

15-447 Computer Architecture Fall 2007 ©

Cache Complexities

°Memory system performance is often critical factor

• multilevel caches, pipelined processors, make it harder to predict outcomes

• Compiler optimizations to increase locality sometimes hurt ILP

°Difficult to predict best algorithm: need experimental data