Top Banner
Computer Organization: A Programmer's Perspective The Memory Hierarchy Gal A. Kaminka [email protected]
34

Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Mar 10, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization:A Programmer's Perspective

The Memory Hierarchy

Gal A. [email protected]

Page 2: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 2

The CPU-Memory GapThe gap widens between DRAM, disk, and CPU speeds.

1985.0 1990.0 1995.0 2000.0 2003.0 2005.0 2010.0 2015.00.0

0.1

1.0

10.0

100.0

1,000.0

10,000.0

100,000.0

1,000,000.0

10,000,000.0

100,000,000.0

Disk seek timeSSD access timeDRAM access timeSRAM access timeCPU cycle timeEffective CPU cycle time

Year

Tim

e (n

s)

DRAM

CPU

SSD

Disk

Page 3: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 3

Key Question:How to have fast, cheap memory?

Page 4: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 4

Principle of LocalityPrinciple of LocalityPrograms tend to reuse data and instructions: near those they have used recently Or same as those they have used recently

Temporal locality: Recently referenced items are likely to be referenced in the near future.

Spatial locality: Items with nearby addresses tend to be referenced close together in time.

Page 5: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 5

Example

Locality Example:Data

Reference array elements in succession (stride-1 reference pattern):Reference sum each iteration:

InstructionsReference instructions in sequence:Cycle through loop repeatedly:

sum = 0;for (i = 0; i < n; i++)

sum += a[i];return sum;

Spatial locality

Spatial localityTemporal locality

Temporal locality

Page 6: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 6

Locality Qualitative EstimationLocality Qualitative Estimation

Question: Does this function have good locality?

int sumarray(int a[M][M]){ int i, j, sum = 0;

for (i = 0; i < M; i++) for (j = 0; j < M; j++) sum += a[i][j]; return sum}

Page 7: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 7

Locality ExampleLocality Example

Question: Does this function have good locality?

int sumarray(int a[M][M]){ int i, j, sum = 0;

for (j = 0; j < M; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum}

Important Skill for Professional Programmer:Be able to look at code, get a qualitative sense of its locality

Page 8: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8

Memory HierarchiesMemory Hierarchies Some fundamental and enduring properties of hardware

and software: Fast storage technologies cost more per byte and have less

capacity. The gap between CPU and main memory speed is widening. Well-written programs tend to exhibit good locality.

These properties complement each other beautifully.

They suggest an approach for organizing memory and storage systems known as a memory hierarchy.

Page 9: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 9

Example Memory Hierarchy

Regs

L1 cache (SRAM)

Main memory(DRAM)

Local secondary storage(local disks)

Larger, slower, and cheaper (per byte)storagedevices

Remote secondary storage(e.g., Web servers)

Local disks hold files retrieved from disks on remote servers

L2 cache (SRAM)

L1 cache holds cache lines retrieved from the L2 cache.

CPU registers hold words retrieved from the L1 cache.

L2 cache holds cache lines retrieved from L3 cache

L0:

L1:

L2:

L3:

L4:

L5:

Smaller,faster,and costlier(per byte)storage devices

L3 cache (SRAM)

L3 cache holds cache lines retrieved from main memory.

L6:

Main memory holds disk blocks retrieved from local disks.

Page 10: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 10

CachesCaches Cache: A smaller, faster storage that acts as a staging area

for a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy:

For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.

Why do memory hierarchies work? Programs tend to access the data at level k more often than they

access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and

cheaper per bit. Net effect: A large pool of memory that costs as much as the

cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

Page 11: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 11

Caching in a Memory HierarchyCaching in a Memory Hierarchy

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.

Data is copied betweenlevels in block-sized transfer units

8 9 14 3Smaller, faster, more expensivedevice at level k caches a subset of the blocks from level k+1

Level k:

Level k+1: 4

4

4 10

10

10

Page 12: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 12

Request14

Request12

General Caching ConceptsGeneral Caching ConceptsProgram needs object d, which is stored in some

block b.

Cache hitProgram finds b in the cache at level k. E.g.,

block 14.

Cache missb is not at level k, so level k cache must fetch it

from level k+1. E.g., block 12.If level k cache is full, then some current block

must be replaced (evicted). Which one is the “victim”? Placement policy: where can the new block go? E.g.,

b mod 4Replacement policy: which block should be evicted?

E.g., LRU

9 3

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Level k:

Level k+1:

1414

12

14

4*

4*12

12

0 1 2 3

Request12

4*4*12

Page 13: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 13

General Caching ConceptsGeneral Caching ConceptsTypes of cache misses:

Cold (compulsary) miss Cold misses occur because the cache is empty.

Conflict miss Most caches limit blocks at level k+1 to a small subset (sometimes a

singleton) of the block positions at level k. e.g. Block i at level k+1 is placed in block (i mod 4) at level k+1. Conflict misses occur when the level k cache is large enough, but

multiple data objects all map to the same level k block. E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.

Capacity miss Occurs when the set of active cache blocks (working set) is larger than

the cache.

Page 14: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 14

Cache MemoriesCache MemoriesCache memories are small, fast SRAM-based memories

managed automatically in hardware. Hold frequently accessed blocks of main memory

CPU looks first for data in L1, then in L2, then in main memory.

Typical bus structure:

mainmemory

I/Obridgebus interfaceL2 cache

ALU

register file

CPU chip

cache bus system bus memory bus

L1 cache

Page 15: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 15

Page 16: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 16

General Organization of a Cache MemoryGeneral Organization of a Cache Memory

• • • B–110

• • • B–110

valid

valid

tag

tagset 0:

B = 2b bytesper cache block

E lines per set

S = 2s sets

t tag bitsper line

1 valid bitper line

Cache size: C = B x E x S data bytes

• • •

• • • B–110

• • • B–110

valid

valid

tag

tagset 1: • • •

• • • B–110

• • • B–110

valid

valid

tag

tagset S-1: • • •

• • •

Cache is an arrayof sets.

Each set containsone or more lines.

Each line holds ablock of data.

Page 17: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 17

Addressing CachesAddressing Caches

t bits s bits b bits

0m-1

<tag> <set index> <block offset>

Address A:

• • • B–110

• • • B–110

v

v

tag

tagset 0: • • •

• • • B–110

• • • B–110

v

v

tag

tagset 1: • • •

• • • B–110

• • • B–110

v

v

tag

tagset S-1: • • •

• • •

The word at address A is in the cache ifthe tag bits in one of the <valid> lines in set <set index> match <tag>.

The word contents begin at offset <block offset> bytes from the beginning of the block.

Page 18: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 18

Direct-Mapped CacheDirect-Mapped Cache

Simplest kind of cache

Characterized by exactly one line per set (E=1).

valid

valid

valid

tag

tag

tag

• • •

set 0:

set 1:

set S-1:

E=1 lines per setcache block

cache block

cache block

Page 19: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 19

Accessing Direct-Mapped CachesAccessing Direct-Mapped Caches

Set selectionUse the set index bits to determine the set of interest.

valid

valid

valid

tag

tag

tag

• • •

set 0:

set 1:

set S-1:t bits s bits

0 0 0 0 10m-1

b bits

tag set index block offset

selected set

cache block

cache block

cache block

Page 20: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 20

Accessing Direct-Mapped CachesAccessing Direct-Mapped CachesLine matching and word selection

Line matching: Find a valid line in the selected set with a matching tag

Word selection: Then extract the word (here 32bits made from 4 bytes B

0..B

3)

1

t bits s bits100i0110

0m-1

b bits

tag set index block offset

selected set (i):

(3) If (1) and (2), then cache hit,

and block offset selects

starting byte.

=1? (1) The valid bit must be set

= ?(2) The tag bits in the cacheline must match the

tag bits in the address

0110 B3B0 B1 B2

30 1 2 74 5 6

Page 21: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 21

Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set

Address trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]

xt=1 s=2 b=1

xx x

1 0 m[1] m[0]

v tag data0 [00002] (cold miss)

(1)

Page 22: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 22

Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set

Address trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]

xt=1 s=2 b=1

xx x

1 0 m[1] m[0]

v tag data0 [00002] (cold miss)

(1)0 M[0] M[1]1

Page 23: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 23

Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set

Address trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]

xt=1 s=2 b=1

xx x

1 0 m[1] m[0]

v tag data1 [00012] (hit!)

(2)0 M[0] M[1]1

Page 24: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 24

Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set

Address trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]

xt=1 s=2 b=1

xx x

1 0 m[1] m[0]

v tag data

(1)1 0 m[1] m[0]

v tag data

1 1 m[13] m[12]

13 [11012] (cold miss)

(3)1 M[12] M[13]1

0 M[0] M[1]10 M[0] M[1]1

Page 25: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 25

Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set

Address trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]x

t=1 s=2 b=1xx x

1 0 m[1] m[0]

v tag data

(1)1 0 m[1] m[0]

v tag data

1 1 m[13] m[12]

13 [11012] (cold miss)

(3)

1 1 m[9] m[8]

v tag data8 [10002] (conflict miss)

(4)

0 M[0] M[1]1

1 M[12] M[13]1

1 M[8] M[9]1

1 M[12] M[13]1

0 M[0] M[1]1

Page 26: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 26

Direct-Mapped Cache SimulationDirect-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block, S=4 sets, E=1 entry/set

Address trace (reads):0 [00002], 1 [00012], 13 [11012], 8 [10002], 0 [00002]

xt=1 s=2 b=1

xx x

1 0 m[1] m[0]

v tag data

(1)1 0 m[1] m[0]

v tag data

1 1 m[13] m[12]

13 [11012] (miss)

(3)

1 1 m[9] m[8]

v tag data8 [10002] (conflict miss)

(4)1 0 m[1] m[0]

v tag data

1 1 m[13] m[12]

0 [00002] (conflict miss)

(5)

0 M[0] M[1]1

1 M[12] M[13]1

1 M[8] M[9]1

1 M[12] M[13]1

0 M[0] M[1]1

1 M[12] M[13]1

0 M[0] M[1]1

Page 27: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 27

Why Use Middle Bits as Index?Why Use Middle Bits as Index?

High-Order Bit IndexingAdjacent memory lines would map to

same cache entryPoor use of spatial locality

Middle-Order Bit IndexingConsecutive memory lines map to

different cache linesCan hold C-byte region of address space

in cache at one time

4-line Cache High-OrderBit Indexing

Middle-OrderBit Indexing

00

01

10

11

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

1011

1100

1101

1110

1111

Page 28: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 28

Set Associative CachesSet Associative Caches

Characterized by more than one line per set

valid tagset 0: E=2 lines per set

set 1:

set S-1:

• • •

cache block

valid tag cache block

valid tag cache block

valid tag cache block

valid tag cache block

valid tag cache block

Page 29: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 29

Accessing Set Associative CachesAccessing Set Associative Caches

Set selectionidentical to direct-mapped cache

valid

valid

tag

tagset 0:

valid

valid

tag

tagset 1:

valid

valid

tag

tagset S-1:

• • •

t bits s bits0 0 0 0 1

0m-1

b bits

tag set index block offset

Selected set

cache block

cache block

cache block

cache block

cache block

cache block

Page 30: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 30

Accessing Set Associative CachesAccessing Set Associative Caches

Line matching and word selectionmust compare the tag in each valid line in the selected set.

1 0110 b3b0 b1 b2

1 1001

t bits s bits100i0110

0m-1

b bits

tag set index block offset

selected set (i):

=1? (1) The valid bit must be set.

= ?(2) The tag bits in one of the cache lines must

match the tag bits inthe address

(3) If (1) and (2), then cache hit, and

block offset selects starting byte. The four bytes

here make up a full 32bit word

30 1 2 74 5 6

Page 31: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 31

What about writes?

Multiple copies of data exist: L1, L2, L3, Main Memory, Disk

What to do on a write-hit? Write-through (write immediately to one level down)

Write-back (defer write to one level down until replacement of line) Need a dirty bit (line different from memory or not)

What to do on a write-miss? Write-allocate (load into cache, update line in cache)

Good if more writes to the location follow

No-write-allocate (writes straight to one level down, does not load into cache)

Typical Write-through + No-write-allocate

Write-back + Write-allocate

Page 32: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 32

Processor ChipProcessor Chip

Intel Pentium Cache HierarchyIntel Pentium Cache Hierarchy

L1 Data1 cycle latency

16 KB4-way assoc

Write-through32B lines

L1 Instruction16 KB, 4-way

32B lines

Regs.L2 Unified

128KB--2 MB4-way assocWrite-back

Write allocate32B lines

L2 Unified128KB--2 MB4-way assocWrite-back

Write allocate32B lines

MainMemory

Up to 4GB

MainMemory

Up to 4GB

Page 33: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 33

Intel Core i7 Cache Hierarchy

Regs

L1 d-cache

L1 i-cache

L2 unified cache

Core 0

Regs

L1 d-cache

L1 i-cache

L2 unified cache

Core 3

L3 unified cache(shared by all cores)

Main memory

Processor package

L1 i-cache and d-cache:32 KB, 8-way, Access: 4 cycles

L2 unified cache: 256 KB, 8-way, Access: 10 cycles

L3 unified cache:8 MB, 16-way,Access: 40-75 cycles

Block size: 64 bytes for all caches.

Page 34: Computer Organization: A Programmer's Perspective · Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8 Memory Hierarchies Some fundamental

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 34