Top Banner
EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British Columbia EECE 476 © 2005 Guy Lemieux
22

EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

EECE476: Computer Architecture

Lecture 25:Chapter 7, Memory and Caches

The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux

Page 2: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Motivation for CachesCPU vs. Memory Performance Gap

Memory is getting slower relative to CPU speeds (log scale!)

Goal: Make memory faster!

Page 3: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Importance of Cache MemoryFast CPUs are Mostly Cache!

64 kBData

Cache

64 kBInstr.Cache

Load/Store

ExecutionUnit

Fetch ScanAlign

Micro-code

BusUnit

HyperTransport DDR Memory Interface

1 MB UnifiedInstruction/DataLevel 2 Cache

Floating-Point Unit

Memory Controller

Total Area: 193 mm2

42% 1MB L2 Cache

4% Instr. Cache

4% Data Cache

(50% is cache)

13% HyperTransport

10% DDR Memory

(23% is I/O)

6% Fetch/Scan/etc

4% Mem Controller

4% FPU

3% Exec Units

2% Bus Unit

(only 20% is actually CPU!)

Page 4: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Main Memory

• What to use for Main Memory?– SRAM– DRAM– SDRAM– RAMBUS

– FLASH

– Disk

Page 5: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology• SRAM: Static RAM

– 6 transistors per bit• Expensive

– Transistors configured as 2 inverters in a loop• Stable, positive feedback holds value strongly (static)• Actively drive bit value along bitlines to sense amps

– Fast: can tune transistors and sense amps• Used to make cache memory!

• DRAM: Dynamic RAM– 1 transistor per bit

• Inexpensive– Transistor holds charge (C)

• Loses charge/value when driving bitline (dynamic)• Transistor leaks charge over time (dynamic)• Must recharge transistor periodically (including after a data-read)

– Slow• Transistors tiny, hold small charge• Sense amps must detect tiny change in voltage

(row select) word

bit bit

10

0 1

(row select) word

bit

C

Page 6: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology• SDRAM: Synchronous DRAM (not Static DRAM!)

– New, around 1995-1996

– Like DRAM, but pipelined (needs clock!)• Pipeline register on Address inputs• Pipeline register on Data outputs• Sometimes additional registers in-between!

– Multiple clock cycles to get data• Latency: CL=2, 2.5, or 3 cycles

– SDR vs DDR• Single data word, transfers once per clock cycle (SDR)• Double data word, twice per clock cycle (DDR, both edges)

– Clock rate• DDR: PC266, PC333, PC400 is 133MHz, 167MHz, 200MHz• SDR: PC100, PC133 is 100MHz, 133MHz

Page 7: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology

• RAMBUS– New, around SDRAM time– More complex than SDR, DDR SDRAM

– Faster clock rates (800MHz!)• Fancy signaling on circuit board• Narrow data width (16 bits)• Difficult to get working• Must license technology from Rambus Inc.• Rambus lawyers are costly, $$

– Longer latency (eg, ten cycles)– Overall memory speed higher (not by a lot!)

– Only used on high-end server PCs (too costly)

Page 8: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology

• FLASH Memory– Different beast: non-volatile

• Keeps power even when turned off!

– 1 transistor per bit (sometimes 0.5)• Very Cheap

– Operation• Trap charge in floating (disconnected) gate of transistor (tunneling)• Floating-gate keeps transistor turned on or off• Not leaky like DRAM

– Not suitable for main memory• Physically wears out with use (100,000 writes)• Writes are very slow, reads are slow (70ns)

Page 9: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology Trends

• Semiconductor manufacturing processes– SRAM & logic compatible– DRAM & logic incompatible– FLASH memory = logic process + extra masks + some tweaking

• Impact on CPU– On-chip SRAM feasible

• Can get FAST memory! (but at high cost)

– On-chip DRAM possible, but unlikely• Cannot get BIG memory

– On-chip FLASH may be feasible• Can store some non-volatile information

Page 10: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology Trends

Memory is getting slower relative to CPU speeds (log scale!)

Page 11: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Recent Impact of Memory Speed

• 1996– 100 MHz CPU clock rate (10ns)– 80 ns Memory Access Time– Memory read: 8 CPU clock cycles– Add 8 pipeline stages just to access data memory?

• DF+DS+DT+DF+DF+DS+DS+DE ?

• 2003– 3 GHz CPU clock rate (0.33ns = 330ps)– PC400 DDR (200MHz or 5ns)– Memory read: 5ns x 2 cycles = 10ns

= 30 CPU clock cycles– Add 30 pipeline stages? Impossible to keep up!

Page 12: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Technology (1997)

Memory Technology

Access Time Cost/MB

SRAM 5-25 ns $100-$250

SDRAM 50-60 ns$10-$20

(today: cheaper than DRAM)

DRAM 60-120 ns $5-$10

Disk 10-20 million ns $0.10-$0.20

Page 13: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Cache Memory• Problem:

– SRAM fast, but costly– DRAM cheap, but slow

• Solution:– Cache

• Small SRAM memory• Holds frequently-used data• Logically, insert between CPU and main memory

– Memory Hierarchy is born• Generally, use cheaper/bigger/slower memory as

you move farther away from CPU

• Question: How to access cache SRAM?

Page 14: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory HierarchyCPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

MultipleLevels ofMemory

Page 15: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Memory Hierarchy

SRAM

CPURegisters

SDRAM

Cost ($/bit)

Smallest

Biggest

Highest

Lowest

Fastest

SlowestDisk

and/orTape

SizeSpeed

Page 16: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Accessing a Cache• Cache: hide in French, a safe place to hide things

• Importance concept: transparent to user/software!– Wish to speed up ALL programs

• Do not want to rewrite old programs• Do not want to write programs to specifically use the cache

• How to hide? Need general cache management policy– CPU manages cache itself (NOT managed by software)

– Load data• If data is in cache

– retrieve from cache• Else, retrieve from main memory

– put a copy in cache

– Store data (write-through, no-alloc-on-write policy)• If data is in cache, write to that cache location and memory• Else write data to memory

Page 17: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Using a Cache

• Problems– Finding existing location for data in cache?

– Finding new location for new cache data?

– Cache is full?• Finding a location that is no longer needed• Must evict data presently in cache

• Various Solutions– Different styles of caches!

Page 18: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Associative Cache

• Choosing a location– Associative cache is very flexible– New data: any– Find existing data: must search all– Difficult, but not impossible

• CAM: content-addressable memory– Searches all locations (addresses) in “1 cycle”– Reports “match” location– Match location holds data

• Cache is full?– Must throw out old data– Need replacement or eviction policy

Page 19: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Associative Cache:Replacement Policies

• Associative Cache is full? Possible replacement policies:– Ideal

• Non-causal: cannot predict what CPU will do in the future!• CPU architects use simulation to find performance of ideal cache

– Least Frequently Accessed• Count # of accesses, choose the one accessed the least• Problem: you will always choose to evict NEW DATA

– Least Recently Used (LRU)• Timestamp every time you use data in cache• Location with oldest timestamp is evicted

– Pseudo-LRU• Periodically “age” contents of cache• Flag data every time it is used• Location with “aged” status is evicted

RANDOM works too!

(LRU or PseudoLRU is slightly better, so is commonly used)

Page 20: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Direct-mapped Cache

• Choosing a location– Much more restrictive than associative cache– New data: one eligible location– Find existing data: search one location only– Location: use lower bits of data address– Easy to use SRAM, fast access!

• Cache is full? Replacement is easy…– Only one location– Must evict old data

Page 21: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Direct-mapped Cache

00001 00101 01001 01101 10001 10101 11001 11101

00

00

01

01

00

11

100

101

110

11

1

Cache Location

Memory Address

Each address inmemory maps toonly one locationin a direct-mappedcache

Lowest 3 bitsof addressdetermineslocation

Page 22: EECE476: Computer Architecture Lecture 25: Chapter 7, Memory and Caches The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Direct-mapped Cache

2 0 1 0

B y t e

o f f s e t

0

1

2

1 0 2 1

1 0 2 2

1 0 2 3

2 0 3 2

3 1 3 0 1 3 1 2 1 1 2 1 0Memory Address

DataHit

DataTagVIndex

TagIndex

Cache Size:1024 locations* 4 data bytes each= 4kB cache

Overhead:1024 locations* 21 bits (Tag + V)= 2.626kB tag bits

(more than 50% overhead!)