1 Computer Architecture Cache Memory
Jan 04, 2016
1
Computer Architecture
Cache Memory
2
Today is brought to you by cache
• What do we want?– Fast access to data from memory– Large size of memory– Acceptable memory system cost
• Where do we get it?– Use a method to interpose a smaller but faster
memory between the data-path and main memory which holds recently accessed data
3
Cache
• Cache = To conceal or store, as in the earth; hide in a secret place; n. A place for hiding or storing provisions, equipment etc, also the things stored or hidden [F. cacher to hide]
• Cache sounds like cash• Programs usually exhibit locality:
– temporal locality: If an item is referenced, with high probability it will be referenced again
– spatial locality: If an item is referenced, the items near to it have high probability of being referenced
4
Learning Objectives1.Know principle of cache implementation
2.Know the difference between direct, partial set associative and fully associative cache and how they work
3.Know the terms: cache hit, cache miss, word size, block size, row or set size, cache rows, cache tag, cache index, direct mapped cache, partial set associative and fully associative cache.
5
Consider this. It is like caching information
• You are in the library gathering books for an assignment1) The well selected books you have gathered probably
contain material that you had not expected but will likely use
2) You do not collect ALL the books from the library to your desk
3) It is quicker to access information from the book on your desk than to go to stack again
• This is like use of cache principles in computing.
6
Cache Principle
• The memory fetch and store on a simply configured CPU-memory system with no cache has access time dependent on memory access speed. In general for a given technology, for larger memory size, the access time increases.
• Cache is a mechanism that can speed up the memory transfers by making use a of proximity principle: machine instructions and memory accesses are often "near" to the previous and following accesses.
• By caching the recent transactions in fast access memory and having another memory transfer process between the main memory and cache, the effective memory access time can be sped up with consequent performance gains.
7
Why is caching effective in computing• Spatial locality arises from
– loops
– data structures, arrays
• Temporal locality arises from
– loops
– sequential access to program instructions
• Memory cost and speed
– SRAM 5-25ns $100-$250/MByte
– DRAM 60-120ns $5-$10/MByte
– Magnetic disk 10-20 ms $0.10 - $0.20/MByte
8
Memory access time and cost
0.1
10
1000
100000
10000000
1000000000
1 2 3 4 5 6
AccessTime
CostPerMB
9
Practical usage of memory types• Advantageous to
build a hierarchy of memories:– fastest and most
expensive, small and close to processor
– slower and least expensive, large and further from processor
Memory
CPU
Memory
Size Cost ($/bit)Speed
Smallest
Biggest
Highest
Lowest
Fastest
Slowest Memory
10
Memory Hierarchy of a Modern Computer System
• By taking advantage of the principle of locality:– Present the user with as much memory as is available in the cheapest
technology.
– Provide access at the speed offered by the fastest technology.
Control
Datapath
SecondaryStorage(Disk)
Processor
Registers
MainMemory(DRAM)
SecondLevelCache
(SRAM)
On
-Ch
ipC
ache
1ns 10,000,000ns
(10s ms)
Speed (ns): 10ns 100ns
100sGs
Size (bytes):Ks Ms
TertiaryStorage(Disk)
10,000,000,000ns
(10s sec)Ts
11
The Art of Memory System Design
Processor
$
MEM
Memory
reference stream <op,addr>, <op,addr>,<op,addr>,<op,addr>, . . .
op: i-fetch, read, write
Optimize the memory system organizationto minimize the average memory access timefor typical workloads
Workload orBenchmarkprograms
12
Notation for accessing data and instructions in memory
• Define a BLOCK as the minimum size unit of information transferred between two adjacent levels of the memory hierarchy
• When a word of data is required, the whole block that the word is in is transferred.
• There is a high probability that the next word required is also in the block!, hence the next word is obtained from FAST memory rather than SLOW memory
13
Hits and misses
• Define a hit as event when data requested by a processor is available in some block of the highest memory hierarchy.
• A miss is the other case.
• Hit rate is a measure of success in accessing a cache
14
More notation• Hit rate,
• miss rate,
• hit time,
• miss penalty: time to fetch from slow memory
• memory systems are critical to good performance
15
Basics of caches• How do we determine if the data is in the cache?• If data is in the cache, how is it found?
• We only have information on:– address of data– how the cache is organized
• Direct mapped cache:– the data can only be at a specific place
16
Data Address is used to organize cache storage strategy
• Word is organized by byte bits
• Block is organized by bits denoting the word
• Location in cache is indexed by row
• Tag is identification of a block in a cache row
TagIndexBlockByte
Word address bits fields
17
Example 24 bit address with 8 byte block and 2048 blocks in cache of 16384 bytes
18
Bit fields for 4 byte word in 32 bit address with 2b words per block
Field Address Bits UsageWord field 0 : 3 address bits within the word
being accessedBlock field 4 : 4+b-1 identifies word within the block,
field could be emptySet field no bitsTag field 4+b : 31 identifies tag field
(unique identifier for block on its row)
19
Example of direct mapped cache• Example shows address entries that map to the same location in
cache for one byte per word, one word per block, one block per row
00001 00101 01001 01101 10001 10101 11001 11101
000
Cache
Memory
001
01
001
11
001
011
101
11
TagIndexBlockByte
Word address bits fields
Index 8 cache entriesData mappedby addressmodulo 8
20
Contents of a direct mapped cache
• Data == Cached block
• TAG == Most significant bits of cached block address that identify the block in that cache row from other blocks that map to that same row
• VALID == Flag bit to indicate the cache content is valid
21
Direct cache
Separate address into fields:
•Byte offset in word
•Index for row of cache
•Tag identifier of block
Address (showing bit positions)
20 10
Byteoffset
Valid Tag DataIndex
0
1
2
1021
1022
1023
Tag
Index
Hit Data
20 32
31 30 13 12 11 2 1 0
Cache of 2^n words, a block being a 4 byte word, has 2^n*(63-n) bits for 32 bit address
#rows=2^n#bits/row=32+32-2-n+1=63-n
22
Reading: Hits and Misses
• Hit requires no special handling. The data is available
• Instruction fetch cache miss: – Stall the pipeline, apply the PC to memory and
fetch the block. Re-fetch the instruction when the miss has been serviced
– Same for data fetch
23
Multi-word BlocksAddress (showing bit positions)
16 12 Byteoffset
V Tag Data
Hit Data
16 32
4Kentries
16 bits 128 bits
Mux
32 32 32
2
32
Block offsetIndex
Tag
31 16 15 4 32 1 0
24
Miss Rates Vs Block Size
1 KB
8 KB
16 KB
64 KB
256 KB
256
40%
35%
30%
25%
20%
15%
10%
5%
0%
Mis
s ra
te
64164
Block size (bytes)
Cache size
25
Block Size Tradeoff• In general, larger block size take advantage of spatial locality BUT:
– Larger block size means larger miss penalty:
• Takes longer time to fill up the block
– If block size is too big relative to cache size, miss rate will go up
• Too few cache blocks
• In general, Average Access Time:
– = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate
MissPenalty
Block Size
MissRate Exploits Spatial Locality
Fewer blocks: compromisestemporal locality
AverageAccess
Time
Increased Miss Penalty& Miss Rate
Block Size Block Size
26
Example: 1 KB Direct Mapped Cache with 32 Byte Blocks
• For a 2 ** N byte cache:
–The uppermost (32 - N) bits are always the Cache Tag
–The lowest M bits are the Byte Select (Block Size = 2 ** M)
Cache Index
0
1
2
3
:
Cache Data
Byte 0
0431
:
Cache Tag Example: 0x50
Ex: 0x01
0x50
Stored as partof the cache “state”
Valid Bit
:
31
Byte 1Byte 31 :
Byte 32Byte 33Byte 63 :Byte 992Byte 1023 :
Cache Tag
Byte Select
Ex: 0x00
9
27
Extreme Example: single big line
• Cache Size = 4 bytes Block Size = 4 bytes
– Only ONE entry in the cache
• If an item is accessed, likely that it will be accessed again soon
– But it is unlikely that it will be accessed again immediately!!!
– The next access will likely be a miss again
• Continually loading data into the cache butdiscard (force out) them before they are used again
• Worst nightmare of a cache designer: Ping Pong Effect
• Conflict Misses are misses caused by:
– Different memory locations mapped to the same cache index
• Solution 1: make the cache size bigger
• Solution 2: Multiple entries for the same Cache Index
0
Cache DataValid Bit
Byte 0Byte 1Byte 3
Cache Tag
Byte 2
28
Another Extreme Example: Fully Associative
• Fully Associative Cache, N blocks of 32 bytes each
– Forget about the Cache Index
– Compare the Cache Tags of all cache entries in parallel
– Example: Block Size = 32 Byte blocks, we need N 27-bit comparators
• By definition: Conflict Miss = 0 for a fully associative cache
:
Cache Data
Byte 0
0431
:
Cache Tag (27 bits long)
Valid Bit
:
Byte 1Byte 31 :
Byte 32Byte 33Byte 63 :
Cache Tag
Byte Select
Ex: 0x01
X
X
X
X
X
29
A Two-way Set Associative Cache• N-way set associative: N entries for each Cache Index
– N direct mapped caches operates in parallel
• Example: Two-way set associative cache
– Cache Index selects a “set” from the cache
– The two tags in the set are compared in parallel
– Data is selected based on the tag result
Cache Data
Cache Block 0
Cache TagValid
:: :
Cache Data
Cache Block 0
Cache Tag Valid
: ::
Cache Index
Mux 01Sel1 Sel0
Cache Block
CompareAdr Tag
Compare
OR
Hit
30
Disadvantage of Set Associative Cache• N-way Set Associative Cache versus Direct Mapped Cache:
– N comparators vs. 1– Extra MUX delay for the data– Data comes AFTER Hit/Miss decision and set selection
• In a direct mapped cache, Cache Block is available BEFORE Hit/Miss:– Possible to assume a hit and continue. Recover later if miss.
Cache Data
Cache Block 0
Cache Tag Valid
: ::
Cache Data
Cache Block 0
Cache TagValid
:: :
Cache Index
Mux 01Sel1 Sel0
Cache Block
CompareAdr Tag
Compare
OR
Hit
31
Three Cs of Caches:1. Compulsory misses: These are cache misses caused by the first access to
the block that has never been in cache (also known as cold-start misses)
2. Capacity misses: These are cache misses caused when the cache cannot contain all the blocks needed during execution of a program. Capacity misses occur because of blocks being replaced and later retrieved when accessed.
3. Conflict misses: These are cache misses that occur in set-associative or direct-mapped caches when multiple blocks compete for the same set. Conflict misses are those misses in a direct-mapped or set-associative cache that are eliminated in a fully associative cache of the same size. These are also called collision misses.
32
A Summary on Sources of Cache Misses• Compulsory (cold start or process migration, first reference): first
access to a block
– “Cold” fact of life: not a whole lot you can do about it
– Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant
• Conflict (collision):
– Multiple memory locations mappedto the same cache location
– Solution 1: increase cache size
– Solution 2: increase associativity
• Capacity:
– Cache cannot contain all blocks access by the program
– Solution: increase cache size
• Invalidation: other process (e.g., I/O) updates memory
33
Summary:• The Principle of Locality:
– Program likely to access a relatively small portion of the address space at any instant of time.
• Temporal Locality: Locality in Time
• Spatial Locality: Locality in Space
• Three Major Categories of Cache Misses:
– Compulsory Misses: sad facts of life. Example: cold start misses.
– Conflict Misses: increase cache size and/or associativity.Nightmare Scenario: ping pong effect!
– Capacity Misses: increase cache size
• Cache Design Space
– total size, block size, associativity
– replacement policy
– write-hit policy (write-through, write-back)
– write-miss policy
34
Cache design parametersDesign change effect on miss rate possible negative
performance effect
Increase block decreases miss rate may increasesize due to compulsory miss-penalty
misses
Increase size decreases capacity may access timeincrease misses
Increase decreases miss rate may increase access associativity time due to conflict
misses