ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)
Feb 22, 2016
ECE7995 Caching and Prefetching Techniques in Computer Systems
Lecture 8: Buffer Cache in Main Memory (IV)
Quantifying Locality with LRU Stack
• Blocks are ordered by their recencies;
• Blocks enter from the stack top, and leave from its bottom;
1 LRU stack
32
5
98
43. . .4
5544 33
Recency = 1Recency = 2
LRU Stack
• Blocks are ordered by recency in the LRU stack;
• Blocks enter from the stack top, and leave from its bottom;
LRU stack
32
45
98
3. . . 5544 3333
Recency = 2
IRR = 2
Inter-Reference Recency (IRR)The number of other distinct blocks accessed between two consecutive references to the block.
Recency = 0
Locality Strength
Locality Strength
Cache Size
MULTI2
IRR
(Re-
use
Dis
tanc
e in
Blo
cks)
Virtual Time (Reference Stream)
LRU
Good for “absolutely” strong locality
Bad for relatively weak locality
LRU’s Inability with Weak Locality
• Memory scanning (one-time access) Infinite IRR, weak locality; should not be cached at all; not replaced timely in LRU (be cached until their recency
larger than cache size);
LRU’s Inability with Weak Locality
• Loop-like accesses (repeated accesses with a fixed interval)
IRR is the same as the interval The interval larger than cache size, no hits blocks to be accessed soonest can be unfortunately
replaced.
LRU’s Inability with Weak Locality
• Accesses with distinct frequencies: The recencies of frequently accessed blocks become large
because of references to infrequently accessed block; Frequently accessed blocks could be unfortunately replaced.
Looking for Blocks with Strong Locality
Locality Strength
Cache Size
MULTI2IR
R (R
e-us
e D
ista
nce
in B
lock
s)
Virtual Time (Reference Stream)
Cover 1000 Blocks with Strongest
Locality
Challenges
Address the limitations of LRU fundamentally.
Retain the low overhead and adaptability merits of LRU.
• Simplicity: affordable implementation • Adaptability: responsive to access pattern changes
Principle of the LIRS Replacement
We select the blocks with high IRRs for replacement .
LIRS: Low IRR Set Replacement algorithm We keep the set of blocks with low IRRs in cache.
If a block’s IRR is high, its next IRR is likely to be high again.
Requirements on Low IRR Block Set (LIRS)
The set size should be the cache size. The set consists of the blocks with strongest
locality strength (with the lowest IRRs)Dynamically keep the set up to date
Low IRR Block Set Low IRR ( LIR ) block and High IRR (HIR) block
LIR block set
(size is Llirs )
HIR block set
Cache size
L = Llirs + LhirsLhirs
Llirs
Physical CacheBlock Sets
An Example for LIRS
Llirs=2, Lhirs=1V time /Blocks
1 2 3 4 5 6 7 8 9 10 R IRR
A X X X 1 1
B X X 3 1
C X 4 inf
D X X 2 3
E X 0 inf
LIR block set = {A, B}, HIR block set = {C, D, E}
CDE
HIR block set
A B
A BE
LIR block set
Resident blocks
Mapping to Cache
Block Sets
Lhirs=1
Llirs=2
Physical Cache
D is referenced at time 10
V time /Blocks
1 2 3 4 5 6 7 8 9 10 R IRR
A X X X 1 1
B X X 3 1
C X 4 inf
D X X XX 0 3
E X 1 Inf
The resident HIR block (E) is replaced !
Which Block is replaced ? Replace HIR Blocks
V time /Blocks
1 2 3 4 5 6 7 8 9 10 R IRR
A X X X 2 1
B X X 3 1
C X 4 inf
D X X XX 0 2
E X 1 Inf
How LIR Set is Updated ? Recency of LIR Block Used
V time / Blocks
1 2 3 4 5 6 7 8 9 10 R IRR
A X X X 2 1
B X X 3 1
C X 4 inf
D X X XX 0 2
E X 1 Inf
After D is Referenced at Time 10 … …
E is replaced, D enters LIR set
B
D
V time /Blocks
1 2 3 4 5 6 7 8 9 10 R IRR
A X X X 2 1
B X X 4 1
C X XX 0 4
D X X 3 3
E X 1 Inf
If Reference is to C at Time 10 … …
E is replaced, C cannot enter LIR set
The LIRS References with Weak Locality
• Memory scanning (one-time access) Infinite IRR; Not included in the LIR block set; replaced timely.
The LIRS References with Weak Locality
• Loop-like accesses The IRRs of all blocks are the same; Once a block becomes LIR block, it can keep its status; Any cached block can contribute a hit in one loop of
accesses.
The LIRS References with Weak Locality
• Accesses with distinct frequencies: The IRRs of frequently accessed blocks have smaller
IRR, than infrequently accessed blocks. Frequently accessed blocks are LIR blocks; Always cached and get hits.
Making LIRS O(1) Efficient
Rmax (Maximum Recency of LIR blocks)
IRR HIR
(New IRR of the HIR block)
This efficiency is achieved by our LIRS stack.
LRU stack + LIR block with Rmax recency in its bottom ==> LIRS stack.
Differences between LRU and LIRS Stacks
resident blockLIR block
HIR block
Cache size L = 5
3216
5LRU
stack53216948
LIRS stack
Llir = 3
Lhir =2
Stack size of LRU decided by cache size, and fixed; Stack size of LIRS decided by Rmax, and varied.
LRU stack holds only resident blocks; LIRS stack holds any blocks whose recencies are no more than Rmax.
LRU stack does not distinguish “hot” and “cold” blocks in it; LIRS stack distinguishes LIR and HIR blocks in it, and dynamically maintains their statues.
Rmax (Maximum Recency of LIR blocks)
IRR HIR
(New IRR of the HIR block)
Blocks in the LIRS stack ==> IRR < Rmax
Other blocks ==> IRR > Rmax
LIRS Stack
How does LIRS Stack Help?
LIRS Operations resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
53216948
LIRS stack S
53
Resident HIR Stack Q
• Initialization: All the referenced blocks are given an LIR status until LIR block set is full.
We place resident HIR blocks in Stack Q
53216948
53
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 4835795Access an LIR Block (a Hit)
LIRS stack S
Resident HIR Stack Q
532169
4
853
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 835795Access an LIR Block (a Hit)
LIRS stack S
Resident HIR Stack Q
Access an LIR block (a Hit)
69
5321
48
53
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 35795 8
S Q
Access a Resident HIR Block (a Hit)
5321
48
53
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 35795
3
S Q
152
5483
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 35795
Access a Resident HIR Block (a Hit)
S Q
152
5483
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 35795
1
Access a Resident HIR Block (a Hit)
S Q
5483
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 5795
15
Access a Resident HIR Block (a Hit)
S Q
Access a Non-Resident HIR block (a Miss)
5
483
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 795
15
7
7
S Q
5
483
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 95
5
7
7
9
5
9
5
Access a Non-Resident HIR block (a Miss)
S Q
483
resident in cacheLIR block
HIR blockCache size
L = 5Llir =
3 Lhir =2
. . . 5
7
7
9
5
9
7
5
4 7
Access a Non-Resident HIR block (a Miss)
S Q
Workload Traces
• postgres is a trace of join queries among four relations in a relational database system;
• sprite is from the Sprite network file system;
• multi2 is obtained by executing three workloads, cs, cpp, and postgres, together.
Cache Partition
• 1% of the cache size is for HIR blocks
• 99% of the cache size is for LIR blocks
• Performance is not sensitive to a partition.
Looping Pattern: postgres (Access Map)
Virtual Time (Reference Stream)
Logi
cal B
lock
Num
ber
Looping Pattern: Postgres (IRR Map) IR
R (R
e-us
e D
ista
nce
in B
lock
s)
Virtual Time (Reference Stream)
LRU
LIRS
Looping Pattern: postgres (Hit Rates) Postgres
0
10
20
30
40
50
60
70
80
0 500 1000 1500 2000 2500 3000Cache Size (# of Blocks)
Hit R
atio
(%) OPT
LIRSLRU-22QLRFUEELRUARCLRU
Temporally-Clustered Pattern: sprite (Access Map)
Virtual Time (Reference Stream)
Logi
cal B
lock
Num
ber
Temporally-Clustered Pattern: sprite (IRR Map) IR
R (R
e-us
e D
ista
nce
in B
lock
s)
Virtual Time (Reference Stream)
LRULIRS
Temporally-Clustered Pattern: sprite (Hit Ratio)SPRITE
0102030405060708090
100
0 200 400 600 800 1000 1200
Cache Size (# of Blocks)
Hit R
atio
(%) OPT
LIRSLRU-22QLRFUEELRUARCLRU
Mixed Pattern: multi2 (Access Map)
Virtual Time (Reference Stream)
Logi
cal B
lock
Num
ber
Mixed Pattern: multi2 (IRR Map) IR
R (R
e-us
e D
ista
nce
in B
lock
s)
Virtual Time (Reference Stream)
LIRS
LRU
Mixed Pattern: multi2 (Hit Ratio)MULTI-2
0
10
20
30
40
50
60
70
80
90
0 1000 2000 3000 4000Cache Size (# of Blocks)
Hit R
atio
(%)
OPTLIRSLRU-22QLRFUEELRUARCLRU
Summay
• LIRS uses both IRR (or reuse distance) and recency for its replacement decision. 2Q uses only reuse distance.
• LIRS adapts to the locality changes when deciding which blocks have small IRRs. 2Q uses a fixed threshold in looking for blocks of small reuse distances.
• Both LIRS and 2Q are of low time overhead (as low as LRU). Their space overheads are acceptably larger.