Self Tuning Power Aware Replacement in Caching

05/01/23

PB-LRU: A Self-Tuning Power Aware Storage Cache Replacement Algorithm for Conserving Disk Energy

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Presented: Hang Zhao Chiu Tan

05/01/23

PB-LRU: Partition-Based LRU

Storage is a major energy consumer, 27% of power budget in a data center.

PB-LRU is a power aware, on-line cache management algorithm.

PB-LRU dynamically partitions cache at run time for energy optimal cache size per disk.

Practical algorithm that dynamically adapts to workload changes with little tuning.

05/01/23

Outline

Motivation Background Why need PB-LRU? Main Idea Energy estimation at Run Time Solving MCKP Evaluation & Simulation Conclusion

05/01/23

Motivation

Why is power conservation important? Data centers are an important component of

the Internet infrastructure. Power needs for a data center are increasing

at 25% a year, with storage taking up 27%.How to reduce power in storage? Simple. Spin down disk when not in use.

05/01/23

Motivation (II)

But … Performance and energy penalty when disk

moving from low to high mode. Data center volume is high. Idle periods

small. Makes spinning up and down impractical.

Solution: Multi-speed disk architecture. PB-LRU targets multi-speed disk.

05/01/23

Background

Break-even time: Minimum length of idle time needed justify spinning up/down.

Oracle DPM: Knows length of next idle period. Uses this to regulate power modes.

Practical DPM: Use thresholds to regulate powering up or down.

05/01/23

Why need PB-LRU?

Earlier work: PA-LRU. Idea: Keep blocks from less active disks in

cache. Thus extends idle period. Cost: More misses to active disks. Justification: Since active disks are already

spinning, cheaper in terms of power consumption.

05/01/23

However …

PA-LRU requires complicated parameter tuning. 4 parameters needed.

No intuition between parameters and disk power consumption or IO times.

Thus difficult to adopt simple extensions or heuristics for real world implementation.

PB-LRU is a practical implementation !

05/01/23

PB-LRU: Main Idea

Divide cache into partitions, one for each disk.

Each partitioned managed individually. Resize partitions periodically. Workloads are not equally distributed through

different disks.

05/01/23

Main Idea (II)

So what do we need? Estimate, for each disk, the energy

consumed for a particular cache size. (estimation problem)

Use these estimates to find partitioning that minimize total energy consumption for all disks. (MCKP problem)

05/01/23

Estimation Problem

Q: How to estimate energy consumption per disk for different cache sizes at run time?

Use simulators. One (multi-disk) simulator for every cache size.

Requires (NumCacheSizes X NumDisks) simulators. Impractical!

05/01/23

Estimation Problem (II)

Mattson’s Stack: Take advantage of inclusion property. A cache of k blocks is a subset of k+1 blocks.

Accessing a stack at position i means a miss at caches smaller than size i.

PB-LRU uses Mattson’s Stack to predict hit or a miss for different partition sizes.

05/01/23

Estimation Problem (III)

In addition, PB-LRU keeps track of previous access time and previous energy consumption.

With these pieces of information, energy consumption of various cache is estimated.

05/01/23

Time T1 T2 T3 T4 T5

Access 5 4 3 2 1

Stack12345

RCache

123

Cache Size Pre_miss Energy1 T5 E52 T5 E53 T5 E54 T5 E55 T5 E5

CacheAccesses

MattsonStack

Existing Cache (real)

5 possible Cache sizes

Before

05/01/23

Time T1 T2 T3 T4 T5 T6

Access 5 4 3 2 1 4

Stack4 (1)1 (2)2 (3)3 (4)5 (5)

RCache

423

Cache Size Pre_miss Energy1 T6 E62 T6 E63 T6 E64 T5 E55 T5 E5

4th element of stack.Miss for cache size < 4

E6 = E5 + E(T6-T5) + 10ms + ActivePower

LRU

LRU

T6: Access Block 4

05/01/23

Solving MCKP

MCKP is NP-hard. But modified problem solvable using dynamic programming.

General result: Increase cache size for less active disks, decrease cache size for active disks.

Why? Penalty for reducing cache size of an active disk is small, while the energy saved for increasing cache size for inactive disk is large

05/01/23

Evaluation Methodology

The integrated simulator Disk power model CacheSim DiskSim

Multi-speed disks model Similar to IBM Ultrastar 36Z15 Add 4 lower-speed modes: 12k,

9k, 6k and 3k RPM Power model: 2-competitive

thresholds

05/01/23

Evaluation Methodology cont. The traces

Real system traces OLTP – database storage system, (21 disks, 128MB cache) Cello96 – Cello file server from HP, (19 disks, 32MB cache)

Synthetic traces generated based on storage system workloads zipf distribution to distribute requests among 24 disks

and blocks in each disk “hill” shape to reflect temporal locality Inter-request arrival distribution: exponential, Pareto

05/01/23

Simulation results Algorithms

Infinite cache LRU PA-LRU PB-LRU

Limited save due to high cold

misses rate 64%

PB-LRU saves 9%

Outperform LRU 22%

05/01/23

Simulation results cont.

Oracle DPM does not slow down the average response time for it always spin disk in time for a request

All PB-LRU results are insensitive to the epoch length

PB-LRU has 5% better

response time

saves 40% response

time

05/01/23

Accuracy of Energy Estimation OLTP, 21 disks

with Practical DPM

Largest deviation of estimated energy from real energy is 1.8%

05/01/23

Cache partition sizes

MCKP partition tendency gives small sizes to disks which remain active increase the sizes assigned to relatively inactive disks

1MB

11-12MB

05/01/23

Effects of spin-up cost

Disks stay longer at low-power mode

Break-even time increases

05/01/23

Sensitivity Analysis on Epoch Length

The epoch length just needs to be large enough to accommodate the “warm-up” period after re-partitioning.

05/01/23

Conclusion

PB-LRU: online storage cache replacement algorithm partitioning the total system cache amongst individual disks

It focuses on multiple disks with data center workloads

Achieving similar or better energy saving and response time improvement with significant less parameter tuning

05/01/23

Future work

Taking pre-fetching into consideration to investigate the role of cache management in energy conservation

Optimally divide the total cache between the cache and pre-fetching buffers

Implement the disk power modeling component into the real storage system

05/01/23

Impact of PB-LRU

5 citations found at Google Scholar Energy conservation techniques for disk array-based s

ervers (ICS’04) Performance Directed Energy Management for Main

Memory and Disks (ASPLOS’04) Power Aware Storage Cache Management Power and Energy Management for Server Systems Management Issues

Self Tuning Power Aware Replacement in Caching

Technology