Counter Stacks: Storage Workload Analysis via Streaming Algorithms Nick Harvey University of British Columbia and Coho Data Joint work with Zachary Drudi,
Post on 05-Jan-2016
213 Views
Preview:
Transcript
Counter Stacks:Storage Workload Analysisvia Streaming Algorithms
Nick Harvey University of British Columbia and Coho Data
Joint work with Zachary Drudi, Stephen Ingram, Jake Wires, Andy Warfield
CachingWhat data to keep in fast memory?
Fast, Low-Capacity Memory
Slow, High-Capacity Memory
CachingHistorically
Registers
RAM
Disk
Belady, 1966: FIFO, RAND, MIN
Denning, 1968: LRU
CachingModern
Registers,L1, L2, L3
RAM
Disk
SSD
Proxy
CDN
Associative map
LRU etc.
LRU
ConsistentHashing...
from 1968CPUs are >1000x fasterDisk latency is <10x betterCache misses are increasingly costly
Challenge: ProvisioningHow much cache should you buy to support your
workload?
Challenge: Virtualization
• Modern servers are heavily virtualized• How should we allocate the physical cache among
virtual servers to improve overall performance?• What is “marginal benefit” to giving server more
cache?
• Understanding workloads better can help–Administrators make provisioning
decisions–Software make allocation decisions
• Storing a trace is costly: GBs per day
• Analyzing and distilling traces is a challenge
Understanding Workloads
Hit Rate Curve
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 760
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Hit
rate
0.20.1 0.30 0.4 0.5
MSR Cambridge “TS” Trace, LRU Policy
• Fix a particular workload and caching policy• If cache size is x, what would hit rate be?• HRCs are useful for choosing an appropriate
cache size
Cache Size (GB)
“Elbow”“Knee”“Working Set”
Not muchmarginal benefitof a bigger cache
Hit Rate Curve
• Real-world HRCs need not be concave or smooth
• “Marginal benefit” is meaningless• “Working set” is a fallacy
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 760
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Cache Size (GB)
Hit
rate
MSR Cambridge “Web” Trace, LRU Policy
4020 600 80
“Elbow”?“Knee”?“Working Set”?
LRU Caching• Policy: An LRU cache of size x always
contains thex most recently requested distinct symbols.
A B C A D A B …
• If cache size >3 then B will still be in the cacheduring the second request for B.–Second request for B is a hit for
cache size x if x>3.
• Inclusive: Larger caches always include contents of smaller caches.
3 distinct symbols“Reuse Distance”
Mattson’s Algorithm
• Maintain LRU cache of size n; simulate cache of all sizes x·n.
• Keep list of all blocks, sorted by most recent request time.
• Reuse distance of a request is its position in that list.• If distance is d, this request is a hit for all cache sizes
¸d.• Hit rate curve is CDF of reuse distances.
A B C A D A B …
List: AB AC B AA C BD A C BA D C BB A D C
Requests:
Faster Mattson[Bennett-Kruskal 1975, Olken 1981, Almasi et al. 2001, ...]
• Maintain table mapping block to time of last request
# of blocks whose last request time is ¸ t= # of distinct blocks seen since time t
• Can compute this in O(log n) time with a balanced tree
• Can compute HRC in O(m log n) time
A B C A D A B …Block
ABCD
Space is (n)
n = # blocks
m = length of trace
Is linear space OK?
• A modern disk is 8TB, divided in 4kB blocks) n = 2B
• The problem is worse in multi-disk arrays) n = 15B
• If the algorithm for improving memory usageconsumes 15GB of RAM, that’s counterproductive!
60TB JBOD
• We ran an optimized C implementation of Mattson on theMSR-Cambridge traces of 13 live servers over 1 week
• Trace file is 20GB in size, 2.3B requests, 750M blocks (3TB)
• Processing time: 1 hour• RAM usage: 92GB
• Lesson: Cannot afford linear space to process storage workloads
• Question: Can we estimate HRCs in sublinear space?
Is linear space OK?
Quadratic Space
A B C A D A BRequests:
Set of allsubsequ
ent items:
AB BC C C
A A A
D D D D D
A AB B B B B
Items seen since first requestItems seen since second request
• Reuse distance is size of oldest set that grows.• Hit rate curve is CDF of reuse distances.
Reuse Distance = 2Reuse Distance = 3Reuse Distance = 1
Quadratic SpaceA B C A D A BRequests:
For t=1,…,m Receive request bt;
Find minimum j such that bt is not in jth set
Let vj be cardinality of jth set
Record a hit at reuse distance vj
Insert bt into all previous sets
Set of allsubsequ
ent items:
AB BC C C
A A A
D D D D D
A Avj = 3
j=3
More Abstract Version
For t=1,…,m Let vj be cardinality of jth set
Receive request bt
Let ±j be change in jth set’s cardinality when adding bt
For j=2,…,t Record (±j-±j-1) hits at reuse distance vj
A B C A D A BRequests:
Set of allsubsequ
ent items:
AB BC C C
A A A
D D D D D
A A
±j:
0 0 1 1 1 1±j-±j-1:
0 0 1 0 0 0
vj = 3
How should we represent these sets? Hash table?
; Insert bt into all previous sets
InsertDeleteMember?Cardinalit
y?Space (in
bits)
Random Set Data StructuresHash TableBloom FilterF0 Estimator
YesYesYesYes
(n log n)
YesNo
Yes*No(n)
YesNoNo
Yes*O(log
n)
Op
era
tion
s
Aka “HyperLogLog”“Probabilistic Counter”
“Distinct Element Estimator”* allowing some error
Subquadratic SpaceA B C A D A BRequests:
Set of allsubsequ
ent items:
Items seen since first requestItems seen since second request
• Reuse distance is size of oldest set that grows (cardinality query)
• Hit rate curve is CDF of reuse distances.
F0
Est
imat
or
F0
Est
imat
or
F0
Est
imat
or
Insert
Insert
…
Insert
For t=1,…,m Let vj be value of jth F0-estimator
Receive request bt
Let ±j be change in jth F0-estimator when adding bt
For j=2,…,t Record (±j-±j-1) hits at reuse distance vj
Towards Sublinear SpaceA B C ARequests:
Set of allsubsequ
ent items:
• Note that an earlier F0-estimator is a superset of later one
• Can this be leveraged to achieve sublinear space?
F0
Est
imat
or
F0
Est
imat
or
…
F0
Est
imat
or
F0
Est
imat
or
¶ ¶ ¶
F0 Estimation[Flajolet-Martin ‘83, Alon-Mattias-Szegedy ‘99, …, Kane-Nelson-Woodruff ‘10]
Operations: • Insert(x)• Cardinality(), with (1+²) multiplicative error
Space: log(n)/²2 bits £(²-
2+log n) is optimal
log n rows
²-2 columns
F0 Estimation
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
Operations: Insert(x), Cardinality()
²-2 columns
11
log n rows
F0 Estimation
1 11
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
Operations: Insert(x), Cardinality()
²-2 columns
log n rows
F0 Estimation
1 1 11
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
Operations: Insert(x), Cardinality()
²-2 columns
log n rows
F0 Estimation
1 1 11
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
Operations: Insert(x), Cardinality()
²-2 columns
log n rows
F0 Estimation
1 1 1 11 1
1
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
Operations: Insert(x), Cardinality()
²-2 columns
log n rows
F0 Estimation
Suppose we insert n distinct elements# of 1s in a column is max of ¼n²2 geometric RVs, so ¼log(n²2)Averaging over all columns gives a concentrated estimate for
log(n²2)Exponentiating and scaling gives concentrated estimate for n
1 1 1 11 1
1
Operations: Insert(x), Cardinality()
²-2 columns
log n rows
F0 Estimation for a chain
word word word word word
word word word word word
word word word word word
word word word word word
word word word word word
²-2 columns
Operations: • Insert(x)•Cardinality(t), estimate # distinct elements since tth insert
Space: log(n)/²2 words
log n rows
F0 Estimation for a chain
11
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
²-2 columns
Operations: Insert(x), Cardinality(t)Space: log(n)/²2 words
log n rows
2 11
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
²-2 columns
F0 Estimation for a chainOperations: Insert(x), Cardinality(t)
log n rows
2 1 31
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
²-2 columns
F0 Estimation for a chainOperations: Insert(x), Cardinality(t)
log n rows
2 4 34
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
²-2 columns
F0 Estimation for a chainOperations: Insert(x), Cardinality(t)
log n rows
2 4 5 34 5
5
A B C A D A B …
Hash function h (uniform)Hash function g(geometric)
²-2 columns
F0 Estimation for a chainOperations: Insert(x), Cardinality(t)
log n rows
2 4 5 34 5
5
²-2 columns
F0 Estimation for a chain
• The {0,1}-matrix consisting of all entries ¸t is the same as the matrix for an F0 estimator that started at time t.
• So, for any t, we can estimate # distinct elements since time t.
Operations: Insert(x), Cardinality(t)
log n rows
Theorem: Let n=B¢W.Let C : [n] ! [0,1] be true HRC.Let Ĉ : [n] ! [0,1] be estimated HRC.Using O(B2¢log(n)¢log(m)/²2) words of space, can get C((j-1)¢W) - ² · Ĉ(j¢W) · C(j¢W)+² 8j=1,…,B
Vertical errorHorizontal
error
n = # distinct blocks m = # requestsB = # “bins” W = width of each “bin”
1
0
Hit
Rate
0 nWB bins
CĈ
C(j¢W)
Ĉ(j¢W)
C((j-1)¢W)
C((j-1)¢W)-²
Experiments:MSR-Cambridge traces of 13 live servers over 1 week
• Trace file is 20GB in size, 2.3B requests, 750M blocks
• Optimized C implementation of Mattson’s algorithm– Processing time: ~1 hour– RAM usage: ~92GB
• Java implementation of our algorithm– Processing time: 17 minutes (2M
requests per second)– RAM usage: 80MB (mostly the garbage
collector)
Experiments:MSR-Cambridge traces of 13 live servers over 1 week
• Trace file has m=2.3B requests, n=750M blocks
heuristic
counter stacks
Experiments:MSR-Cambridge traces of 13 live servers over 1 week
• Trace file has m=585M requests, n=62M blocks
heuristic
counter stacks
Experiments:MSR-Cambridge traces of 13 live servers over 1 week
• Trace file has m=75M requests, n=20M blocks
heuristic
counter stacks
Conclusions• Workload analysis by measuring
uniqueness over time.
• Notion of “working set” can be replaced by “hit rate curve”.
• Can estimate HRCs in sublinear space, quickly and accurately.
• On some real-world data sets, its accuracy is noticeably better than heuristics that have been proposed in the literature.
Open Questions• Does algorithm use optimal amount of space?
Can it be improved to O(B¢log(n)¢log(m)/²2) words of space?
• We did not discuss runtime.Can we get runtime independent of B and ²?
• We are taking difference of F0-estimators by subtraction.This seems crude. Is there a better approach?
• Streaming has been used in networks, databases, etc.To date, not used much in storage. Potentially more uses here.
top related