TRIAD: Creating Synergies Between Memory, Disk and Log in Log Structured Key-Value Stores EPFL A. Arora K. Gupta P. Konka H. Yuan O. Balmau D. Didona R. Guerraoui W. Zwaenepoel EPFL EPFL EPFL Nutanix Nutanix Nutanix Nutanix USENIX ATC ‘17, Santa Clara CA 1
78
Embed
TRIAD: Creating Synergies Between Memory, Disk and · PDF fileTRIAD: Creating Synergies Between Memory, Disk and Log ... KV pairs. Simple operations: ... TRIAD: Throughput
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
TRIAD: Creating Synergies Between Memory, Disk and Log
in Log Structured Key-Value Stores
EPFL
A. Arora K. Gupta P. KonkaH. Yuan
O. Balmau D. Didona R. Guerraoui W. ZwaenepoelEPFL EPFL EPFL
Nutanix Nutanix NutanixNutanix
USENIX ATC ‘17, Santa Clara CA
1
Very simple data stores.
KV pairs.
Simple operations: update, read.
KV Stores
2
KV Stores
3
Distributed Single machine
In-memory
Persistent
KV Stores
4
Distributed Single machine
In-memory
Persistent
KV Stores
5
Distributed Single machine
In-memory
Persistent
Log-Structured Merge (LSM)
TRIAD in a Nutshell
TRIAD LSM KV: achieves 2x throughput on production wklds.
Methods: Reducing background I/O in LSMs.
6
TRIAD in a Nutshell
TRIAD LSM KV: achieves 2x throughput on production wklds.
Methods: Reducing background I/O in LSMs.
7
JJ
No need to know workload a priori.LSM KV semantics preserved.
LSM Overview
8
LSM ComponentsSorted memory component
SSTables• sorted files• many SSTables/Level Cm
L0
Ln
…
9
LSM Updatesupdate
Commit Log
10
Cm
L0
Ln
LSM Reads
read
11
Cm
L0
Ln
LSM Background Ops: Flushing
12
FlushingFrom memory to L0.
Cm
L0
Ln
LSM Background Ops: Flushing
CommitLog
L0
flushing
K1 V1’
K2 V2
K3 V3’
…
Kn Vn
K1 V1
K2 V2
K1 V1’
K3 V3
Cm
13
RAM
Disk
Mem component full
LSM Background Ops: Flushing
CommitLog
L0
flushing
Cm
14
RAM
Disk
K1 V1’
K2 V2
…
Kn Vn
flush1
LSM Background Ops: Flushing
CommitLog
L0
flushing
K1 V1’
K2 V2
K3 V3’
K1 V1
K2 V2
K1 V1’’
K3 V3
…
K1 V1’
K3 V3’
Cm
15
RAM
Disk
Commit log full
LSM Background Ops: Flushing
CommitLog
L0
flushing
Cm
16
RAM
Disk
flush1
K1 V1’
K2 V2
K3 V3’
LSM Background Ops: Compaction
17
FlushingFrom memory to L0.
Cm
L0
Ln
CompactionLevels on disk.
LSM Background Ops: Compaction
18
Disk L0
L1
Ln
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
K
Key Val
… …
…
LSM Background Ops: Compaction
19
Disk L0
L1
Ln
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
K
…
K written to disk
LSM Background Ops: Compaction
20
Disk L0
L1
Ln
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
K
K
…K rewritten to disk
LSM Background Ops: Compaction
21
Disk L0
L1
Ln
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
K
… K rewritten to disk…Key Val
… …
K
LSM Background Ops: Compaction
22
Disk L0
L1
Ln
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
…Key Val
… …
K K rewritten n+1th time to disk!Key Val
… …
K
LSM Background Ops: Compaction
23
Disk L0
L1
Ln
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
…Key Val
… …
K K rewritten n+1th time to disk!Key Val
… …
K
Write amplification (WA)≈amount of rewrites of data to disk
Insight
Severe competition for compute/storage resources between LSM background ops and user ops.
24
050
100150200250300
Uniform50r-50w
Skewed50r-50w
K O
pera
tions
/sRocksDBRocksDB No BG I/O
Background I/O Overhead§ Long & slow bg. ops slowdown of user ops.
25
050
100150200250300
Uniform50r-50w
Skewed50r-50w
K O
pera
tions
/sRocksDBRocksDB No BG I/O
Background I/O Overhead§ Long & slow bg. ops slowdown of user ops.
26
up to 3x throughput gap L
Goal
Decrease background ops overhead to increase user throughput.
27
TRIAD
28
TRIAD
TRIAD-MEM
TRIAD-DISK
TRIAD-LOG
29
Workload Improve WA in
Skewed workloads Flushing and Compaction
In-between Compaction
Uniform workloads Flushing
Three techniques work together and are complementary.
TRIAD
TRIAD-MEM
TRIAD-DISK
TRIAD-LOG
30
Workload Improve WA in
Skewed workloads Flushing and Compaction
Uniform workloads Flushing
TRIAD-MEM
TRIAD-MEM
TRIAD-DISK
TRIAD-LOG
31
Workload Improve WA in
Skewed workloads Flushing and Compaction
In-between Compaction
Uniform workloads Flushing
32
L0
flushing
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
33
L0
flushing
K1 V11
K1 V11
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
34
L0
flushing
K1 V12
K1 V11
K1 V12
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
35
L0
flushing
K1 V12
K2 V2
K1 V11
K1 V12
K2 V2
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
36
L0
flushing
K1 V13
K2 V2
K1 V11
K1 V12
K2 V2
K1 V13
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
37
L0
flushing
K1 V14
K2 V2
K1 V11
K1 V12
K2 V2
K1 V13
K1 V14
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
38
L0
flushing
K1 V1n
K2 V2
K1 V11
K1 V12
K2 V2
K1 V13
K1 V14
…
K1 V1n
Cm
CommitLog
Problem: Flushing with Skewed Workloads
RAM
Disk
39
L0
flushing
Cm
K1 V1n
K2 V2
Problem: Flushing with Skewed Workloads
flush1
CommitLog
RAM
Disk
40
L0
flushing
K1 V1n
K2 V2
Cm
K1 V1n
K2 V2
K1 V1n’
K3 V3
K1 V1n’’
K2 V2’
Kn Vn
…
Problem: Flushing with Skewed Workloads
flush1 flush2 flushn
K1 V11
K2 V2
K1 V12
K1 V13
K1 V14
…
K1 V1n
CommitLog
RAM
Disk
High data skew
• Flush because commit log is full• Flush mostly empty mem comp
L0
L1
41
Problem: Compaction with Skewed Workloads
Popular key K1Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
L0
L1
42
Problem: Compaction with Skewed Workloads
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
L0
L1
43
Problem: Compaction with Skewed Workloads
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
L0
L1
44
Problem: Compaction with Skewed Workloads
Key Val
… …
Key Val
… …Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
L0
L1
45
Problem: Compaction with Skewed Workloads
Rewritten to disk
Key Val
… …
Key Val
… …Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
L0
L1
46
Problem: Compaction with Skewed Workloads
Rewritten to disk
Key Val
… …
Key Val
… …Key Val
… …
Key Val
… …
Key Val
… …
Key Val
… …
File on L1 rewritten to disk twice because of one key L
47
TRIAD-MEM: Hot-cold key separation
L0
flushing
K1 V1n
K2 V2
K3 V3
…
Kn Vn
Cm
K1 V11
K2 V2
K1 V12
K1 V13
K1 V14
…
K1 V1n
CommitLog
RAM
Disk
48
TRIAD-MEM: Hot-cold key separation
L0
flushing
K1 V1n
K2 V2
K3 V3
…
Kn Vn
Cm
Idea: Keep hot keys in memoryFlush only cold keysKeep hot keys in CL
K1 V11
K2 V2
K1 V12
K1 V13
K1 V14
…
K1 V1n
CommitLog
RAM
Disk
49
TRIAD-MEM: Hot-cold key separation
L0
flushing
K1 V1n
Cm
K2 V2
K3 V3
…
Kn Vn
Idea: Keep hot keys in memoryFlush only cold keysKeep hot keys in CL
K1 V1n
CommitLog
RAM
Disk
ü Good for skewed workloads.
ü Reduce flushing WA: less data written from memory to disk.
ü Reduce compaction WA: avoid repeatedly compacting hot keys.
50
TRIAD-MEM Summary
TRIAD-LOG
TRIAD-LOG
51
Workload Improve WA in
Uniform workloads Flushing
Problem: Flushing with Uniform Workloads
CommitLog
L0
flushing
Key Val
K1 V1’
K2 V2
…
Kn Vn
K1 V1
K2 V2
K1 V1’
K3 V3
K3 V3’
…
Kn Vn
Cm
52
RAM
Disk
Problem: Flushing with Uniform Workloads
CommitLog
L0
flushing
Key Val
K1 V1
K2 V2
K1 V1’
K3 V3
K3 V3’
…
Kn Vn
Cm
53
RAM
Disk
Key Val
K1 V1’
K2 V2
…
Kn Vn
flush
CommitLog
L0
flushing
Key Val
K1 V1
K2 V2
K1 V1’
K3 V3
K3 V3’
…
Kn Vn
Cm
54
RAM
Disk
Key Val
K1 V1’
K2 V2
…
Kn Vn
flush
Problem: Flushing with Uniform Workloads
CommitLog
L0
flushing
Key Val
…
K1 V1
K2 V2
K1 V1’
K3 V3
K3 V3’
…
Kn Vn
Cm
55
RAM
Disk
Key Val
K1 V1’
K2 V2
…
Kn Vn
flush
Insight: Flushed data already written to commit log.
Idea: Use commit logs as SSTables. Avoid bg I/O due to flushing.
Problem: Flushing with Uniform Workloads
TRIAD-LOG
CommitLog
L0
flushing
K1 V1
K2 V2
K1 V1’
K3 V3
K3 V3’
…
Kn Vn
Cm
Key Val CLIndex
K1 V1’ 3
K2 V2 2
…
Kn Vn n
56
RAM
Disk
Point to most recent entry in CL.
TRIAD-LOG
L0
flushing
Cm
Key Val CLIndex
K1 V1’ 3
K2 V2 2
…
Kn Vn n
57
RAM
Disk CommitLog
K1 V1
K2 V2
K1 V1’
K3 V3
K3 V3’
…
Kn Vn
�
TRIAD-LOG
L0
flushing
Cm
Key Val CLIndex
K1: 3
K2:2
Kn:n
K1 V1
K2 V2
K1 V1’
…
Kn Vn
CLIndex
K1: 3
K2:2
…
Kn:n
58
RAM
Disk
CL-SSTableOnly flush CL Index from memory and couple it with the current Commit Log. CommitLog
CLIndex
K1: 3
K2:2
Kn:n
Keep index in memory for further reads.
ü Good for uniform workloads.
ü Reuse Commit Log as L0 SST.
ü No more flushing of mem component to disk.
59
TRIAD-LOG Summary
TRIAD Summary
TRIAD-MEM, TRIAD-DISK, TRIAD-LOG:
oComplementary, targeting different wklds.
oWorking simultaneously.
oTransparent to the workloads; no a priori knowledge needed.
60
Evaluation
61
Evaluation
§Compare TRIAD with RocksDB
§Workloads: Production, Synthetic
§Metrics: Throughput, Write Amplification (WA)
§Code: https://github.com/epfl-labos/TRIAD
62
Write Amplification (WA)
totaldatawrittentostoragedatawritten byappWA =
63
Production Workloads: Throughput
64
050
100150200250300350
Prod Wkld 1 Prod Wkld 2
KOPS RocksDB
TRIAD
0
2
4
6
8
10
Prod Wkld 1 Prod Wkld 2
Writ
e Am
plifi
catio
n
RocksDB
TRIAD
~uniform skewed
higheris better
Production Workloads: Throughput
65
050
100150200250300350
Prod Wkld 1 Prod Wkld 2
KOPS RocksDB
TRIAD
0
2
4
6
8
10
Prod Wkld 1 Prod Wkld 2
Writ
e Am
plifi
catio
n
RocksDB
TRIAD
~uniform skewed
TRIAD: stable throughput across wklds.
2x
higheris better
Production Workloads: Write Amplification
66
050
100150200250300350
Prod Wkld 1 Prod Wkld 2
KOPS RocksDB
TRIAD
0
2
4
6
8
10
Prod Wkld 1 Prod Wkld 2
Writ
e Am
plifi
catio
n
RocksDB
TRIAD
~uniform skewed
loweris better
Production Workloads: Write Amplification
67
050
100150200250300350
Prod Wkld 1 Prod Wkld 2
KOPS RocksDB
TRIAD
0
2
4
6
8
10
Prod Wkld 1 Prod Wkld 2
Writ
e Am
plifi
catio
n
RocksDB
TRIAD
~uniform skewed
TRIAD: low and uniform WA.
4x
loweris better
TRIAD: Throughput Breakdown Synthetic Workloads
68
0
50
100
150
200
250
300
SkewAwarenessOnly
DeferredCompac;onOnly
CommitLogIndexingOnly
RocksDB
KOPS
NoSkew
250260270280290300310320330
SkewAwarenessOnly
DeferredCompac?onOnly
CommitLogIndexingOnly
RocksDB
KOPS
Skew1-99
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
0
50
100
150
200
250
300
SkewAwarenessOnly
DeferredCompac;onOnly
CommitLogIndexingOnly
RocksDB
KOPS
NoSkew
250260270280290300310320330
SkewAwarenessOnly
DeferredCompac?onOnly
CommitLogIndexingOnly
RocksDB
KOPS
Skew1-99
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
Skew1%-99%
NoSkew
TRIAD-LOGTRIAD RocksDB
TRIAD-LOGTRIAD RocksDB
TRIAD-MEM
TRIAD-MEM
higheris better
TRIAD: Throughput Breakdown Synthetic Workloads
69
0
50
100
150
200
250
300
SkewAwarenessOnly
DeferredCompac;onOnly
CommitLogIndexingOnly
RocksDB
KOPS
NoSkew
250260270280290300310320330
SkewAwarenessOnly
DeferredCompac?onOnly
CommitLogIndexingOnly
RocksDB
KOPS
Skew1-99
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
0
50
100
150
200
250
300
SkewAwarenessOnly
DeferredCompac;onOnly
CommitLogIndexingOnly
RocksDB
KOPS
NoSkew
250260270280290300310320330
SkewAwarenessOnly
DeferredCompac?onOnly
CommitLogIndexingOnly
RocksDB
KOPS
Skew1-99
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
TRIAD-MEM TRIAD-DISK TRIAD-LOG RocksDB
Skew1%-99%
NoSkew
TRIAD-LOGTRIAD RocksDB
TRIAD-LOGTRIAD RocksDB
TRIAD-MEM
TRIAD-MEM
higheris better
Complementary techniques
§ TRIAD-MEM efficient for skewed workloads.
§ TRIAD-LOG efficient for uniform workloads.
97%
96%
More in Our Paper
oMore production workloads
oMore synthetic workloads
oDetailed breakdown of TRIAD techniques
oTRIAD-DISK
70
Related work
71
o LevelDB: first LSM-based KV store. No attempts to reduce WA.
o A number of systems attempt to reduce WA.§ LSMs: RocksDB, bLSM (SIGMOD/PODS ’12), VT-tree (FAST ‘13),