Durability for Memory- Based Key-Value Stores Kiarash Rezahanjani July 4, 2012 1
Nov 15, 2014
1
Durability for Memory-Based Key-Value Stores
Kiarash Rezahanjani
July 4, 2012
2
Durability
set(university , UPC)
Ack
get(university )
UPC
Data Store
(university , KTH )
3
Durability
Ack
Data Store
Non Volatile
set(university , UPC )
Commodity
4
Durability
Ack
Data Store
set(myKey, U)
Commodity
5
Durability
Disk
Write Read
SLOWSeek time +Rotational time
+Transfer time
6
Cache in memory
Primary copy of objects
Cached Objects
ReadsWrites
Consistency ?
FastSlow
7
Cache in memory
MySQL Servers
Memcache servers
Application Servers
Update Obj A
Delete Obj A
Read ObjA - > Cache Miss
Read Obj A
Set ObjA
Stale data
Spending resouces
Writes are still Slow
Complicates development
8
Memory-Based Databases
Primary Copy of Objects
Back up
Writes ReadsNo stale data
Reads are fast
Writes latency?
Durability?
No inconsistency
9
Approaches towards durability
Data loss
Slow
Data loss
Snapshot Snapshot
State A State B Periodic Snapshots
Log Log Log
Synchronous logging
Logs Logs
Asynchronous logging
10
Approaches towards durability
Data
Replica Replica
Replica
Expensive
Catastrophic Failure , All gone
11
Project Goals
Durable write
Low latency
Cheap, commodity hardware
Availability, able to recover quickly
12
Target systems
• Data is big = many machines• Read dominant workload• Simple key-value store• Small writes– Example: Facebook• Tera bytes of data = 2000 memcache servers• Write/read ratio < 6%• Memcache is a key-value store• Status update, tag photo, profile update, etc
13
Solution
14
Design decisions
Periodic snapshot vs.
Message logging
15
Design decisions
Local diskvs.
Remote location
16
Design decisions
Remote file servervs.
Local disks of database cluster
17
Design Decision
Database
client
write
LogAck
Remote storage
Design Decision
Database
client
write
LogAck
Two Problems
2) Data availability
Asynchronous loggingMust1) Synchronous logging
18
Replication
Problems: Data loss
Replication
LogAck
LogLogLog
Replication
LogAck
¿
19
20
master
slaveslave
headtail
Broadcast Chain replication
Log LogAck Ack
Replication
21
Replication
master
slaveslave
Broadcast
LogAck
slave
22
Replication
headtail
Chain replication
LogAck
23
Replication
headtail
Chain replication
LogAck
Chain Replication
Database
client
write
LogAck
24
LogLogLog
Chain Replication
Database
client
write
LogAck
LogLogLog
Stable Storage Unit
25
Available Logs
Synchronous logging abstraction
Low latency
26
Log Server
Log
27
Log Server
Receiver
Persister
Reader
356
1
1
23
2
7
Sequential Write
Seek time
28
Zookeeper
ID3ID2ID1
Forming storage units
1. Query zookeeper
2. Get list of servers
3. Leader send request
4. Leader send list of
members
5. Upload storage unit data
6. Start the service
ID2 ID3ID1
Storage System
29
Stable storage unit Stable storage unit
Stable storage unit Stable storage unit
Zookeeper
Client
Client
Client
30
Failover
ID 440%
ID 545%
ID 150%
ID 620%
ID 220%
ID 330%
Stable Storage Unit Stable Storage Unit
Cient
31
Failover
ID 440%
ID 545%
ID 150%
ID 620%
ID 220%
ID 330%
Stable Storage Unit Stable Storage Unit
Cient
32
Failover
ID 440%
ID 545%
ID 150%
ID 620%
ID 220%
ID 330%
Stable Storage Unit Stable Storage Unit
Cient
33
Evaluation
• Throughput and latency of stable storage unit– Log entry sizes– Replication factors
• Comparison with WAL into local disk
34
Single synchronous client
Entry Size (bytes)
Latency(ms) Throughput(entries/sec)
200 0,45 2200
1024 0,62 1600
4096 0,99 1000
Replication factor of 3
35
Throughput vs. Latency
0 5000 10000 15000 20000 25000 30000 35000 400000
500
1000
1500
2000
2500
3000
3500
Replication factor of 3
5 B200 B1 KB4 KB10 KB
Throughput (entries/sec)
Late
ncy
(ms)
340002800014000
5000
36
Additional replica
0 5000 10000 15000 20000 25000 30000 35000 400000
200
400
600
800
1000
1200
1400
1600
1800
2000Entry size of 200 bytes
RF 3RF 2
Throughput (entries/sec)
Late
ncy
(micr
osec
ond)
37
Sustained load
38
WAL to local disk vs Stable storage unit
200 1024 40960
10
20
30
40
50
60
1.76 1.81 2.03
49.46 49.81 49.87
0.45 0.62 13.01 4.2
15
Disk (cache enabled)Disk (cache disabled)Stable Storage UnitStable Storage Unit (buffer full)
Entry size (bytes)
late
ncy
(ms)
39
Resource utilization
• Throughput of 6,000 entries/sec• Log entries of 200 bytes– CPU utilization = 9%– Bandwidth = 29 Mb/s – Dedicated disk– Small memory requirement
40
Summary
Durable write
Low latency
High availability
No additional resources
Avoid dependencies
Scalable