Top Banner
Durability for Memory- Based Key-Value Stores Kiarash Rezahanjani July 4, 2012 1
40

Presentation

Nov 15, 2014

Download

Technology

kiarash1361

new approach to provide durability for in-memory databases with high performance and low cost
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presentation

1

Durability for Memory-Based Key-Value Stores

Kiarash Rezahanjani

July 4, 2012

Page 2: Presentation

2

Durability

set(university , UPC)

Ack

get(university )

UPC

Data Store

(university , KTH )

Page 3: Presentation

3

Durability

Ack

Data Store

Non Volatile

set(university , UPC )

Commodity

Page 4: Presentation

4

Durability

Ack

Data Store

set(myKey, U)

Commodity

Page 5: Presentation

5

Durability

Disk

Write Read

SLOWSeek time +Rotational time

+Transfer time

Page 6: Presentation

6

Cache in memory

Primary copy of objects

Cached Objects

ReadsWrites

Consistency ?

FastSlow

Page 7: Presentation

7

Cache in memory

MySQL Servers

Memcache servers

Application Servers

Update Obj A

Delete Obj A

Read ObjA - > Cache Miss

Read Obj A

Set ObjA

Stale data

Spending resouces

Writes are still Slow

Complicates development

Page 8: Presentation

8

Memory-Based Databases

Primary Copy of Objects

Back up

Writes ReadsNo stale data

Reads are fast

Writes latency?

Durability?

No inconsistency

Page 9: Presentation

9

Approaches towards durability

Data loss

Slow

Data loss

Snapshot Snapshot

State A State B Periodic Snapshots

Log Log Log

Synchronous logging

Logs Logs

Asynchronous logging

Page 10: Presentation

10

Approaches towards durability

Data

Replica Replica

Replica

Expensive

Catastrophic Failure , All gone

Page 11: Presentation

11

Project Goals

Durable write

Low latency

Cheap, commodity hardware

Availability, able to recover quickly

Page 12: Presentation

12

Target systems

• Data is big = many machines• Read dominant workload• Simple key-value store• Small writes– Example: Facebook• Tera bytes of data = 2000 memcache servers• Write/read ratio < 6%• Memcache is a key-value store• Status update, tag photo, profile update, etc

Page 13: Presentation

13

Solution

Page 14: Presentation

14

Design decisions

Periodic snapshot vs.

Message logging

Page 15: Presentation

15

Design decisions

Local diskvs.

Remote location

Page 16: Presentation

16

Design decisions

Remote file servervs.

Local disks of database cluster

Page 17: Presentation

17

Design Decision

Database

client

write

LogAck

Remote storage

Page 18: Presentation

Design Decision

Database

client

write

LogAck

Two Problems

2) Data availability

Asynchronous loggingMust1) Synchronous logging

18

Replication

Problems: Data loss

Page 19: Presentation

Replication

LogAck

LogLogLog

Replication

LogAck

¿

19

Page 20: Presentation

20

master

slaveslave

headtail

Broadcast Chain replication

Log LogAck Ack

Replication

Page 21: Presentation

21

Replication

master

slaveslave

Broadcast

LogAck

slave

Page 22: Presentation

22

Replication

headtail

Chain replication

LogAck

Page 23: Presentation

23

Replication

headtail

Chain replication

LogAck

Page 24: Presentation

Chain Replication

Database

client

write

LogAck

24

LogLogLog

Page 25: Presentation

Chain Replication

Database

client

write

LogAck

LogLogLog

Stable Storage Unit

25

Available Logs

Synchronous logging abstraction

Low latency

Page 26: Presentation

26

Log Server

Log

Page 27: Presentation

27

Log Server

Receiver

Persister

Reader

356

1

1

23

2

7

Sequential Write

Seek time

Page 28: Presentation

28

Zookeeper

ID3ID2ID1

Forming storage units

1. Query zookeeper

2. Get list of servers

3. Leader send request

4. Leader send list of

members

5. Upload storage unit data

6. Start the service

ID2 ID3ID1

Page 29: Presentation

Storage System

29

Stable storage unit Stable storage unit

Stable storage unit Stable storage unit

Zookeeper

Client

Client

Client

Page 30: Presentation

30

Failover

ID 440%

ID 545%

ID 150%

ID 620%

ID 220%

ID 330%

Stable Storage Unit Stable Storage Unit

Cient

Page 31: Presentation

31

Failover

ID 440%

ID 545%

ID 150%

ID 620%

ID 220%

ID 330%

Stable Storage Unit Stable Storage Unit

Cient

Page 32: Presentation

32

Failover

ID 440%

ID 545%

ID 150%

ID 620%

ID 220%

ID 330%

Stable Storage Unit Stable Storage Unit

Cient

Page 33: Presentation

33

Evaluation

• Throughput and latency of stable storage unit– Log entry sizes– Replication factors

• Comparison with WAL into local disk

Page 34: Presentation

34

Single synchronous client

Entry Size (bytes)

Latency(ms) Throughput(entries/sec)

200 0,45 2200

1024 0,62 1600

4096 0,99 1000

Replication factor of 3

Page 35: Presentation

35

Throughput vs. Latency

0 5000 10000 15000 20000 25000 30000 35000 400000

500

1000

1500

2000

2500

3000

3500

Replication factor of 3

5 B200 B1 KB4 KB10 KB

Throughput (entries/sec)

Late

ncy

(ms)

340002800014000

5000

Page 36: Presentation

36

Additional replica

0 5000 10000 15000 20000 25000 30000 35000 400000

200

400

600

800

1000

1200

1400

1600

1800

2000Entry size of 200 bytes

RF 3RF 2

Throughput (entries/sec)

Late

ncy

(micr

osec

ond)

Page 37: Presentation

37

Sustained load

Page 38: Presentation

38

WAL to local disk vs Stable storage unit

200 1024 40960

10

20

30

40

50

60

1.76 1.81 2.03

49.46 49.81 49.87

0.45 0.62 13.01 4.2

15

Disk (cache enabled)Disk (cache disabled)Stable Storage UnitStable Storage Unit (buffer full)

Entry size (bytes)

late

ncy

(ms)

Page 39: Presentation

39

Resource utilization

• Throughput of 6,000 entries/sec• Log entries of 200 bytes– CPU utilization = 9%– Bandwidth = 29 Mb/s – Dedicated disk– Small memory requirement

Page 40: Presentation

40

Summary

Durable write

Low latency

High availability

No additional resources

Avoid dependencies

Scalable