Top Banner
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels SOSP(2007) Presenter: Shichao Jin
30

Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Jan 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Dynamo: Amazon’s Highly Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,

Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels

SOSP(2007)

Presenter: Shichao Jin

Page 2: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Outline

Background

Design Principles

Techniques

Conclusion

Page 3: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Background

Amazon Shopping Carts

low-latency key-value storage

Put() & Get()

SLA: response within 300ms for 99.9% of requests

hundreds of nodes

a collection of distributed techniques

spawned many imitators

Voldemort (LinkedIn)

Cassandra (Facebook)

Page 4: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Design Principles

Always-writable

Incrementally scalable

Symmetrical

Decentralized

Heterogenous

Page 5: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Techniques

Problem Technique

Partitioning Consistent hashing

High availability for writes

Eventual consistency,

Vector clocks with reconciliation during reads

Handling temporary failures

Sloppy quorum protocol and hinted handoff

Recovering from permanent failures Anti-entropy using Merkle trees

Membership and failure detection Gossip-based membership protocol and failure detection

Page 6: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

m nodes

items identified by keys

How to partition items to m nodes?

Page 7: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

node0 node1 node2 node3

11%4=3

102%4=2

Page 8: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

Disadvantages of hash:

static, rehash when add/delete node(s)

Solution:

Consistent Hashing

Page 9: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

Consistent Hashing:

hash space: ring

each node manages a region

all rehash is unnecessary

Page 10: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

add node3

delete node1

Page 11: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

Problems of Consistent Hashing:

non-uniform load distribution

heterogeneity

Solution:

Virtual Nodes

Page 12: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Partition——Consistent Hashing

Virtual Nodes:

disperse load to other nodes when a node fails

Page 13: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Replication

An Example for Replication

N = 3

B, C, D is K’s preference list

for fault-tolerance

for availability

Page 14: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

High Availability for Writes

Concurrent Writes:

Application: Shopping Cart

Two-Phase Commit in distributed RDBMS

Page 15: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

High Availability for Writes

Concurrent Writes:

Problem: 2 (more) versions of a data item

Possible Solution: timestamp (How?)

Dynamo: Vector Clocks

N1 N2 N3

K14 V14 K14 V14 K14 V14 K14 V14’ K14 V14’’

Page 16: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

High Availability for Writes

Vector Clocks:

logical clock

causal order (partial)

Page 17: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

High Availability for Writes

How to determine ordering of versions?

(A:1, B:1, C:1) < (A:3, B:1, C:1)

(A:1, B:1, C:1) ? (A:2, C:1)

Page 18: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Consistency——Strict Quorum

Eventual Consistency:

given enough time all updates will propagate

through the system

Read after Write

N1 N2 N3

K14 V14 K14 V14 K14 V14 K14 V14’ K14 V14’

Page 19: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Consistency——Strict Quorum

Strict Quorum:

see the latest data

define a replica set of size N

put() waits for acks from at least W replicas

get() waits for responses from at least R replicas

W+R > N

Page 20: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Consistency——Strict Quorum

Strict Quorum Example:

N=3, W=2, R=2

replica set for K14: {N1, N2, N3}

assume put() on N3 fails

N1 N2 N3

K14 V14 K14 V14

put(

K1

4,

V1

4)

Page 21: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Consistency——Strict Quorum

Strict Quorum Example:

Now, issuing get() to any two nodes out of three will

return the answer

N1 N2 N3

K14 V14 K14 V14

ge

t(K1

4)

nill

Page 22: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Consistency——Strict Quorum

Why does Strict Quorum works?

Tune W, R, N:

optimized for write, set W small

optimized for read, set R small

W R

Page 23: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Temporary Failure ——Hinted Handoff

Hinted Handoff (Sloppy Quorum)

node accepts writes for other down nodes

data accepted by other node is handed off when

down node recovers

set W = 3, N = 3

do not wait B recover

Page 24: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Temporary Failure ——Hinted Handoff

Sloppy Quorum

Page 25: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Permanent Failure ——Replica Synchronize

Replica Synchronization (Merkle tree)

hierarchical checksums

executed periodically or when membership changes

Page 26: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Permanent Failure ——Replica Synchronize

Replica Synchronization (Merkle tree)

hierarchical checksums

executed periodically or when membership changes

Page 27: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Permanent Failure ——Replica Synchronize

Replica Synchronization (Merkle tree)

hierarchical checksums

executed periodically or when membership changes

Page 28: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Conclusion

Consistent Hashing

Vector Clocks

Eventual consistency

Strict & Sloppy Quorum

Merkel Tree

Page 29: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

References

Dynamo Paper

KaiAn: Open Source Implementation of Amazon’s

Dynamo

UCB CS162: Key-Value Store, Networking, Protocols

A Little Riak Book by Eric Redmond

Page 30: Dynamo: Amazon’s Highly - University of Waterloo · Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,

Q & A