Top Banner
MAKE YOUR CHOICE CONSISTENCY, AVAILABILITY, PARTITION Andrea Giuliano @bit_shark
64

Consistency, Availability, Partition: Make Your Choice

Jul 15, 2015

Download

Technology

Andrea Giuliano
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Consistency, Availability, Partition: Make Your Choice

M A K E Y O U R C H O I C EC O N S I S T E N C Y, A VA I L A B I L I T Y, PA R T I T I O N

A n d r e a G i u l i a n o@ b i t _ s h a r k

Page 2: Consistency, Availability, Partition: Make Your Choice

D I S T R I B U T E D S Y S T E M S

Page 3: Consistency, Availability, Partition: Make Your Choice

W H AT A D I S T R I B U T E D S Y S T E M I S

“A distributed system is a software system in which components located on networked computers communicate

and coordinate their actions by passing messages”

Page 4: Consistency, Availability, Partition: Make Your Choice

D I S T R I B U T E D S Y S T E M SE X A M P L E S

Page 5: Consistency, Availability, Partition: Make Your Choice

D I S T R I B U T E D S Y S T E M S

R E P L I C AT I O N

Page 6: Consistency, Availability, Partition: Make Your Choice

R E P L I C AT E D S E R V I C EP R O P E R T I E S

CONSISTENCY

AVAILABILITY

Page 7: Consistency, Availability, Partition: Make Your Choice

C O N S I S T E N C Y

The result of operations will be predictable

Page 8: Consistency, Availability, Partition: Make Your Choice

C O N S I S T E N C Y

Strong consistency all replicas return the same value for the same object

Page 9: Consistency, Availability, Partition: Make Your Choice

C O N S I S T E N C Y

Strong consistency all replicas return the same value for the same object

Weak consistency different replicas can return different values for the same object

Page 10: Consistency, Availability, Partition: Make Your Choice

S T R O N G V S W E A KC O N S I S T E N C Y

Page 11: Consistency, Availability, Partition: Make Your Choice

S T R O N G V S W E A KC O N S I S T E N C Y

Strong consistency Atomic, consistent, isolated, durable database

Weak consistency Basically Available Soft-state Eventual consistency database

Page 12: Consistency, Availability, Partition: Make Your Choice

E X A M P L EC O N S I S T E N C Y

put(price, 10)

Page 13: Consistency, Availability, Partition: Make Your Choice

E X A M P L EC O N S I S T E N C Y

get(price)

price = 10

Page 14: Consistency, Availability, Partition: Make Your Choice

AVA I L A B I L I T Y

Page 15: Consistency, Availability, Partition: Make Your Choice

E X A M P L EA VA I L A B I L I T Y

Page 16: Consistency, Availability, Partition: Make Your Choice

C O M M U N I C AT I O N

Page 17: Consistency, Availability, Partition: Make Your Choice

PA R T I T I O N T O L E R A N C E

continue to operate even in presence of partitions

Page 18: Consistency, Availability, Partition: Make Your Choice

PA R T I T I O N T O L E R A N C E

Network failure groups at each side of a faulty entity network (switch, backbone)

Process failure system split in two groups: correct nodes and crashed node

Page 19: Consistency, Availability, Partition: Make Your Choice

C A P T H E O R E M

“Of three properties of shared-data systems (data consistency, system availability and

tolerance to network partitions) only two can be achieved at any given moment in time.”

Page 20: Consistency, Availability, Partition: Make Your Choice

T H E P R O O FC A P T H E O R E M

put(price, 10)

get(price)

price = 0

price = 0 price = 0

price = 0 no response

not consistentnot available

t2

t1partition 1

partition 2

Page 21: Consistency, Availability, Partition: Make Your Choice

CONSISTENCY AVAILABILITY

PARTITION TOLERANCE

➡ distributed databases ➡ distributed locking ➡ majority protocol ➡ active/passive replication ➡ quorum-based systems

BigTable

C A P T H E O R E M

I N P R A C T I C E

Page 22: Consistency, Availability, Partition: Make Your Choice

C A P T H E O R E M

CONSISTENCY AVAILABILITY

PARTITION TOLERANCE

➡ web caches ➡ stateless systems ➡ DNS

DynamoDB

Page 23: Consistency, Availability, Partition: Make Your Choice

C A P T H E O R E M

CONSISTENCY AVAILABILITY

PARTITION TOLERANCE

➡ Single site database ➡ cluster databases ➡ ldap

Page 24: Consistency, Availability, Partition: Make Your Choice

D Y N A M O

Page 25: Consistency, Availability, Partition: Make Your Choice

R E Q U I R E M E N T SD Y N A M O

“customers should be able to view and add items to their shopping cart even if disks are failing,

network routes are flapping, or data centers are being destroyed by tornados.”

Page 26: Consistency, Availability, Partition: Make Your Choice

R E Q U I R E M E N T SD Y N A M O

“customers should be able to view and add items to their shopping cart even if disks are failing,

network routes are flapping, or data centers are being destroyed by tornados.”

➡ reliable ➡ high scalable ➡ always available

Page 27: Consistency, Availability, Partition: Make Your Choice

S I M P L E I N T E R FA C ED Y N A M O

get(key)returns the object associated with the key and returns a single object or a list of objects with conflicting versions along with a context.

put(key, context, object)determines where the replicas of the object should be placed based on the associated key. The context includes information such as the version of the object.

Page 28: Consistency, Availability, Partition: Make Your Choice

R E P L I C AT I O N : T H E C H O I C ED Y N A M O

Synchronous replica coordination

‣ strong consistency ‣ availability tradeoff

Optimistic replication technique

‣ high availability ‣ conflicts probability

Page 29: Consistency, Availability, Partition: Make Your Choice

C O N F L I C T S : W H E ND Y N A M O

At write time

‣ writes rejection probability

At read time

‣ “always writable” datastore

Page 30: Consistency, Availability, Partition: Make Your Choice

C O N F L I C T S : W H OD Y N A M O

The data store

‣ e.g. “last write win” policy

The application

‣ resolution as implementation detail

Page 31: Consistency, Availability, Partition: Make Your Choice

A R I N G T O R U L E T H E M A L LD Y N A M O

Page 32: Consistency, Availability, Partition: Make Your Choice

PA R T I T I O N I N G : T H E R I N GD Y N A M O

A

B

C

DE

F

G

DATAhash

Page 33: Consistency, Availability, Partition: Make Your Choice

R E P L I C AT I O ND Y N A M O

A

B

C

DE

F

G

N = 3 D will store keys in the range (A, B], (B, C], (C, D]

DATAhash

Page 34: Consistency, Availability, Partition: Make Your Choice

D ATA V E R S I O N I N GD Y N A M O

put()may return before the update has been propagated to all replicas.

get()subsequent get() may return an object that does not have the latest update

Page 35: Consistency, Availability, Partition: Make Your Choice

R E C O N C I L I AT I O ND Y N A M O

Page 36: Consistency, Availability, Partition: Make Your Choice

R E C O N C I L I AT I O ND Y N A M O

Syntactic reconciliation

‣ new version subsumes the previous

Semantic reconciliation

‣ conflicting versions of the same object

Page 37: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Page 38: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Definition‣ list of (node, counter) pairs

Page 39: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Definition‣ list of (node, counter) pairs

D1

[Sx,1]

write

handled by Sx

Page 40: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Definition‣ list of (node, counter) pairs

D1

[Sx,1]

D2

[Sx,2]

write

handled by Sx

write

handled by Sx

Page 41: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Definition‣ list of (node, counter) pairs

D1

[Sx,1]

D2

[Sx,2]

D3

[Sx,2], [Sy,1]

write

handled by Sx

write

handled by Sxhandled by Sywrite

Page 42: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Definition‣ list of (node, counter) pairs

D1

[Sx,1]

D2

[Sx,2]

D3

[Sx,2], [Sy,1]

D4

[Sx,2], [Sz,1]

write

handled by Sx

write

handled by Sx

write

handled by Sy

writehandled by Sz

Page 43: Consistency, Availability, Partition: Make Your Choice

V E C T O R C L O C KD Y N A M O

Definition‣ list of (node, counter) pairs

D1

[Sx,1]

D2

[Sx,2]

D3

[Sx,2], [Sy,1]

D4

[Sx,2], [Sz,1]

D5 [Sx,3], [Sy,1], [Sz,1]write

handled by Sx

write

handled by Sx

write

handled by Sy

writehandled by Sz

reconciled and written by Sx

Page 44: Consistency, Availability, Partition: Make Your Choice

P U T ( ) A N D G E T ( )D Y N A M O

R

‣ minimum number of nodes that must partecipate in a read operation.

W

‣ minimum number of nodes that must participate in a successful write operation

Page 45: Consistency, Availability, Partition: Make Your Choice

P U T ( ) A N D G E T ( )D Y N A M O

put()‣ the coordinator generates the vector clock for the new version and

writes the new version locally ‣ the new version is sent to N nodes ‣ the write is successful if W-1 nodes respond

get()‣ the coordinator requests all existing versions of data ‣ the coordinator waits for R responses before returning the result ‣ the coordinator returns all the version causally unrelated ‣ the divergent versions are reconciled and written back

Page 46: Consistency, Availability, Partition: Make Your Choice

S L O P P Y Q U O R U MD Y N A M O

A

B

C

DE

F

G

N = 3

Page 47: Consistency, Availability, Partition: Make Your Choice

W H Y I S A P ?D Y N A M O

‣ requests served even if some replicas are not available

‣ if some node is down the write is stored to another node

‣ consistency conflicts resolved at read time or in the

background

‣ eventually, all the replicas will converge

‣ concurrent read/write operation can make distinct clients

see distinct versions of the same key

Page 48: Consistency, Availability, Partition: Make Your Choice

B I G TA B L E

Page 49: Consistency, Availability, Partition: Make Your Choice

R E Q U I R E M E N T SG O O G L E B I G TA B L E

‣ scale to petabyte of data ‣ thousand of machines ‣ high availability ‣ high performance

Page 50: Consistency, Availability, Partition: Make Your Choice

D ATA M O D E LG O O G L E B I G TA B L E

‣ sparse, distributed, persistent multi-dimensional sorted map

(row: string, column: string, time: int64) string

Page 51: Consistency, Availability, Partition: Make Your Choice

R O W SG O O G L E B I G TA B L E

‣ arbitrary strings ‣ read/write operations are atomic ‣ data is maintained in lexicographic order by row key ‣ each row range is called a tablet

maps.google.com com.google.maps

Page 52: Consistency, Availability, Partition: Make Your Choice

C O L U M N SG O O G L E B I G TA B L E

‣ columns keys are grouped into sets: column families ‣ a column family must be created before data can be

stored under any column key in that family

‣ column key named as family:qualifier ‣ access control and both disk and memory

accounting are performed at the column-family level

Page 53: Consistency, Availability, Partition: Make Your Choice

T I M E S TA M P SG O O G L E B I G TA B L E

C O N T E N T S :

c o m . e x a m p l e

< h t m l > …

< h t m l > …

t 1

t 2

Page 54: Consistency, Availability, Partition: Make Your Choice

D ATA M O D E L : E X A M P L EG O O G L E B I G TA B L E

L A N G U A G E : C O N T E N T S : A N C H O R : C N N S I . C O M A N C H R : M Y L O O K . C A

c o m . e x a m p l e e n< ! D O C T Y P E h t m l P U B L I C

c o m . c n n . w w w e n< ! D O C T Y P E h t m l P U B L I C

…“ c n n " “ c n n . c o m ”

c o m . c n n . w w w / f o o e n< ! D O C T Y P E h t m l P U B L I C

column familiesrow keys

sort

ed ro

ws

Page 55: Consistency, Availability, Partition: Make Your Choice

D I F F E R E N C E S W I T H R D B M SG O O G L E B I G TA B L E

R D B M S B I G TA B L E

q u e r y l a n g u a g e s p e c i f i c a p i

j o i n s n o r e f e r e n t i a l i n t e g r i t y

e x p l i c i t s o r t i n g s o r t i n g d e f i n e d a p r i o r i i n t h e c o l u m n f a m i l y

Page 56: Consistency, Availability, Partition: Make Your Choice

A R C H I T E C T U R EG O O G L E B I G TA B L E

Google File System (GFS)

‣ store data files and logs

Google SSTable

‣ store BigTable data

Chubby

‣ high-available distributed lock service

Page 57: Consistency, Availability, Partition: Make Your Choice

C O M P O N E N T SG O O G L E B I G TA B L E

library‣ linked into every client

one master server‣ assigning tablets to tablet server ‣ detecting the addition and expiration of tablet servers ‣ balancing tablet-server load ‣ garbaging collection of files in GFS ‣ handling schema changes

many tablet servers‣ manages 10 to 100 tablets ‣ handles read and write requests to the tablets ‣ splits tablets that have grown too large

Page 58: Consistency, Availability, Partition: Make Your Choice

C O M P O N E N T SG O O G L E B I G TA B L E

Master server

Client

Tablet server Tablet server Tablet server

Metadata

read/write

Page 59: Consistency, Availability, Partition: Make Your Choice

S TA R T U P A N D G R O W T HG O O G L E B I G TA B L E

Chubby fileRoot tablet

1st Metadata tablet

other metadata

tablets

UserTableN

UserTable1

Page 60: Consistency, Availability, Partition: Make Your Choice

TA B L E T A S S I G N M E N TG O O G L E B I G TA B L E

tablet server‣ when started, creates and acquires a lock in Chubby

master‣ grabs a unique master lock in Chubby ‣ scans Chubby to find live tablet servers ‣ asks each tablet server to discover its tablets ‣ scans the Metadata table to learn the full set of tablets ‣ builds a set of unassigned tablet server, for future tablet

assignment

Page 61: Consistency, Availability, Partition: Make Your Choice

W H Y I S C P ?G O O G L E B I G TA B L E

‣ master death cause services no longer functioning

‣ tablet server death cause tablets unavailable

‣ Chubby death cause BigTable inability to execute

synchronization operations and to serve client requests

‣ Google File System is a CP system

Page 62: Consistency, Availability, Partition: Make Your Choice

$ W H O A M I

Andrea Giuliano@bit_sharkwww.andreagiuliano.it

Page 63: Consistency, Availability, Partition: Make Your Choice

joind.in/13224Please rate the talk!

Page 64: Consistency, Availability, Partition: Make Your Choice

G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key-value Store” F. Chang et al. “Bigtable: A Distributed Storage System for Structured Data”

Assets: https://farm1.staticflickr.com/41/86744006_0026864df8_b_d.jpg https://farm9.staticflickr.com/8305/7883634326_4e51a1a320_b_d.jpg https://farm5.staticflickr.com/4145/4958650244_65b2eddffc_b_d.jpg https://farm4.staticflickr.com/3677/10023456065_e54212c52e_b_d.jpg https://farm4.staticflickr.com/3076/2871264822_261dafa44c_o_d.jpg https://farm1.staticflickr.com/7/6111406_30005bdae5_b_d.jpg https://farm4.staticflickr.com/3928/15416585502_92d5e608c7_b_d.jpg https://farm8.staticflickr.com/7046/6873109431_d3b5199f7d_b_d.jpg https://farm4.staticflickr.com/3007/2835755867_c530b0e0c6_o_d.jpg https://farm3.staticflickr.com/2788/4202444169_2079db9580_o_d.jpg https://farm1.staticflickr.com/55/129619657_907b480c7c_b_d.jpg https://farm5.staticflickr.com/4046/4368269562_b3e05e3f06_b_d.jpg https://farm8.staticflickr.com/7344/12137775834_d0cecc5004_k_d.jpg https://farm5.staticflickr.com/4073/4895191036_1cb9b58d75_b_d.jpg https://farm4.staticflickr.com/3144/3025249284_b77dec2d29_o_d.jpg https://www.flickr.com/photos/avardwoolaver/7137096221

R E F E R E N C E S