Consistency in Distributed Systems, Part 2

Consistency in Distributed Systems: IIMike MillerCo-Founder, Chief Scientist @mlmilleratmit

2014-07-31 2

AMP on Consistency

https://amplab.cs.berkeley.edu/tag/consistency/

https://amplab.cs.berkeley.edu/tag/consistency/

2014-07-31

Topics For Today‣ Brief review of Part I (2014-06-12)

‣ Why is Consistency hard?

‣ What should you really care about? • Single object/row/document operations • Multi-part transactions • Primary/secondary indexes

‣ Dirty little (ACID) secrets: Results from industry survey

‣ Failure modes, strategies, and gotchas

3

2014-07-31

Motivation

4

2014-07-31 5

MobileBig Data

=> Stress models for consistency, transactional reasoning

2014-07-31

This is your problem when… !

… data doesn’t fit on one server. … data replicated between servers (e.g. read slaves). … data spread between data centers. … state spread across more than one device (mobile!) … mixed workloads with concurrency. … state spread across more than one process.

6

2014-07-31

This is now everyone’s problem

7

2014-07-31

Good news — market response: NewSQL, NoSQL, Cloud, …

8

2014-07-31 9

ships with a mobile strategy

2014-07-31

{Write: ‘Local’, Sync: ‘Later’}

Embedded, Edge, Satellites

Desktop, Browser

Cloud

10

2014-07-31

NoSQL Taxonomy

11

2014-07-31 12

2014-07-31 13

…

…http://www.bailis.org/papers/ramp-sigmod2014.pdf

Fundamental reason: CAP Theorem

2014-07-31

You do need to understand your datastore.

14

2014-07-31

Why is Consistency Hard?

15

2014-07-31

1. The network is reliable.

2. Latency is zero. (Fallacies of Distributed Computing, P. Deutsch)

16

MySQL, MongoDB, CouchDB, SOLR, …

Dynamo, Cloudant, Cassandra, Riak, …

Primary

Secondary

Client

Repl

icat

ion

w(x=1)success

Clientr(x)x=1

time

Perfect Network

Primary

Secondary

Client

Repl

icat

ion

w(x=1)success

Clientr(x)x=Null

time

Network Partition: Primary Only

Available, temporarily inconsistent

Primary

Secondary

Client

Repl

icat

ion

w(x=1)

Client success

time

Network Partition: Primary+Secondary

Consistent

Primary

Secondary

Client

Repl

icat

ion

w(x=1)

failure

time

Network Partition: Primary+Secondary

Not Available

2014-07-31

Partition Failures Dominate‣ 2011 (AWS): • misconfiguration => 12 hour outage

‣ 2011 Survey (Microsoft): • 13,300 customer impacting network failures • Median 60,000 packts lost per failure • mean 41 link failures per day (95% of 136) • median time to repair of 5 minutes (up to a week) • Redundant networks only reduce failure impact by 40%

‣ HP Managed Enterprise Networks • 28% of customer tickets due to network problems • 39% of all support tickets due to network problems • Median incident duration: 114-188 minutes

22

http

://qu

eue.

acm

.org

/det

ail.c

fm?i

d=26

5573

6

http://queue.acm.org/detail.cfm?id=2655736

2014-07-31

LatencyNetwork health really depends on your latency tolerance.

A slow network can be just as bad as a broken network.

The tails matter.

23

2014-07-31

Median Latencies

24

Same AZ Different AZs

Different Regions

http://www.bailis.org/blog/communication-costs-in-real-world-networks/


2014-07-31

99.99% Latencies

25

Same AZ Different AZs

Different Regions



2014-07-31

Latency Summary‣ Distributed, coordinated operations: ‣ rate ~ 1/latency

‣ Real world latencies are substantial, with long tails

‣ At scale, 0.01% events happen constantly

‣ Picture actually much worse due to systematic fluctuations

‣ 99.99% Latencies: ‣ Same AZ: ~50 ms ‣ Same Region: ~80 ms ‣ Inter-Region: 200-400 ms!

26

2014-07-31

Thank god for ACID (New)SQL, right? !

… not so fast

27

2014-07-31

ACID in the Wild

28

http://arxiv-web3.library.cornell.edu/abs/1302.0309v1


2014-07-31

Beware the Marketing

29



2014-07-31 30

Wow!

2014-07-31

So… What do we use? What should we worry about?

31

2014-07-31

1. Locks / Concurrency 2. Relationships / Foreign Keys 3. Inter-index consistency

32

Distinguishing Characteristics

2014-07-31

Subjective Classification

33

Cassandra Cloudant MongoDB Riak

Locking Minimal None Writes and Reads Minimal

Consistency Quorum, Optional Paxos Quorum Single document

LocksQuorum,

Optional Paxos

Relationships, “JOINs”

De-normalize, Materialized

Views

Normalize, Materialized

Views

De-normalize, Application Joins

De-normalize or Link Walking

Leading Strategies Immutability Immutability Fat Documents Immutability

“Intention” HA, Shared Nothing, Many Servers

HA, Shared Nothing, Many Servers

Master/Slave, Single Server

HA, Shared Nothing, Many Servers

2014-07-31

It happens in all no-SQL systems. Is it the application's responsibility or the DB?

34

De-normalization

2014-07-31

Relationships as Single Documents

35

Natural fit for some applicationshttp://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

2014-07-31

Relationships as Single Documents

36

Duplication sucks, pathologicallyhttp://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

2014-07-31

Materialized Views Rule

37

Cassandra, Cloudant: “JOINs” via materialized views

2014-07-31

Review: Cassandra

‣ Highly Available

‣ CQL eases pain of de-normalization

‣ 1-many, many-many relationships via inserts into multiple column families at update

‣ Eventual consistency as those updates propagate

‣ Can appeal to Paxos API with latency, availability hit

38

2014-07-31

Review: Cloudant‣ Highly Available

‣ Normalize document structure, include foreign keys to other documents.

‣ Manage foreign key integrity yourself

‣ 1-many, many-many relationships via materialized views

‣ Eventual consistency between primary-index and (batch updated) materialized view

39

2014-07-31

Review: MongoDB‣ Understand when MongoDB locks

‣ Go as far as you can with “fat”, de-normalized documents

‣ Beware the consistency subtleties of replica sets, de-normalization

40

2014-07-31

Review: Riak‣ Highly Available

‣ Include foreign keys to other documents.

‣ Manage foreign key integrity yourself

‣ one-way (“graphy”) relationships via link-walking API

‣ Can appeal to Paxos API with latency, availability hit

41

2014-07-31

My Final $0.02‣ Time to market should be your #1 concern.

‣ You will probably run both SQL and NoSQL.

‣ We’ve focused on the database, but all new apps need a mobile strategy.

‣ You’ll never engineer a perfect network • Focus on Availability and Partition Tolerance

‣ You will need to become advanced/expert in data modeling for your choice of DB

42

2014-07-31

cloudant.com

[email protected]

@mlmilleratmit

#Cloudant

Thanks!

43

IRC

mailto:[email protected]

2014-07-31 44

Consistency in Distributed Systems, Part 2

Technology

failure time network

slow network

broken network

latency network health

time perfect network

null time network partition

applications http

consistency hard