Top Banner
Databases Sargun Dhillon @Sargun
143

Intro to Databases

Apr 16, 2017

Download

Technology

Sargun Dhillon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intro to Databases

DatabasesSargun Dhillon

@Sargun

Page 2: Intro to Databases

What is a database? A database is an organized collection of data

Page 3: Intro to Databases

What are databases for?

Applications

Page 4: Intro to Databases

Internet ApplicationsExperiencing exploding growth

Page 5: Intro to Databases

Internet Traffic vs. Penetration

0

25

50

75

100

0

10000

20000

30000

40000

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

IP Traffic (PB/mo) Global Penetration (%)

Page 6: Intro to Databases

Number of Internet Users in 2012

Page 7: Intro to Databases

Average Distance to Every Human

Page 8: Intro to Databases

ExtrapolatingWe have not yet reached Peak “Web” and we won’t see

it for some time

Page 9: Intro to Databases

ApplicationsHow are they built?

Page 10: Intro to Databases

Basic Application

Page 11: Intro to Databases

Useful ApplicationAdd Persistence

Page 12: Intro to Databases

Scale Out

Page 13: Intro to Databases

Scale Out with Correctness

Page 14: Intro to Databases

What is a Transaction?A Unit of Work

Page 15: Intro to Databases

Transaction SchedulingConcurrent Operations

Page 16: Intro to Databases

Non-Conflicting ConcurrencyParallel Execution

Page 17: Intro to Databases

ACID

Page 18: Intro to Databases

ACID = AtomicityA transaction executes or it does not

Page 19: Intro to Databases

ACID = ConsistencyCorrectness; Require the database to follow set of

invariants

Page 20: Intro to Databases

ACID = IsolationPrevent inter-actor visibility during concurrent operations

Page 21: Intro to Databases

ACID = DurabilityOnce you write, it will survive

Page 22: Intro to Databases

Lifecycle of a Transaction

Page 23: Intro to Databases

Vertically ScalabilityMoore’s Law can take us places

Page 24: Intro to Databases

Biggest AWS Database• vCPUs: 32

• Memory: 244

• Storage: 3TB

• IOPs: 30,000 IOPs

• Networking: 10 Gigabit

• Resiliency: Multi-AZ

• SLA: 99.95%

• Backend: Postgresql

Page 25: Intro to Databases

$141,052.66/yr

Page 26: Intro to Databases

Scaling Beyond

Page 27: Intro to Databases

Sharding?

Page 28: Intro to Databases
Page 29: Intro to Databases

Do we have a natural sharding key?

Page 30: Intro to Databases

Add a Coordinator?

Page 31: Intro to Databases

Two-phase commit?

Three-phase commit?

Paxos?

Enhanced Three-phase commit?

Wat?

Egalitarian Paxos?

Page 32: Intro to Databases

Do we really want to run NxM databases?

Page 33: Intro to Databases

Partial Availability

Page 34: Intro to Databases

Failure detectors are hard

Page 35: Intro to Databases

Database Failure

Page 36: Intro to Databases

Cascading App Failure

Page 37: Intro to Databases

Recovery

Page 38: Intro to Databases

Hotspots? (The “Bieber” problem)

Page 39: Intro to Databases

Scaling SSI databases is a hard problem

Page 40: Intro to Databases

What if want multidatacenter?

Page 41: Intro to Databases
Page 42: Intro to Databases
Page 43: Intro to Databases
Page 44: Intro to Databases

No latency win for mutable data

Page 45: Intro to Databases

Must sacrifice recency for latency win

Page 46: Intro to Databases

Complex Routing Semantics

Page 47: Intro to Databases

Multi-master requires at least 1 RTT

Page 48: Intro to Databases

-F1: A Distributed SQL Database That Scales, Google

“Because the data is synchronously replicated across multiple datacenters, and because

we’ve chosen widely distributed datacenters, the commit latencies are relatively high (50-150

ms).”

Page 49: Intro to Databases

-Kohavi and Longbotham 2007

“Every 100 ms increase in load time of Amazon.com decreased sales by 1%.”

(~$120M of losses per 100 ms)

Page 50: Intro to Databases

“Average partition duration ranged from 6 minutes for software-related failures to more than 8.2 hours for

hardware-related failures (median 2.7 and 32 minutes; 95th percentile of 19.9 minutes and 3.7 days,

respectively).” -The Network is Reliable

WANs Fail

Page 51: Intro to Databases

Is there another way?

Page 52: Intro to Databases

Eventually Consistent Systems

Page 53: Intro to Databases

-F1: A Distributed SQL Database That Scales, Google

“We also have a lot of experience with eventual consistency systems at Google. In all such

systems, we find developers spend a significant fraction of their time building

extremely complex and error-prone mechanisms to cope with eventual consistency

and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems

should be solved at the database level. ”

Page 54: Intro to Databases

CAP Theorem

Page 55: Intro to Databases

“A shared-data system can have at most two of the three following properties:

Consistency, Availability, and tolerance to network Partitions.”

-Dr. Eric Brewer

Page 56: Intro to Databases

On Consistency

• ACID Consistency: Any transaction, or operation will bring the database from one valid state to another

• CAP Consistency: All nodes see the same data at the same time (synchrony)

Page 57: Intro to Databases

On Partition Tolerance

• The network will be allowed to lose arbitrarily many messages sent from one node to another.

• Databases systems, in order to be useful must have communication over the network

• Clients count

Page 58: Intro to Databases

There is no such thing as a 100% reliable network:

Can’t choose CA

http://codahale.com/you-cant-sacrifice-partition-tolerance

Page 59: Intro to Databases

We Can Have Both*(*Just not at the same time)

Page 60: Intro to Databases

PNUTS• Paper released by Yahoo! research in 2008

• Operations:

• Read-Any

• Read-Critical(Required-Version)*

• Read-Latest

• Write

• Test-and-set-write(Required-Version)

* Will fall back to CP operation

Page 61: Intro to Databases

Weak Consistency

Page 62: Intro to Databases

Weak Consistency

Page 63: Intro to Databases

“This is a specific form of weak consistency; the storage system

guarantees that if no new updates are made to the object,

eventually all accesses will return the last updated value.”

Definition of “Eventual Consistency” from “Eventually Consistency Revisited” - Werner Vogels

Page 64: Intro to Databases
Page 65: Intro to Databases

Eventual Consistency in the LAN

Page 66: Intro to Databases

Less Relevant Today

Page 67: Intro to Databases

Good at Building LANs at Scale

Page 68: Intro to Databases

Facebook Fabric

Page 69: Intro to Databases

Microsoft VL2

Page 70: Intro to Databases

Google Jupiter

Page 71: Intro to Databases

Less Interesting

Page 72: Intro to Databases

Eventual Consistency in the WAN

Page 73: Intro to Databases

Low-latency everywhere

Page 74: Intro to Databases

Write AnywhereBeat the speed of the light

Page 75: Intro to Databases

Build for WAN locality

Page 76: Intro to Databases

Typical Pattern with

COTS EC Store

Page 77: Intro to Databases

System Model

Page 78: Intro to Databases

Use Case: Social Network

Page 79: Intro to Databases

Models: Users, Posts, Friends

Page 80: Intro to Databases

SchemaCREATE TABLE test.users ( user_name text PRIMARY KEY, friends set<text>, posts set<text> )

Page 81: Intro to Databases

State*****:test> SELECT * FROM users;

user_name | friends | posts -----------+----------+------- sargun | {'BOSS'} | null

Page 82: Intro to Databases

Let’s Post!(But First)

Page 83: Intro to Databases

Remove Boss

*****:test> UPDATE users SET friends = friends - {'BOSS'} WHERE user_name = 'sargun' ;

Page 84: Intro to Databases

Hidden Failure

Page 85: Intro to Databases

Dropped Unfriending

Page 86: Intro to Databases

State at DC2 & DC3*****:test> SELECT * FROM users;

user_name | friends | posts -----------+----------+------- sargun | {'BOSS'} | null

Page 87: Intro to Databases

Post Message

*****:test> UPDATE users SET posts = posts + {'PARTY'} WHERE user_name = 'sargun' ;

Page 88: Intro to Databases

State at DC2 & DC3*****:test> SELECT * FROM users;

user_name | friends | posts -----------+----------+----------- sargun | {'BOSS'} | {'PARTY'}

Page 89: Intro to Databases

Worse Than Banking

Unbounded Financial Loss

Page 90: Intro to Databases

No Happens-Before (h.b.)

Relationship

Page 91: Intro to Databases

Solution: Wait For Acks

Page 92: Intro to Databases

Very Little Benefit Over

CP system

Quorum Systems

Page 93: Intro to Databases

RYOW at an Incredible Cost

Page 94: Intro to Databases

Why not just do Paxos*?

Single-Decree Paxos Variant such as EPaxos, Cheap Paxos, or Multi-Paxos

Page 95: Intro to Databases
Page 96: Intro to Databases
Page 97: Intro to Databases

Quorum

Page 98: Intro to Databases

Quorum

Page 99: Intro to Databases

Participating Quorums Must Overlap

Page 100: Intro to Databases

Just Perform Paxos Reconfiguration

to Recover from Failure

Page 101: Intro to Databases
Page 102: Intro to Databases
Page 103: Intro to Databases
Page 104: Intro to Databases

Is there an alternative?

Page 105: Intro to Databases

Strong Eventual

Consistency

Page 106: Intro to Databases

Strong Eventual Consistency

“Any set of nodes that have received the same (unordered) set of updates

will be in the same state.”

Page 107: Intro to Databases

How do you even use this?

Page 108: Intro to Databases

Vector Clocks

Page 109: Intro to Databases

Vector Clocks• Extension of Lamport Clocks

• Used to detect cause and effect in distributed systems

• Can determine concurrency of events, and causality violations

• Preserves h.b. relationships

Page 110: Intro to Databases

CRDTs

Page 111: Intro to Databases

• CRDTs:

• Convergent Replicated Data Types

• Commutative Replication Data Types

• Enables data structures to be always writeable on both sides of a partition, and replay after healing a partition

• Enable distributed computation across monotonic functions

• Two Types:

• CvRDTs

• CmRDTs

CRDTs

Page 112: Intro to Databases

CvRDTs

• State / value based CRDTs

• Minimal state

• Don’t require active garbage collection

Page 113: Intro to Databases

Set CvRDT

Page 114: Intro to Databases

CmRDTs

• Op / method based CRDTs

• Size grows monotonically

• Uses version vectors to determine order of operations

Page 115: Intro to Databases

Counter CmRDT

Page 116: Intro to Databases

CRDTs in the Wild• Sets

• Observe-remove set

• Grow-only sets

• Counters

• Grow-only counters

• PN-Counters

• Flags

• Maps

Page 117: Intro to Databases

Data structures that are CRDTs

• Probabilistic, convergent data structures

• Hyper log log

• Bloom filter

• Co-recursive folding functions

• Maximum-counter

• Running Average

• Operational Transform

Page 118: Intro to Databases

CRDTs

• Incredibly powerful primitive

• Not only useful for in-database manipulation but client-database interaction

• You can compose them, and build your own

• Garbage collection is tricky

Page 119: Intro to Databases

Riak In Action

Page 120: Intro to Databases

Modelcurl -s http://localhost:8098/types/test/buckets/test/datatypes/sargun |python -mjson.tool { "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq", "type": "map", "value": { "friends_set": [ "Boss" ], "posts_set": [] } }

Page 121: Intro to Databases

“Primary Key”curl -s http://localhost:8098/types/test/buckets/test/datatypes/sargun |python -mjson.tool { "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq", "type": "map", "value": { "friends_set": [ "Boss" ], "posts_set": [] } }

Page 122: Intro to Databases

Causal Contextcurl -s http://localhost:8098/types/test/buckets/test/datatypes/sargun |python -mjson.tool { "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq", "type": "map", "value": { "friends_set": [ "Boss" ], "posts_set": [] } }

Page 123: Intro to Databases

Updatecurl -XPOST http://localhost:8098/types/test/buckets/test/datatypes/sargun \ -H "Content-Type: application/json" \ -H "X-Riak-Vclock: g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq" \ -d ' { "update": { "friends_set": { "remove": "Boss" } } }'

Page 124: Intro to Databases

Updated Entries (during partition)

{ "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQpq", "type": "map", "value": { "friends_set": [ "Boss" ], "posts_set": [] } }

{ "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQtq", "type": "map", "value": { "friends_set": [], "posts_set": [] } }

Page 125: Intro to Databases

Updatecurl -XPOST http://localhost:8098/types/test/buckets/test/datatypes/sargun \ -H "Content-Type: application/json" -H "X-Riak-Vclock: g2wAAAABaAJtAAAACBjtDYuvG6A4YQtq" -d ' { "update": { "posts_set": { "add": "Party" } } }'

Page 126: Intro to Databases

Updated Entries (After Healing)

{ "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQ5q", "type": "map", "value": { "friends_set": [], "posts_set": [ "Party" ] } }

{ "context": "g2wAAAABaAJtAAAACBjtDYuvG6A4YQ5q", "type": "map", "value": { "friends_set": [], "posts_set": [ "Party" ] } }

Page 127: Intro to Databases

Currently: Replicates entire value

Page 128: Intro to Databases

Future Work: δ-CRDT

Page 129: Intro to Databases

Ship only Deltas

Page 130: Intro to Databases

Eventual Consistency In Summary

Page 131: Intro to Databases

SEC Enables

Page 132: Intro to Databases

Distributed

Page 133: Intro to Databases

ScalableSc

alab

ility

Processors

Page 134: Intro to Databases

Fault-Tolerant

Page 135: Intro to Databases

Applications

Page 136: Intro to Databases

Eventual Consistency (CAP) Without Consistency (ACID)

Gives EC a Bad Name

Page 137: Intro to Databases

Invariant Operation AP / CPSpecify unique ID Any CP

Generate unique ID Any AP

> INCREMENT AP

> DECREMENT CP

< INCREMENT CP

< DECREMENT AP

Secondary Index Any AP

Materialized View Any APAUTO_INCREMEN

TINSERT CP

Linearizability CAS CP

Operations Requiring

Weak Consistency

vs.

Strong Consistency

Page 138: Intro to Databases

BASE not ACID• Basically Available: There will be a response

per request (failure, or success)

• Soft State: Any two reads against the system may yield different data (when measured against time)

• Eventually Consistent: The system will eventually become consistent when all failures have healed, and time goes to infinity

Page 139: Intro to Databases

Brand New Technology Still being invented

Page 140: Intro to Databases

Technology Timeline• 1996 - Log structured merge tree

• 2000 - CAP Theorem

• 2007 - Amazon Dynamo Paper

• 2011 - INRIA CRDT Technical Report

• 2014 - Riak DT map: a composable, convergent replicated dictionary

Page 141: Intro to Databases

Further Reading• Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area

Storage with COPS

• PNUTS: Yahoo!’s Hosted Data Serving Platform

• F1: A Distributed SQL Database That Scales

• Spanner: Google's Globally-Distributed Database

• The Network is Reliable: An informal survey of real-world communications failures

• A comprehensive study of Convergent and CommutativeReplicated Data Types

• Riak DT Map: A Composable, Convergent Replicated Dictionary

Page 142: Intro to Databases

Get in Touch• If you’re interested in cheating the speed of light

• Come use our software

• If you’re interested in solving today’s computer science problems

• Come work for us

• If you’d like to learn more about distributed systems at scale

• Maybe you have a better idea

Page 143: Intro to Databases

Sargun Dhillon @Sargun

[email protected]

The Case for

Eventual Consistency