Top Banner
Everything you always wanted to know about Highly Available Distributed Databases javier ramirez - @supercoco9 AMSTERDAM 11-12 MAY 2016 https://teowaki.com
50

Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Jan 20, 2017

Download

Software

javier ramirez
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Everything you always wanted to know about Highly Available Distributed Databases

javier ramirez - @supercoco9

AMSTERDAM 11-12 MAY 2016

https://teowaki.com

Page 2: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

IBM Data Centerin Japan duringand after an earthquake

Page 3: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems
Page 4: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems
Page 5: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

A squirrel did take out half of our Santa Clara data centre two years backMike Christian, Yahoo Director of Engineering

Page 6: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems
Page 7: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Hayastan Shakarian

a.k.a.The SpadeHacker

Page 8: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Cut-offArmeniafrom the Internetfor almostone day*

* By accident, while scavenging copper

Page 9: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

I have no idea what the internet is

Page 10: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Some data center outages reported in 2015:

* Amazon Web Services* Apple iCloud* Microsoft Azure* IBM Softlayer* Google Cloud Platform

* And of course every hosting with scheduled maintenance operations (rackspace, digitalocean, ovh...)

Page 11: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Complex systems can and will fail

Page 12: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

You better distribute your data, or else...

Also, distributed databases can perform better and run on cheaper hardware thancentralised ones

Page 13: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Most basic level:Backup

Page 14: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

And keep the copyon a separate data centre*

* Vodafone once lost one yearof data on a fire because of this

Page 15: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Next Level:

Replicas(master-slave)

Page 16: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

A main server sends a binary log of changes to one or more

replicas

* Also known as Write Ahead Log or WAL

Page 17: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Master-slave is good but

* All the operations are replicated on all slaves

* Good scalability on reads, but not on writes

* Cannot function during a network partition

* Single point of failure (SPOF)

Page 18: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Next Level:Multi-Master Cluster(master-master)

Page 19: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Every server can accept reads or writes, and send its binary log to all the other servers

* also referred as update-anywhere

Page 20: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Multi-master is great, but:* All the operations are replicated on all masters.

* When synchronous, high latency (Consistency achieved via locks, coordination and serializable transactions)

* When asynchronous, typically poor conflict resolution

*Hard to scale up or down automatically

Page 21: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

The system I want:* Always ON, even with network partitions

* Scales out both reads and writes. Doesn't need to keep all the data in all the servers

* Runs on cheap commodity diverse hardware

* Runs locally to my users (low latency)

* Grows/shrinks elastically and survives server failures

Page 22: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Then you need to let go ofmany convenient things you take for granted in databases

Page 23: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

AvailabilityPartition Tolerance

Consistency

CA

AP

CP

CAP Theorem

Everything is a trade-off

Page 24: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Next Level:Distributed Data stores

Page 25: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems
Page 26: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Distributed DB design decisions

* data (keys) distribution* data replication/durability* conflict resolution* membership* status of the other peers* operation under partitions and during unavailability of peers* incremental scalability

Page 27: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Data distribution

Consistent hashing based on the key

Usually implies operations work on single keys. Somesolutions, like Redis, allow the clients to group related keys consistently. Some solutions, like BigTable, allow tocollocate data by group or family.

Queries are frequently limited to query by key or by secondary indexes (say bye to the power of SQL)

Page 28: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Data distribution. The Ring

Page 29: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Data ReplicationHow many replicas of each? Typically at least 3, so in case of conflicts there can be a quorum

Often, the distribution of keys is done taking into account the physical location of nodes, so replicas live in different racks or different datacentres

Page 30: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Replication: durabilityIf we want to have a durable system, we need at least to make sure the data is replicated in at least 2 nodes beforeconfirming the transaction to the client.

This is called the write quorum, and in many cases it can be configured individually.

Not all data are equally important, and not all systems have thesame R/W ratio.

Systems can be configured to be “always writable” or “always readable”.

Page 31: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Conflicts

I see a record that I thought was deleted

I created a record but cannot see it

I have different values in two nodes

Something should be unique, but it's not

Page 32: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

No-Conflict strategies

Quorum-based systems: Paxos, RAFT. Require coordinationof processes with continuous electionsof leaders and consensus. Worse latency

Last Write Wins (LWW): Doesn't require coordination. Good latency

Page 33: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

But, what does “Last” mean?

* Google spanner uses atomic clocks and servers with GPS clocks to synchronize time

* Cassandra tries to sync clocks and divides updates in small parts to minimize conflict

* Dynamo-like use vector clocks

Page 34: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Conflict resolution

Can be done at Write time or at Read time.

Page 35: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Conflict resolution

Can be done at Write time or at Read time.

As long as R + W > N it's possible to reach a quorum

Page 36: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Vector clocks

* Don't need to sync time

* There are several versions of a same item

* Need consolidationto prune size

* Usually client needs tofix the conflict and update

Page 37: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Alternatives to conflict resolution

* Conflict-Free-Replicated-Datatypes(CRDT).Counters, Hashes, Maps

* Allowing for strong consistency on keys from the same family

* The Uber solution with serialized tokens

* Some solutions are implementing immutability, so no conflicts

* Peter David Bailis paper on Coordination Avoidance usingRead Atomic Multi-Partition transactions (Nov/15)

Page 38: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

membership

gossip

infection-likeprotocols

Page 39: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Gossip

A centralised server is a SPOF

Communicating state with each node is very time consumingand doesn't support partitions

Gossip protocols communicate pairs of random nodes atregular frequent intervals and exchange information.

Based on that information exchange, a new status is agreed

Page 40: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Gossip example

Page 41: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Incremental scalabilityWhen a new node enters the system, the rest of nodes noticevia gossip.

The node claims a partition of the ring and asksthe replicas of the same partition to send data to it.

When the rest of nodes decide (after gossiping) that a nodehas left the system and it's not a temporary failure, the dataassigned to the partitions of that node is copied to more replicas to reach the N copies.

All the process is automatic and transparent.

Page 42: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Operation under partition:

Hinted HandoffOn a network partition, it can happen that we have less than W nodes of the same segment in the current partition.

In this case, the data is replicated to W nodes, even if thatnode wasn't responsible for the segment. The data is keptwith a “hint”, and stored in a special area.

Periodically, the server will try to contact the original destination and will “hand off” the data to it.

Page 43: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Operation under partition:

Hinted Handoff

Page 44: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Anti Entropy

A system with handoffs can be chaotic and not veryeffective

Anti Entropy is implemented to make sure hints arehanded off or synchronized to other nodes

Anti entropy is usually achieved by using Merkle Trees, ahash of hashes structure very efficient to compare differences between nodes

Page 45: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

All this features mean your clients need tobe aware of some internals of the system

Page 46: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Clients must * Know which close nodes are responsible for each segment of the ring, and hash locally**

* Be aware of when nodes become available or unavailable**

* Decide on durability

* Handle conflict resolution, unless under LWW

** some solutions offer a load balancer proxy to abstract the client from that complexity, but trading off latency

Page 47: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

now you know how it works

* A system that always can work, even with network partitions

* That scales out both reads and writes

* On cheap commodity diverse hardware

* Running locally to your users (low latency)

* Can grow/shrink elastically and survive server failures

Page 48: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Extra level: Build yourown distributed database

Netflix dynomite, built in Java

Uber ringpop, built in JavaScript

Page 49: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Not ScaredOf YouAnymore

Page 50: Basics of the Highly Available Distributed Databases - teowaki - javier ramirez - cloud - big data - distributed systems

Dank jeJavier Ramírez@supercoco9

All pictures belongto their respective authors

AMSTERDAM 9-12 MAY 2016

Find related links at

http://bit.ly/teowaki-distributed-systems(https://teams.teowaki.com/teams/javier-community/link-categories/distributed-systems)

need help with cloud, distributed systems or big data?

https://teowaki.com