Top Banner
CONSENSUS IN DISTRIBUTED COMPUTING LET’S TALK ABOUT
60

Consensus in distributed computing

Feb 22, 2017

Download

Technology

Ruben Tan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

LET’S TALK ABOUT…

Page 2: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

RUBEN TAN LONG ZHENG

▸ CTO of Neuroware, Inc

▸ We Do Blockchain Stuff™

▸ Co-founder of Javascript Developers Malaysia

▸ Proud owner of 2 useless cats

▸ @roguejs

Page 3: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

SUPER HIGH-LEVEL OVERVIEW

▸ Consensus in Distributed Computing

▸ Consensus

▸ Agreeing that something is the truth

▸ Distributed Computing

▸ A network of nodes operating together

Page 4: Consensus in distributed computing
Page 5: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

FAILURE MODES

▸ Fail-stop = a node dies

▸ Fail-recover = a node dies and comes back later (Jesus/Zombie)

▸ Byzantine = a node misbehaves

Page 6: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

BYZANTINE GENERAL’S PROBLEM

▸ One of the first impossibility proof in computer communications

▸ Impossible to solve in a perfect manner

▸ Originated from the Two General’s Problem (1975)

▸ Explored in detail in Leslie Lamport, Robert Shostak, Marshall Pease paper: The Byzantine General Problem (1982)

Page 7: Consensus in distributed computing

ENEMY

A

B

C

D

E

F

TRAITOR

ATTACK!

ATTACK!

ATTACK!

RETREAT!

RETREAT!

RETREAT!

ATTACK! RETREAT!

Page 8: Consensus in distributed computing

ENEMY

A

B

C

D

E

F

TRAITOR

MUAHAHA, NO CONSENSUS!

ROUTS THE FLEEING ARMY

ATTACKERS HAVE

INSUFFICIENT FORCE

AND ARE DESTROYED

Page 9: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

BYZANTINE FAULT TOLERANCE

▸ Byzantine Fault

▸ Any fault that presents different symptoms to different observers (some general attack, some general retreat)

▸ Byzantine Failure

▸ The loss of a system service reliant on consensus due to Byzantine Fault

▸ Byzantine Fault Tolerance

▸ A system that is resilient/tolerant of a Byzantine Fault

Page 10: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

ON A SIDENOTE…

▸ Distributed computing is inherently unreliable

▸ Peter Deutsch, Bill Joy, Tom Lyon and James Gosling

▸ The Eight Fallacies of Distributed Computing (1994-1997)

▸ Today, we still have engineers who believe in some, if not all of the fallacies

Page 11: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

EIGHT FALLACIES OF DISTRIBUTED COMPUTING

▸ The network is reliable

▸ Latency is zero

▸ Bandwidth is infinite

▸ The network is secure

▸ Topology does not change

▸ There is only one administrator

▸ Transport cost is zero

▸ The network is homogeneous (same platform)

Page 12: Consensus in distributed computing

When you believe in any of the eight fallacies…

Page 13: Consensus in distributed computing

CONSENSUS

The Real Talk Begins™

Page 14: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

CONSENSUS OVERVIEW

▸ Achieving Consensus = distributed system acting as one entity

▸ Consensus Problem = getting nodes in a distributed system to agree on something (value, operation, etc)

▸ Basically… consensus = THE HIVE MIND

▸ Common Examples

▸ Commit transactions to a database

▸ Synchronising clocks

Page 15: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

FLP IMPOSSIBILITY PROOF

▸ Michael J. Fisher, Nancy A. Lynch, and Michael S. Patterson

▸ Impossibility of Distributed Consensus with One Faulty Process (1985) - Dijkstra (dike-stra) Award (2001)

▸ In synchronous settings, it is possible to reach consensus at the cost of time

▸ Consensus is impossible in an asynchronous setting even when only 1 node will crash

Page 16: Consensus in distributed computing
Page 17: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

SOLVING THE CONSENSUS PROBLEM

▸ Strong consensus follows these properties:

▸ Termination - all nodes eventually decide on a value

▸ Agreement - all nodes decide on a value

▸ Validity - the value decided must be proposed by a node (AKA no default value to fall back on)

▸ Termination + Agreement + Validity = Consensus

Page 18: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

CONSENSUS PROTOCOLS

▸ 2 Phase Commit

▸ 3 Phase Commit

▸ Basic Paxos

▸ The Future…

Page 19: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Simplest consensus protocol

▸ Phase 1 - Proposal

▸ A node (called coordinator) proposes a value to all other nodes, then gathers votes

▸ Phase 2 - Commit-or-abort

▸ The coordinator sends:

▸ Commit if all nodes voted yes. All nodes commit the new value

▸ Abort if 1 or more nodes voted no. All nodes abort the value

Page 20: Consensus in distributed computing

COOR.

NODE

NODE

NODE

NODE

Coordinator proposes a value

Page 21: Consensus in distributed computing

COOR.

NODE

NODE

NODE

NODE

All nodes vote yes or no

Page 22: Consensus in distributed computing

COOR.

NODE

NODE

NODE

NODE

Coordinator sends commit if

all nodes voted yes; sends

abort otherwise All nodes now

update themselves

to contain the

proposed value, or

all nodes abort

Page 23: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Agreement - every node accepts the value from the coordinator at phase 2 = YES

▸ Validity - commit/abort originated from the coordinator = YES

▸ Termination = no loops in the steps, doesn’t run forever = YES

▸ Therefore, 2 phase commit fulfils the requirements of a consensus protocol

Page 24: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Blocking failure when coordinator fails before sending proposal to all nodes

COOR.

NODE

NODE

NODE

Coordinator proposes a value

Page 25: Consensus in distributed computing

▸ Blocking failure when coordinator fails before sending proposal to all nodes

2 PHASE COMMIT

CONSENSUS IN DISTRIBUTED COMPUTING

COOR.

NODE

NODE

NODE

Receives proposed

value, votes yes, now

waiting for commit

Page 26: Consensus in distributed computing

▸ Blocking failure when coordinator fails before sending proposal to all nodes

2 PHASE COMMIT

CONSENSUS IN DISTRIBUTED COMPUTING

COOR.

NODE

NODE

NODE

Coordinator crashes… and a different

coordinator comes in to propose a

different value

NEW COOR.

Page 27: Consensus in distributed computing

▸ Blocking failure when coordinator fails before sending proposal to all nodes

2 PHASE COMMIT

CONSENSUS IN DISTRIBUTED COMPUTING

COOR.

NODE

NODE

NODENEW COOR.

Node cannot accept new proposal

because waiting on commit. Cannot

abort because first Coordinator might

recover.

Page 28: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

2 PHASE COMMIT

▸ Guarantees safety, but not liveness

▸ Safety = all nodes agree on a value proposed by a node

▸ Liveness = should still be able to function when some nodes crash

Page 29: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

3 PHASE COMMIT

▸ Similar to 2 Phase Commit, with an extra phase (duh)

▸ Phase 1 - Proposal - same as 2PC

▸ Phase 2 - Pre-approve - similar to 2PC commit-or-abort, but nodes reply with ACK instead

▸ Phase 3 - Do Commit - now the nodes commit

▸ Tolerant of node crashes, but not network partitions

▸ Won’t cover in detail

Page 30: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Presented by Leslie Lamport in The Part-Time Parliament (1988)

▸ Named after the Paxos civilisation’s legislation

▸ Remains as:

▸ The hardest to understand in theory

▸ The hardest to implement

▸ The closest we get to reaching ideal consensus

Page 31: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Used in:

▸ Apache Zookeeper

▸ Google Chubby (BigTable)

▸ Google Spannar

▸ Apache Mesos

▸ Apache Cassandra

▸ etc

Page 32: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Components:

▸ Proposers

▸ Proposes values to other nodes

▸ Acceptors

▸ Respond to proposers with votes

▸ Commits chosen value & decision state

▸ Server can have both 1 Proposer & 1 Acceptor

Page 33: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Uses a two-base approach:

▸ Broadcast Prepare

▸ Find out if there’s already a chosen value

▸ Block older proposals that have yet to be completed

▸ Broadcast Accept

▸ Ask acceptors to accept a value

Page 34: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Prepare(n)

▸ n = proposal number [max++]~[server id]

▸ Return(p, v)

▸ p = proposal number

▸ v = current accepted value (if any)

▸ Accept(p, v)

▸ p = proposal number

▸ v = value to be accepted

Page 35: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Proposal Phase

▸ Proposer generates a proposal number p

▸ Proposer broadcasts p and a value v

▸ Acceptor checks p if higher than its min-p, updates if so

▸ Acceptor replies any acc-p and acc-v

▸ Proposer waits for majority

▸ Checks if any return acc-p is highest, and replace v with acc-v

Page 36: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS

▸ Accept Phase

▸ Proposer sends p and v to all acceptors

▸ Acceptors check if p is lower than min-p, and ignores if so. Otherwise, acc-p = min-p = p and acc-v = v

▸ Acceptor reply accepted or rejected

▸ If majority accepted, terminate with v. Otherwise, restart Propose Phase with new p

Page 37: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 0 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

P

7

Page 38: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

P

7

P1 7

Page 39: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 0 ACC-P - ACC-V -

P

7

P1 7

P1 7

Page 40: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

Page 41: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

Page 42: Consensus in distributed computing

A1

A2

A3

7

v7 is proposed with p1

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

ACC-P -

ACC-V -

Page 43: Consensus in distributed computing

A1

A2

A3

7

Has majority! Since acc-p and acc-v are both null, we know that we are the only proposers in the network so far

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

ACC-P -

ACC-V -

Page 44: Consensus in distributed computing

A1

A2

A3

Now, we send out p and v in the accept phase

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

Page 45: Consensus in distributed computing

A1

A2

A3

Acceptors update acc-p and acc-v

P1MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

7

P1 7

P1 7

Page 46: Consensus in distributed computing

A1

A2

A3

Accept!

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

Page 47: Consensus in distributed computing

A1

A2

A3

Accept!

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

Accept!

Page 48: Consensus in distributed computing

A1

A2

A3

Accept!

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P

Accept!

Oh look, we have majority! v7 is the terminated value then!

Page 49: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

PShuddup, nobody loves you

Accept? :(

Page 50: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

PAXOS - MULTI PROPOSERS

▸ What if there were multiple proposers?

▸ Brace yourself, It’s Complicated™ (not really)

Page 51: Consensus in distributed computing

A1

A2

A3

7

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P1

7

P1 7

P2

P1 7

Page 52: Consensus in distributed computing

A1

A2

A3

7

P1MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

MIN-P 1 ACC-P - ACC-V -

P

7

P1 7

P1 7

ACC-P -

ACC-V -

ACC-P -

ACC-V -

P2

Page 53: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 5

5

v5 is proposed with p2

Page 54: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 5

ACC-P 1

ACC-V 7

5

Page 55: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 7

7

value of p2 is changed to 7

Page 56: Consensus in distributed computing

A1

A2

A3

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

MIN-P 1 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 7

Broadcast accept phase with p2 and v7

Page 57: Consensus in distributed computing

A1

A2

A3

MIN-P 2 ACC-P 1 ACC-V 7

MIN-P 2 ACC-P 1 ACC-V 7

MIN-P 2 ACC-P 1 ACC-V 7

P1

P2

P1 7

P1 7

P1 7 P2 7

P2 7

P2 7

Both proposer succeed! No blocking here.

Page 58: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

BASIC PAXOS

▸ This is BASIC Paxos: 2PC with a twist (Quorum)

▸ It has vulnerabilities!

▸ Best of 2PC (safety), with strong liveness

▸ Most Consensus Algorithm are a variant of Paxos

▸ Forms the basis of Distributed Computing research

Page 59: Consensus in distributed computing

CONSENSUS IN DISTRIBUTED COMPUTING

CLOSING…

▸ Basic Paxos is not Byzantine Fault Tolerant

▸ It is a challenge to create a consensus protocol (termination, agreement, validity) that is Byzantine Fault Tolerant

▸ Nakamoto Consensus (aka bitcoin consensus) skirts around Byzantine problems by imposing proof-of-work

▸ Raft is an implementation of Paxos, used in etcd and consul

Page 60: Consensus in distributed computing

PAXOS - BEST GEEKY PICKUP LINE NEVER

Ruben Tan

CONSENSUS IN DISTRIBUTED COMPUTING