Top Banner
Papers We Love San Francisco Edition July 24th, 2014 Henry Robinson [email protected] / @henryr
72

Impossibility of Consensus with One Faulty Process - Papers We Love SF

Sep 08, 2014

Download

Technology

HenryRobinson

Slides from Papers We Love SF July meetup on the Impossibility of Consensus with One Faulty Process (http://macs.citadel.edu/rudolphg/csci604/ImpossibilityofConsensus.pdf)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Papers We LoveSan Francisco Edition

July 24th, 2014

Henry Robinson [email protected] / @henryr

Page 2: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Software engineer at Cloudera since 2009

• My interests are in databases and distributed systems

• I write about them - in particular, about papers in those areas - at http://the-paper-trail.org

Page 3: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Papers We LoveSan Francisco Edition

July 24th, 2014

Henry Robinson [email protected] / @henryr

Page 4: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Papers We LoveSan Francisco Edition

July 24th, 2014

Henry Robinson [email protected] / @henryr

Papers of which we are quite fond

Page 5: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Impossibility of Distributed Consensus with One Faulty Process, by Fischer, Lynch and Paterson (1985)

• Dijkstra award winner 2001

Page 6: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Walk through the proof (leaving rigour for the paper itself)

• Show how this gives rise to a framework for thinking about distributed systems

Page 7: Impossibility of Consensus with One Faulty Process - Papers We Love SF

or: agreeing to agree

Consensus

Page 8: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Consensus is the problem of having a set of processes agree on a value proposed by one of those processes

Page 9: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Validity: the value agreed upon must have been proposed by some process

• Termination: at least one non-faulty process eventually decides

• Agreement: all deciding processes agree on the same value

Page 10: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Validity: the value agreed upon must have been proposed by some process - safety

• Termination: at least one non-faulty process eventually decides - liveness

• Agreement: all deciding processes agree on the same value - safety

Page 11: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Transactional CommitShould I commit this

transaction?

[Magic consensus protocol]

YES! No :(

Page 12: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Replicated State Machines

Client

Node 1

Node 2

Node 3

N-2N-3 N = SN-1

N-2N-3 N = SN-1

N-2N-3 N = SN-1

1: Client proposes !state N should !

be S2: Magic consensus !

protocol3: New state written to!

log

Page 13: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Strong Leader Election

1: Who’s the leader?

Page 14: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Strong Leader Election

A cast of millions

2: Magic consensus

protocol1: Who’s the

leader?

Page 15: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Strong Leader Election

A cast of millions

2: Magic consensus

protocol3: There can only

be one1: Who’s the

leader?

Page 16: Impossibility of Consensus with One Faulty Process - Papers We Love SF

What does FLP actually say?

Page 17: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Fischer

Page 18: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Fischer Lynch

Page 19: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Fischer Lynch Paterson

Page 20: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Fischer Lynch Paterson

Choose at most two.

Page 21: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Distributed consensus is impossible when at least one process might fail

Page 22: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Distributed consensus is impossible when at least one process might fail

“[a] surprising result”

Page 23: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Distributed* consensus is impossible when at least one process might fail

*i.e. message passing

Page 24: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Distributed consensus is impossible when at least one process might fail

Termination Validity

Agreement

Page 25: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Distributed consensus is impossible when at least one process might fail

No algorithm solves consensus in every case

Page 26: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Distributed consensus is impossible when at least one process might fail

Crash failures

Page 27: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Hierarchy of Failure Modes

Crash failures!!

Fail by stopping

Page 28: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Omission failures!!

!

!

!

Fail by dropping messages

Hierarchy of Failure Modes

Crash failures!!

Fail by stopping

Page 29: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Byzantine failures!!

!

!

!

!

!

!

!

Fail by doing whatever the hell I like

Omission failures!!

!

!

!

Fail by dropping messages

Hierarchy of Failure Modes

Crash failures!!

Fail by stopping

Page 30: Impossibility of Consensus with One Faulty Process - Papers We Love SF

More on the system model

Page 31: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• The system model is the abstraction we layer over messy computers and networks in order to actually reason about them.

Page 32: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Message deliveries are the only way that nodes may communicate

• Messages are delivered in any order

• But are never lost (c.f. crash model vs. omission model), and are always delivered exactly once

Page 33: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Nodes do not have access to a shared clock.

• So cannot mutually estimate the passage of time

• Messages are the only way that nodes may co-ordinate with each other

Page 34: Impossibility of Consensus with One Faulty Process - Papers We Love SF

The Proof itself

Page 35: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Some definitions• Configuration: the state of every node in the system,

plus the set of undelivered (but sent) messages!

• Initial configuration: what each node in the system would propose as the decisions at time 0

• Univalent: a state from which only one decision is possible, no matter what messages are received (0-valent and 1-valent can only decide 0 or 1 respectively)

• Bivalent: a state from which either decision value is still possible.

Page 36: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Proof sketch

Initial, ‘undecided’, configuration

Page 37: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Proof sketch

Initial, ‘undecided’, configuration

Undecided state

Messages delivered

Page 38: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Proof sketch

Initial, ‘undecided’, configuration

Undecided state

Messages delivered

More messages delivered

Page 39: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Proof sketch

Initial, ‘undecided’, configuration

Undecided state

Messages delivered

Lemma 2: This always exists!

More messages delivered

Page 40: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Proof sketch

Initial, ‘undecided’, configuration

Undecided state

Messages delivered

Lemma 2: This always exists!

Lemma 3: You can always get here!

More messages delivered

Page 41: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Lemma 2: Communication Matters

Page 42: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

(C:XY means process 0 has initial value X, process 1 has initial value Y)

Page 43: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

These two configurations differ only at one node, but their

valencies are different

(C:XY means process 0 has initial value X, process 1 has initial value Y)

Page 44: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

These two configurations differ only at one node, but their

valencies are different

(C:XY means process 0 has initial value X, process 1 has initial value Y)

Page 45: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

(C:XY means process 0 has initial value X, process 1 has initial value Y)

I decided 1!

All executions of the protocol - i.e. set of messages delivered

Page 46: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

(C:XY means process 0 has initial value X, process 1 has initial value Y)

I decided 0!

All executions of the protocol - i.e. set of messages delivered

Page 47: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

(C:XY means process 0 has initial value X, process 1 has initial value Y)

I decided 0!

What if process 1 fails? Are the configurations any different?

I decided 1!

Page 48: Impossibility of Consensus with One Faulty Process - Papers We Love SF

2-node systemC: 00!V: 1

C: 01!V: 0

C: 11!V: 0

C: 10!V: 1

(C:XY means process 0 has initial value X, process 1 has initial value Y)

I decided 0!

For the remaining processes: no difference in initial state, but

different outcome ?!

I decided 1!

Same execution

Page 49: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Every protocol has an undecided (‘bivalent’) initial state

Page 50: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Lemma 3: Indecisiveness is Sticky

Page 51: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Configuration C (bivalent)

e-not-delivered

Configuration

Configuration

Configuration

Configuration

Configuration

e-arrived-last

Configuration

Configuration

Configuration

Configuration

Configuration

Some message e is sent in C

Page 52: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Configuration C (bivalent)

e-not-delivered

Configuration

Configuration

Configuration

Configuration

Configuration

e-arrived-last

Configuration

Configuration

Configuration

Configuration

Configuration

One of these must be

bivalent

Some message e is sent in C

Configuration set D

Page 53: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Consider the possibilities:

• If one of those configurations in D is bivalent, we’re done

• Otherwise show that lack of bivalent state leads to contradiction

• Do this by first showing that there must be both 0-valent and 1-valent configurations in D

• and that this leads to a contradiction

Page 54: Impossibility of Consensus with One Faulty Process - Papers We Love SF

D

Configuration C (bivalent)

0-valent!e not received

0-valent!e received

Either the protocol goes through D before it reaches the 0-valent

configuration…

2. e is received

1. C moves to 0-valent configuration

before receiving e

Page 55: Impossibility of Consensus with One Faulty Process - Papers We Love SF

D

Configuration C (bivalent)

0-valent!e not received

0-valent!e received

Or the protocol gets to the 0-valent configuration after

receiving e in which case this state also must be 0-valent and in D

1. e is received

2. 0-valent state is arrived at

Page 56: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Now for the contradiction

Page 57: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• There must be two configurations C0 and C1 that are separated by a single message m where receiving e in Ci moves the configuration to Di

• We will write that as Ci + e = Di

• So C0 + m = C1

• and C0 + m + e = C1 + e = D1

• and C0 + e = D0

Page 58: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Now consider the destinations of m and e. If they go to different processes, their receipt is commutative

• C0 + m + e = D1

• C0 + e + m = D0 + m = D1

• Contradiction: D0 is 0-valent!

Page 59: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Instead, e and m might go to the same process p.

• Consider a deciding computation R from the original bivalent state C, where p does nothing (i.e. looks like it failed)

• Since to get to D0 and D1, only e and m have been received, only p took any steps to get there.

• So R can apply to both D0 and D1.

Page 60: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Since D0 and D1 are both univalent, so the configurations D0 + R and D1 + R are both univalent.

Page 61: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Now remember:

• A = C + R

• D1 = C + m + e

• D0 = C + e

• But what about

• C + R + m + e = A + m + e = D1 + R => 1-valent

• C + R + e = A + e = D0 + R => 0-valent

Page 62: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Let e be some event that might be sent in configuration C. Then let D be the set of all configurations where e is received last and let C be the set of configurations where e has not been received.

• D either contains a bivalent configuration, or both 0- and 1-valent configurations. If it contains a bivalent configuration, we’re done. So assume it does not.

• Now there must be some C0 and C1 in C where C0 + e is 0-valent, but C1 + e is 1-valent, and C1 = C0 + e’

• Consider two possibilities for the destination of e’ and e. If they are not the same, then we can say C0 + e + e’ == C0 + e’ + e = C1 + e = D1 -> 1-valent. But C0 + e -> 0-valent.

• If they are the same, then let A be the configuration reached by a deciding run from C0 when p does nothing (looks like it failed). We can also apply that run from D0 and D1 to get to E0 and E1. But we can get from A to either E0 or E1 by applying e or e’ + e. This is a contradiction.

Page 63: Impossibility of Consensus with One Faulty Process - Papers We Love SF

What are the consequences?

Page 64: Impossibility of Consensus with One Faulty Process - Papers We Love SF

!

“These results do not show that such problems cannot be “solved” in practice; rather, they

point up the need for more refined models of distributed computing that better reflect realistic

assumptions about processor and communication timings, and for less stringent

requirements on the solution to such problems. (For example, termination might be required

only with probability 1.) “

Page 65: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Paxos

• Paxos cleverly defers to its leader election scheme

• If leader election is perfect, so is Paxos!

• But perfect leader election is solvable iff consensus is.

• Impossibilities all the way down…

Page 66: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Randomized Consensus

• Nice way to circumvent technical impossibilities: make their probability vanishingly small

• Ben-Or gave an algorithm that terminates with probability 1

• (But the rate of convergence might be high)

Page 67: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Failure Detectors

• Deep connection between the ability to tell if a machine has failed, and consensus.

• Lots of research into ‘weak’ failure detectors, and how weak they can be and still solve consensus

Page 68: Impossibility of Consensus with One Faulty Process - Papers We Love SF

FLP vs CAP

Page 69: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• FLP and CAP are not the same thing (see http://the-paper-trail.org/blog/flp-and-cap-arent-the-same-thing/)

• FLP is a stronger result, because the system model has fewer restrictions (crash stop vs omission)

Page 70: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• Theorem: CAP is actually really boring

Page 71: Impossibility of Consensus with One Faulty Process - Papers We Love SF

Further reading

Page 72: Impossibility of Consensus with One Faulty Process - Papers We Love SF

• 100 Impossibility Proofs for Distributed Computing (Lynch, 1989)

• The Weakest Failure Detector for Solving Consensus (Chandra and Toueg, 1996)

• Sharing Memory Robustly in Message-Passing Systems (Attiya et. al., 1995)

• Wait-Free Synchronization (Herlihy, 1991)

• Another Advantage of Free Choice: Completely Asynchronous Agreement Protocols (Ben-Or, 1983)