Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009 1 Principles of Reliable Distributed Systems Lecture 5: Synchronous (Uniform) Consensus Spring 2009 Idit Keidar
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20091
Principles of Reliable Distributed Systems
Lecture 5: Synchronous (Uniform) Consensus
Spring 2009
Idit Keidar
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20092
Today’s Material
• Distributed Algorithms, Nancy Lynch– Ch. 6
• Distributed Computing, Attiya and Welch– Ch. 5
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20093
Reminder: State Machine Replication (SMR)
Client A
Client B
atomicbroadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20094
Replica Coordination Requirements
• Agreement: all replicas receive all client requests– What happens when a replica (server) fails?– What happens when a client fails?
• Order: replicas process requests in the same order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20095
Uniform Atomic Broadcast
• Uniform Reliable Broadcast– Validity: if a correct process broadcasts m then all
correct processes eventually deliver m– Uniform Agreement: if any process delivers m then all
correct processes eventually deliver m– Integrity: m is delivered by a correct process at most
once, and only if it was previously broadcast
• Uniform Total Order– If any two processes deliver both m and m’, they
deliver them in the same order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20096
Today’s Problem: Uniform Consensus
Each process has an input, should decide on an output (one-shot problem)
• Uniform Agreement: every two decisions are the same
• Validity: every decision is an input of one of the processes
• Termination: eventually all correct processes decide
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20097
(Uniform) Consensus versus (Uniform) Atomic Broadcast
• From Atomic Broadcast to Consensus
• From Consensus to Atomic Broadcast – Homework question
• From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast and SMR
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20098
Today’s Model(s)
• Round-based synchronous
• Static set P = {p1, …, pn} of processes
• Reliable links– What happens if links can fail?
• Fault tolerance:
1. Crash failures
2. Byzantine failures
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 20099
Round
Synchronous Round-Based Model
• Synchronous rounds:
1.Send messages to any set of processes;
2.Receive messages from this round;
3.Do local processing (possibly decide, halt)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200910
Model 1: Round-Based Failstop
• If pi does not crash in step 1 of round r, and pj does not crash in or before step 2 of round rthen any message sent by pi to pj in round r is received by pj in round r
• Note: If pi crashes in step 1 of a round, then any subset of the messages pi sends in this round can be lost
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200911
Round-Based Failstop Model
• If a message from pj is expected, and no message from pj is received, then pj is suspected
• If pi is suspected in round r, pi fails in round r or r-1,and no further messages from pi will arrive
round 1 round 2
p1
p2
p3
p1 crashes in round 2, step1;
p2 receives p1’s round 2 msg
p3 suspects p1 in round 2
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200912
t-Resilient Algorithm
• t is a threshold on the number of potential failures– The algorithm is correct as long as no more than t
processes fail
• In the following algorithm, 0 ≤ t < n• We denote by f the number of actual failures that
occur in a given run, 0 ≤ f ≤ t• We’d like t to be big (robust algorithm)
– But f will usually be small (failures are rare)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Example: t=0 versus f=0
• Thinks of a simple algorithm for t=0
• What happens if we run this algorithm where failures do occur?
13
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200914
Notation
• P = {p1, …, pn} is the set of processes
• initi is pi’s initial value (input)
• The decide action determines the output
• Show code for process pi
• Local variables of pi are denoted: vi, Alivei
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200915
t-Resilient Failstop Uniform Consensus Algorithm
vi=initi; Alivei = P in every round 1 ≤ k ≤ t+2:
send vi to allreceive round k messagesfor all pj
if (received vj) then vi = min(vi, vj)otherwise pj is suspected
if ( (pj Alivei : received vj = vi) && !decided ) then decide vi.
for all pj if (suspect pj) then Alivei=Alivei {pj}
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200916
Proof: Validity
• Lemma: For every process pi, vi always includes the initial value initj of some process pj.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200917
Proof: Uniform Agreement
• Lemma: – If exist value v, round r, and process pi s.t.
– all processes that are in Alivei at the beginning of round r send v in round r,
– then v is the only possible decision value from r onward.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200918
Proof: Uniform Agreement (Cont’d)
• From the Lemma, we get that if some process decides v in round r, then v is the only possible decision value from r onward.
• Now look at the first round in which some process decides.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200919
Termination Lemma
• After a round r in which no process fails, all processes have the same vi forever
• Proof: – Because all receive the same messages in r,– By induction…
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200920
Proof: Termination 1/2
• Consider a run where f processes fail– There are at most f rounds with failures– There are at most f rounds when Alivei changes
at any correct pi
– Alivei can change to reflect a failure either in the round of the failure or in the ensuing round
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200921
Proof: Termination 2/2
• In f+2 rounds, there is at least one failure-free round and later at least one round in which Alivei does not change – Thus, from the Termination Lemma, after at
most f+2 rounds, there is a round in which Alivei does not change and all received values are the same
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200922
How Long Does it Take?
• Early-deciding: in a run with f failures, decision is reached by the end of round f+2
• This is optimal – For Uniform Consensus, but not for Consensus– As long as f < t-1
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 200923
Deciding vs. Stopping (Halting)
• The algorithm is not early-stopping: – It continues running for t+2 rounds– Even after reaching a decision
• Homework question: can you change the algorithm to be early-stopping?– Stop (halt) after f+k rounds in runs with t≥f≥0
failures for some constant k
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Model 2: Byzantine Faults
Synchronous Byzantine Consensus
24
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
The Byzantine Generals Problem
• First formulation of the consensus problem [Pease, Shostak, Lamport 80]
25
Let’s attackLet’s not attack
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Byzantine Faults
• Faulty process can behave arbitrarily, i.e., they don’t have to follow the protocol, e.g.,– can suffer benign failures – crash, timing;– can send bogus values in messages;– can send messages at the wrong time; – can send different messages to different
processes; etc.
• Captures software bugs, hacker intrusions26
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Byzantine Nodes can Lead Correct Nodes to Conflicting
Decisions
27
Correct nodes cannot know whom to believe
נדיח את מרינה
נדיח את גיא
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Byzantine-Fault-Tolerant (BFT) Consensus
• Only non-uniform makes sense. Why?
• Recall, we defined consensus as follows:– Agreement: correct processes’ decisions are
the same– Termination: eventually all correct processes
decide – Validity: decision is input of one process
• Problem?28
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Validity: Take II
• Strong unanimity: If the input of all the correct processes is v then no correct process decides a value other than v
• How resilient can an algorithm satisfying this property be?– Homework: prove this!
29
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Consensus w/ Strong Unanimity
Each process has input, should decide on output• Agreement: correct processes’ decisions are the
same• Validity (Strong Unanimity): If the input of all the
correct processes is v then no correct process decides a value other than v
• Termination: eventually all correct processes decide
30
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
2 Byzantine Models
1. Authenticated– Uses digital signatures– Assumes PKI – Public Key Infrastructure
2. Un-authenticated– No digital signatures– Secure point-to-point communication– Over the Internet – implemented with
symmetric keys31
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
1. Authenticated (Byzantine) Model
• Authentication: The receiver of a message can ascertain its origin– An intruder cannot masquerade as someone else
• Integrity: The receiver of a message can verify that it has not been modified in transit– An intruder cannot substitute a false message for a
legitimate one
• Nonrepudiation: A sender cannot falsely deny later that he sent a message
32
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Implementing Authentication
• Uses a Cryptographic Public Key Infrastructure (PKI)
• Each process has a well-know public key and a matching private key Mp is message M signed by p’s private key
– Only p can generate Mp
– Every process can verify p’s signature on Mp using p’s public key
33
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Exploiting Authentication
• All messages are signed by their source• Every receiver can verify the message• Signed messages can be forwarded as proof
“I can prove that Idit said that I don’t have to submit this homework assignment” – Yossy does not have to submit homework assignment 2Idit
• Liars can be exposed
34
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Today’s Model 2
• Round-based synchronous
• Static set P = {p1, …, pn} of processes
• t-out-of-n Byzantine (arbitrary) failures– t < n/2
• Authentication
35
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Exponential Information Gathering (EIG) Algorithms
• Forward all received messages in each round, for t+1 rounds:
In round 1:
send your value to allIn later rounds:
for every received message m (w/out my_id)forward m + my_id to all
36
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
EIG with Signatures for t <n/2send vi pi to allin every round 2 ≤ k ≤ t+1:
for every received message m: if (m has k-1 different valid signatures and not mine) then send mpi to all
Validi = {vjpj | all messages with t+1 different valid signatures starting with pj’s have same value vj }
decide on most common value in Validi
in case of a tie – choose the default value
37
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Signatures Expose Liars
גיא דן נדיח את מרינה
דן ת גיאנדיח א
דן נדיח את מרינה
מרינה דןת גיא
נדיח א
Remove from Valid
38
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Validity
• Need to prove Strong Unanimity: If the input of all correct processes is v then no correct process decides a value other than v
• Claim: At every correct pi, for all correct pj,Validi includes vjpj
• Validity follows
39
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Agreement
• Claim: For two correct processes pi and pj, Validi and Validj include the same values
• Agreement follows
40
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Termination
• Decide always happens after t+1 rounds
41
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Can We Improve the Resilience?
42
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Validity: Take III
• Weak unanimity: If the input of all the correct processes is v and no process fails then no correct process decides a value other than v
• Does this prevent a trivial solution?
• Resilience?– See recitation
43
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2009
Summary of Known Results
• Synchronous, Byzantine Fault-Tolerant, t-resilient consensus algorithms – – Strong unanimity with authentication iff t < n/2
• As we just saw
– Weak unanimity with authentication: iff t < n• Recitation
– Without authentication: iff t < n/3• Next week
44