CS43 4/534: Topics in Network Systems High-Level Programming for Programmable Networks: Distributed Network OS: Replicated Data Store Yang (Richard) Yang Computer Science Department Yale University 208A Watson Email: [email protected]http://zoo.cs.yale.edu/classes/cs434/ Acknowledgements: Paxos slides are based on RAFT usanility slides linked on the Schedule page.
40
Embed
CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CS434/534: Topics in Network Systems
High-Level Programming for Programmable Networks: Distributed Network OS: Replicated Data Store
q Replicated operations log => replicated state machineo All servers execute same commands in same order
q Consensus module ensures proper log replicationq System makes progress as long as any majority of servers are up
Recap: A Common Multi-Server (Data Store) Arch: Replicated Log
add jmp mov shlLog
ConsensusModule
StateMachine
add jmp mov shlLog
ConsensusModule
StateMachine
add jmp mov shlLog
ConsensusModule
StateMachine
Servers
Clientsshl
11
12
Outline
q Admin and recapq Controller framework supporting
programmable networkso architectureo data model and operations: OpenDaylight as an
exampleo distributed data store
Two general approaches to achieving replicated log (consensus in general):q Symmetric, leader-less:
o All servers have equal roleso Clients can contact any servero We will see Basic Paxos as an example
q Asymmetric, leader-based:o At any given time, one server is in charge, others accept
its decisionso Clients communicate with the leadero We will see Raft, which uses a leadero OpenDaylight is based on Raft
Roadmap Today
13
14
Outline
q Admin and recapq Controller framework supporting
programmable networkso architectureo data model and operations: OpenDaylight as an
exampleo distributed data store
• overview• basic Paxos
q Leslie Lamport, 1989q Nearly synonymous with consensus
Warning
“The dirty little secret of the NSDI community is that at most five people really, truly understand every part of Paxos ;-).”—NSDI reviewer
“There are significant gaps between the description of the Paxos algorithm and the needs of a real-worldsystem...the final system will be based on an unproven protocol.”—Chubby authors
15
q Basic Paxos (“single decree”):o One or more servers propose values (what is a
value in our context?)o System must agree on a single value (what is
chosen in our context?)• Only one value is ever chosen
q Multi-Paxos:o Combine several instances of Basic Paxos to
agree on a series of values forming the log
The Paxos Approach
16
q Safety:o Only a single value may be choseno A server never learns that a value has been chosen
unless it really has been
q Liveness (as long as majority of servers up and communicating with reasonable timeliness):o Some proposed value is eventually choseno If a value is chosen, servers eventually learn about it
Requirements for Basic Paxos
17
q Proposers:o Handle client requestso Active: put forth particular values to be chosen
q Acceptors:o Passive: respond to messages from proposerso Responses represent votes that form consensuso Store chosen value, state of the decision
processFor this class:
o Each Paxos server contains both components
Paxos Components
18
19
Outline
q Admin and recapq Controller framework supporting
programmable networkso architectureo data model and operations: OpenDaylight as an
exampleo distributed data store
• overview• Paxos
– basic paxos» structure» initial design options
q A single acceptor chooses value
q Problem: o What if acceptor crashes
after choosing?q Solution:
o Quorum: multiple acceptors (3, 5, ...)
o As long as majority of acceptors available, system can make progress
Design I: A Single AcceptorProposers
Acceptor
add jmp shl sub
jmp
20
q Design decision: which value to accept?o Design: Acceptor accepts only first value it receives
q Problem: o Split votes: if simultaneous proposals, no value might
have majority
=> An Acceptor must be able to sometimes “change mind” to accept multiple (different) values
Design II
time
s1s2s3s4s5
accept?(red)
accept?(blue)
accept?(green)
accepted(red)
accepted(blue)
accepted(green)
accepted(red)
accepted(blue)
21
q Design decision: which value to accept?o Design: Acceptor accepts every value it receives
q Problem: o Conflicting choices
Solution: If a value has been chosen => future proposals propose/choose that same value
Design III
time
s1s2s3s4s5
accept?(red)
accept?(blue)
accepted(red)
accepted(red)
accepted(blue)
accepted(red)
accepted(blue)
accepted(blue)
Red Chosen
Blue Chosen
22
24
Outlineq Admin and recapq Controller framework supporting
programmable networkso architectureo data model and operations: OpenDaylight as an
o Find out about any chosen valueso Block older proposals that have not yet
completedq Phase 2: broadcast Accept RPCs
o Ask acceptors to accept a specific value
Basic Paxos Protocol Structure
25
q Each proposal has a unique numbero Higher numbers take priority over lower numberso It must be possible for a proposer to choose a new proposal
number higher than anything it has seen/used beforeq One simple approach:
o Each server stores maxRound: the largest Round Number it has seen so far
o To generate a new proposal number:• Increment maxRound• Concatenate with Server Id
o Proposers must persist maxRound on disk: must not reuse proposal numbers after crash/restart
Proposal Numbers
Server IdRound Number
Proposal Number
26
Basic Paxos ProtocolAcceptors
3) Respond to Prepare(n):o If n > minProposal then minProposal = no Return(acceptedProposal, acceptedValue)
6) Respond to Accept(n, value):o If n ≥ minProposal then
acceptedProposal = minProposal = nacceptedValue = value
o Return(minProposal)
Acceptors must record minProposal, acceptedProposal, and acceptedValue on stable storage (disk)
Proposers1) Choose new proposal number n2) Broadcast Prepare(n) to all
servers
4) When responses received from majority:
o If any acceptedValues returned, replace value with acceptedValuefor highest acceptedProposal
5) Broadcast Accept(n, value) to all servers
6) When responses received from majority:
o Any rejections (result > n)? goto (1)o Otherwise, value is chosen
27
q Basic setting: single proposero s1 proposes X
Basic Paxos Protocol: Exercise
time
s1s2s3s4s5
X
q Q: Who knows that a value has been chosen?
q Q: If other servers want to know the chosen value, what should they do?
Question
q Setting: a new proposal arrives after a value is already chosen
Basic Paxos Protocol: Exercise
time
s1
s2
s3
s4
s5
P 4.5
A 3.1 XP 3.1
P 3.1
P 3.1
A 3.1 X
A 3.1 X
P 4.5
P 4.5
A 4.5 X
A 4.5 X
A 4.5 X
“Prepare proposal 3.1 (from s1)”
X
Y
“Accept proposal 4.5with value X (from s5)”
values
q Previous value not chosen, but new proposer sees it:o New proposer will use existing valueo Both proposers can succeed
Basic Paxos Protocl: Race Condition I
time
s1s2s3s4s5
P 4.5
A 3.1 XP 3.1
P 3.1
P 3.1
A 3.1 X
A 3.1 X
P 4.5
P 4.5
A 4.5 X
A 4.5 X
A 4.5 X
X
Y
values
q Previous value not chosen, new proposer doesn’t see it:o New proposer chooses its own valueo Older proposal blocked
Basic Paxos Protocl: Race Condition II
time
s1s2s3s4s5
P 4.5
A 3.1 XP 3.1
P 3.1
P 3.1
A 3.1 X
A 3.1 X
P 4.5
P 4.5
A 4.5 Y
A 4.5 Y
A 4.5 Y
X
Y
values
Summary of Cases
q Three possibilities when later proposal prepares:1. Previous value already chosen:
• New proposer will find it and use it2. Previous value not chosen, but new proposer sees it:
• New proposer will use existing value• Both proposers can succeed
3. Previous value not chosen, new proposer doesn’t see it:• New proposer chooses its own value• Older proposal blocked
33
q Competing proposers can livelock:
q Potential solutions: o randomized delay before restarting, give other
proposers a chance to finish choosingo use leader election instead
Liveness
time
s1s2s3s4s5
A 3.1 XP 3.1
P 3.5
A 3.5 Y
P 3.1
P 3.1
P 3.5
P 3.5
A 3.1 X
A 3.1 X
P 4.1
P 4.1
P 4.1
A 3.5 Y
A 3.5 Y
P 5.5
P 5.5
P 5.5 A 4.1 X
A 4.1 X
A 4.1 X
35
Outline
q Admin and recapq Controller framework supporting programmable
networkso architectureo data model and operations: OpenDaylight as an exampleo distributed data store
• overview• Paxos
– basic Paxos– multi-Paxos
q Separate instance of Basic Paxos for each entry in the log:o Add index argument to Prepare and Accept
(selects entry in log)
Multi-Paxos
add jmp mov shlLog
ConsensusModule
StateMachine Server
Client
shl
OtherServers
1. Client sends command to server
2. Server uses Paxos to choose command as value for a log entry
3. Server waits for previous log entries to be applied, then applies new command to state machine
4. Server returns result from state machine to client
q Which log entry to use for a given client request?
q Performance optimizations:q Ensuring full replicationq Client protocolq Configuration changes
Note: Multi-Paxos not specified precisely in literature
Multi-Paxos Issues
Selecting Log Entries
38
q When request arrives from client:o Find first log entry not known to be choseno Run Basic Paxos to propose client’s command for this indexo Prepare returns acceptedValue?