Top Banner
CS43 4/534: Topics in Network Systems High-Level Programming for Programmable Networks: Distributed Network OS: Replicated Data Store Yang (Richard) Yang Computer Science Department Yale University 208A Watson Email: [email protected] http://zoo.cs.yale.edu/classes/cs434/ Acknowledgements: Paxos slides are based on RAFT usanility slides linked on the Schedule page.
40

CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Jul 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

CS434/534: Topics in Network Systems

High-Level Programming for Programmable Networks: Distributed Network OS: Replicated Data Store

Yang (Richard) YangComputer Science Department

Yale University208A Watson

Email: [email protected]

http://zoo.cs.yale.edu/classes/cs434/

Acknowledgements: Paxos slides are based on RAFT usanility slides linked on the Schedule page.

Page 2: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

2

Outline

q Admin and recapq Controller software framework (network OS)

supporting programmable networkso architectureo data model and operations: OpenDaylight as an exampleo distributed data store

• overview• basic Paxos• multi-paxos

Page 3: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: Proactive HL Programmingq Objective: Proactive, fully generate multi-

table datapath pipeline

q Basic ideas: o A function f consists of a sequence of

instructions I1, I2, …, In, and one can • (conceptually) consider each instruction in the dataflow

as a table• identify instructions with compact representation

(symbolic analysis and direct state plug in)• use state propagation to compute compact tables

Page 4: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

4

L1, EMPTY

L2, dstSw=sw1, dstMac=11

L2, dstSw=sw2, dstMac=22

L2, dstSw=sw2, dstMac=33

L2, dstSw=swu, dstMac=*

L4, dstSw=sw1, dstCond=S

L4, dstSw=sw2, dstCond=C

Pri p.dstMac Action2 11 dstSw=sw1

2 22 dstSw=sw2

2 33 dstSw=sw2

1 * dstSw=swu

Pri p.dstMac Action2 11 dstCond=S

2 22 dstCond=C

2 33 dstCond=C

1 * dstCond=UK

L4, dstSw=swu, dstCond=UK

L∞, aRoutes(sw1, S)

L∞, aRoutes(sw2, C)

L∞,aRoutes(swu, UK)

Map<MAC, Switch> hostTable; // {11->sw1, 22->sw2, 33->sw2, others swu} Map<MAC, Cond> condTable; // {11->S, 22->C, 33->C, others UK}

0. Route onPacketIn(Packet p) {L1. Switch dstSw = hostTable.get( p.dstMac() );L2. Cond dstCond = condTable.get( p.dstMac() );L3. Route aRoutes = AllPairCondRoutes();L4. return aRoutes.get(dstSw, dstCond);

Page 5: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

g1

!g1

srcMac

dstPort

dstMac

g1 = dstPort < 1025

dstSw = SW_FW dstCond = V

g2=openPorts.contain(dstPort)

egress=DropdstSw =

hostTable(dstMac)dstCond =

condTable(dstMac)

g1 g2 g2

phi(dstSw) phi(dstCond)

!g2

srcSw = hostTable(srcMac)

egress=f(dstCond, dstSw, srcSw)

Pipeline design for constraints and optimization

Page 6: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: HL NOS Architecture

NetworkView

NEDatapath

Service/Policy

NEDatapath

logically centralized data store

Program

6

Key component - data store:• Data model• Data access model• Data store availability

Key goal: provide applications w/ high-level views and make the views highly available (e.g., 99.99%), scalable.

Page 7: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: OpenDaylight Software Architecture

7

Controller

Model-Driven SAL (MD-SAL)

Protocol Plugin RESTCONF

NETCONF SERVER

Network Devices Applications

App/Service Plugin

App/Service Plugin

......

Protocol Plugin

Config Subsystem

Messaging Data Store

Remote Controller Instance

Remote Controller Instance

Network DevicesNetwork Devices

ApplicationsApplications

Page 8: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: Data Model - Yang Data Tree

https://wiki.opendaylight.org/view/OpenDaylight_Controller:Architectural_Framework

8

Page 9: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

ReadWriteTransaction transaction = dataBroker.newReadWriteTransaction();Optional<Node> nodeOptional;nodeOptional = transaction.read(

LogicalDataStore.OPERATIONAL,n1InstanceIdentifier);

transaction.put(LogicalDataStore.CONFIG,n2InstanceIdentifier,topologyNodeBuilder.build());

transaction.delete(LogicalDataStore.CONFIG,n3InstanceIdentifier);

CheckedFuture future;future = transaction.submit();

n1

/operational /config

network-topo

BGPv4 overlay1

nodesnodes

Datastore

n3n1

transaction

n2 n3

Recap: Data Operation Model - Transactions

- Operations included in transactions

9

Page 10: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Recap: Multi-Server NOS

10

Servers

Clients

request/response

Discussion: why multiple servers?

Page 11: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Replicated operations log => replicated state machineo All servers execute same commands in same order

q Consensus module ensures proper log replicationq System makes progress as long as any majority of servers are up

Recap: A Common Multi-Server (Data Store) Arch: Replicated Log

add jmp mov shlLog

ConsensusModule

StateMachine

add jmp mov shlLog

ConsensusModule

StateMachine

add jmp mov shlLog

ConsensusModule

StateMachine

Servers

Clientsshl

11

Page 12: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

12

Outline

q Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

Page 13: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Two general approaches to achieving replicated log (consensus in general):q Symmetric, leader-less:

o All servers have equal roleso Clients can contact any servero We will see Basic Paxos as an example

q Asymmetric, leader-based:o At any given time, one server is in charge, others accept

its decisionso Clients communicate with the leadero We will see Raft, which uses a leadero OpenDaylight is based on Raft

Roadmap Today

13

Page 14: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

14

Outline

q Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

• overview• basic Paxos

Page 15: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Leslie Lamport, 1989q Nearly synonymous with consensus

Warning

“The dirty little secret of the NSDI community is that at most five people really, truly understand every part of Paxos ;-).”—NSDI reviewer

“There are significant gaps between the description of the Paxos algorithm and the needs of a real-worldsystem...the final system will be based on an unproven protocol.”—Chubby authors

15

Page 16: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Basic Paxos (“single decree”):o One or more servers propose values (what is a

value in our context?)o System must agree on a single value (what is

chosen in our context?)• Only one value is ever chosen

q Multi-Paxos:o Combine several instances of Basic Paxos to

agree on a series of values forming the log

The Paxos Approach

16

Page 17: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Safety:o Only a single value may be choseno A server never learns that a value has been chosen

unless it really has been

q Liveness (as long as majority of servers up and communicating with reasonable timeliness):o Some proposed value is eventually choseno If a value is chosen, servers eventually learn about it

Requirements for Basic Paxos

17

Page 18: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Proposers:o Handle client requestso Active: put forth particular values to be chosen

q Acceptors:o Passive: respond to messages from proposerso Responses represent votes that form consensuso Store chosen value, state of the decision

processFor this class:

o Each Paxos server contains both components

Paxos Components

18

Page 19: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

19

Outline

q Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

• overview• Paxos

– basic paxos» structure» initial design options

Page 20: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q A single acceptor chooses value

q Problem: o What if acceptor crashes

after choosing?q Solution:

o Quorum: multiple acceptors (3, 5, ...)

o As long as majority of acceptors available, system can make progress

Design I: A Single AcceptorProposers

Acceptor

add jmp shl sub

jmp

20

Page 21: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Design decision: which value to accept?o Design: Acceptor accepts only first value it receives

q Problem: o Split votes: if simultaneous proposals, no value might

have majority

=> An Acceptor must be able to sometimes “change mind” to accept multiple (different) values

Design II

time

s1s2s3s4s5

accept?(red)

accept?(blue)

accept?(green)

accepted(red)

accepted(blue)

accepted(green)

accepted(red)

accepted(blue)

21

Page 22: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Design decision: which value to accept?o Design: Acceptor accepts every value it receives

q Problem: o Conflicting choices

Solution: If a value has been chosen => future proposals propose/choose that same value

Design III

time

s1s2s3s4s5

accept?(red)

accept?(blue)

accepted(red)

accepted(red)

accepted(blue)

accepted(red)

accepted(blue)

accepted(blue)

Red Chosen

Blue Chosen

22

Page 23: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

24

Outlineq Admin and recapq Controller framework supporting

programmable networkso architectureo data model and operations: OpenDaylight as an

exampleo distributed data store

• overview• Paxos

– basic paxos» structure» initial design options» design details

Page 24: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Two-phase approach:q Phase 1: broadcast Prepare RPCs

o Find out about any chosen valueso Block older proposals that have not yet

completedq Phase 2: broadcast Accept RPCs

o Ask acceptors to accept a specific value

Basic Paxos Protocol Structure

25

Page 25: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Each proposal has a unique numbero Higher numbers take priority over lower numberso It must be possible for a proposer to choose a new proposal

number higher than anything it has seen/used beforeq One simple approach:

o Each server stores maxRound: the largest Round Number it has seen so far

o To generate a new proposal number:• Increment maxRound• Concatenate with Server Id

o Proposers must persist maxRound on disk: must not reuse proposal numbers after crash/restart

Proposal Numbers

Server IdRound Number

Proposal Number

26

Page 26: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Basic Paxos ProtocolAcceptors

3) Respond to Prepare(n):o If n > minProposal then minProposal = no Return(acceptedProposal, acceptedValue)

6) Respond to Accept(n, value):o If n ≥ minProposal then

acceptedProposal = minProposal = nacceptedValue = value

o Return(minProposal)

Acceptors must record minProposal, acceptedProposal, and acceptedValue on stable storage (disk)

Proposers1) Choose new proposal number n2) Broadcast Prepare(n) to all

servers

4) When responses received from majority:

o If any acceptedValues returned, replace value with acceptedValuefor highest acceptedProposal

5) Broadcast Accept(n, value) to all servers

6) When responses received from majority:

o Any rejections (result > n)? goto (1)o Otherwise, value is chosen

27

Page 27: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Basic setting: single proposero s1 proposes X

Basic Paxos Protocol: Exercise

time

s1s2s3s4s5

X

Page 28: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Q: Who knows that a value has been chosen?

q Q: If other servers want to know the chosen value, what should they do?

Question

Page 29: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Setting: a new proposal arrives after a value is already chosen

Basic Paxos Protocol: Exercise

time

s1

s2

s3

s4

s5

P 4.5

A 3.1 XP 3.1

P 3.1

P 3.1

A 3.1 X

A 3.1 X

P 4.5

P 4.5

A 4.5 X

A 4.5 X

A 4.5 X

“Prepare proposal 3.1 (from s1)”

X

Y

“Accept proposal 4.5with value X (from s5)”

values

Page 30: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Previous value not chosen, but new proposer sees it:o New proposer will use existing valueo Both proposers can succeed

Basic Paxos Protocl: Race Condition I

time

s1s2s3s4s5

P 4.5

A 3.1 XP 3.1

P 3.1

P 3.1

A 3.1 X

A 3.1 X

P 4.5

P 4.5

A 4.5 X

A 4.5 X

A 4.5 X

X

Y

values

Page 31: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Previous value not chosen, new proposer doesn’t see it:o New proposer chooses its own valueo Older proposal blocked

Basic Paxos Protocl: Race Condition II

time

s1s2s3s4s5

P 4.5

A 3.1 XP 3.1

P 3.1

P 3.1

A 3.1 X

A 3.1 X

P 4.5

P 4.5

A 4.5 Y

A 4.5 Y

A 4.5 Y

X

Y

values

Page 32: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Summary of Cases

q Three possibilities when later proposal prepares:1. Previous value already chosen:

• New proposer will find it and use it2. Previous value not chosen, but new proposer sees it:

• New proposer will use existing value• Both proposers can succeed

3. Previous value not chosen, new proposer doesn’t see it:• New proposer chooses its own value• Older proposal blocked

33

Page 33: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Competing proposers can livelock:

q Potential solutions: o randomized delay before restarting, give other

proposers a chance to finish choosingo use leader election instead

Liveness

time

s1s2s3s4s5

A 3.1 XP 3.1

P 3.5

A 3.5 Y

P 3.1

P 3.1

P 3.5

P 3.5

A 3.1 X

A 3.1 X

P 4.1

P 4.1

P 4.1

A 3.5 Y

A 3.5 Y

P 5.5

P 5.5

P 5.5 A 4.1 X

A 4.1 X

A 4.1 X

Page 34: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

35

Outline

q Admin and recapq Controller framework supporting programmable

networkso architectureo data model and operations: OpenDaylight as an exampleo distributed data store

• overview• Paxos

– basic Paxos– multi-Paxos

Page 35: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Separate instance of Basic Paxos for each entry in the log:o Add index argument to Prepare and Accept

(selects entry in log)

Multi-Paxos

add jmp mov shlLog

ConsensusModule

StateMachine Server

Client

shl

OtherServers

1. Client sends command to server

2. Server uses Paxos to choose command as value for a log entry

3. Server waits for previous log entries to be applied, then applies new command to state machine

4. Server returns result from state machine to client

Page 36: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q Which log entry to use for a given client request?

q Performance optimizations:q Ensuring full replicationq Client protocolq Configuration changes

Note: Multi-Paxos not specified precisely in literature

Multi-Paxos Issues

Page 37: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Selecting Log Entries

38

q When request arrives from client:o Find first log entry not known to be choseno Run Basic Paxos to propose client’s command for this indexo Prepare returns acceptedValue?

• Yes: finish choosing acceptedValue, start again• No: choose client’s command

cmpmov add

cmp

ret

1 2 3 4 5 6 7

s1

submov add rets2

cmpmov add rets3

cmpmov add

shl

ret

1 2 3 4 5 6 7

s1

submov add rets2

cmpmov add rets3

cmp

sub jmp

jmp

jmp Known Chosen

Logs Before Logs After

Page 38: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Selecting Log Entries, cont’d

39

q Servers can handle multiple client requests concurrently:o Select different log entries for each

q Must apply commands to state machine in log order

Page 39: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

q What are inefficiencies of the concurrent, leaderless multi-paxos:o With multiple concurrent proposers, conflicts and

restarts are likely (higher load → more conflicts)o 2 rounds of RPCs for each value chosen (Prepare, Accept)

Leaderless Multi-Paxos Discussion

Page 40: CS434/534: Topics in Network Systemszoo.cs.yale.edu/classes/cs434/cs434-2019-spring/... · Previous value already chosen: •New proposer will find it and use it 2. Previous value

Prepare for Next Class

q Stare at Paxos and play with examples to better understand it

q Read Raftq Start to think about projects

54