Top Banner
OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman
33

OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Reconfigurable Distributed Storage for Dynamic

NetworksGregory Chockler, Seth Gilbert,

Vincent Gramoli, Peter M Musial, Alexander A Shvartsman

Page 2: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Goals

Reconfigurable Distributed Storage (RDS)• Atomic consistency (read/write)• Fault Tolerance

…in Dynamic and Asynchronous Systems.

Page 3: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage

Page 4: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage

Data is replicated at several network locations

Page 5: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage

Write

Read

Operation policy

Page 6: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

…in Dynamic Networks

Page 7: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage in Dynamic Networks

Page 8: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage in Dynamic Networks

leaving nodesjoining nodes

Page 9: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage in Dynamic Networks

Page 10: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage in Dynamic Networks

…requires a reconfiguration process.

Page 11: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Distributed Storage in Dynamic Networks

…by achieving agreement.

Page 12: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Model

• Distributed– Connected set of processors– Each processor has a unique id i I– MWMR, any processor is a potential client

• Asynchronous– Asynchronous processors – Point-to-point asynchronous unreliable

channels• Dynamic

– Processors join and leave the system– Processors may crash

Page 13: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

What is a configuration?

• Configuration <members, read-quorums, write-quorums>– members is a set of processors,– read-quorums, write-quorums two sets of quorums RQ read-quorums, WQ write-quorums

• RQ members • WQ members • RQ WQ (only for a given configuration)

• Every client maintains a set of configurations, initially containing the default one.

Page 14: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Single Object Operations Overview

After [ABD95]• tag = <c,i> N I, val a possible value

• val = Read()i

(<c,j>,val)=query();[prop(<c,j>,val);]

• Write(val)i (<c’,j>,val’)=query();prop(<c’++,i>,val);

1.(tag,val) query(NULL): gathers (tag,val) pairs of all processors of a RQ and returns the one with the largest tag.

2.NULL prop(tag,val): updates (tag,val) pairs at all processors of a WQ.

Write tag

Read tag

Page 15: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Reconfiguration Design Goals

• Sound– Totally ordered configurations

• Flexible – No dependences between configurations

• Non-intrusive– Makes possible concurrent read/write

operations

• Fast– Strengthening fault tolerance

Page 16: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Decoupling Reconfiguration

• Reconfiguration = Replacing Configurations– {I} Installing a new configuration– {R} Removing old configuration(s)

• If {R} ≺ {I} Operations are delayed

• If {I} ≺ {R} Stronger configuration viability assumption is required

Page 17: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Solution

({R} ≺ {I}) ({I} ≺ {R})

{I} // {R}

Tighter coupling between removal and installation

Page 18: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration• Reconfiguration is based on Paxos (3 phases leader-based consensus alorithm)• l is the leader• c is the current configuration• configs is the set of active configurations• A ballot has a unique identifier b and a value v,

which is a configuration• Paxos phases:

– Prepare: l creates a new ballot and chooses/gets the value to propose.

– Propose: l proposes <b,v> and gathers votes from a majority.

– Propagate: l propagates decision

Page 19: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

Recon(c,c’)

Page 20: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

Prepare phaseRecon(c,c’) •Creates a new larger ballot b

Page 21: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

<1a, b>

Prepare phaseRecon(c,c’)

Page 22: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

<1a, b>

<1b, b, configs, <b’’, c’’>>

•Updates its ballot’s value v with the one received •Updates its configs set

Prepare phaseRecon(c,c’)

Page 23: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

<1a, b>

<1b, b, configs, <b’’, c’’>>

<2a, b, c, v>

Propose phaseRecon(c,c’)

Page 24: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

<1a, b>

<1b, b, configs, <b’’, c’’>>

<2a, b, c, v>

<2b, b, c, v, tag, val>

Recon(c,c’)

<2b, b, c, v, tag, val>

Propose phase

•Updates their tag and val•Adds v to their configs set

Page 25: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

RDS Reconfiguration

l

RQWQ

<1a, b>

<1b, b, configs, <b’’, c’’>>

<2a, b, c, v>

<2b, b, c, v, tag, val><3a, c, v, tag, val>

<3a, c, v, tag, val>

Recon(c,c’)

<2b, b, c, v, tag, val>

Propagation phase

•Update their tag and val•Remove configuration c from their configs set

<3a, c, v, tag, val>

Page 26: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Proving Atomicity

• Ordering configurations

• Ordering operations

Theorem 1: The set of installed configurations in the system is totally ordered.

Theorem 2: If operation 1 precedes operation 2 then 1’s tag is not larger than 2’s tag.

Page 27: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Additional Assumptions

• Eventual stabilization with– Unique leader l – Message delay bound d (unkown to the algorithm) – Gossip with frequency d– Restricted reconfiguration rate– Some quorums remain alive in active configurations

ts

ts: System stabilization time

Let’s tr be the Request time

2d

tl: Algorithm stabilization time

tl

Page 28: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Reconfiguration Latency

Worst case scenario: Last reconfiguration was done by a different leader.

Prepare

max(tl, tr)

Propose Propagate

2d 2d d

te

te: end timeReconfiguration is complete

5d

Page 29: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Reconfiguration Latency

Other cases: The leader made the previous reconfiguration.

max(tl, tr)

Propose Propagate

2d d

te

te: end timeReconfiguration is complete

3d

Page 30: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Operation Latency

Phase latency: • 2d is sufficient for the phase round trip.• In some cases (pending reconfiguration), the phase might be delayed twice.

1st round trip

Operation latency: • Operations are bounded by 8d.• In some cases, the propagation phase of the read operation can be ignored, leading to a possible bound of 2d.

2nd round trip

2d 2d

New configuration discovered

Page 31: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Experimental Results

• IOA to Java code following set of rules.

• Implementation of Attiya, Bar-Noy, and Dolev algorithm « ABD » (w/o Reconfiguration) and RDS which shares parts of the ABD code.

• Using majority-based configurations.

• Measuring operation latency1. While varying configuration size2. While varying algorithm instances

Page 32: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Experimental Results

• Operation latency of RDS is competitive with ABD, confirming the theory.

• Reconfiguration messages contain operation information which might accelerate operations in RDS.

Page 33: OPODIS 05 Reconfigurable Distributed Storage for Dynamic Networks Gregory Chockler, Seth Gilbert, Vincent Gramoli, Peter M Musial, Alexander A Shvartsman.

OPODIS 05

Conclusion

• RDS, Reconfigurable Distributed Storage.• With sound, flexible, non-intrusive and

fast reconfiguration.• It solves two problems in one:

Configuration replacement and Consensus.

• Reconfiguration is inexpensive (time).• Fault tolerance is strenghtened.• RAMBO can become more agressive: it is

exactly what we did here!