Top Banner
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong
67

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

Feb 22, 2016

Download

Documents

gilead

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms. CS 249 Project Fall 2005 Wing Wong. Outline. Introduction Asynchronous distributed systems, distributed computations, consistency Two different strategies to construct global states - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

CS 249 ProjectFall 2005

Wing Wong

Page 2: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 2

Outline Introduction Asynchronous distributed systems, distributed

computations, consistency Two different strategies to construct global states

Monitor passively observes the system (reactive-architecture)

Monitor actively interrogates the system (snapshot protocol)

Properties of global predicates Sample applications: deadlock detection and debugging

Page 3: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 3

Introduction global state = union of local states of individual

processes many problems in distributed computing require:

construction of a global state and evaluation of whether the state satisfies some predicate Φ

difficulties: uncertainties in message delays relative speeds of computations

global state obtained can be obsolete, incomplete, or inconsistent

Page 4: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 4

Distributed Systems

collection of sequential processes p1, p2, …, pn

unidirectional communication channels between pairs of processes

reliable channels messages may be delivered out of order network strongly connected (not necessarily

completely)

Page 5: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 5

Asynchronous Distributed Systems

no bounds on relative process speeds no bounds on message delays no synchronized local clocks communication is the only possible

mechanism for synchronization

Page 6: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 6

Distributed Computations

distributed program executed by a collection of processes

each process executes a sequence of events

communication through events send(m) and receive(m), m as message identifier

Page 7: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 7

Distributed Computations

hi = ei1ei

2… local history of process pi canonical enumeration total order imposed by sequential execution

hik= ei

1ei2… ei

k

initial prefix of hi containing first k events H = h1 U … U hn

global history containing all events does not specify relative timing between events

h

Page 8: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 8

Distributed Computations

to order events, define binary relation “→” to capture “cause-and-effect”:

e → e’ if and only if e “causally precedes” e’ concurrent events: neither e → e’ nor e’ → e, write e || e’ distributed computation = partially ordered set defined by

(H, →)

Page 9: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 9

Distributed Computations

e21 → e3

6 ; e22 || e3

6

Page 10: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 10

Global States, Cuts and Runs σi

k

local state of process pi after event eik

Σ = (σ1,… ,σn) global state of distributed computationn-tuple of local states

cut C = h1c1 U … U hn

cn or (c1, …, cn) subset of global history H

Page 11: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 11

Global States, Cuts and Runs

(σ1c1,… ,σn

cn) global state correspond to cut C

(e1c1,… ,en

cn) frontier of cut C set of last events

run a total ordering R including all events in global history consistent with each local history

Page 12: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 12

Global States, Cuts and Runs

cut C = (5,2,4); cut C’ = (3,2,6) a consistent run R = e3

1e11e3

2e21e3

3e34e2

2e12e3

5e13e1

4e15e3

6e23e1

6

Page 13: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 13

Consistency

cut C is consistent if for all events e and e’

closed under the causal precedence relation consistent global state corresponds to a

consistent cut run R is consistent if for all events, e → e’ implies

e appears before e’ in R

Page 14: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 14

Consistency

run R = e1e2… results in a sequence of global states Σ0Σ1Σ2

Σi is obtained from Σi-1 by some process executing event ei , or Σi-1 leads to Σi

denote the transitive closure of the leads-to relation by ~>R

Σ’ is reachable from Σ in run R iff Σ ~>R Σ’

Page 15: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 15

Lattice of Global States lattice = set of all

consistent global states, along with leads-to relation

Σk1…kn = shorthand for global state (σ1

k1,…,σnkn)

k1 + … + kn = level of lattice

Page 16: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 16

Lattice of Global States path = sequence of

global states of increasing level (downwards)

each path corresponds to a consistent run

a possible path:Σ00 Σ01 Σ11 Σ21 Σ31 Σ32

Σ42 Σ43 Σ44 Σ54 Σ64 Σ65

Page 17: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 17

Observing Distributed Computations (reactive-architecture) processes notify monitor process p0

whenever they execute an event monitor constructs observation as the

sequence of events corresponding to the notification messages

problem: observation may be inconsistent due to

variability in notification message delays

Page 18: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 18

Observing Distributed Computations

Page 19: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 19

Observing Distributed Computations any permutation of run R is a possible

observation we need:

delivery rule at monitor process to restore message order

we have First-In-First-Out (FIFO) delivery using sequence number for all source-destination pair pi, pj : sendi(m) → sendi(m’) => deliverj(m) → deliverj(m’)

Page 20: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 20

Delivery Rule 1

assume: global real-time clock message delays bound by δ

process includes timestamp (real-time clock value) when notifying p0 of local event e

DR1: At time t, deliver all received messages with timestamps up to t – δ in increasing timestamp order

Page 21: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 21

Delivery Rule 1

let RC(e) denotes value of global clock when e is executed

real-time clock satisfies Clock Condition: e → e’ => RC(e) < RC(e’)

but logical clocks also satisfies clock condition…

Page 22: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 22

Logical Clocks event orderings based on increasing clock

values LC(ei) denotes value of logical clock when ei is

executed by pi each sent message m contains timestamp TS(m) update rules by pi at occurrence of ei:

Page 23: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 23

Logical Clocks

Page 24: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 24

Delivery Rule 2

replace real-time clock by logical clock need gap-detection property:

given events e, e’ where LC(e) < LC(e’), determine if some event e’’ exists such that LC(e) < LC(e’’) < LC(e’)

message is “stable” at p if no future messages with timestamps smaller than TS(m) can be received by p

Page 25: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 25

Delivery Rule 2

with FIFO, when p0 receives m from pi with timestamp TS(m), can be certain no other message m’ from pi with TS(m’) ≤ TS(m)

message m at p0 guaranteed stable when p0 has received at least one message from all other processes with timestamps > TS(m)

DR2: Deliver all received messages that are stable at p0 in increasing timestamp order

Page 26: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 26

Strong Clock Condition

DR1, DR2 assume RC(e) < RC(e’) (or LC(e) < LC(e’)) => e → e’

recall RC and LC guarantee clock condition: e → e’ => RC(e) < RC(e’)

DR1, DR2 can unnecessarily delay delivery want timing mechanism TC that gives Strong

Clock Condition: e → e’ ≡ TC(e) < TC(e’)

Page 27: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 27

Timing Mechanism 1 - Causal Histories causal history as “clock” value

set of all events that causally precede event e:

smallest consistent cut that includes e projection of θ(e) on process pi: θi(e) = θ(e) ∩ hi

Page 28: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 28

Timing Mechanism 1 - Causal Histories

Page 29: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 29

Timing Mechanism 1 - Causal Histories To maintain causal histories:

θ initially empty if ei is an internal or send event

θ(ei) = {ei} U θ(previous local event of pi)

if ei = receive of message m by pi from pj

θ(ei) = {ei} U θ(previous local event of pi) U θ(corresponding send event at pj)

Page 30: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 30

Timing Mechanism 1 - Causal Histories

new event e15

new event e23

newsendevent:

newreceiveevent:

Page 31: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 31

Timing Mechanism 1 - Causal Histories can interpret clock comparison as set

inclusion:e → e’ ≡ θ(e) θ(e’) (why not set membership, [e → e’ ≡ e θ(e’)]?)

unfortunately, causal histories grow too rapidly

Page 32: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 32

Timing Mechanism 2 - Vector Clocks note:

projection θi(e) = hik for some unique k

eir θi(e) for all r < k

can use single number k to represent θi(e) θ(e) = θ1(e) U … U θn(e)

represent entire causal history by n-dimensional vector clock VC(e), where for all 1 ≤ i ≤ n VC(e)[i] = k, if and only if θi(e) = hi

k

Page 33: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 33

Timing Mechanism 2 - Vector Clocks

Page 34: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 34

Timing Mechanism 2 - Vector Clocks To maintain vector clock:

each process pi initializes VC to contain all zeros update rules by pi at occurrence of ei:

VC(ei)[i] ≡ number of events pi has executed up to and including ei

VC(ei)[j] ≡ number of events of pj that causally precede event ei of pi

Page 35: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 35

Timing Mechanism 2 - Vector Clocks

newsendevent:

newreceiveevent:

causal histories vector clocks

Page 36: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 36

Vector Clock Comparison

Define “less than” relation: V < V’ ≡ (V ≠ V’) ( 1 ≤ k ≤ n: V[k] ≤ V’[k])

Page 37: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 37

Properties of Vector Clocks1. Strong Clock Condition:

e → e’ ≡ VC(e) < VC(e’)2. Simple Strong Clock Condition:

given event ei of pi and event ej of pj, i ≠ j ei → ej ≡ VC(ei)[i] ≤ VC(ej)[i]

Page 38: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 38

Properties of Vector Clocks

3. Test for Concurrency: given event ei of pi and event ej of pj

ei || ej ≡ (VC(ei)[i] > VC(ej)[i]) (VC(ej)[j] > VC(ei)[j])

4. Pairwise Inconsistent: given event ei of pi and ej of pj, i ≠ j if ei , ej cannot belong to the frontier of the same consistent cut

(VC(ei)[i] < VC(ej)[i]) (VC(ej)[j] < VC(ei)[j])

(concurrent)

Page 39: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 39

Properties of Vector Clocks5. Consistent Cut:

frontier contains no pairwise inconsistent eventsVC(ei

ci)[i] VC(ejcj)[i] , 1 ≤ i, j ≤ n

6. Counting # of events causally precede ei: #(ei) = (Σj=1 .. n VC(ei)[j]) – 1 # events = 4+1+3-1 = 7

Page 40: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 40

Properties of Vector Clocks

7. Weak Gap-Detection: given event ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k ≠ j, there exists

event ek such that (ek → ei) (ek → ej)

Page 41: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 41

Causal Delivery and Vector Clocks

assume processes increment local component of VC only for events notified to monitor p0

p0 maintains set M for messages received but not yet delivered

suppose we have: message m from pj

m’ = last message delivered from process pk, k ≠ j

Page 42: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 42

Causal Delivery and Vector Clocks

To deliver m, p0 must verify:1. no earlier message from pj is undelivered

(i.e. TS(m)[j] – 1 messages have been delivered from pj)

2. no undelivered message m’’ from pk s.t.sendk(m’)→sendk(m’’)→sendj(m), k ≠ j (i.e. whether TS(m’)[k] TS(m)[k] for all k)

Page 43: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 43

Causal Delivery and Vector Clocks

p0 maintains array D[1…n] where D[i] = TS(mi)[i], mi being last message delivered from pi

e.g. on right, delivery of m is delayed until m’’ is received and delivered

0 0 1 0 2 0 2 1

m'm''

(0, 0) (1, 0) (2, 0) m (2, 1)

(0, 0) (1, 0) (2, 0) (2, 1)

P0

Pk

Pj

Page 44: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 44

Delivery Rule 3 Causal Delivery:

for all messages m, m’, sending processes pi, pj and destination process pk

sendi(m) → sendj(m’) => deliverk(m) → deliverk(m’)

DR3 (Causal Delivery): Deliver message m from process pj as soon as

D[j] = TS(m)[j] – 1, andD[k] TS(m)[k], k ≠ j

p0 set D[j] to TS(m)[j] after delivery of m

Page 45: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 45

Causal Delivery and Hidden Channels should apply to

closed systems incorrect conclusion

with hidden channels (communication channel external to the system)

Page 46: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 46

Active Monitoring - Distributed Snapshots monitor p0 requests states of other

processes and combine into global state assume channels implement FIFO delivery channel state χi,j for channel pi to pj:

messages sent by pi not yet received by pj

Page 47: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 47

Distributed Snapshots

notations:INi = set of processes having direct channels to pi

OUTi = set of processes to which pi has a channel

for each execution of the snapshot protocol, process pi record its local state σi and the states of its incoming channels (χj,i for all pj INi)

Page 48: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 48

Distributed Snapshots

Snapshot Protocol (Chandy-Lamport)1. p0 starts the protocol by sending itself a “take

snapshot” message2. when receiving the “take snapshot” message for the

first time from process pf

pi records local state σi and relays the “take snapshot” message along all outgoing channels

channel state χf,i is set to empty pi starts recording messages on other incoming channels

Page 49: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 49

Distributed Snapshots

Snapshot Protocol (Chandy-Lamport)3. when receiving the “take snapshot” message

beyond the first time from process ps: pi stops recording messages along channel from ps

channel state χs,i are messages that have been recorded

Page 50: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 50

Distributed Snapshots

dash arrows indicate “take snapshot” messages constructed global state: Σ23; χ1,2 empty; χ2,1 = {m}

p1 done

p2 done

Page 51: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 51

Properties of Snapshots

Let Σs = global state constructed Σa = global state when protocol initiated Σf = global state when protocol terminated

Σs is guaranteed to be consistent actual run that the system followed may

not pass through Σs but a run R such that Σa ~>R Σs ~>R Σf

Page 52: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 52

Properties of Snapshots

Σa = Σ21

Σf = Σ55

r does not pass through Σs ( = Σ23)

Page 53: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 53

Properties of Snapshots

but Σ21 ~> Σ23 ~> Σ55

Page 54: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 54

Properties of Global Predicates

Now we have two methods for global predicate evaluation:monitor passively observing runs monitor actively constructing snapshots

utility of either approach depends (in part) on properties of the predicate

Page 55: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 55

Stable Predicates

communication delays => Σs can only reflect some past state of the system

stable predicate: once become true, remain true e.g. deadlock, termination, loss of all tokens,

unreachable storage if Φ is stable, then

(Φ is true in Σs) => (Φ is true in Σf) and(Φ is false in Σs) => (Φ is false in Σa)

Page 56: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 56

Stable Predicates deadlock detection through snapshots (p.29, 30)

Page 57: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 57

Stable Predicates deadlock detection using reactive protocol (p.31, 32)

Page 58: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 58

Nonstable Predicates

e.g. debugging, checking if queue lengths exceed some thresholds

Two problems:1. condition may not persist long enough for it

to be true when the predicate is evaluated2. if a predicate Φ is found true, do not know

whether Φ ever held during the actual run

Page 59: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 59

Nonstable Predicates e.g. monitoring condition

(x = y) 7 states where (x = y) holds but no longer hold after

state Σ54 e.g. (y – x) = 2

condition hold only in Σ31

and Σ41

monitor might detect (y - x) = 2 even if actual run never goes through Σ31 or Σ41

Page 60: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 60

Nonstable Predicates

very little value to detect nonstable predicate

Page 61: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 61

Nonstable Predicates

With observations, can extend predicates: Possibly(Φ): There exist a consistent

observation O of the computation such that Φ holds in a global state of O

Definitely(Φ): For every consistent observation O of the computation, there exists a global state of O in which Φ holds

e.g. Possibly((y – x) = 2), Definitely(x = y)

Page 62: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 62

Nonstable Predicates

use of extended predicate in debugging:if Φ = some erroneous state, then Possibly(Φ) indicates a bug, even if it is not observed during an actual run

if predicate Φ is stable, then Possibly(Φ) ≡ Definitely(Φ)

Page 63: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 63

Detecting Possibly and Definitely Φ

detection based on the lattice of consistent global states

If any global state in the lattice satisfies Φ, then Possibly(Φ) holds

Definitely(Φ) requires all possible runs to pass through a global state that satisfies Φ

Page 64: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 64

Detecting Possibly and Definitely Φ

Possibly((y – x) = 2) Definitely(y = x)

(why?)

Page 65: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 65

Detecting Possibly and Definitely Φ

set of global state current with progressively increasing levels

any member of current satisfies Φ => Possibly(Φ) true

Page 66: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 66

Detecting Possibly and Definitely Φ

iteratively construct set of global states of level l without passing through a state that satisfies Φ

set empty => Definitely(Φ) true

set contains the final state => Definitely(Φ) true

Page 67: Consistent Global States of Distributed Systems:      Fundamental Concepts and Mechanisms

Consist Global States 67

Conclusions many distributed system problems require recognizing

certain global conditions two approaches to constructing global states:

reactive-architecture based snapshot based

timing mechanism that captures causal precedence relation

applying to distributed deadlock detection and debugging solutions can be adapted to deal with nonstable

predicates, multiple observations and failures