Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong
Feb 22, 2016
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms
CS 249 ProjectFall 2005
Wing Wong
Consist Global States 2
Outline Introduction Asynchronous distributed systems, distributed
computations, consistency Two different strategies to construct global states
Monitor passively observes the system (reactive-architecture)
Monitor actively interrogates the system (snapshot protocol)
Properties of global predicates Sample applications: deadlock detection and debugging
Consist Global States 3
Introduction global state = union of local states of individual
processes many problems in distributed computing require:
construction of a global state and evaluation of whether the state satisfies some predicate Φ
difficulties: uncertainties in message delays relative speeds of computations
global state obtained can be obsolete, incomplete, or inconsistent
Consist Global States 4
Distributed Systems
collection of sequential processes p1, p2, …, pn
unidirectional communication channels between pairs of processes
reliable channels messages may be delivered out of order network strongly connected (not necessarily
completely)
Consist Global States 5
Asynchronous Distributed Systems
no bounds on relative process speeds no bounds on message delays no synchronized local clocks communication is the only possible
mechanism for synchronization
Consist Global States 6
Distributed Computations
distributed program executed by a collection of processes
each process executes a sequence of events
communication through events send(m) and receive(m), m as message identifier
Consist Global States 7
Distributed Computations
hi = ei1ei
2… local history of process pi canonical enumeration total order imposed by sequential execution
hik= ei
1ei2… ei
k
initial prefix of hi containing first k events H = h1 U … U hn
global history containing all events does not specify relative timing between events
h
Consist Global States 8
Distributed Computations
to order events, define binary relation “→” to capture “cause-and-effect”:
e → e’ if and only if e “causally precedes” e’ concurrent events: neither e → e’ nor e’ → e, write e || e’ distributed computation = partially ordered set defined by
(H, →)
Consist Global States 9
Distributed Computations
e21 → e3
6 ; e22 || e3
6
Consist Global States 10
Global States, Cuts and Runs σi
k
local state of process pi after event eik
Σ = (σ1,… ,σn) global state of distributed computationn-tuple of local states
cut C = h1c1 U … U hn
cn or (c1, …, cn) subset of global history H
Consist Global States 11
Global States, Cuts and Runs
(σ1c1,… ,σn
cn) global state correspond to cut C
(e1c1,… ,en
cn) frontier of cut C set of last events
run a total ordering R including all events in global history consistent with each local history
Consist Global States 12
Global States, Cuts and Runs
cut C = (5,2,4); cut C’ = (3,2,6) a consistent run R = e3
1e11e3
2e21e3
3e34e2
2e12e3
5e13e1
4e15e3
6e23e1
6
Consist Global States 13
Consistency
cut C is consistent if for all events e and e’
closed under the causal precedence relation consistent global state corresponds to a
consistent cut run R is consistent if for all events, e → e’ implies
e appears before e’ in R
Consist Global States 14
Consistency
run R = e1e2… results in a sequence of global states Σ0Σ1Σ2
Σi is obtained from Σi-1 by some process executing event ei , or Σi-1 leads to Σi
denote the transitive closure of the leads-to relation by ~>R
Σ’ is reachable from Σ in run R iff Σ ~>R Σ’
Consist Global States 15
Lattice of Global States lattice = set of all
consistent global states, along with leads-to relation
Σk1…kn = shorthand for global state (σ1
k1,…,σnkn)
k1 + … + kn = level of lattice
Consist Global States 16
Lattice of Global States path = sequence of
global states of increasing level (downwards)
each path corresponds to a consistent run
a possible path:Σ00 Σ01 Σ11 Σ21 Σ31 Σ32
Σ42 Σ43 Σ44 Σ54 Σ64 Σ65
Consist Global States 17
Observing Distributed Computations (reactive-architecture) processes notify monitor process p0
whenever they execute an event monitor constructs observation as the
sequence of events corresponding to the notification messages
problem: observation may be inconsistent due to
variability in notification message delays
Consist Global States 18
Observing Distributed Computations
Consist Global States 19
Observing Distributed Computations any permutation of run R is a possible
observation we need:
delivery rule at monitor process to restore message order
we have First-In-First-Out (FIFO) delivery using sequence number for all source-destination pair pi, pj : sendi(m) → sendi(m’) => deliverj(m) → deliverj(m’)
Consist Global States 20
Delivery Rule 1
assume: global real-time clock message delays bound by δ
process includes timestamp (real-time clock value) when notifying p0 of local event e
DR1: At time t, deliver all received messages with timestamps up to t – δ in increasing timestamp order
Consist Global States 21
Delivery Rule 1
let RC(e) denotes value of global clock when e is executed
real-time clock satisfies Clock Condition: e → e’ => RC(e) < RC(e’)
but logical clocks also satisfies clock condition…
Consist Global States 22
Logical Clocks event orderings based on increasing clock
values LC(ei) denotes value of logical clock when ei is
executed by pi each sent message m contains timestamp TS(m) update rules by pi at occurrence of ei:
Consist Global States 23
Logical Clocks
Consist Global States 24
Delivery Rule 2
replace real-time clock by logical clock need gap-detection property:
given events e, e’ where LC(e) < LC(e’), determine if some event e’’ exists such that LC(e) < LC(e’’) < LC(e’)
message is “stable” at p if no future messages with timestamps smaller than TS(m) can be received by p
Consist Global States 25
Delivery Rule 2
with FIFO, when p0 receives m from pi with timestamp TS(m), can be certain no other message m’ from pi with TS(m’) ≤ TS(m)
message m at p0 guaranteed stable when p0 has received at least one message from all other processes with timestamps > TS(m)
DR2: Deliver all received messages that are stable at p0 in increasing timestamp order
Consist Global States 26
Strong Clock Condition
DR1, DR2 assume RC(e) < RC(e’) (or LC(e) < LC(e’)) => e → e’
recall RC and LC guarantee clock condition: e → e’ => RC(e) < RC(e’)
DR1, DR2 can unnecessarily delay delivery want timing mechanism TC that gives Strong
Clock Condition: e → e’ ≡ TC(e) < TC(e’)
Consist Global States 27
Timing Mechanism 1 - Causal Histories causal history as “clock” value
set of all events that causally precede event e:
smallest consistent cut that includes e projection of θ(e) on process pi: θi(e) = θ(e) ∩ hi
Consist Global States 28
Timing Mechanism 1 - Causal Histories
Consist Global States 29
Timing Mechanism 1 - Causal Histories To maintain causal histories:
θ initially empty if ei is an internal or send event
θ(ei) = {ei} U θ(previous local event of pi)
if ei = receive of message m by pi from pj
θ(ei) = {ei} U θ(previous local event of pi) U θ(corresponding send event at pj)
Consist Global States 30
Timing Mechanism 1 - Causal Histories
new event e15
new event e23
newsendevent:
newreceiveevent:
Consist Global States 31
Timing Mechanism 1 - Causal Histories can interpret clock comparison as set
inclusion:e → e’ ≡ θ(e) θ(e’) (why not set membership, [e → e’ ≡ e θ(e’)]?)
unfortunately, causal histories grow too rapidly
Consist Global States 32
Timing Mechanism 2 - Vector Clocks note:
projection θi(e) = hik for some unique k
eir θi(e) for all r < k
can use single number k to represent θi(e) θ(e) = θ1(e) U … U θn(e)
represent entire causal history by n-dimensional vector clock VC(e), where for all 1 ≤ i ≤ n VC(e)[i] = k, if and only if θi(e) = hi
k
Consist Global States 33
Timing Mechanism 2 - Vector Clocks
Consist Global States 34
Timing Mechanism 2 - Vector Clocks To maintain vector clock:
each process pi initializes VC to contain all zeros update rules by pi at occurrence of ei:
VC(ei)[i] ≡ number of events pi has executed up to and including ei
VC(ei)[j] ≡ number of events of pj that causally precede event ei of pi
Consist Global States 35
Timing Mechanism 2 - Vector Clocks
newsendevent:
newreceiveevent:
causal histories vector clocks
Consist Global States 36
Vector Clock Comparison
Define “less than” relation: V < V’ ≡ (V ≠ V’) ( 1 ≤ k ≤ n: V[k] ≤ V’[k])
Consist Global States 37
Properties of Vector Clocks1. Strong Clock Condition:
e → e’ ≡ VC(e) < VC(e’)2. Simple Strong Clock Condition:
given event ei of pi and event ej of pj, i ≠ j ei → ej ≡ VC(ei)[i] ≤ VC(ej)[i]
Consist Global States 38
Properties of Vector Clocks
3. Test for Concurrency: given event ei of pi and event ej of pj
ei || ej ≡ (VC(ei)[i] > VC(ej)[i]) (VC(ej)[j] > VC(ei)[j])
4. Pairwise Inconsistent: given event ei of pi and ej of pj, i ≠ j if ei , ej cannot belong to the frontier of the same consistent cut
(VC(ei)[i] < VC(ej)[i]) (VC(ej)[j] < VC(ei)[j])
(concurrent)
Consist Global States 39
Properties of Vector Clocks5. Consistent Cut:
frontier contains no pairwise inconsistent eventsVC(ei
ci)[i] VC(ejcj)[i] , 1 ≤ i, j ≤ n
6. Counting # of events causally precede ei: #(ei) = (Σj=1 .. n VC(ei)[j]) – 1 # events = 4+1+3-1 = 7
Consist Global States 40
Properties of Vector Clocks
7. Weak Gap-Detection: given event ei of pi and ej of pj, if VC(ei)[k] < VC(ej)[k] for some k ≠ j, there exists
event ek such that (ek → ei) (ek → ej)
Consist Global States 41
Causal Delivery and Vector Clocks
assume processes increment local component of VC only for events notified to monitor p0
p0 maintains set M for messages received but not yet delivered
suppose we have: message m from pj
m’ = last message delivered from process pk, k ≠ j
Consist Global States 42
Causal Delivery and Vector Clocks
To deliver m, p0 must verify:1. no earlier message from pj is undelivered
(i.e. TS(m)[j] – 1 messages have been delivered from pj)
2. no undelivered message m’’ from pk s.t.sendk(m’)→sendk(m’’)→sendj(m), k ≠ j (i.e. whether TS(m’)[k] TS(m)[k] for all k)
Consist Global States 43
Causal Delivery and Vector Clocks
p0 maintains array D[1…n] where D[i] = TS(mi)[i], mi being last message delivered from pi
e.g. on right, delivery of m is delayed until m’’ is received and delivered
0 0 1 0 2 0 2 1
m'm''
(0, 0) (1, 0) (2, 0) m (2, 1)
(0, 0) (1, 0) (2, 0) (2, 1)
P0
Pk
Pj
Consist Global States 44
Delivery Rule 3 Causal Delivery:
for all messages m, m’, sending processes pi, pj and destination process pk
sendi(m) → sendj(m’) => deliverk(m) → deliverk(m’)
DR3 (Causal Delivery): Deliver message m from process pj as soon as
D[j] = TS(m)[j] – 1, andD[k] TS(m)[k], k ≠ j
p0 set D[j] to TS(m)[j] after delivery of m
Consist Global States 45
Causal Delivery and Hidden Channels should apply to
closed systems incorrect conclusion
with hidden channels (communication channel external to the system)
Consist Global States 46
Active Monitoring - Distributed Snapshots monitor p0 requests states of other
processes and combine into global state assume channels implement FIFO delivery channel state χi,j for channel pi to pj:
messages sent by pi not yet received by pj
Consist Global States 47
Distributed Snapshots
notations:INi = set of processes having direct channels to pi
OUTi = set of processes to which pi has a channel
for each execution of the snapshot protocol, process pi record its local state σi and the states of its incoming channels (χj,i for all pj INi)
Consist Global States 48
Distributed Snapshots
Snapshot Protocol (Chandy-Lamport)1. p0 starts the protocol by sending itself a “take
snapshot” message2. when receiving the “take snapshot” message for the
first time from process pf
pi records local state σi and relays the “take snapshot” message along all outgoing channels
channel state χf,i is set to empty pi starts recording messages on other incoming channels
Consist Global States 49
Distributed Snapshots
Snapshot Protocol (Chandy-Lamport)3. when receiving the “take snapshot” message
beyond the first time from process ps: pi stops recording messages along channel from ps
channel state χs,i are messages that have been recorded
Consist Global States 50
Distributed Snapshots
dash arrows indicate “take snapshot” messages constructed global state: Σ23; χ1,2 empty; χ2,1 = {m}
p1 done
p2 done
Consist Global States 51
Properties of Snapshots
Let Σs = global state constructed Σa = global state when protocol initiated Σf = global state when protocol terminated
Σs is guaranteed to be consistent actual run that the system followed may
not pass through Σs but a run R such that Σa ~>R Σs ~>R Σf
Consist Global States 52
Properties of Snapshots
Σa = Σ21
Σf = Σ55
r does not pass through Σs ( = Σ23)
Consist Global States 53
Properties of Snapshots
but Σ21 ~> Σ23 ~> Σ55
Consist Global States 54
Properties of Global Predicates
Now we have two methods for global predicate evaluation:monitor passively observing runs monitor actively constructing snapshots
utility of either approach depends (in part) on properties of the predicate
Consist Global States 55
Stable Predicates
communication delays => Σs can only reflect some past state of the system
stable predicate: once become true, remain true e.g. deadlock, termination, loss of all tokens,
unreachable storage if Φ is stable, then
(Φ is true in Σs) => (Φ is true in Σf) and(Φ is false in Σs) => (Φ is false in Σa)
Consist Global States 56
Stable Predicates deadlock detection through snapshots (p.29, 30)
Consist Global States 57
Stable Predicates deadlock detection using reactive protocol (p.31, 32)
Consist Global States 58
Nonstable Predicates
e.g. debugging, checking if queue lengths exceed some thresholds
Two problems:1. condition may not persist long enough for it
to be true when the predicate is evaluated2. if a predicate Φ is found true, do not know
whether Φ ever held during the actual run
Consist Global States 59
Nonstable Predicates e.g. monitoring condition
(x = y) 7 states where (x = y) holds but no longer hold after
state Σ54 e.g. (y – x) = 2
condition hold only in Σ31
and Σ41
monitor might detect (y - x) = 2 even if actual run never goes through Σ31 or Σ41
Consist Global States 60
Nonstable Predicates
very little value to detect nonstable predicate
Consist Global States 61
Nonstable Predicates
With observations, can extend predicates: Possibly(Φ): There exist a consistent
observation O of the computation such that Φ holds in a global state of O
Definitely(Φ): For every consistent observation O of the computation, there exists a global state of O in which Φ holds
e.g. Possibly((y – x) = 2), Definitely(x = y)
Consist Global States 62
Nonstable Predicates
use of extended predicate in debugging:if Φ = some erroneous state, then Possibly(Φ) indicates a bug, even if it is not observed during an actual run
if predicate Φ is stable, then Possibly(Φ) ≡ Definitely(Φ)
Consist Global States 63
Detecting Possibly and Definitely Φ
detection based on the lattice of consistent global states
If any global state in the lattice satisfies Φ, then Possibly(Φ) holds
Definitely(Φ) requires all possible runs to pass through a global state that satisfies Φ
Consist Global States 64
Detecting Possibly and Definitely Φ
Possibly((y – x) = 2) Definitely(y = x)
(why?)
Consist Global States 65
Detecting Possibly and Definitely Φ
set of global state current with progressively increasing levels
any member of current satisfies Φ => Possibly(Φ) true
Consist Global States 66
Detecting Possibly and Definitely Φ
iteratively construct set of global states of level l without passing through a state that satisfies Φ
set empty => Definitely(Φ) true
set contains the final state => Definitely(Φ) true
Consist Global States 67
Conclusions many distributed system problems require recognizing
certain global conditions two approaches to constructing global states:
reactive-architecture based snapshot based
timing mechanism that captures causal precedence relation
applying to distributed deadlock detection and debugging solutions can be adapted to deal with nonstable
predicates, multiple observations and failures