Distributed Algorithms Luc J. B. Onana Alima Seif Haridi.
Post on 06-Jan-2018
230 Views
Preview:
DESCRIPTION
Transcript
Distributed AlgorithmsDistributed Algorithms
Luc J. B. Onana Alima Seif Haridi
2
Introduction• What is a distributed system ? set of autonomous processors interconnected in some way
•What is a distributed algorithm (protocol) ? Concurrently executing components each on a separate processor•Distributed algorithms can be extremely complex: many components run concurrently; locality; failure; non-determinism; independent inputs; no global clock, uncertain message delivery; uncertain messages ordering, .
•Can we understand everything about their executions?
3
Ch9: Models of Distributed Computation
• Preliminaries• Notations• Assumptions
• Causality• Lamport Timestamps• Vector Timestamps• Causal Communication
• Distributed Snapshots• Modeling a Distributed Computation• Execution DAG Predicates• Failures in Distributed System
4
Ch9 Models: Preliminaries Assumptions
A1: No shared variables among processors
A2: On each processor there are a number of executing threads
A3: Communication by sending and receiving messages send(dest,action,param) is non-blocking;
A4: Event driven algorithms: reaction upon receipt of a declared event; Events: sending or receiving a message; etc. An event is buffered until it is handeled; Dedicated thread to handle some events at any time;
5
Ch9 Models: Preliminaries Notations Waiting for events
wait for A1,A2,…,An on Ai (source;param) do code to handle Ai , 1<= i <=n end
Waiting for an event from p up to T seconds wait until p sends (event;param), timeout=T on timeout do timeout action
end on event(param) from p do Successful response actions
6
Ch9 Models: Preliminaries Notations Waiting for events
wait for A1,A2,…,An on Ai (source;param) do code to handle Ai , 1<= i <=n end
end Waiting for an event from p up to T seconds
wait for pon timeout do time-out action end
on Ai(param) from p do action endend
7
Ch9 Models: Preliminaries Notations Waiting for responses from a set of processors up to T seconds
wait up to T seconds for (event;param) messages Event:<message handling code> To be considered if necessary.
8
Ch9 Models: Preliminaries Concurrency control within an instance of a protocol Definition : Let P be a protocol. If instance of P at processor q consists of threads T1, T2, T3, …, Tn , we say that T1, T2, …, Tn are in the same family. they access the same set of variables; need for concurrency control; Assumption used: A5: Once a thread gains control of the processor, it does not release control to a thread of the same family until it is blocked.
9
Ch9 Models: Causality There is no global time in a distributed system processors cannot do simultaneous observations of global states Causality serves as a supporting property
Provided traveling backward in time is excluded, distributed systems are causal The cause precedes the effect.
The sending of a message precedes the receipt of that message
10
Ch9 Models: CausalitySystem composition we assume a distributed system composed of the set processors P = {p1, …, pM}.
Each processor reacts upon receipt of an event
Two classes of events: External/Communication events: sending a message; receiving a message Internal events: local input/output; raising of a signal; decision on a commit point (database); etc.
11
Ch9 Models: CausalityNotations: E : the set of all possible events in our system Ep : the set of all events in E that occur at processor p
We are interested in defining orders between events Why?
In many cases, orders are necessary for coordinating distributed activities (e.g. many concurrency control algorithms use ordering of events we’ll see this later)
12
Ch9 Models: Causality Orders between events 1) on the same processor p Order: <p, e <p e’ ``e occurs before e’ in p´´. If e and e’ occur on the same processor p then either e <p e’ or e’ <p e i.e. in the same processor events are totally ordered
Time
p
e
e’
e <p e’
13
Ch9 Models: Causality Orders between events2) of sending message m and receiving message m Order: <m If e is the sending of message m, and e’ the receipt of message m then e <m e’
14
Ch9 Models: Causality Orders between events3) in general (i.e. all events in E are considered) Order: <H ``happens-before´´ or ``can causally affect´´
Definition <H is the union of <p and <m (for all p,m), and transitive (i.e. if e1 <H e2 and e2 <H e3 then e1 <H e3)
Definition: we define a causal path from e to e’ as a sequence of events e1,e2,…,en such that 1) e=e1; e’=en 2) for each i in {1,..,n}, ei <H ei+1
Thus, e <H e’ if only if there is a causal path from e to e’
15
Ch9 Models: CausalityHappens-before is a partial order
It is possible to have two events e and e’ (e e’) such that neither e <H e’ nor e’ <H e
If two events e and e’ are such that neither e <H e’ nor e’ <H e, then e and e’ are concurrent and we write e || e’ The possibility of concurrent events implies that the happens-before (<H) relation is a partial order
16
Ch9 Models: CausalitySpace-Time diagram:Happens-before DAG
p1 p2 p3
e7
e4
e3
e2
e5
e8
e6
e1
Time
No causal path neither from e1 to e2 nor from e2 to e1
e1 and e2 are concurrent
No causal path neither from e1 to e6 nor from e6 to e1
e1 and e6 are concurrent
No causal path neither from e2 to e6 nor from e6 to e2
e2 and e6 are concurrentDependencies must point forward in time
17
Ch9 Models: CausalitySpace-Time diagram:Happens-before DAG
p1 p2 p3
e7
e4
e3
e2
e5
e8
e6
e1
Time
Compare:
e1 and e7;
e1 and e8;
e5 and e2;
e4 and e6
18
Ch9 Models: CausalityGlobal Logical Clock (Time stamps)
Although there is no global time in a distributed system, a Global Logical Clock (GLC) that assigns total order to the events in a distributed system is very useful
Such a global logical clock can be used to arbitrate requests for resources in a fair manner, breaks deadlock, etc. A GLC should assign a time stamp t(e) to each event e such that t(e) < t(e’) or t(e’) < t(e) for e e’, furthermore the order imposed by the GLC should be consistent with <H., that is if e <H e’ then t(e) < t(e’)
19
Ch9 Models: CausalityLamport’s Algorithm Gives a Global Logical Clock consistent with <H
Each event e receives an integer e.TS such that e <H e’ e.TS < e’.TS Concurrent events (unrelated by <H) are ordered according to the processor address (assume these are integers)
Timestamps t(e) = (e.TS,p) when e occurs at processor p Ordering of timestamps: (e.TS,p) < (e’.TS,q) iff e.TS < e’.TS or e.TS = e’.TS and p < q
20
Ch9 Models: CausalityLamport’s Algorithm (cont.)
Each processor maintains p a local timestamp my_TS
Each processor attaches its timestamp to all messages that it sends
21
Ch9 Models: Causality Lamport’s Timestamp algorithmInitially, my_TS = 0wait for any event e on e do if e is the receipt of message m then my_TS := max(m.TS,my_TS)+1; e.TS := my_TS elseif e is an internal event then my_TS := my_TS+1 ; e.TS := my_TS elseif e is the sending of message m then my_TS := my_TS+1 ; e.TS := my_TS; m.TS = my_TS endend
22
Ch9 Models: CausalityLamport’s Algorithm (cont.)
Lamport’s algorithm ensures that e <H e’ e.TS < e’.TSReason: if e1 <p e2 or e1 <m e2 then
e2 is assigned a higher timestamp than e1 Note: It is easy to see that the algorithm presented does not assign total order to the events in the system. Processor address to break the ties
23
Ch9 Models: CausalityLamport’s timestamps illustrated
p1 p2 p3
e7
e4
e3
e2
e5
e8
e6
e1
Why e7 is labeled (3,1)?e8 is labeled (4,3)?
(1,1)
(2,1)
(3,1)
(1,2)
(2,2)
(3,2)
(1,3)
(4,3)Time
24
Ch9 Models: CausalityLamport’s timestamps algorithm Has the following properties:
Completely distributed
Simple
Fault tolerant
Minimal overhead
Many applications
25
Ch9 Models: CausalityVector Timestamps Lamport Timestamps guarantee that if e <H e’ then e.TS < e’.TS but there is no guarantee that if e.TS < e’.TS then e <H e’
Problem: given two arbitrary events e and e’ in E, we want to determine if they are causally related
Why this problem is interesting?
26
Ch9 Models: CausalityKnowing when two events are causally related is usefulTo see this, consider the following H-DAG in which O is a mobile object
p1 p2 p3
Where is O ?MigrateO on p2
On p2
Where is O ?
I don’t know
Error !
m3
m2
m1
Time
When you debug the system after the red line, you will find that the object is at p2.
So, why p2 don’t knowwhere the object is ?
27
Ch9 Models: CausalityCausally precedes relation <c between messages Let s(m) be the event of sending message m r(m) the event of receiving message m Definition: m1 <c m2 if s(m1) <H s(m2) A Causality violation occurs when there are messages m1 and m2, a processor p such that s(m1) <H s(m2) and r(m2) <p r(m1)
p1 p2
Time
s(m1)s(m2)
r(m2)r(m1)
The simplest form of causality violation:the sending events are on the same processor p1
the receiving events are on the same processor p2
28
Ch9 Models: Causality Causality violation (ex: distributed object system)
When p3 receives “I don’t know” message from p2, p3 has inconsistent information : From p1, p3 knows O is on p2 but from p2, p3 knows O is not on p2!
The source of the problem is: m1 <c m3 but r(m3) <p2 r(m1) i.e. there is a causality violation.
Thus for two events e and e’, if we know exactly whether e <H e’ then we can detect causality violation
Vector timestamps gives this.
29
Ch9 Models: Causality Vector Timestamps Idea: each event e indicates for each processor p, all events at p that are causally before e
30
Ch9 Models: CausalityThe idea illustrated
p1 p4p3p2
1
23
12
4
3
5
1
6
4
5
23
21
3
4
e
31
Ch9 Models: Causality Vector Timestamps Idea: each event e indicates which events in each processor p causally precede e
Each event e has a vector timestamp e.VT such that e.VT <V e’.VT e <H e’
e.VT is an array with an entry for any processor p;
For any processor p e.VT[p] is an integer and e.VT[p]=k means e causally follows the first k events that occur at p (one assumes that each event follows itself)
32
Ch9 Models: CausalityThe meaning of e.VT[p] illustrated
p1 p4p3p2
1
23
12
4
3
5
1
6
4
5
23
21
3
4
e
e.VT[p1]=3e.VT[p2]=6e.VT[p3]=4e.VT[p4]=2
33
Ch9 Models: Causality Vector TimestampsOrdering <V on vector timestamps is defined as e.VT <V e’.VT iff a) e.VT[i] e’.VT[i] for all i in {1,..,M} and b) there is j in {1,..,M} such that e.VT[j] e’.VT[j]
Example: (1,0,3) <V (2,0,5); (1,1,3) <V (2,1,3); (1,1,3) <V (1,0,3); (1,1,3) <V (1,1,3)
Property: e.VT <V e’.VT only if e’ causally follows every event that e causally follows
34
Ch9 Models: CausalityComparison of vector timestamps illustrated
p1 p2 p3 p4
1
32
4
3
21
2
4
3
5
1
6
54
321 e1.VT=(5,4,1,3) ;
e2.VT=(3,6,4,2);e3.VT=(0,0,1,3)e3.VT <V e1.VT;
No causal path neither from e1 toe2 nor from e2 to e1. e1 and e2 are concurrent
e1
e2
e3
35
Ch9 Models: CausalityThe property illustrated
p1 p4p3p2
1
23
12
4
3
5
1
6
4
5
23
21
3
4
e’
We have that e.VT=(0,1,4,2)e’.VT=(3,6,4,2)e.VT <V e’.VT
e
e’ causally follows every event that e causally follows
36
Ch9 Models: Causality Vector timestamps algorithmInitially, my_VT = [0,…,0]wait for any event e on e do if e is the receipt of message m then for i := 1 to M do my_VT[i] := max(m.VT[i],my_VT[i])+1; my_VT[self] := my_VT[self] +1 e.VT := my_VT end elseif e is an internal event then my_VT[self] := my_VT[self]+1 ; e.VT := my_VT elseif e is the sending of message m then my_VT[self] := my_VT[self]+1 ; e.VT := my_VT m.VT = my_VT endend
Here we assume that each processor knows the names of all the processors in the system
How can we achieve this assumption ?
We’ll see later
37
Ch9 Models: Causality Vector Timestamp algorithm
Ensures:e <H e’ e.VT <V e’.VT
Reason: 1) e <p e’ : the case of internal events at processor p; e.VT <V e’.VT 2) e <m e’: the case of receiving of message m; e.VT <V e’.VT
38
Ch9 Models: Causality Vector Timestamp algorithm
Ensures: e.VT <V e’.VT e <H e’
Reason: Assume e <H e’ then two cases are to consider 1) if e’ <H e then e’.VT <V e.VT (from previous slide)
p
e’ k
l
e And e.VT[p]=l > k which implies that e.VT <V e’.VT
39
Ch9 Models: Causality Vector Timestamp algorithm
Ensures (cont.): e.VT <V e’.VT e <H e’
Reason: Assume e <H e’ then two cases are to consider 2) if e’ <H e then e’.VT <V e.VT and e.VT <V e’.VT
40
Ch9 Models: CausalityDetecting causality violation in the dist. object system ex.
If we know for every pair of events, whether they are causally related we can detectcausality violation in the distributed object system example by installing a causality violation detector at every processor
p1 p2 p3
Where is O ?MigrateO on p2
On p2
Where is O ?
I don’t know
Error !
m3
m2
m1
Time
If we attach a vector timestamp to each event(and message)of the distributed object system example, then each processor can detect a causality violation
e.g. p2 can detect that a causality violation occurs when it receives m1 : m1 <c m3 but r(m3) <p2 r(m1)
(1,0,0)(0,0,1)
(3,0,2)
(3,0,3)
(3,2,4)
(3,1,3)(3,2,3)
(3,3,3)
(2,0,1)
(3,0,1)
41
Ch9 Models: CausalityCausal communication Causality violation can lead to undesirable situations
A processor usually cannot choose the order in which messages arrive.
But a processor can decide the order in which application executing on it have messages delivered to them
This leads to the need for communication subsystems with specified propertiese.g. one may require a communication subsystem that deliver messages in a causal order
Advantage: the design of many distributed algorithms would be easy (e.g. simple object migration protocol)
42
Ch9 Models: CausalityCausal communication
Can we build a communication subsystem that guarantees delivery of messages in causal order?
No for unicast message sending,
Yes for multicast
43
Ch9 Models: Causality
Causal communication (an attempt of solution)
Idea:Hold back messages that arrive “too soon”;Deliver a held-back message m only when you are assured that you will not receive m’ such that m’ causally precedes m ;
The implementation of this idea is similar to the implementation of FIFO communication
Applications
CSS
Network
44
Ch9 Models: CausalityFIFO communication (TCP): the problemAssume 1) p and q are connected by an oriented communication line from p to q that satisfies: messages sent are eventually received; messages sent by p can arrive at q in any order
2) q delivers messages received from p to an application A running at q
The problem is to devise a distributed algorithm that enablesprocessor q to deliver to A messages received from p in the orderp sent them.
45
Ch9 Models: CausalityFIFO communication: implementation(idea)The solution consists of one algorithm for p and one for q.
Algorithm for p p sequentially numbers each message it sends to q.
q knows that messages should be sequentially numbered.
Algorithm for q (idea) upon receipt of a message m with a sequence number x, if q has never received a message with sequence number x-1, q delays the delivery of m until m can be delivered in sequence
46
Ch9 Models: CausalityFIFO communication: implementation(idea)
Algorithm for q (idea cont.)
Message number xMessage number x
No hole, deliver There is a hole,buffer
47
Ch9 Models: CausalityCausal communication: implementation(idea)Assumption (PTP):all point-to-point messages are delivered in order sent
Instead of using sequence numbers (as for the FIFO implementation)we use timestamps
Lamport timestamps or vector timestamps can be used Idea: whenever processor q receives a message m from processor p, q holds back m until it is assured that no message m’ <c m will be delivered from any other processor.
48
Ch9 Models: CausalityCausal communication: implementation(idea, variables used)
self
blocked[i] = queue of blocked messages received from pi
earliest[i] = (head(blocked[i])).timestamp OR 1i if blocked[i] is empty
messages in delivery_list are causally ordered
delivery_list
blocked[1]
earliest[1]
blocked[i]
earliest[i]
blocked[M]
earliest[M]
49
Ch9 Models: CausalityCausal communication: implementation(idea, variables update)When processor self receives a message m from p, it performs the followingsteps in order:
Step1 : If blocked[p] is empty then earliest[p] is set to m.timestamp ; /* because assumption (PTP) guarantees that no earlier message can be received from p */Step 2: Enqueue message m to blocked[p]; Step 3: Unblock one after another, all blocked messages that can be unblocked; add each unblocked message to deliver_list; update earliest if necessary
How to determine when a message can be unblocked?Step 4: Deliver messages in deliver_list
50
Ch9 Models: CausalityCausal communication: implementation(idea, variables update)Step 3 detailed: Assume we use vector timestamps
Step 3 refined: Unblock one after another, all blocked messages that can be unblocked; the message m at the head of the holding queue for processor k can be unblocked only if the “time” of processor k according to message m is smaller than the “time” of processor k according to any other message m’ if any, at the head of a holding queue More precisely, blocked[k] can be unblocked only if ( i {1,..,M} i k i self : earliest[k][i] < earliest[i][i]) Thus, the details of Step 3 are:
51
Ch9 Models: CausalityCausal communication: implementation(idea, variables update)Step 3 detailed (cont.):blocked[k] can be unblocked only if ( i {1,..,M} i k i self : earliest[k][i] < earliest[i][i]) combining the above condition with the fact that messages are unblocked one after another, we obtain a while loop.
While ( ( k {1,..,M} : blocked[k] empty) ( i {1,..,M} i k i self : earliest[k][i] < earliest[i][i])) do remove the first message of blocked[k] and add this message to delivery_list; if blocked[k] empty then earliest[k] := (head(blocked[k])).timestamp /* vector timestamp */ else earliest[k] := earliest[k] + 1k
end
Deliver the messages in delivery_list
52
Ch9 Models: CausalityCausal communication:implementation(the complete scheme)Initially for each k in {1,..,M}, earliest[k] := 1k; blocked[k] := empty Wait for a message from any processor on the receipt of message m from processor p do deliver_list := empty; Step 1; Step 2 ; Step 3; Step 4 end end
53
Ch9 Models: CausalityDetecting causality violation in the dist. object system ex.
If we know for every pair of events, whether they are causally related we can detectcausality violation in the distributed object system example by installing a causality violation detector at every processor
p1 p2 p3
Where is O ?MigrateO on p2
On p2
Where is O ?
I don’t know
Error !
m3
m2
m1
Time
(1,0,0)(0,0,1)
(3,0,2)
(3,0,3)
(3,2,4)
(3,1,3)(3,2,3)
(3,3,3)
(2,0,1)
(3,0,1)
54
Ch9 Models: CausalityProblem of the causal communication implementation previously given.
One problem that the algorithm presented for causal communication has is that
the communication subsystem at processor self might never deliver some messages
55
Ch9 Models: CausalityCausal communication: problems illustrated
(1,0,0,0)
(1,0,0,2)
(3,0,1,0)
(2,0,1,0)
(0,0,1,0)
(3,0,2,0)
(3,0,3,0)
(3,1,3,0)
M
p3p1 p2 p4 Message M is never delivered by the communication subsystem running at processor p2
blocked[p3] ; M=head(blocked[p3])
earliest[p3][p1]=3
and
blocked[p1] =; earliest[p1][p1]=1
blocked[p4] =; earliest[p4][p1]=1
self is processor p2
56
Ch9 Models:Distributed SnapshotsAssumptions/definitions The system is connected, that is there is a path from every pair of processors Ci,j channel from pi to pj;
Communication channels : reliable and FIFO messages sent are eventually received in order;
State of Ci,j is the ordered list of messages sent by pi but not yet received at pj; (we will soon make this definition precise)
State of a processor (at an instant) is the assignment of a value to each variable of that processor;
57
Ch9 Models:Distributed SnapshotsAssumptions (cont.) Global state of the system: (S,L) S =(s1,.., sM) processor states ; L = channel states
A global state cannot be taken instantaneously it must be computed in a distributed manner;
The problem: Devise a distributed algorithm that computes a consistent global state.
What do we mean by consistent global state?
58
Ch9 Models:Distributed SnapshotsMeaning of consistent global state Example 1
Cq,p
q
Cp,q
p
Two possible states for each processor: s0, s1
In s0: the processor hasn’t the tokenIn s1: the processor has the token
The system contains exactly one token which moves back and forth between p and q. Initially, p has the token.Events: sending/receiving the token.
Cq,p
59
Ch9 Models:Distributed SnapshotsMeaning of consistent global state Global states of the system of Example 1
q
Cp,q
pCq,p
Cq,p
q
Cp,q
pCq,p
Cq,p
q
Cp,q
pCq,p
Cq,p
q
Cp,q
pCq,p
Cq,p
60
Ch9 Models:Distributed SnapshotsMeaning of consistent global state (informal) A global state G is consistent if it is one that could have occurred
Actual transitionsG
The output of the snapshot algorithm can be G !
Consider a system with two possible runs (non-determinism)
61
Ch9 Models:Distributed SnapshotsConsistent global state (formal)S={s1,..,sM}; oi: event of observing si at pi;O(S)={o1,..,oM}Definition: S is a consistent cut iff {o1,..,oM} is consistent with causality
Definition: {o1,..,oM} is consistent with causality iff ( e, oi : e in Ei e <H oi : ( e’ : e’ in Ej e ’ <H e : e ’ <H oj) )
Notation:s(m)= event of sending m; r(m)= event of receiving m
oi
pi pj
e
e’
oj
Intuition
62
Ch9 Models:Distributed SnapshotsPrecision about ´´message sent but not yet received´´ Definition: Given O(S)={o1,..,oM}; m a message. If s(m) <pi oi oj <pj r(m) then m is sent but not yet received (relatively to O).
o1p1
p2 o2
m1 m2m3
p2 observes its state, then asks p1 to do the sameThe global state resulting from o1 and o2 must contain: m1,m2,m3
63
Ch9 Models:Distributed SnapshotsMeaning of consistent global state (cont.)Definition: A global state (S,L) is consistent if S is a consistent cut L contains all messages sent but not yet received (relatively to O(S))
64
Ch9 Models:Distributed SnapshotsExamples of global states (questions)
o3
o1
o2
Is O={o1,o2,o3} consistent with causality?
o’3
o’1
o’2
Is O’={o’1,o’2,o’3} consistent with causality?
p1 p2 p3p1 p2 p3
65
Ch9 Models:Distributed SnapshotsWhy a consistent global state is useful (an example)?
Processors p1 and p2 make use of resources r1 and r2
A deadlock global state of a distributed system is one in which there is cycle in the wait-for graph
Deadlock property:Once a distributed system enters a deadlock state, all subsequent global state are deadlock states.
p1 r1 r2 p2
Req
Rel
Ok
Req
Req
Req
Ok
Ok
43
21Assume we have a “tough guy” called deadlock detector whose goal is to observe the processors and the resources at some points of their processing then checks if there is a cycle in the wait-for graph if so, he claim that there is a deadlock
Our guy observes the processors and the resources at the points marked 1 through 4
66
Ch9 Models:Distributed SnapshotsWhy a consistent global state is useful (ex., cont.)?
p1 r1 r2 p2
Req
Rel
Ok
Req
Req
Req
Ok
Ok
43
21
The deadlock detector observes the processors and the resources at the points marked 1 through 4 and finds :
p2p1
r2
r1
Where x y
Means x is waiting for y
To see why, assumea correct transaction for using a resourceconsists of three steps:
Req
Ok
Rel
67
Ch9 Models:Distributed SnapshotsWhy a consistent global state is useful (ex., cont.)?
4
p1 r1 r2 p2
Req
Rel
Ok
Req
Req
Req
Ok
Ok
3
21
The deadlock detector observes the processors and the resources at the points marked 1 through 4 a finds :
p2p1
r2
r1
Where x y
Means x is waiting for y
Is there actually a deadlock in the system?
The answer is NO. There is only a phantom deadlock. The claim of our guy is dueto the fact that he made an inconsistent observation that led to a wrong result!
68
Ch9 Models:Distributed SnapshotsThe snapshot algorithm(Informal) Uses special messages: snapshot tokens (stok) There are two types of participating processors: initiating, others The algorithm for the initiating processor: Records its state; Sends a stok to each outgoing channel; Starts to record state of incoming channels. Recording of the state of an incoming channel c is finished when a stok is received along it.
69
Ch9 Models:Distributed SnapshotsThe snapshot algorithm(Informal cont.) Uses special messages: snapshot tokens (stok) Types of participating processors: initiating, others The algorithm for any other processor: Records its state on receipt of a stok for the first time; (assume the first stok is received along c). Records the state of c as empty; Sends one stok to each outgoing channel; Starts to record the state of all other incoming channels;
Recording of the state of an incoming channel c´ c is finished when a stok is received along it.
70
Ch9 Models:Distributed SnapshotsThe snapshot algorithm(Idea, cont.)
Notation: T(p,state): time at p when p records its state; T(p,stok,c) : time at p when p receives a stok along c The state of an incoming channel c of p is the sequence of messages that p receives in the interval ] T(p,state), T(p,stok,c) [Recall that the state of c is recorded by p.
71
Ch9 Models:Distributed SnapshotsThe snapshot algorithm illustrated: Taking a snapshot of a token passing system
p records its state: s0
and send stok
q
Cp,q
pCq,p
Cq,pCp,q
stoks0s0
qpCq,p
Cq,pstok
s0
Lpq={}
s0
q
Cp,q
pCq,p
Cq,p
stok
s0 s1
q
Cp,q
pCq,p
Cq,p
s1
s0
Lqp={ }
s0
q receives stok: q records its state and the state of Cp,q
then sends stok
p received the token and stok arrives then p records the state of Cq,p
Recorded global state:S={s0, s0}L={Lpq, Lqp}
72
Ch9 Models:Distributed SnapshotsApplications of snapshots
Detecting stable state predicates (or properties)
A state predicate P is said to be stable if P(G) P(G’) for every G’ that is reachable from G
Examples: Deadlock; Termination; lost of token; etc.
73
Ch9 Models:Distributed SnapshotsThe snapshot algorithm(in the book) Accounts for the possibility of different concurrent snapshots; To achieve this, Each snapshot is identified by the name of the initiating processor A processor might initiate a new snapshot while the first is still being collected; version number To achieve this, Version numbers are used (for simplicity, when a processor r requests a new version of the snapshot, the old snapshot is cancelled)
diffusing computation: one useful technique for designing distributed algorithms
74
Ch9 Models:Diffusing computationDiffusing computationAssume a connected network (i.e. for each pair of processors in the system, there is a path connecting them) and that messages sent are eventually receivedThe problem : A processor p has an information Info that it wants to send to all other processors.
pProcessors that are directlyconnected are called neighborsEach processor knows its neighbors
75
Ch9 Models:Diffusing computationDiffusing computation (a solution)The algorithm for the initiator i for each neighbor k send(k,Info)
The algorithm for any other processorwait for message from any neighbor on receipt of Info from some neighbor p do for each neighbor k p send(k,Info) end end
There are two problems withthis algorithm:
Problem 1: there might be unprocessed messages left in some channels
Problem 2: processor p does not know if and when all other processors have received Info
76
Ch9 Models:Diffusing computation
The algorithm for the initiator i Step1: for each k in my_neighbors send(k,Info) Step 2: my_wlist:=my_neighbors; while my_wlist is not empty do wait for message from any k in my_wlist on receipt of Info from k in my_wlist do my_wlist:= my_wlist\ {k} end end end
Diffusing computation (a solution,cont.)Solution to problem 1 and 2: we want the initiator to be informed of the fact that all the processors have received Info
Variables used:my_neighbors: the set of identities of all my neighbors;
my_wlist: the list of neighbors from which I am waiting for a message containing Info
77
Ch9 Models:Diffusing computation
(a solution,cont.:The algorithm for a non-initiating processor consists of three steps: Step 1, Step 2 and Step 3 in that
order)
Step1: wait for a message from any k in my_neighbors on receipt of Info from k do my_parent := k; for each j in my_neighbors\{k} send(j, Info), end end
Step 2: my_wlist:=my_neighbors \{my_parent}; while my_wlist is not empty do wait for message from any k in my_wlist on receipt of Info from k in my_wlist do my_wlist:= my_wlist\ {k} end end end
Step 3: send(my_parent, Info)
Why this distributed algorithm iscorrect (i.e. each processor receivesInfo and the initiator eventually learns that each processor has received Info, no deadlock)?
78
Ch9 Models:Diffusing computation
p
Channels along which processors received Info for the first time
Spanning tree constructionA spanning tree of a graph is a tree whose nodes are all those in the graph and whose edges are a subset of those in the graph
79
Ch9 Models:Distributed computationFormal models (non-deterministic interleaving): Understand how distributed computations actually occur
Intuition: A distributed system has: Global states: (S,L) see Snapshots; Initially, each processor is in an initial local state; each communication channel is empty
Events: occurrence of an event causes a transition of the system from the current global state to a new global state;
Computations: sequences of events from intial global states;
80
Ch9 Models:Distributed computationMore precisely: An event e=(p,s,s’,m,c); p in P; s, s’ local states of p; m in M NULL; (M= set of all possible messages) c in C NULL (C=set of all channels); Interpretation of e=(p,s,s’,m,c): e takes p from s to s’ and possibly sends or receives m on c. If m (and c) is NULL then e is an internal event; No channel is affected by the occurrence of e Otherwise If c is an incoming channel then m is removed from c. If c is an outgoing channel then m is added to c.
81
Ch9 Models:Distributed computationOccurrence of event (execution of an event): An event e=(p,s,s’,m,c) can occur in a global state G only if some condition, termed enabling condition of e is satisfied in G.
The enabling condition of e=(p,s,s’,m,c) is a condition on the state of p and the channels attached to p; example: the program counter has a specific value;
Transition of the system: If e=(p,s,s’,m,c) can occur in G, then the execution of e by p changes the global state by changing only the state of p and possibly the state of one channel attached to p.
82
Ch9 Models:Distributed computationMore precisely (cont.) Two functions: Let G be a global state; e and event. Ready(G) = the set of all events that can occur in G; Next(G,e) = the global state just after the occurrence of e. Assume: G0 = initial global state; Gi = the global state when event ei occurs; seq = <e0,e1,…,en> a sequence of events. Definition: seq is a computation of the system if 1) ( i in {0,…,n} : ei in Ready(Gi)) 2) ( i in {0,…,n} : Gi+1=Next(Gi, ei))Note: non-deterministic selection in Ready(Gi).
83
Ch9 Models:Distributed computationCorrectness: State predicate: assertion on global states; Correctness property: assertion on computations.
Definition: A distributed algorithm is correct if each of its computations satisfies the correctness property.
Proving correctness: Show that each global state reachable from the initial global state satisfies some well-defined state predicate. In general, one uses invariant assertions.
84
Ch9 Models:Distributed computation``Eventually´´ and ``Always´´ properties: Let G0 be an initial global state; R(G0)= all computations that start in G0; A a state predicate; Q an assertion on computation.eventually: eventually(A,G0,Q) means starting from G0, for any computation for which Q holds, there is a global state that satisfies A (from now on, something good will happen)always: always(A,G0,Q) means A is always true starting from G0 for any computation for which Q holds,
85
Ch9 Models:Distributed computationFailures in a distributed system
In a distributed system, failures occur
An additional complication in designing distributed algorithm
for a distributed system to be dependable, fault tolerance must be incorporated
a fault tolerant algorithm is one which minimizes the impact of certain
faults on the service provided by the system
Fault Classification:
fail-stop; timing fault,
byzantine; transient faults, etc.
top related