Distribution Seminar Distribution Seminar Chien-Liang Fok Chien-Liang Fok [email protected] [email protected]
Dec 31, 2015
Distribution SeminarDistribution Seminar
Chien-Liang FokChien-Liang Fok
[email protected]@cse.wustl.edu
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 22
Topic OneTopic One
Clocks, Order, and Mutual Exclusion in a Clocks, Order, and Mutual Exclusion in a Distributed SystemDistributed System
Leslie Lamport, "Time, Clocks, and the Leslie Lamport, "Time, Clocks, and the Ordering of Events in a Distributed Ordering of Events in a Distributed System", Communications of the ACM, System", Communications of the ACM, 21(7), pp. 558-565, July 1978. 21(7), pp. 558-565, July 1978.
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 33
What is a Distributed System?What is a Distributed System?
A collection of processes communicating A collection of processes communicating through message passing.through message passing.
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 44
Representation of a ProcessRepresentation of a Process
A process is a sequence of eventsA process is a sequence of events
It may not be possible to tell which of two events It may not be possible to tell which of two events in different processes occurred first.in different processes occurred first.– ““EExx occurs before E occurs before Eyy” is only a ” is only a partial orderpartial order
E1 E2 E3 … En
E1 E2 E3 … En
E1 E2 E3 … En
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 55
The The aabb relation relation
““aabb” denotes “” denotes “aa occurs before occurs before bb””
We want to develop an algorithm that uses We want to develop an algorithm that uses this relation to determine a total ordering this relation to determine a total ordering of events in our systemof events in our system
Real clocks are not easily synchronizedReal clocks are not easily synchronized– Must define truth of Must define truth of aabb w/o using physical w/o using physical
clocksclocks
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 66
Truth of A Truth of A B B
AAB is B is truetrue when: when:– A occurs before B in the same processA occurs before B in the same process
– A and B in different processes; A sends A and B in different processes; A sends message while B receives itmessage while B receives it
– AAC and CC and CBB
A … … …
… B …
… A … B ……
…
…
…
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 77
ConcurrencyConcurrency
Events Events aa and and bb are concurrent are concurrent iff:iff:
This implies that This implies that aa cannot causally affect cannot causally affect bb and vice versaand vice versa
Note that for any event, Note that for any event, ee, , ((eeee)) is is truetrue is an irreflexive partial ordering on the set is an irreflexive partial ordering on the set
of all events in the systemof all events in the system
(A(AB) B) (B(BA)A)
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 88
Concurrency (Continued)Concurrency (Continued)T
ime
Given the space time Given the space time diagram on right:diagram on right:– P1P1P2, Q1P2, Q1Q3Q3
– P1P1Q1, Q3 Q1, Q3 P3 P3 (P2(P2Q2) Q2) (Q2(Q2P2)P2)
P2 and Q2 are concurrent!P2 and Q2 are concurrent!
Event
Sent Message
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 99
Logical ClocksLogical Clocks
Assign a number to each event such that Assign a number to each event such that
if if aabb then then C( C(aa) ) < < C(C(bb) )
where C(where C(xx) denotes the number assigned ) denotes the number assigned to event to event xx..– This is known as the This is known as the clock conditionclock condition
Note: the converse is Note: the converse is notnot true true
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1010
Clock “tick”Clock “tick”
Occurs between events on a processOccurs between events on a process– Let Let aa and and b b be two events such that c(be two events such that c(aa)=4 )=4
and c(and c(bb)=7)=7 Ticks 5, 6, and 7 occur between Ticks 5, 6, and 7 occur between aa and and bb
A A tick linetick line connects like-times between two connects like-times between two or more processesor more processes– Must be between any two events on a Must be between any two events on a
processprocess– Every message line must cross a tick lineEvery message line must cross a tick line
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1111
Tick Lines: Visually Rep.Tick Lines: Visually Rep.
Sent MessageEvent Tick Line
Tim
e Tim
e
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1212
Clock ImplementationClock Implementation
How to implement a clock that satisfies the How to implement a clock that satisfies the clock condition??clock condition??
Add a register to each processAdd a register to each process– Increment it after each eventIncrement it after each event
– Augment each message with a timestamp TAugment each message with a timestamp Tmm
– When received message, increase the clock When received message, increase the clock register’s value to be greater than both its register’s value to be greater than both its current value and Tcurrent value and Tmm
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1313
Obtaining Total OrderObtaining Total Order
Using system of clocks, total ordering of all Using system of clocks, total ordering of all events in the system is easyevents in the system is easy– Order events by the time they occurOrder events by the time they occur– Break ties by using an arbitrary ordering of Break ties by using an arbitrary ordering of
processes processes
Let Let aabb be be truetrue if c( if c(aa)<c()<c(bb) or c() or c(aa)=c()=c(bb) ) and and aa’s process is preferred over ’s process is preferred over bb’s’s– The The relation is relation is NOTNOT unique! unique!
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1414
Using Total Order to Solve theUsing Total Order to Solve theMutual Exclusion ProblemMutual Exclusion Problem
Only one process can be granted the resource Only one process can be granted the resource at a timeat a timeProcesses must be granted the resource in the Processes must be granted the resource in the order they order they requestedrequested for it for itEvery process that is granted the resource will Every process that is granted the resource will eventually release iteventually release it
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1515
Mutual Exclusion Algorithm (1/4)Mutual Exclusion Algorithm (1/4)
Implement the system of clocksImplement the system of clocks– Each process has a clock registerEach process has a clock register
Augment each process with a request Augment each process with a request queue queue rqrq– Initialize all Initialize all rqrq’s to contain message t’s to contain message t00:p:p00
– tt0 0 is smaller than all clock registersis smaller than all clock registers
– pp00 is the process initially granted the resource is the process initially granted the resource
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1616
Mutual Exclusion Algorithm (2/4)Mutual Exclusion Algorithm (2/4)
ppii requests resource by sending t requests resource by sending tmm:p:pii to to
every other process and putting tevery other process and putting tmm:p:pii in in rqrqii
– When pWhen pjj receives t receives tmm:p:pii, add to , add to rqrqjj and send and send
acknowledgement to packnowledgement to p ii
RequestAcknowledge
ppii
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1717
Mutual Exclusion Algorithm (3/4)Mutual Exclusion Algorithm (3/4)
ppii releases resource by sending a request to releases resource by sending a request to
everyone and removing teveryone and removing tmm:p:pii from from rqrqii
– When pWhen pjj receives the request, it removes t receives the request, it removes tmm:p:pii from from
rqrqjj
Request
ppii
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1818
Mutual Exclusion Algorithm (4/4)Mutual Exclusion Algorithm (4/4)
ppii is granted the resource when both: is granted the resource when both:
– ttmm:p:pii {all other requests in {all other requests in rqrqii}}
– ppii received a message from everyone else received a message from everyone else
with time stamp greater than twith time stamp greater than tmm
Note: this can be determined Note: this can be determined locallylocally
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 1919
VulnerabilitiesVulnerabilities
Two vulnerabilities:Two vulnerabilities:
– If just one process deadlocks, the whole If just one process deadlocks, the whole system diessystem dies
– Anomalous behavior if system model does not Anomalous behavior if system model does not match real-life modelmatch real-life model
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2020
Anomalous BehaviorAnomalous Behavior
User User bb’s message may have a lower time ’s message may have a lower time stamp than stamp than aa
Hay!!Request it!
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2121
Addressing Anomalous BehaviorAddressing Anomalous Behavior
Introduce Introduce strong clock condition, astrong clock condition, abb
For any two events For any two events aa and and bb in the system, in the system, if if aabb then then c( c(aa)<c()<c(bb))
This is This is notnot supported by our clock system! supported by our clock system!
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2222
Solution to Anomalous BehaviorSolution to Anomalous Behavior
Use real clocks, letUse real clocks, let– CCii((tt) = reading of clock ) = reading of clock ii at time at time tt = maximum clock error rate, # sec off/sec= maximum clock error rate, # sec off/sec = maximum error between clocks, sec= maximum error between clocks, sec = maximum message transmission delay= maximum message transmission delay
To prevent anomalous behavior, ensure:To prevent anomalous behavior, ensure:CCii((t t + + ) - C) - Cjj((tt) > 0) > 0
For the above to be true, For the above to be true, /(1-/(1-) )
See end of paper for derivation/proof.See end of paper for derivation/proof.
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2323
Topic 2Topic 2
Vector ClocksVector Clocks
Friedemann Mattern, Virtual Time and Friedemann Mattern, Virtual Time and Global States of Distributed Systems. Global States of Distributed Systems. – Cosnard M. et al. (Eds): Proc. Workshop on Cosnard M. et al. (Eds): Proc. Workshop on
Parallel and Distributed Algorithms, North-Parallel and Distributed Algorithms, North-Holland / Elsevier, pp. 215-226, 1989 Holland / Elsevier, pp. 215-226, 1989
– Reprinted in: Z. Yang, T.A. Marsland (Eds.), Reprinted in: Z. Yang, T.A. Marsland (Eds.), "Global States and Time in Distributed "Global States and Time in Distributed Systems", IEEE, 1994, pp. 123-133.)Systems", IEEE, 1994, pp. 123-133.)
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2424
MotivationMotivation
Previously, partially ordered events were Previously, partially ordered events were mapped into a total ordermapped into a total order– Two simultaneous events are treated as if one Two simultaneous events are treated as if one
occurred firstoccurred first– Useful information is lostUseful information is lost
Linear ordering of events is sometimes not Linear ordering of events is sometimes not good enoughgood enough
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2525
The Nature of TimeThe Nature of Time
Time has the following propertiesTime has the following properties– TransitivityTransitivity– IrreflexivityIrreflexivity– LinearityLinearity– Eternity Eternity
goes on forevergoes on forever
– Density Density can be divided into infinitesimally small unitscan be divided into infinitesimally small units
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2626
The BIG QuestionThe BIG Question
Can we develop a virtual clock system that Can we develop a virtual clock system that preserves more of the system’s timing preserves more of the system’s timing properties?properties?– Lamport’s algorithm does not preserve causal Lamport’s algorithm does not preserve causal
independenceindependence
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2727
Vector timeVector time
Suppose a deity can observe all clocks Suppose a deity can observe all clocks and store their values in a vectorand store their values in a vector
Time 001
011
111
121
122
132
232
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2828
GoalGoal
Create algorithm so each process gets an Create algorithm so each process gets an optimal approximation of this vectoroptimal approximation of this vector
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 2929
TechniqueTechnique
Replace the register in each process with Replace the register in each process with a vector.a vector.– Size of vector = number of processesSize of vector = number of processes– Update value in vector of own time like usualUpdate value in vector of own time like usual– When receive message, update estimated When receive message, update estimated
time of other process based on time of other process based on current time current time
timestamp vector within messagetimestamp vector within message
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3030
Using our TechniqueUsing our Technique
Time
001
011
100
021
022
131
200
001
011
111
121
122
132
232
Ideal
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3131
Things to NoteThings to Note
Each process has perfect local informationEach process has perfect local information
Given two time vectors Given two time vectors uu, , vv– uu ≤ ≤ vv iff iff ii : u[ : u[ i i ] ] ≤ ≤ v[ v[ i i ]]– uu < < vv iff (u iff (u ≤ ≤ v) v) (u (u ≠ ≠ v)v)– uu || || vv iff iff (u < v) (u < v) (v < u)(v < u)
uu ≤ ≤ v v and and uu < < vv are partial orders are partial orders
uu || || vv is the concurrent relation is the concurrent relation– It is It is NOTNOT transitive!! transitive!!
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3232
CutsCuts
A timing diagrams with consistent cuts can be A timing diagrams with consistent cuts can be modified so the cuts are straight lines.modified so the cuts are straight lines.
Time
Sent MessageEvent Cut Event Cut Line
Consistent Cut
Inconsistent Cut
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3333
Time at a CutTime at a Cut
The time vector of a consistent cut is the The time vector of a consistent cut is the time of each individual process at the cuttime of each individual process at the cut
Time
Sent MessageEvent Cut Event Cut Line
Cut 1 Cut 2
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3434
Key PointKey Point
For any set of events:For any set of events:
{the lattice of consistent cuts}{the lattice of consistent cuts}
==
{the lattice of possible time vectors}{the lattice of possible time vectors}
Thus, two events Thus, two events ee and and e’e’ are causally are causally related iff TimeVector(related iff TimeVector(ee) < TimeVector() < TimeVector(e’e’) )
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3535
ApplicationsApplications
When do we care which event in a set of When do we care which event in a set of concurrent events occurred first??concurrent events occurred first??
Distributed debuggingDistributed debugging– Must consider causal relationship between Must consider causal relationship between
events in the systemevents in the system
Performance analysisPerformance analysis– Vector clocks allows one to calculate potential Vector clocks allows one to calculate potential
concurrencyconcurrency
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3636
Topic 3Topic 3
Global SnapshotGlobal Snapshot
K. Mani Chandy and Leslie Lamport, K. Mani Chandy and Leslie Lamport, "Distributed Snapshots: Determining "Distributed Snapshots: Determining Global States of Distributed Systems", Global States of Distributed Systems", ACM Transactions on Computer Systems, ACM Transactions on Computer Systems, 3(1), pp. 63-75, February 1985. 3(1), pp. 63-75, February 1985.
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3737
MotivationMotivation
Knowing the global state of the system is Knowing the global state of the system is usefuluseful
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3838
RestrictionRestriction
Cannot pause the systemCannot pause the system
Cannot prevent system from performing Cannot prevent system from performing regular processingregular processing
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 3939
ProblemProblem
Given the restrictions, it is impossible to Given the restrictions, it is impossible to obtain global stateobtain global state
– Cannot halt systemCannot halt system– System is constantly changingSystem is constantly changing
What can we do?What can we do?
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 4040
GoalGoal
Find a possible global state Find a possible global state
Can determine value of Can determine value of stablestable system system propertiesproperties
deadlockdeadlock
TerminationTermination
By definition if a possible global state By definition if a possible global state satisfies a stable property, the system will satisfies a stable property, the system will forever satisfy this property.forever satisfy this property.
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 4141
The ModelThe Model
Channel
Processes send and receive messages Processes send and receive messages through channelsthrough channels
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 4242
Global StateGlobal State
Consists of:Consists of:
– Process statesProcess statesWhat’s the state of the process?What’s the state of the process?
– Channel statesChannel statesWhat messages are being sent through it?What messages are being sent through it?
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 4343
The AlgorithmThe Algorithm
Sender process Sender process pp::– Record stateRecord state– Send marker on all outgoing channelsSend marker on all outgoing channels
Receiver process Receiver process qq::– If If qq has not recorded state has not recorded state
record staterecord staterecord channel state as emptyrecord channel state as empty
– If If qq has recorded state has recorded staterecord channel state as all messaged received record channel state as all messaged received after recording, but before receiving markerafter recording, but before receiving marker
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 4444
The Algorithm (2/2)The Algorithm (2/2)
Assuming strongly connected graph, all Assuming strongly connected graph, all process will have recorded process will have recorded
– Process stateProcess state– Channel state for each incoming channelChannel state for each incoming channel
Consolidate these records to form global Consolidate these records to form global snapshotsnapshot
– The paper proves this snapshot is a The paper proves this snapshot is a reachable from one that actually existed.reachable from one that actually existed.
Enough to prove stable properties of systemEnough to prove stable properties of system
April 19, 2023April 19, 2023 CS 687 SP03 Chien-Liang FokCS 687 SP03 Chien-Liang Fok 4545
ConclusionsConclusions
Virtual clocks are necessary since Virtual clocks are necessary since physical clocks are not easily physical clocks are not easily synchronizedsynchronized
Vector clocks improve upon Lamport’s Vector clocks improve upon Lamport’s virtual clock since it preserves causal virtual clock since it preserves causal independenceindependence
Global snapshots of possible states is Global snapshots of possible states is useful to determine stable properties of the useful to determine stable properties of the systemsystem