Brian Mitchell ([email protected]) - Distributed Systems 1 Synchronization In Distributed Systems Distributed Systems Memory 1 Directory 1 Memory 2 Directory 2 Memory N Directory N ... Interconnection Network Cache 1 Processor 1 Cache 2 Processor 2 Cache N Processor N ...
31
Embed
Distributed Systems - College of Computing & Informaticsbmitchell/course/mcs721/distsync.pdf · Mutual Exclusion in Distributed Systems • Mutual Exclusion, a review ... Distributed
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Previously we saw how processes in adistributed system communicate with eachother– RPC’s & Group Communications
• Another important consideration indistributed systems is how processescooperate and synchronize with each other
• In single CPU systems we use criticalregions, mutual exclusion and other systemsynchronization problems are generallysolved using methods such as semaphoresand monitors
• In a distributed system techniques involvingsemaphores and monitors do not work, andother techniques are needed
• Even determining whether event A happenedbefore or after event B requires carefulthought
• In a distributed system it is generally notdesirable to collect information fromdistributed processes in one place and thenmake decisions– Desire to make decisions based on
distributed information• Distributed systems (and associated
algorithms) have the following properties– The relevant information is scattered
among multiple machines– Processes make decisions based only on
local information– A single point of failure in a distributed
system should be avoided– No common clock or other precise global
• In a non-distributed system, time isunambiguous– When a process needs to know the time it
makes a system call and the kernel returns thetime maintained by the hardware
• When multiple CPUS are present in adistributed system we can not count on allclocks running at exactly the same rate
• Even if all clocks are initialized to exactly thesame time at exactly the same instance, overtime the clocks will differ– Due to clock skew because not all timers are
exact• If it is not possible to synchronize all clocks
in a distributed system to produce a single,unambiguous time standard, then can it bepermissible to have distributed processesagree on a consistent view of logical time?
• The following material is derived from theresearch paper provided on the web byRaynal and Singhal entitled Logical Time:A Way to Capture Causality in DistributedSystems
• Before we investigate this paper, somefundamental Computer Sciencebackground needs to be reviewed:– Functions– Relations (especially binary relations)– Partial Orders– Total Orders– Monotonicity– Isomorphism
• The foundation for functions and relationsis based on set theory
• Consider two sets A and B, the Cartesianproduct of A and B denoted AxB isdenoted as the set of pairs in which the firstcomponent is chosen from A and thesecond component is chosen from B
• Formally the Cartesian product AxB isdefined as:
AxB = {(a,b) | a ∈ A and b ∈ B}• Example of Cartesian product:
Let A = {1,2,3} and B = {4,5} thenAxB = { (1,4), (1,5), (2,4), (2,5),
(3,4), (3,5) }• If the cardinality of A has n elements
(notation |A|=n), and the cardinality of Bhas m elements, then the cardinality ofAxB, denoted |AxB| has nm elements
• We say a function f:A→ B is anisomorphism if it has an inverse
• A function f:A→ B is an isomorphism ifand only if it is a bijection
• This makes sense because a bijectionclearly defines a mapping between thedomain and range where each element inthe domain is mapped to a unique elementin the range
• Thus, each element in the range of thefunction is clearly associated with a uniqueelement in the range of the function– This is the inverse
• A partial order is a transitive andantisymmetric binary relationship
• A total order is a partial order (transitiveand antisymmetric) where every pair ofelements in the domain are comparable– If R is a total order and a and b are two
elements in the domain, then either aRb orbRa is true
– Every total order is reflexive because a andb in aRb or bRa must hold and it must alsohold that a=b due to the definition of totalorder (every pair)
• The comparison operator ≤ is a total order– ≤ is transitive (inspection)– ≤ is antisymmetric. Except for the case of
where a=b, aRb or bRa holds but both donot
– Every element in the domain of ≤ iscomparable. Consider all integers then forany two integers a,b it is true that eithera ≤ b or b ≤ a. It is also true that if a=b thatboth a ≤ b or b ≤ a hold because a ≤ a and b ≤ b holds
• The comparison operator < is a partialorder– < is not transitive (inspection)– < is antisymmetric (inspection)– < is not comparable for every element.
Consider a=b then neither aRb or bRa istrue because a<a is false and b<b is false
• When a process wants to enter a criticalregion, it builds a message containing– The critical region that it wants to enter– Its process number– The current time (logical time)
• The message constructed is sent to allprocesses, possibly including itself
• The sending of messages is assumed to bereliable (every message is acknowledged)
• May use reliable group communications• When a process receives a message the
action that it takes depends on its state withrespect to the critical region named in themessage
• Based on the received message, one of threecases must be handled:– CASE 1: The receiver is not in the critical
region and does not want to enter it. Thus itsends back an “OK” message to the sender.
– CASE 2: The receiver is already in thecritical region. It handles this case by notreplying to the sender and queues the requestfor later use
– CASE 3: The receiver wants to enter thecritical region but has not yet done so. Thereceiver compares the timestamp in themessage to its own logical timestamp.
• If the timestamp in the message is lower, thereceiver sends back an “OK” message to thesender
• If the timestamp is higher, the receiverqueues the incoming request and sendsnothing.
• After sending out requests askingpermission to enter the critical region theprocess sits and waits until all otherprocesses have given permission
• Recall that the senders original messagewill be queued by receiver processes if therequest can not be immediately granted– CASE 2 & 3 (previous slide)
• As soon as the sender receives permissionfrom all of the process in the distributedsystem the sender enters the critical section
• The sender must also notify all processesin its queue when it exits the criticalsection– The sender also deletes the entries from the
A Token Ring DistributedMutual Exclusion Algorithm
• This algorithm is based on organizing processesinto a logical ring
• Each process knows about its neighbor• A token is circulated around the ring• If a process does not want to enter a critical
region it passes the token to its neighbor uponreceipt
• If a process wants to enter a critical region itwaits until it receives the token then it:– Holds the token– Enters its critical region– Performs its operations– Leaves its critical region– Passes the token to its neighbor
• A process is not allowed to enter another criticalregion, it must pass the token to its neighbor andwait for it to circulate around the ring
• When the original initiator of theELECTION message receives the messageagain (after traversing the ring) themessage will contain the process ID’s ofall of the operating processes in thedistributed environment
• The original initiator of the ELECTIONmessage then extracts the ID of the largestprocess and generates a COORDINATORmessage containing the largest processesID
• This message is then sent around the ring• Each process in the ring then records who