Top Banner
CS 704D Advanced OS Debasis Das
55

Cs704 d distributedmutualexcclusion&memory

Nov 17, 2014

Download

Technology

Debasis Das

Distributed mutual exclusion, lamprort algorithm, Ricart Agarwala algorithm, Distributed memory system
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OSDebasis Das

Page 2: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 2

Distributed

Mutual Exclusion

Page 3: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 3

ComplexitiesIn distributed systems

Absence of shared memory

Inter-node communication delays can be

considerable

Global system state cannot be observed by

constituent machines due to communication

delays, component failures, absence of shared

memory

Many more modes of failures yet fail soft is a

goal

Page 4: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 4

Some ConsiderationsPolicies/strategies developed for a distributed

system can be made applicable in a uniprocessor case

However, policies/strategies developed for uniprocessor case cannot be extended to distributed case

Same can be simulated by adding a central resources allocator

Increase traffic to central allocator, the system will fail when the allocator fails

Election of a successor would be needed

Page 5: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 5

Required AssumptionsMessages exchanged by a pair of communicating

processes need to be received in the same order as

they were generated (pipelining property)

Every message is received without errors, no

duplicates

The underlying network ensures all nodes are fully

connected. Any node can communicate with every

other node

Page 6: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 6

Desirable Properties of Algorithms

All nodes should have equal amount of information

Each node makes decisions on the basis of local information. The algorithm should ensure that nodes make consistent & coherent decisions

All nodes reach decisions through about equal effort

Failure of a node should not cause complete break down. The ability of reaching a decision and accessing the resources should not be affected

Page 7: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 7

Time & Ordering of EventsHappened Before Relationship

Logical clock needs to ensureIf a and b are events in the same process and

a comes before b then a->bIf event a is a representation of sending of a

message and b is that of receiving of message in another process the a->b

It is a transitive relationship; that is if a->b and b->c then a->c

If a and b has no happened before relationship the a and b are said to be concurrent

Page 8: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 8

Time & Ordering of EventsLogical Clock Properties

If a->b the C(a) < C(b)Clock condition is satisfied if

If a and b are events in a process Pi and if a comes before b the Ci(a)< Ci(b)

If a is an event sending message m by process Pi and b is the receipt of message by process Pj then Ci(a) <Cj(b)

Page 9: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 9

Time & Ordering of EventsLogical Clock Implementation

Process Pi increments the clock Ci between

successive events

Message m needs to be time stamped so that

T(m)=ci(a)

Receiving process adjusts clock such that it is

max of (Cj+1, Tm)

Page 10: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 10

Total Orderingab only when

Ci(a) “less than” Cj(b) orCi(a) = Cj(b) and Pi “less than” Pj

Simple way to implement “less than” relation would be to assign a unique number to each process and define the “less than” such that i < j.

Page 11: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 11

Lamport’s Algorithm Initiator i: Process Pi requires an exclusive access to a

resource. Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.

Other processes(j, j not= i): When Pj receives the request, places the request on its own queue, send a reply with time stamp (Tj, j) to Process Pi

Pi is allowed access only whenPi request is in front of the queue andAll replies are time stamped later that the Pi time stamp

Pi sends a release message by sending a release message, time stamped suitably

Pj removes Pi request from it request queueCost: 3 (N-1) messages, works best on bus based system where

broadcast costs are minimal

Page 12: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 12

Ricart-Agarwala Algorithm Initiator i: Process Pi requires an exclusive access to a resource.

Sends time stamped message request (Ti, i) where Ti = Ci to all the other processes.

Other processes(j, j not= i): When Pj receives the request reacts as follows, If Pj is not requesting the resource, it sends a time stamped reply If Pj needs the resource and the time stamp precedes the Pi’s time

stamp Pi’s request is retained, else a time stamped reply is returned.Pi is allowed access only when

Pi request is in front of the queue andAll replies are time stamped later that the Pi time stamp

Pi sends a releases resource by sending a release message, for each pending resources

Cost: 2(N-1) messages

Page 13: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 13

Distributed Shared MemoryA software abstraction over the loosely

coupled systemsProvides a shared memory kind of operation

over the underlying IPC/RPC mechanismsCan be implemented in OS kernel or runtime

systemAlso known as Distributed Shared Virtual

Memory System (DSVM)The shared space exists only virtually

Page 14: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 14

DSM Architecture

Distributed Shared Memory Layer

Memory

Mapping

CPU(s)

Memory

Mapping

CPU(s)

Memory

Mapping

CPU(s)

Communication Network

Page 15: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 15

DSM ArchitectureUnlike tightly coupled systems, this shared memory is

entirely virtualPartitioned into blocksLocal memory is treated as large local cachesIf the data requested is not available locally a network

fault is generatedOS, through a message, requests the node holding the

block and gets it migrated to the node where fault occurred

Data may be replicated locallyConfiguration varies depending on what kind of

replication, migration policies are used

Page 16: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 16

Design issuesGranularity (block size): Smaller size, higher

faults, traffic; larger blocks mean jobs with higher locality

Structure: Layout of data, depends on applicationCoherence & access synchronization: Like the

cache situation in a uniprocessor systemData Location & access: what data to be

replicated, locatedReplacement strategyThrashingHeterogeneity

Page 17: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 17

Granularity

Page 18: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 18

Block Size Selection FactorsLarge block sizes favored as overheads to transfer

smaller blocks and larger one not too differentPaging overhead- paging overheads also favors

larger block sizes, application should thus have larger locality of reference

Directory size-smaller block larger directory, larger management overhead

Thrashing- thrashing is likely to increase with larger block size

False sharing-larger block sizes increases probability. Consequence, higher thrashing

Page 19: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 19

Page Size as Block SizePage size is preferred as the DSM block sizeAdvantages are

Existing page fault hardware can be used as block fault mechanism. Memory coherence can be handled in page fault handlers

Access control can be managed with existing memory mapping systems

If page size is less than packet size, no extra overhead

Page size proved to be, over time, the right unit as far as memory contention

Page 20: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 20

Structure

Page 21: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 21

Structure of Shared Memory SpaceApproaches to structuring

No structure: a linear array of memory, easy to

design

By data type: granularity per variable,

complex to handle

As database: as tuple space, associative

memory, primitives need to be added to

languages, non transparent access to shared

data

Page 22: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 22

Consistency Models

Page 23: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 23

Consistency ModelsStrict consistency

Sequential consistency

Causal consistency

Pipelined random access memory consistency

Processor consistency

Weak consistency

Release consistency

Page 24: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 24

Strict Consistency ModelValue read of a memory address is the same

as the latest write at that address

Writes become visible to all nodes

Needs absolute ordering of memory

read/write operations, a global time required

(to define most recent)

Nearly impossible to implement

Page 25: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 25

Sequential Consistency ModelAll processes should see the same ordering of

read, writes

Exact interleaving does not matter

No memory operation is started unless

earlier operations have completed

Acceptable in most applications

Page 26: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 26

Causal Consistency ModelOperations are seen in same order (correct

order)when they are causally relatedW2 follows w1 and causally related, then w1,

w2 is the order every process should seeThey may not be seen in same order when

not related causally

Page 27: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 27

Pipelined RAM Consistency ModelAll writes of a single process are seen in the

same order by other processes (as in a pipeline)

However, writes by other processes may appear in different order.(W11,w12) and (w21, w22) can be seen as

(wi1,wi2) followed by (w21, w22) or (w21, w22) followed by (w11,w12)

Simple to implement

Page 28: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 28

Processor Consistency ModelAdds memory coherence to the PRAM model

That is if the writes are for a particular

memory location then all processes should

see the writes in the same order that

maintains memory coherence

Page 29: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 29

Weak Consistency Model Changes in memory can be made after a set of changes has happened (example

critical section)

Isolated access to variable is usually rare, usually there will be several accesses

and then none at all

Difficulty is the system would not know when to show the changes

Application programmers can take care of this through a synchronization

variable

Necessarily

All accesses to sync variable must follow strongest consistency9sequential)

All pending writes must be completed before access to sync variable is allowed

All previous access to sync must be completed before another access is allowed

Page 30: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 30

Release Consistency ModelWeak consistency model requires that

All changes made by a process are propagated to all nodes

All changes at other nodes are propagated to the processor node

Acquire and release variable used for sync so that only one of the operations above need to be done

Page 31: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 31

Discussion of ModelsStrict sequential model s difficult to

implement, almost never implementedSequential consistency model is most

commonly usedCausal, PRAM, processor, weak and release

consistency are the ones implemented in many DSM systems, programmers need to intervene

Weak and release consistency provides explicit sync variables to help with the consistency

Page 32: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 32

Implementing Sequential Concurrency Model

Implementing sequential consistency would depend on what replication/ migration are allowed

Migration/Replication strategiesNon replicated, non migrating blocks

(NRNMBs)Non replicated, migrating blocks (NRMBs)Replicated, migrating blocks (RMBs)Replicated, non migrating blocks (RNMBs)

Page 33: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 33

NRNMBAll requests to a block are routed through the

OS and MMU to this one block that is not replicate and does not move anywhere

Can causeBottleneck because of serializing of memory

accessesParallelism is not possible

Page 34: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 34

NRMBNo copies, if required entire block may be

moved to the node that requires itAdvantages

No communication costs, all accesses are localApplications can take advantage of locality,

applications with high locality will perform better

DisadvantagesProne to thrashingNo advantage of parallelism

Page 35: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 35

Data Locating in NRMBBroadcast

Fault happens, a request is broadcast, current owner sends the block

Broadcast cause communication overheadsCentralized server

Request sent to the server, servers asks the node holding the block to send it to the requesting node, updates location information

Fixed distributed server Fault handler finds mapping of block to the specific server, send

request and gets the blockDynamic distributed server

Fault causes a local search for probable owner, goes to that node, finds another probable owner or the block, gets block updates info

Page 36: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 36

RMBReplication is required to increase

parallelismReads can be done locally, writes has

overheadsHigh read/write ratio systems can apportion

the write overhead over many readsMaintaining coherence throughout replicated

block is an issueTwo basic protocols used are

Write-invalidateWrite update

Page 37: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 37

Coherence ProtocolsWrite-invalidate

On write fault, the fault handler copies the block from one of the nodes to its own

Invalidates all the copies, writes dataIf another node needs it now, the updated block

is replicatedWrite update

On write fault, copy block to local node, update data

Send address & new data to all the replicasOperation resumes after all the writes are done

Page 38: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 38

ComparisonWrite update typically needs a global

sequencer to makes sure all nodes see writes in the same sequence

Also the operations are full writesTogether there is a significant communication

overheadWrite invalidate does not need all that, just a

invalidation signalWrite invalidate is thus more often used

method

Page 39: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 39

Data Locating in RMB StrategyOwner of a block needs to be located, the

most recent node which had write accessNode that has a valid copy will need to be

trackedUse on of the following

BroadcastingCentralized server algorithmFixed distributed server algorithmDynamic distributed server algorithm

Page 40: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 40

RNMBReplicas are maintained but blocks do not

migrate

Consistency is maintained by updating all the

replicas by a write update like process

Page 41: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 41

Data Locating in RNMB Strategy

Replica locations do not change

Replicas are kept consistent

Read requests can go to the nodes that has

the data block

Writes through global sequencer

Page 42: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 42

Munin: A Release Consistent DSM SystemStructure: a collection of shared variablesEach shared variable goes to a separate

memory pageacquireLock and releaselock are usedDifferent consistency protocol is applied for

different types of shared variable used in the systemRead-only, migratory, write-shared, producer-

consumer, result, reduction and conventional

Page 43: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 43

Replacement Strategy

Page 44: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 44

Replacement StrategyShared memory blocks are replicated and/or

migrated so two strategies need to be

decided

Block to be replaced

Where should the replaced block go

Page 45: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 45

Blocks to ReplaceUsage based vs. non-usage basedFixed space vs. variable space

UnusedNilRead onlyRead-owned Writable

Page 46: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 46

Place for Replacement BlockUsing secondary store locally

Using memory space of other nodes- store at

free memory space in some other node. Free

memory space status need to be exchanged,

piggybacking on normal communication

messages

Page 47: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 47

Thrashing

Page 48: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 48

Thrashing SituationsDSM allows migration, so migration back and

forth leads to thrashing

Bata blocks keep migrating between nodes due

to interleaved accesses by processes

Read only blocks are repeatedly invalidated so

after replication

Page 49: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 49

Thrashing Reduction StrategiesApplication controlled locks

Locking an application to a node for a time,

deciding t could be a very difficult issue

Tune coherence strategy to the usage pattern,

transparency of the memory system is

compromised

Page 50: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 50

Other Approaches to DSM

Page 51: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 51

Approaches Data caching managed by the OS

Data Caching managed by MMUs

Data Caching managed by the language run

time system

Page 52: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 52

Heterogeneous DSM

Page 53: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 53

Features of Heterogeneous DSMData Conversion

Structuring DSM as a source of source language objects

Allowing one type of data in a block only (has complications) Memory fragmentation Compilation issues Entire page is converted but a small part may be

used before transfer Not transparent, user provided conversion may be

required

Page 54: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 54

Advantages of DSM

Page 55: Cs704 d distributedmutualexcclusion&memory

CS 704D Advanced OS 55

AdvantagesSimpler abstraction

Better portability of distributed applications

Better performance of some Systems

Flexible communications environment

Ease of process migration