Top Banner
Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and
45

Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Dec 27, 2015

Download

Documents

Silvester Hart
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory Systems and Programming

By: Kenzie MacNeil

Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and

Page 2: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory Systems

• Shared memory programming model on al cluster

• Has physically distributed and separate memory

• Programming Viewpoint:– Memory is grouped together and sharable

between processes

• Known as Distributed Shared Memory (DSM)

Page 3: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory Systems

• Can be achieved by software or hardware• Software:– Easy to use on clusters– Inferior to using explicit message passing on the

same cluster

• Utilizes the same techniques as true shared memory systems (Chapter 8)

Page 4: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory

• Shared memory programming is generally more convenient than message passing

• Data can be accessed by individual processors without explicitly sending data

• Shared data has to be controlled– Locks or other means

• Both message passing and shared memory often require synchronization

Page 5: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory

• Distributed Shared Memory is a group of interconnected computers appearing to have a sing memory with a single address space

• Each computer having its own memory which is physically distributed

• Any memory location can be accessed by any processor in the cluster– Regardless of the memory residing locally

Page 6: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory

Page 7: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Advantages of DMS

• Normal shared memory programming techniques can be used

• Easily scalable, compared to traditional bus-connected shared memory multiprocessors

• Message passing is hidden from the user• Can handle complex and large data bases

without replication or sending the data to processes

Page 8: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Disadvantages of DMS

• Lower performance than true shared memory multiprocessor systems

• Must provide for protection against simultaneous access to shared data – Locks, etc.

• Little programmer control over actual messages being generated

• Incur performance penalties when compared to message passing routines on a cluster

Page 9: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Hardware DSM Systems

• Special network interfaces and cache coherence circuits are required

• Several interfaces that support shared memory operations

• Higher level of performance• More expensive

Page 10: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Software DSM Systems

• Requires no hardware changes• Preformed by software routines• Software layer added between the operating

system and the applications– Kernel may or may not be modified

• Software layer can be– Page based– Shared variable based– Object based

Page 11: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Page Based DMS

• Existing virtual memory is used to instigate movement of data between computer

• Occurs when page referenced does not reside locally

• Referred to as virtual shared memory system• Page based systems include:– The first DMS system by Li(1986), TreadMarks

(1996), Locust (1998)

Page 12: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Page Based DSM System

Page 13: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Page Based DMS Disadvantages

• Size of the unit of the data, a page, can be too big• More than the specific data is usually referenced – Leads to longer messages

• Not portable, because they are tied to a particular virtual memory hardware and software

• False sharing effects appear at the page level– Situation in which different parts of a page are

required by different processors without any actual sharing of information, but each page must be shared by each process to access different parts

Page 14: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Shared Variable DMS

• Only variables declared as shared are transferred

• Transferred on demand– Paging mechanism is not used

• Software routines perform the actions• Shared Variable DMS approach includes:– Munin (1990), JIAJIA (1999), Adsmith (1996)

Page 15: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Object Based DMS

• Shared data is embodied in objects– Includes data items and procedures/methods– Methods used to access data

• Similar to shared variable approach, even considered an extension

• Easily implemented in OO languages

Page 16: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Managing Shared Data

• Many ways a processor can be given access to shared data

• Simplest is the use of a central server– Responsible for all read write operations on

shared data– Requests sent to this server– Occurs sequentially on the server– Implements a single reader/ single writer policy

Page 17: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Managing Shared Data

• Single reader/writer policy incurs bottleneck• Additional servers can be added to relieve this

bottleneck by sharing variables• However multiple copies of data is preferable– Allows simultaneous access to the data by

different processors– Coherence policy must be used to maintain these

copies

Page 18: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Multiple Reader / Single Writer

• Allows multiple processors to read shared data– Which can be achieved by replicating data

• Allows only one processor, the owner, to alter data at any instant

• When an owner alters data two policies are available:– Update policy– Invalidate policy

Page 19: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Multiple Reader/Single Writer Policy

• Update policy– Utilizes broadcast– All copies are altered to reflect broadcast message

• Invalidate policy– All unaltered copies of the data are flagged as invalid– Requires a processor to make a request from the

owner– Any copies of the data that are not accessed remain

invalid• Both policies are needed to be reliable

Page 20: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Multiple Reader/Single Writer Policy

• Page based approach• Complete page, which holds the variable, is

transferred• A variable stored on a page which is not

shared will be moved or invalidated• Protocols offered by applications like

TreadMarks for dual writing to a single page

Page 21: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Achieving Consistent Memory in DSM

• Memory consistency addresses when the current value of a shared variable is seen by other processors

• Various models are available:– Strict Consistency– Sequential Consistency– Relaxed Consistency– Weak consistency– Release Consistency– Lazy Release Consistency

Page 22: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Strict Consistency

• Variable is obtained from the most recent write to the shared variable

• As soon as a variable is altered all other processors are informed– Can be done by update or invalidity

• Disadvantage is the large number of messages and changes are not instantaneous

• Relaxed memory consistency, writes are delayed to reduce message passing

Page 23: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Strict Consistency

Page 24: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Sequential and Weak Consistency

• Sequential consistency, result of any execution same as an interleaving of individual programs

• Weak consistency, synchronized operations are used by the programmer to enforce sequential consistency

• Any accesses to shared data can be controlled with synchronized operations– Locks, etc

Page 25: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Release Consistency

• Extension of weak consistency• Specified synchronization operation– Acquire operation, used before a shared variable or

variables are to be read– Release operations, used after the shared variable

or variable have been altered

• Acquire is performed with a lock operation• Release is performed with an unlock operation

Page 26: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Release Consistency

Page 27: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Lazy Release Consistency

• Version of release consistency• Update is only done at the time of acquire

rather than at release• Generates fewer messages that release

consistency

Page 28: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Lazy Release Consistency

Page 29: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory Programming Primitives

• Four fundamental and necessary operations of shared memory programming:– Process/thread creations and termination– Shared data creation– Mutual exclusion synchronization, controlled

access to shared data– Process/thread and event synchronization

• Typically provided by user-level library calls

Page 30: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Process Creation

• Set of routines are defined by DSM systems– Such as Adsmith and TreadMarks

• Used to start new process if process creation is supported– dsm_spawn(filename, num_processes);

Page 31: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Shared Data Creation

• Routine is necessary to declare shared data– dsm_shared(&x); or shared int x;– Dynamically creates memory space for shared

data in the manner of a C malloc

• After memory space can be discarded

Page 32: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Shared Data Access

• Various forms of data access are provided depending on the memory consistency used

• Some systems provide efficient routines for difference classes of accesses

• Adsmith provides three types of accesses:– Ordinary Accesse– Synchronization Access– Non-Synchronization Access

Page 33: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Synchronization Accesses

• Two principle forms:– Global synchronization and process-process pair

synchronization• Global is usually done through barrier routines• Process-process pair can be done by the same

routine or separate routines through simple synchronous send/receive routines

• DSM systems could also provide their own routines

Page 34: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Overlapping Computations with Communications

• Can be provided by starting a nonblocking communication before it results are needed– Called a prefetch routine

• Program continues execution after the prefetch has been called and while the data is being fetched

• Could even be done speculatively• Special mechanism must be in place to handle memory

exceptions• Similar to speculative load mechanism used in

advanced processors that overlap memory operations with program execution

Page 35: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Distributed Shared Memory Programming

• DSM programming on a cluster uses the same concepts as shared memory programming on a shared memory multiprocessor system

• Uses user level library routines or methods • Message passing is hidden from the user

Page 36: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Basic Shared-Variable Implementation

• Simplest DSM implementation is to use a shared variable approach with user level DSM library routines– Sitting on top of an existing message passing systems,

such as MPI– Routines can be embodied into classes and methods

• The routines could send messages to a central location that is responsible for the shared variables

Page 37: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Simple DSM System using a Centralized Server

Single reader/writer protocol

Page 38: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Basic Shared-Variable Implementation

• A simple DSM system using a centralized server can easily result in a bottleneck

• One method to reduce this bottleneck is to have multiple servers running on different processors

• Each server responsible for specific shared variables

• This is a single reader / single writer protocol

Page 39: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Simple DSM System using Multiple Servers

Page 40: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Basic Shared-Variable Implementation

• Also can provide multiple reader capability • A specific server is responsible for the shared

variable• Other local copies are invalidated

Page 41: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Simple DSM System using Multiple Servers and Multiple Reader Policy

Page 42: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Overlapping Data Groups

• Existing interconnections structure• Access patterns of the application• Static overlapping– Defined by the programmer prior to execution

• Shared variables can migrate according to usage

Page 43: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Symmetrical Multiprocessor System with Overlapping Data Regions

Page 44: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Simple DSM System using Multiple Servers and Multiple Reader Policy

Page 45: Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations.

Questions or Comments?