Top Banner
Super computers Parallel Processing By: Lecturer \ Aisha Dawood
28

Super computers Parallel Processing

Feb 24, 2016

Download

Documents

Brendan McQuade

Super computers Parallel Processing. By: Lecturer \ Aisha Dawood. Communication Model of Parallel Platforms . There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Super computers Parallel Processing

Super computersParallel Processing

By:

Lecturer \ Aisha Dawood

Page 2: Super computers Parallel Processing

2

Communication Model of Parallel Platforms

•There are two primary forms of data exchange between parallel tasks - accessing a shared data space and exchanging messages.

•Platforms that provide a shared data space are called shared-address-space machines or multiprocessors.

•Platforms that support messaging are also called message passing platforms or multicomputers.

Page 3: Super computers Parallel Processing

3

Shared-Address-Space Platforms

•Part (or all) of the memory is accessible to all processors.

•Processors interact by modifying data objects stored in this shared-address-space.

•If the time taken by a processor to access any memory word in the system global or local is identical, the platform is classified as a uniform memory access (UMA), else, a non-uniform memory access (NUMA) machine (caches are not considered).

Page 4: Super computers Parallel Processing

4

NUMA and UMA Shared-Address-Space Platforms

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b)

Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-

access shared-address-space computer with local memory only.

Page 5: Super computers Parallel Processing

5NUMA and UMA Shared-Address-Space

Platforms • The distinction between NUMA and UMA platforms is

important from the point of view of algorithm design. NUMA machines if accessing local memory is cheaper than accessing global memory algorithms and data structure must built locally.

• Programming these platforms is easier since reads and writes are implicitly visible to other processors (global memory space).

• However, read-write data to shared data must be coordinated.

• Caches in such machines require coordinated access to multiple copies. This leads to the cache coherence problem.

• Cache coherence problem: the presence of multiple copies of a single memory word being changed by multiple processors.

• A weaker model of these machines provides an address map, but not coordinated access. These models are called non cache coherent shared address space machines.

Page 6: Super computers Parallel Processing

6

Shared-Address-Space vs. Shared Memory Machines

•It is important to note the difference between the terms shared address space and shared memory.

•We refer to the former as a programming abstraction and to the latter as a physical machine attribute.

• It is possible to provide a shared address space using a physically distributed memory.

Page 7: Super computers Parallel Processing

7

Message-Passing Platforms • The logical machine view of a message passing

platform of p processing nodes each with its own exclusive address space ( exclusive memory).

• Interactions in such platforms between processes running on different nodes must be accomplished using messages, hence the name message passing.

• These platforms are programmed using (variants of) send and receive primitives.

• Libraries such as MPI and PVM provide such primitives.

Page 8: Super computers Parallel Processing

8

Message Passing vs. Shared Address Space Platforms

•Message passing requires little hardware support, other than a network, and more programming support.

•Shared address space platforms can easily emulate message passing. The reverse is more difficult to do (in an efficient manner).

Page 9: Super computers Parallel Processing

9

Physical Organization of Parallel Platforms We begin this discussion with an ideal

parallel machine called Parallel Random Access Machine, or PRAM.

Page 10: Super computers Parallel Processing

10

Architecture of an Ideal Parallel Computer •A natural extension of the Random Access

Machine (RAM) serial architecture is the Parallel Random Access Machine, or PRAM.

•PRAMs consist of p processors and a global memory of unbounded size that is uniformly accessible to all processors.

•Processors share a common clock but may execute different instructions in each cycle.

Page 11: Super computers Parallel Processing

11

Architecture of an Ideal Parallel Computer• Depending on how simultaneous memory

accesses are handled, PRAMs can be divided into four subclasses. ▫ Exclusive-read, exclusive-write (EREW) PRAM. Access to a memory location is exclusive, no concurrent read write operations are allowed.(weakest model, minimum concurrency).▫ Concurrent-read, exclusive-write (CREW) PRAM. Multiple read accesses to a memory location is allowed, multiple write accesses are serialized.▫ Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write accesses are allowed to a memory location, but read accesses are serialized. ▫ Concurrent-read, concurrent-write (CRCW) PRAM.Allows multiple read and write accesses to a common memory location, this is the most powerful PRAM model.

Page 12: Super computers Parallel Processing

12

Architecture of an Ideal Parallel Computer • What does concurrent write mean, anyway?

Concurrent read doesn’t allow any problems but concurrent write accesses require resolving.

The most used protocols to resolve concurrent writes are:

▫ Common: write only if all values are identical. ▫ Arbitrary: write the data from a randomly selected

processor others are fail. ▫ Priority: follow a predetermined priority list, the processor

with the highest priority succeed but the rest are failed. ▫ Sum: Write the sum of all data items, this can be extended

to any operator defined on the quantities being written.

Page 13: Super computers Parallel Processing

13

Interconnection Networks for Parallel Computers

• Interconnection networks carry data between processors and to memory.

• Interconnects are made of switches and links (wires, fiber).

• Interconnection networks are classified as static or dynamic.

• Static networks consist of point-to-point communication links among processing nodes and are also referred to as direct networks.

• Dynamic networks consists of nodes connected dynamically built using switches and communication links. Dynamic networks are also referred to as indirect networks.

Page 14: Super computers Parallel Processing

14

Static and Dynamic Interconnection Networks

Classification of interconnection networks: (a) a static network; and (b) a dynamic network.

Page 15: Super computers Parallel Processing

15

Interconnection Networks

• Switches functionality:▫Switches map a fixed number of inputs to

outputs. ▫Switches may also support internal buffering,

when the requested output port is busy.▫Support routing to prevent collision on the

network.▫Support multicast (same output on multiple

ports).• The total number of ports on a switch is the degree

of the switch. • The cost of switching affected by mapping cost and

packaging cost.

Page 16: Super computers Parallel Processing

16

Interconnection Networks: Network Interfaces

• Processors connect to the network via a network interface which has input and output ports that pipe data into and out of the network.

• The network interface may be placed on the I/O bus or the memory bus, the latter support higher bandwidth, since I/O buses are slower.

• The relative speeds of the I/O and memory buses impact the performance of the network.

Page 17: Super computers Parallel Processing

17

Network Topologies

• A variety of network topologies have been proposed and implemented.

• These topologies tradeoff performance for cost. • Commercial machines often implement hybrids

of multiple topologies for reasons of packaging, cost, and available components.

Page 18: Super computers Parallel Processing

18

Network Topologies: Bus based network • Some of the simplest and earliest parallel

machines used buses. • All processors access a common bus for

exchanging data. • The distance between any two nodes is 1 in a

bus. The bus also provides a convenient broadcast media.

• However, the bandwidth of the shared bus is a major bottleneck as the number of nodes increases.

• The demand of a bus bandwidth can be reduced by making the majority of the data accessed is local to the node and remote data is accessed by the bus.

Page 19: Super computers Parallel Processing

19

Network Topologies: Bus based network

Bus-based interconnects (a) with no local caches; (b) with local memory/caches.

Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.

Page 20: Super computers Parallel Processing

20

Network Topologies: Crossbars

A completely non-blocking crossbar network connecting p processors to b memory banks.

A crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a

non-blocking manner.

Page 21: Super computers Parallel Processing

21

Network Topologies: Crossbars•The cost of crossbar network at which p

processors connected to b databank increases as the number of processing nodes increases , this leads to memory access blocking.

Page 22: Super computers Parallel Processing

22

Network Topologies: Multistage Networks •Crossbars have excellent performance

scalability but poor cost scalability. •Buses have excellent cost scalability, but

poor performance scalability. •Multistage interconnects strike a

compromise between these extremes.

Page 23: Super computers Parallel Processing

23

Network Topologies: Multistage Networks

The schematic of a typical multistage interconnection network.

Page 24: Super computers Parallel Processing

Network Topologies: Multistage Omega Network

•One of the most commonly used multistage interconnects is the Omega network.

•This network consists of log p stages, where p is the number of inputs/outputs.

•At each stage, input i is connected to output j if:

24

Page 25: Super computers Parallel Processing

25

Network Topologies: Multistage Omega NetworkEach stage of the Omega network implements a perfect shuffle as follows:

A perfect shuffle interconnection for eight inputs and outputs.

Page 26: Super computers Parallel Processing

Network Topologies: Multistage Omega Network

•The perfect shuffle patterns are connected using 2×2 switches.

•The switches operate in two modes crossover or passthrough.

Two switching configurations of the 2 × 2 switch: (a) Pass-through; (b) Cross-over.

26

Page 27: Super computers Parallel Processing

27

Network Topologies: Multistage Omega Network

A complete omega network connecting eight inputs and eight outputs.

A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated:

Page 28: Super computers Parallel Processing

28

Network Topologies: Multistage Omega Network – Routing

An example of blocking in omega network: one of the messages (010 to 111 or 110 to 100) is blocked at link AB.

Blocking route is the most important property of Omega network, accessing a memory location may block other accesses to another memory location by another processor network with this property

called blocking network