Top Banner
Introduction to Parallel Computing George Karypis Parallel Programming Platforms
33

Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Apr 08, 2018

Download

Documents

vothien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Introduction to Parallel Computing

George KarypisParallel Programming Platforms

Page 2: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Elements of a Parallel ComputerHardware

Multiple ProcessorsMultiple MemoriesInterconnection Network

System SoftwareParallel Operating SystemProgramming Constructs to Express/Orchestrate Concurrency

Application SoftwareParallel Algorithms

Goal: Utilize the Hardware, System, & Application Software to either

Achieve Speedup: Tp = Ts/pSolve problems requiring a large amount of memory.

Page 3: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Parallel Computing PlatformLogical Organization

The user’s view of the machine as it is being presented via its system software

Physical OrganizationThe actual hardware architecture

Physical Architecture is to a large extent independent of the Logical Architecture

Page 4: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Logical Organization ElementsControl Mechanism

SISD/SIMD/MIMD/MISDSingle/Multiple Instruction Stream & Single/Multiple Data Stream

SPMD: Single Program Multiple Data

Page 5: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Logical Organization Elements

Communication ModelShared-Address Space

UMA/NUMA/ccNUMA

Message-Passing

Page 6: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Physical OrganizationIdeal Parallel Computer Architecture

PRAM: Parallel Random Access MachinePRAM Models

EREW/ERCW/CREW/CRCWExclusive/Concurrent Read and/or Write

Concurrent Writes are resolved viaCommon/Arbitrary/Priority/Sum

Page 7: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Physical OrganizationInterconnection Networks (ICNs)

Provide processor-to-processor and processor-to-memory connectionsNetworks are classified as:

DynamicThe network consists of switching elements that the various processors attach to

indirect networkHistorically used to link processors-to-memory

shared-memory systems

StaticConsist of a number of point-to-point links

direct networkHistorically used to link processors-to-processors

distributed-memory system

Page 8: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Static & Dynamic ICNs

Page 9: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Evaluation Metrics for ICNsDiameter

The maximum distance between any two nodesSmaller the better.

ConnectivityThe minimum number of arcs that must be removed to break it into two disconnected networks

Larger the betterMeasures the multiplicity of paths

Bisection widthThe minimum number of arcs that must be removed to partition the network into two equal halves.

Larger the betterBisection bandwidth

Applies to networks with weighted arcs—weights correspond to the link width (how much data it can transfer)The minimum volume of communication allowed between any two halves of a network

Larger the betterCost

The number of links in the networkSmaller the better

Page 10: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Metrics and Dynamic Networks

Page 11: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesBus-Based Networks

Shared mediumInformation is being broadcastedEvaluation:

Diameter: O(1)Connectivity: O(1)Bisection width: O(1)Cost: O(p)

Page 12: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesCrossbar Networks

Switch-based networkSupports simultaneous connectionsEvaluation:

Diameter: O(1)Connectivity: O(1)?Bisection width: O(p)?Cost: O(p2)

Page 13: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesMultistage Interconnection Networks

Page 14: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Multistage Switch Architecture

Pass-through

Cross-over

Page 15: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Connecting the Various Stages

Page 16: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Blocking in a Multistage SwitchRouting is done by comparing the bit-levelrepresentation of source and destination addresses.-match goes via pass-through-mismatch goes via cross-over

Page 17: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesComplete and star-connected networks.

Page 18: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesCartesian Topologies

Page 19: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesHypercubes

Page 20: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Network TopologiesTrees

Page 21: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Summary of Performance Metrics

Page 22: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Physical OrganizationCache Coherence in Shared Memory Systems

A certain level of consistency must be maintained for multiple copies of the same dataRequired to ensure proper semantics and correct program execution

serializabilityTwo general protocols for dealing with it

invalidate & update

Page 23: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Invalidate/Update Protocols

Page 24: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Invalidate/Update ProtocolsThe preferred scheme depends on the characteristics of the underlying application

frequency of reads/writes to shared variablesClassical trade-off between communication overhead (updates) and idling (stalling in invalidates)Additional problems with false sharingExisting schemes are based on the invalidate protocol

A number of approaches have been developed for maintaining the state/ownership of the shared data

Page 25: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Communication Costs in Parallel Systems

Message-Passing SystemsThe communication cost of a data-transfer operation depends on:

start-up time: tsadd headers/trailer, error-correction, execute the routing algorithm, establish the connection between source & destination

per-hop time: thtime to travel between two directly connected nodes.

node latencyper-word transfer time: tw

1/channel-width

Page 26: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Store-and-Forward & Cut-Through Routing

Page 27: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Cut-through Routing Deadlocks

Messages 0, 1, 2, and 3need to go to nodes A, B,C, and D, respectively

Page 28: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Communication Model Used for this Class

We will assume that the cost of sending a message of size m is:

In general true because ts is much larger than th and for most of the algorithms that we will study mtw is much larger than lth

Page 29: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Routing MechanismsRouting:

The algorithm used to determine the path that a message will take to go from the source to destination

Can be classified along different dimensions

minimal vs non-minimaldeterministic vs adaptive

Page 30: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Dimension Ordered RoutingThere is a predefined ordering of the dimensionsMessages are routed along the dimensions in that order until they cannot move any further

X-Y routing for meshesE-cube routine for hypercubes

Page 31: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Topology EmbeddingsMapping between networks

Useful in the early days of parallel computing when topology specific algorithms were being developed.

Embedding quality metricsdilation

maximum number of lines an edge is mapped tocongestion

maximum number of edges mapped on a single link

Page 32: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Mapping a Cartesian Topology onto a Hypercube

Cool things ☺

Page 33: Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early

Mapping a Cartesian Topology onto a Hypercube