Introduction to Parallel Computing - CSE User Home Pageskarypis/parbook/Lectures/GK... · System Software Parallel Operating ... -match goes via pass-through ... Useful in the early
Post on 08-Apr-2018
220 Views
Preview:
Transcript
Elements of a Parallel ComputerHardware
Multiple ProcessorsMultiple MemoriesInterconnection Network
System SoftwareParallel Operating SystemProgramming Constructs to Express/Orchestrate Concurrency
Application SoftwareParallel Algorithms
Goal: Utilize the Hardware, System, & Application Software to either
Achieve Speedup: Tp = Ts/pSolve problems requiring a large amount of memory.
Parallel Computing PlatformLogical Organization
The user’s view of the machine as it is being presented via its system software
Physical OrganizationThe actual hardware architecture
Physical Architecture is to a large extent independent of the Logical Architecture
Logical Organization ElementsControl Mechanism
SISD/SIMD/MIMD/MISDSingle/Multiple Instruction Stream & Single/Multiple Data Stream
SPMD: Single Program Multiple Data
Physical OrganizationIdeal Parallel Computer Architecture
PRAM: Parallel Random Access MachinePRAM Models
EREW/ERCW/CREW/CRCWExclusive/Concurrent Read and/or Write
Concurrent Writes are resolved viaCommon/Arbitrary/Priority/Sum
Physical OrganizationInterconnection Networks (ICNs)
Provide processor-to-processor and processor-to-memory connectionsNetworks are classified as:
DynamicThe network consists of switching elements that the various processors attach to
indirect networkHistorically used to link processors-to-memory
shared-memory systems
StaticConsist of a number of point-to-point links
direct networkHistorically used to link processors-to-processors
distributed-memory system
Evaluation Metrics for ICNsDiameter
The maximum distance between any two nodesSmaller the better.
ConnectivityThe minimum number of arcs that must be removed to break it into two disconnected networks
Larger the betterMeasures the multiplicity of paths
Bisection widthThe minimum number of arcs that must be removed to partition the network into two equal halves.
Larger the betterBisection bandwidth
Applies to networks with weighted arcs—weights correspond to the link width (how much data it can transfer)The minimum volume of communication allowed between any two halves of a network
Larger the betterCost
The number of links in the networkSmaller the better
Network TopologiesBus-Based Networks
Shared mediumInformation is being broadcastedEvaluation:
Diameter: O(1)Connectivity: O(1)Bisection width: O(1)Cost: O(p)
Network TopologiesCrossbar Networks
Switch-based networkSupports simultaneous connectionsEvaluation:
Diameter: O(1)Connectivity: O(1)?Bisection width: O(p)?Cost: O(p2)
Blocking in a Multistage SwitchRouting is done by comparing the bit-levelrepresentation of source and destination addresses.-match goes via pass-through-mismatch goes via cross-over
Physical OrganizationCache Coherence in Shared Memory Systems
A certain level of consistency must be maintained for multiple copies of the same dataRequired to ensure proper semantics and correct program execution
serializabilityTwo general protocols for dealing with it
invalidate & update
Invalidate/Update ProtocolsThe preferred scheme depends on the characteristics of the underlying application
frequency of reads/writes to shared variablesClassical trade-off between communication overhead (updates) and idling (stalling in invalidates)Additional problems with false sharingExisting schemes are based on the invalidate protocol
A number of approaches have been developed for maintaining the state/ownership of the shared data
Communication Costs in Parallel Systems
Message-Passing SystemsThe communication cost of a data-transfer operation depends on:
start-up time: tsadd headers/trailer, error-correction, execute the routing algorithm, establish the connection between source & destination
per-hop time: thtime to travel between two directly connected nodes.
node latencyper-word transfer time: tw
1/channel-width
Communication Model Used for this Class
We will assume that the cost of sending a message of size m is:
In general true because ts is much larger than th and for most of the algorithms that we will study mtw is much larger than lth
Routing MechanismsRouting:
The algorithm used to determine the path that a message will take to go from the source to destination
Can be classified along different dimensions
minimal vs non-minimaldeterministic vs adaptive
Dimension Ordered RoutingThere is a predefined ordering of the dimensionsMessages are routed along the dimensions in that order until they cannot move any further
X-Y routing for meshesE-cube routine for hypercubes
Topology EmbeddingsMapping between networks
Useful in the early days of parallel computing when topology specific algorithms were being developed.
Embedding quality metricsdilation
maximum number of lines an edge is mapped tocongestion
maximum number of edges mapped on a single link
top related