Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn Michael J. Quinn
Mar 31, 2015
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Programmingin C with MPI and OpenMP
Michael J. QuinnMichael J. Quinn
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 1
Motivation and HistoryMotivation and History
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outline
MotivationMotivation Modern scientific methodModern scientific method Evolution of supercomputingEvolution of supercomputing Modern parallel computersModern parallel computers Seeking concurrencySeeking concurrency Data clustering case studyData clustering case study Programming parallel computersProgramming parallel computers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why Faster Computers?
Solve compute-intensive problems fasterSolve compute-intensive problems faster Make infeasible problems feasibleMake infeasible problems feasible Reduce design timeReduce design time
Solve larger problems in same amount of timeSolve larger problems in same amount of time Improve answer’s precisionImprove answer’s precision Reduce design timeReduce design time
Gain competitive advantageGain competitive advantage
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Definitions
Parallel computingParallel computing Using parallel computer to solve single Using parallel computer to solve single
problems fasterproblems faster Parallel computerParallel computer
Multiple-processor system supporting parallel Multiple-processor system supporting parallel programmingprogramming
Parallel programmingParallel programming Programming in a language that supports Programming in a language that supports
concurrency explicitlyconcurrency explicitly
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why MPI?
MPI = “Message Passing Interface”MPI = “Message Passing Interface” Standard specification for message-passing Standard specification for message-passing
librarieslibraries Libraries available on virtually all parallel Libraries available on virtually all parallel
computerscomputers Free libraries also available for networks of Free libraries also available for networks of
workstations or commodity clustersworkstations or commodity clusters
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why OpenMP?
OpenMP an application programming OpenMP an application programming interface (API) for shared-memory systemsinterface (API) for shared-memory systems
Supports higher performance parallel Supports higher performance parallel programming of symmetrical programming of symmetrical multiprocessorsmultiprocessors
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Classical Science
Nature
Observation
Theory PhysicalExperimentation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Modern Scientific Method
Nature
Observation
Theory PhysicalExperimentation
NumericalSimulation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Evolution of Supercomputing
World War IIWorld War II Hand-computed artillery tablesHand-computed artillery tables Need to speed computationsNeed to speed computations ENIACENIAC
Cold WarCold War Nuclear weapon designNuclear weapon design Intelligence gatheringIntelligence gathering Code-breakingCode-breaking
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Supercomputer
General-purpose computerGeneral-purpose computer Solves individual problems at high speeds, Solves individual problems at high speeds,
compared with contemporary systemscompared with contemporary systems Typically costs $10 million or moreTypically costs $10 million or more Traditionally found in government labsTraditionally found in government labs
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Commercial Supercomputing
Started in capital-intensive industriesStarted in capital-intensive industries Petroleum explorationPetroleum exploration Automobile manufacturingAutomobile manufacturing
Other companies followed suitOther companies followed suit Pharmaceutical designPharmaceutical design Consumer productsConsumer products
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
50 Years of Speed Increases
ENIAC
350 flops
Today
> 1 trillion flops
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CPUs 1 Million Times Faster
Faster clock speedsFaster clock speeds Greater system concurrencyGreater system concurrency
Multiple functional unitsMultiple functional units Concurrent instruction executionConcurrent instruction execution Speculative instruction executionSpeculative instruction execution
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Systems 1 Billion Times Faster
Processors are 1 million times fasterProcessors are 1 million times faster Combine thousands of processorsCombine thousands of processors Parallel computerParallel computer
Multiple processorsMultiple processors Supports parallel programmingSupports parallel programming
Parallel computing = Using a parallel Parallel computing = Using a parallel computer to execute a program fastercomputer to execute a program faster
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Microprocessor RevolutionMicros
Minis
Mainframes
Speed (log scale)
Time
Supercomputers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Modern Parallel Computers
Caltech’s Cosmic Cube (Seitz and Fox)Caltech’s Cosmic Cube (Seitz and Fox) Commercial copy-catsCommercial copy-cats
nCUBE CorporationnCUBE Corporation Intel’s Supercomputer Systems DivisionIntel’s Supercomputer Systems Division Lots moreLots more
Thinking Machines CorporationThinking Machines Corporation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copy-cat Strategy
MicroprocessorMicroprocessor 1% speed of supercomputer1% speed of supercomputer 0.1% cost of supercomputer0.1% cost of supercomputer
Parallel computer = 1000 microprocessorsParallel computer = 1000 microprocessors 10 10 xx speed of supercomputer speed of supercomputer Same cost as supercomputerSame cost as supercomputer
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Why Didn’t Everybody Buy One?
Supercomputer Supercomputer CPUs CPUs Computation rate Computation rate throughput throughput Inadequate I/OInadequate I/O
SoftwareSoftware Inadequate operating systemsInadequate operating systems Inadequate programming environmentsInadequate programming environments
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
After the “Shake Out”
IBMIBM Hewlett-PackardHewlett-Packard Silicon GraphicsSilicon Graphics Sun MicrosystemsSun Microsystems
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Commercial Parallel Systems
Relatively costly per processorRelatively costly per processor Primitive programming environmentsPrimitive programming environments Focus on commercial salesFocus on commercial sales Scientists looked for alternativeScientists looked for alternative
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Beowulf Concept
NASA (Sterling and Becker)NASA (Sterling and Becker) Commodity processorsCommodity processors Commodity interconnectCommodity interconnect Linux operating systemLinux operating system Message Passing Interface (MPI) libraryMessage Passing Interface (MPI) library High performance/$ for certain applicationsHigh performance/$ for certain applications
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Advanced Strategic Computing Initiative U.S. nuclear policy changesU.S. nuclear policy changes
Moratorium on testingMoratorium on testing Production of new weapons haltedProduction of new weapons halted
Numerical simulations needed to maintain Numerical simulations needed to maintain existing stockpileexisting stockpile
Five supercomputers costing up to $100 Five supercomputers costing up to $100 million eachmillion each
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ASCI White (10 teraops/sec)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Seeking Concurrency
Data dependence graphsData dependence graphs Data parallelismData parallelism Functional parallelismFunctional parallelism PipeliningPipelining
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Dependence Graph
Directed graphDirected graph Vertices = tasksVertices = tasks Edges = dependencesEdges = dependences
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Parallelism
Independent tasks apply same operation to Independent tasks apply same operation to different elements of a data setdifferent elements of a data set
Okay to perform operations concurrentlyOkay to perform operations concurrently
for i 0 to 99 do a[i] b[i] + c[i]endfor
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Functional Parallelism
Independent tasks apply different operations to Independent tasks apply different operations to different data elementsdifferent data elements
First and second statementsFirst and second statements Third and fourth statementsThird and fourth statements
a 2b 3m (a + b) / 2s (a2 + b2) / 2v s - m2
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Pipelining
Divide a process into stagesDivide a process into stages Produce several items simultaneouslyProduce several items simultaneously
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Partial Sums Pipeline
a[ 0 ]
= + + +
a[ 1 ] a[ 2 ] a[ 3 ]
p [ 0 ] p [ 1 ] p [ 2 ]
p [ 0 ] p [ 1 ] p [ 2 ] p [ 3 ]
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Clustering
Data mining = looking for meaningful Data mining = looking for meaningful patterns in large data setspatterns in large data sets
Data clustering = organizing a data set into Data clustering = organizing a data set into clusters of “similar” itemsclusters of “similar” items
Data clustering can speed retrieval of Data clustering can speed retrieval of related itemsrelated items
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Document Vectors
Moon
Rocket
Alice in Wonderland
A Biography of Jules Verne
The Geology of Moon Rocks
The Story of Apollo 11
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Document Clustering
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Clustering Algorithm
Compute document vectorsCompute document vectors Choose initial cluster centersChoose initial cluster centers RepeatRepeat
Compute performance functionCompute performance function Adjust centersAdjust centers
Until function value converges or max iterations Until function value converges or max iterations have elapsedhave elapsed
Output cluster centersOutput cluster centers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Parallelism Opportunities
Operation being applied to a data setOperation being applied to a data set ExamplesExamples
Generating document vectorsGenerating document vectors Finding closest center to each vectorFinding closest center to each vector Picking initial values of cluster centersPicking initial values of cluster centers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Functional Parallelism Opportunities
Draw data dependence diagramDraw data dependence diagram Look for sets of nodes such that there are no Look for sets of nodes such that there are no
paths from one node to anotherpaths from one node to another
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Data Dependence Diagram
Build document vectors
Compute function value
Choose cluster centers
Adjust cluster centers Output cluster centers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Programming Parallel Computers
Extend compilers: translate sequential Extend compilers: translate sequential programs into parallel programsprograms into parallel programs
Extend languages: add parallel operationsExtend languages: add parallel operations Add parallel language layer on top of Add parallel language layer on top of
sequential languagesequential language Define totally new parallel language and Define totally new parallel language and
compiler systemcompiler system
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Strategy 1: Extend Compilers
Parallelizing compilerParallelizing compiler Detect parallelism in sequential programDetect parallelism in sequential program Produce parallel executable programProduce parallel executable program
Focus on making Fortran programs parallelFocus on making Fortran programs parallel
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Extend Compilers (cont.)
AdvantagesAdvantages Can leverage millions of lines of existing Can leverage millions of lines of existing
serial programsserial programs Saves time and laborSaves time and labor Requires no retraining of programmersRequires no retraining of programmers Sequential programming easier than Sequential programming easier than
parallel programmingparallel programming
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Extend Compilers (cont.)
DisadvantagesDisadvantages Parallelism may be irretrivably lost when Parallelism may be irretrivably lost when
programs written in sequential languagesprograms written in sequential languages Performance of parallelizing compilers Performance of parallelizing compilers
on broad range of applications still up in on broad range of applications still up in airair
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Extend Language
Add functions to a sequential languageAdd functions to a sequential language Create and terminate processesCreate and terminate processes Synchronize processesSynchronize processes Allow processes to communicateAllow processes to communicate
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Extend Language (cont.)
AdvantagesAdvantages Easiest, quickest, and least expensiveEasiest, quickest, and least expensive Allows existing compiler technology to Allows existing compiler technology to
be leveragedbe leveraged New libraries can be ready soon after New libraries can be ready soon after
new parallel computers are availablenew parallel computers are available
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Extend Language (cont.)
DisadvantagesDisadvantages Lack of compiler support to catch errorsLack of compiler support to catch errors Easy to write programs that are difficult Easy to write programs that are difficult
to debugto debug
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Add a Parallel Programming Layer
Lower layerLower layer Core of computationCore of computation Process manipulates its portion of data to Process manipulates its portion of data to
produce its portion of resultproduce its portion of result Upper layerUpper layer
Creation and synchronization of processesCreation and synchronization of processes Partitioning of data among processesPartitioning of data among processes
A few research prototypes have been built based A few research prototypes have been built based on these principleson these principles
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Create a Parallel Language
Develop a parallel language “from scratch”Develop a parallel language “from scratch” occam is an exampleoccam is an example
Add parallel constructs to an existing Add parallel constructs to an existing languagelanguage Fortran 90Fortran 90 High Performance FortranHigh Performance Fortran C*C*
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
New Parallel Languages (cont.)
AdvantagesAdvantages Allows programmer to communicate Allows programmer to communicate
parallelism to compilerparallelism to compiler Improves probability that executable will Improves probability that executable will
achieve high performanceachieve high performance DisadvantagesDisadvantages
Requires development of new compilersRequires development of new compilers New languages may not become standardsNew languages may not become standards Programmer resistanceProgrammer resistance
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Current Status
Low-level approach is most popularLow-level approach is most popular Augment existing language with low-level Augment existing language with low-level
parallel constructsparallel constructs MPI and OpenMP are examplesMPI and OpenMP are examples
Advantages of low-level approachAdvantages of low-level approach EfficiencyEfficiency PortabilityPortability
Disadvantage: More difficult to program and Disadvantage: More difficult to program and debugdebug
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary (1/2)
High performance computingHigh performance computing U.S. governmentU.S. government Capital-intensive industriesCapital-intensive industries Many companies and research labsMany companies and research labs
Parallel computersParallel computers Commercial systemsCommercial systems Commodity-based systemsCommodity-based systems
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Summary (2/2)
Power of CPUs keeps growing exponentiallyPower of CPUs keeps growing exponentially Parallel programming environments changing Parallel programming environments changing
very slowlyvery slowly Two standards have emergedTwo standards have emerged
MPI library, for processes that do not share MPI library, for processes that do not share memorymemory
OpenMP directives, for processes that do share OpenMP directives, for processes that do share memorymemory