y y -X-'-y-'•:•,:•••'••'•• : : ':•; '•''•••• : 1 : :• :* -/:-- : -V PROGRAMMING Calvin Lin Department of Computer Sciences The University of Texas at Austin Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle PEARSON Addison Wesley Boston San Francisco New York London Toronto Sydney Tokyo Singapore Madrid Mexico City Munich Paris Cape Town Hong Kong Montreal
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Calvin Lin Department of Computer Sciences The University of Texas at Austin
Lawrence Snyder Department of Computer Science and Engineering
University of Washington, Seattle
PEARSON
Addison Wesley
Boston San Francisco New York
London Toronto Sydney Tokyo Singapore Madrid
Mexico City Munich Paris Cape Town Hong Kong Montreal
Contents
PARTI Foundations
Chapter 1 Introduction
The Power and Potential of Parallelism Parallelism, a Familiär Concept Parallelism in Computer Programs Multi-Core Computers, an Opportunity Even More Opportunities to Use Parallel
Hardware Parallel Computing versus Distributed
15
I C t •••. •
16 16 17 18
19
Computing System Level Parallelism Convenience of Parallel Abstractions
Examining Sequential and Parallel Programs
Parallelizing Compilers A Paradigm Shift Parallel Prefix Sum
Parallelism Using Multiple Instruction Streams
The Concept of a Thread A Multithreaded Solution to Counting 3s
The Goals: Scalability and Performance Portability
Scalability Performance Portability Principles First
Performance Trade-Offs 87 Communication versus Computation 88 Memory versus Parallelism 89 Overhead versus Parallelism 89
Measuring Performance 91 Execution Time 91 Speedup 92 Superlinear Speedup 92 Efficiency 93 Concerns with Speedup 93 Scaled Speedup versus Fixed-Size Speedup 95
Scalable Performance 95 Scalable Performance Is Difficult to Achieve 95
Implications for Hardware Implications for Software Scaling the Problem Size
Chapter Summary Historical Perspective Exercises
PART 2 Parallel Abstractions
Chapter 4 First Steps Toward Parallel Programming
96 97 97
98 98 99
101
102
Data and Task Parallelism Definitions Illustrating Data and Task Parallelism
The Peril-L Notation Extending C Parallel Threads Synchronization and Coordination Memory Model Synchronized Memory Reduce and Scan The Reduce Abstraction
OpenMP 207 The Count 3s Example 208 Semantic Limitations on p a r a l l e l for 209 Reduction 210 Thread Behavior and Interaction 211 Sections 213 Summary of OpenMP 213
MPI: The Message Passing Interface 216 The Count 3s Example 217 Groups and Communicators 225 Point-to-Point Communication 226 Collective Communication 228 Example: Successive Over-Relaxation 233 Performance Issues 236 Safety Issues 242
Partitioned Global Address Space Languages 243
Contents
Co-Array Fortran Unified Parallel C Titanium
Chapter Summary Historical Perspective Exercises
Chapter 8 Z P L a n d O t h e r G l o b a l V i e w L a n g u a g e s
The ZPL Programming Language
Basic Concepts of ZPL Regions Array Computation
Life, an Example The Problem The Solution How It Works The Philosophy of Life
Distinguishing Features of ZPL Regions Statement-Level Indexing Restrictions Imposed by Regions Performance Model Addition by Subtraction
Manipulating Arrays of Different Ranks Partial Reduce Flooding The Flooding Principle Data Manipulation, an Example Flood Regions Matrix Multiplication
Reordering Data with Remap Index Arrays Remap Ordering Example
Parallel Execution of ZPL Programs Role of the Compiler Specifying the Number of Processes
244 245 246
247 248 248
250
250
Assigning Regions to Processes 275 Array Allocation 276 Scalar Allocation 277 Work Assignment 277
Performance Model 278 Applying the Performance Model: Life 279 Applying the Performance Model:
SUMMA 280 Summary of the Performance Model 280
NESL Parallel Language 281 Language Concepts 281 Matrix Product Using Nested Parallelism 282 NESL Complexity Model 283
251 251 254
256 256 256 257 259
259 259 259 260 260 261
261 262 263 264 265 266 267
269 269 270 272
274 274 275
Chapter Summary Historical Perspective Exercises
Chapter 9 Assessing the State of the Art Four Important Properties of Parallel Languages
Graphics Processing Units Cell Processors Attached Processors Summary
Grid Computing
Transactional Memory Comparison with Locks Implementation Issues Open Research Issues
MapReduce
Problem Space Promotion
Emerging Languages Chapel Fortress X10
Chapter Summary Historical Perspective Exercises
297
jß *̂* ß»
298 299 302 302
304
305 306 307 309
310
312
313 314 314 316
318 318 318
Incremental Development Focus on the Parallel Structure Testing the Parallel Structure Sequential Programming Be Willing to Write Extra Code Controlling Parameters during Testing Functional Debugging