Top Banner
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005
24

Parallel Programming

Jan 02, 2016

Download

Documents

dorian-garrison

Parallel Programming. Aaron Bloomfield CS 415 Fall 2005. Why Parallel Programming?. Predict weather Predict spread of SARS Predict path of hurricanes Predict oil slick propagation Model growth of bio-plankton/fisheries Structural simulations Predict path of forest fires - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Programming

1

Parallel Programming

Aaron Bloomfield

CS 415

Fall 2005

Page 2: Parallel Programming

2

Why Parallel Programming?

• Predict weather• Predict spread of SARS• Predict path of hurricanes• Predict oil slick propagation• Model growth of bio-plankton/fisheries• Structural simulations• Predict path of forest fires• Model formation of galaxies• Simulate nuclear explosions

Page 3: Parallel Programming

3

Code that can be parallelized

do i= 1 to max,

a[i] = b[i] + c[i] * d[i]

end do

Page 4: Parallel Programming

4

Parallel Computers

• Programming mode types– Shared memory– Message passing

Page 5: Parallel Programming

5

Distributed Memory Architecture

• Each Processor has direct access only to its local memory• Processors are connected via high-speed interconnect• Data structures must be distributed• Data exchange is done via explicit processor-to-processor

communication: send/receive messages• Programming Models

– Widely used standard: MPI– Others: PVM, Express, P4, Chameleon, PARMACS, ...

P0Communication Interconnect

...

Memory

Memory

Memory

P0 P1 Pn

Page 6: Parallel Programming

6

Message Passing Interface

MPI provides:• Point-to-point communication• Collective operations

– Barrier synchronization– gather/scatter operations– Broadcast, reductions

• Different communication modes– Synchronous/asynchronous– Blocking/non-blocking– Buffered/unbuffered

• Predefined and derived datatypes• Virtual topologies• Parallel I/O (MPI 2)• C/C++ and Fortran bindings

• http://www.mpi-forum.org

Page 7: Parallel Programming

7

Shared Memory Architecture• Processors have direct access to global memory and I/O

through bus or fast switching network• Cache Coherency Protocol guarantees consistency

of memory and I/O accesses• Each processor also has its own memory (cache)• Data structures are shared in global address space• Concurrent access to shared memory must be coordinated• Programming Models

– Multithreading (Thread Libraries)– OpenMP P0

CacheP0

CacheP1

CachePn

Cache

Global Shared Memory

Shared Bus

...

Page 8: Parallel Programming

8

OpenMP

• OpenMP: portable shared memory parallelism• Higher-level API for writing portable multithreaded

applications• Provides a set of compiler directives and library routines

for parallel application programmers• API bindings for Fortran, C, and C++

http://www.OpenMP.org

Page 9: Parallel Programming

9

Page 10: Parallel Programming

10

Approaches

• Parallel Algorithms

• Parallel Language

• Message passing (low-level)

• Parallelizing compilers

Page 11: Parallel Programming

11

Parallel Languages

• CSP - Hoare’s notation for parallelism as a network of sequential processes exchanging messages.

• Occam - Real language based on CSP. Used for the transputer, in Europe.

Page 12: Parallel Programming

12

Fortran for parallelism

• Fortran 90 - Array language. Triplet notation for array sections. Operations and intrinsic functions possible on array sections.

• High Performance Fortran (HPF) - Similar to Fortran 90, but includes data layout specifications to help the compiler generate efficient code.

Page 13: Parallel Programming

13

More parallel languages

• ZPL - array-based language at UW. Compiles into C code (highly portable).

• C* - C extended for parallelism

Page 14: Parallel Programming

14

Object-Oriented

• Concurrent Smalltalk

• Threads in Java, Ada, thread libraries for use in C/C++– This uses a library of parallel routines

Page 15: Parallel Programming

15

Functional

• NESL, Multiplisp

• Id & Sisal (more dataflow)

Page 16: Parallel Programming

16

Parallelizing Compilers

Automatically transform a sequential program into a parallel program.

1. Identify loops whose iterations can be executed in parallel.

2. Often done in stages.

Q: Which loops can be run in parallel?

Q: How should we distribute the work/data?

Page 17: Parallel Programming

17

Data Dependences

Flow dependence - RAW. Read-After-Write. A "true" dependence. Read a value after it has been written into a variable.

Anti-dependence - WAR. Write-After-Read. Write a new value into a variable after the old value has been read.

Output dependence - WAW. Write-After-Write. Write a new value into a variable and then later on write another value into the same variable.

Page 18: Parallel Programming

18

Example

1: A = 90;

2: B = A;

3: C = A + D

4: A = 5;

Page 19: Parallel Programming

19

Dependencies

A parallelizing compiler must identify loops that do not have dependences BETWEEN ITERATIONS of the loop.

Example:

do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I)end do

Page 20: Parallel Programming

20

Example

Fork one thread for each processor

Each thread executes the loop:

do I = my_lo, my_hi

A(I) = B(I) + C(I)

D(I) = A(I)

end do

Wait for all threads to finish before proceeding.

Page 21: Parallel Programming

21

Another Example

do I = 1, 1000

A(I) = B(I) + C(I)

D(I) = A(I+1)

end do

Page 22: Parallel Programming

22

Yet Another Example

do I = 1, 1000

A( X(I) ) = B(I) + C(I)

D(I) = A( X(I) )

end do

Page 23: Parallel Programming

23

Parallel Compilers

• Two concerns:

• Parallelizing code– Compiler will move code around to uncover

parallel operations

• Data locality– If a parallel operation has to get data from

another processor’s memory, that’s bad

Page 24: Parallel Programming

24

Distributed computing

• Take a big task that has natural parallelism• Split it up to may different computers across a

network

• Examples: SETI@Home, prime number searches, Google Compute, etc.

• Distributed computing is a form of parallel computing