Top Banner
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005
24

Parallel Programming

Jan 15, 2016

Download

Documents

oriole

Parallel Programming. Aaron Bloomfield CS 415 Fall 2005. Why Parallel Programming?. Predict weather Predict spread of SARS Predict path of hurricanes Predict oil slick propagation Model growth of bio-plankton/fisheries Structural simulations Predict path of forest fires - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Programming

1

Parallel Programming

Aaron Bloomfield

CS 415

Fall 2005

Page 2: Parallel Programming

2

Why Parallel Programming?

• Predict weather• Predict spread of SARS• Predict path of hurricanes• Predict oil slick propagation• Model growth of bio-plankton/fisheries• Structural simulations• Predict path of forest fires• Model formation of galaxies• Simulate nuclear explosions

Page 3: Parallel Programming

3

Code that can be parallelized

do i= 1 to max,

a[i] = b[i] + c[i] * d[i]

end do

Page 4: Parallel Programming

4

Parallel Computers

• Programming mode types– Shared memory– Message passing

Page 5: Parallel Programming

5

Distributed Memory Architecture

• Each Processor has direct access only to its local memory• Processors are connected via high-speed interconnect• Data structures must be distributed• Data exchange is done via explicit processor-to-processor

communication: send/receive messages• Programming Models

– Widely used standard: MPI– Others: PVM, Express, P4, Chameleon, PARMACS, ...

P0Communication Interconnect

...

Memory

Memory

Memory

P0 P1 Pn

Page 6: Parallel Programming

6

Message Passing Interface

MPI provides:• Point-to-point communication• Collective operations

– Barrier synchronization– gather/scatter operations– Broadcast, reductions

• Different communication modes– Synchronous/asynchronous– Blocking/non-blocking– Buffered/unbuffered

• Predefined and derived datatypes• Virtual topologies• Parallel I/O (MPI 2)• C/C++ and Fortran bindings

• http://www.mpi-forum.org

Page 7: Parallel Programming

7

Shared Memory Architecture• Processors have direct access to global memory and I/O

through bus or fast switching network• Cache Coherency Protocol guarantees consistency

of memory and I/O accesses• Each processor also has its own memory (cache)• Data structures are shared in global address space• Concurrent access to shared memory must be coordinated• Programming Models

– Multithreading (Thread Libraries)– OpenMP P0

CacheP0

CacheP1

CachePn

Cache

Global Shared Memory

Shared Bus

...

Page 8: Parallel Programming

8

OpenMP

• OpenMP: portable shared memory parallelism• Higher-level API for writing portable multithreaded

applications• Provides a set of compiler directives and library routines

for parallel application programmers• API bindings for Fortran, C, and C++

http://www.OpenMP.org

Page 9: Parallel Programming

9

Page 10: Parallel Programming

10

Approaches

• Parallel Algorithms

• Parallel Language

• Message passing (low-level)

• Parallelizing compilers

Page 11: Parallel Programming

11

Parallel Languages

• CSP - Hoare’s notation for parallelism as a network of sequential processes exchanging messages.

• Occam - Real language based on CSP. Used for the transputer, in Europe.

Page 12: Parallel Programming

12

Fortran for parallelism

• Fortran 90 - Array language. Triplet notation for array sections. Operations and intrinsic functions possible on array sections.

• High Performance Fortran (HPF) - Similar to Fortran 90, but includes data layout specifications to help the compiler generate efficient code.

Page 13: Parallel Programming

13

More parallel languages

• ZPL - array-based language at UW. Compiles into C code (highly portable).

• C* - C extended for parallelism

Page 14: Parallel Programming

14

Object-Oriented

• Concurrent Smalltalk

• Threads in Java, Ada, thread libraries for use in C/C++– This uses a library of parallel routines

Page 15: Parallel Programming

15

Functional

• NESL, Multiplisp

• Id & Sisal (more dataflow)

Page 16: Parallel Programming

16

Parallelizing Compilers

Automatically transform a sequential program into a parallel program.

1. Identify loops whose iterations can be executed in parallel.

2. Often done in stages.

Q: Which loops can be run in parallel?

Q: How should we distribute the work/data?

Page 17: Parallel Programming

17

Data Dependences

Flow dependence - RAW. Read-After-Write. A "true" dependence. Read a value after it has been written into a variable.

Anti-dependence - WAR. Write-After-Read. Write a new value into a variable after the old value has been read.

Output dependence - WAW. Write-After-Write. Write a new value into a variable and then later on write another value into the same variable.

Page 18: Parallel Programming

18

Example

1: A = 90;

2: B = A;

3: C = A + D

4: A = 5;

Page 19: Parallel Programming

19

Dependencies

A parallelizing compiler must identify loops that do not have dependences BETWEEN ITERATIONS of the loop.

Example:

do I = 1, 1000 A(I) = B(I) + C(I) D(I) = A(I)end do

Page 20: Parallel Programming

20

Example

Fork one thread for each processor

Each thread executes the loop:

do I = my_lo, my_hi

A(I) = B(I) + C(I)

D(I) = A(I)

end do

Wait for all threads to finish before proceeding.

Page 21: Parallel Programming

21

Another Example

do I = 1, 1000

A(I) = B(I) + C(I)

D(I) = A(I+1)

end do

Page 22: Parallel Programming

22

Yet Another Example

do I = 1, 1000

A( X(I) ) = B(I) + C(I)

D(I) = A( X(I) )

end do

Page 23: Parallel Programming

23

Parallel Compilers

• Two concerns:

• Parallelizing code– Compiler will move code around to uncover

parallel operations

• Data locality– If a parallel operation has to get data from

another processor’s memory, that’s bad

Page 24: Parallel Programming

24

Distributed computing

• Take a big task that has natural parallelism• Split it up to may different computers across a

network

• Examples: SETI@Home, prime number searches, Google Compute, etc.

• Distributed computing is a form of parallel computing