Top Banner
1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology
44

1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

1/44

MPI Programming

Hamid Reza Tajozzakerin

Sharif University of technology

Page 2: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

2/44

Introduction

Massage-Passing interface (MPI) A library of functions and macros Objectives: define an international long-term

standard API for portable parallel applications and get all hardware vendors involved in implementations of this standard; define a target system for parallelizing compilers

Can be used in C,C++,FORTRAN The MPI Forum (http://www.mpi-forum.org/) brings

together all contributing parties

Page 3: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

3/44

The User’s View

CommunicationSystem(MPI)

Processor

Process

Process

Process

Processor

Process

Process

Process

Processor

Process

Processor

Process

Page 4: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

4/44

Programming with MPI

Include the lib file mpi.h (or however called) into the source code

Initialize the MPI environment: MPI_Init (&argc, &argv)

Must be called and only once before any other MPI functions

At the end of the program:

MPI_Finalize( ); Cleans up any unfinished business left by MPI

General MPI Programs

Page 5: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

5/44

Programming with MPI (cont.)

Get your own process ID (rank): MPI_Comm_rank (MPI_Comm comm, int rank) First argument is a communicator

Communicator: a collection of processes send message to each other

Get the number of processes (including oneself): MPI_Comm_size (MPI_comm comm, int size)

Size: number of processes in comm

Page 6: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

6/44

What is message? Message: Data + Envelope Envelope:

Additional information to message be communicated successfully

Envelop contains: Rank of sender (who send the message)

Can be a wildcard: MPI_ANY_SOURCE Rank of receiver (who received the message)

No wildcard for dest

A tag: used to distinguish messages received from a single process Can be a wildcard: MPI_ANY_TAG

Communicator

Page 7: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

7/44

Point-to-Point Communication

a send command can be Blocking: continuation possible after passing to

communication system has been completed (buffer can be re-used)

non-blocking: immediate continuation possible (check buffer whether message has been sent and buffer can be re-used)

Page 8: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

8/44

Point-to-Point Communication(Cont.) Four types of point-to-point send operations, each of

them available in a blocking and a non-blocking variant Standard (regular) send: MPI_SEND or MPI_ISEND

Asynchronous; the system decides whether or not to buffer messages to be sent

Successful completion may depend on matching receive Buffered send: MPI_BSEND or MPI_IBSEND

Asynchronous, but buffering of messages to be sent by the system is enforced

Synchronous send: MPI_SSEND or MPI_ISSEND Synchronous, i.e. the send operation is not completed before the

receiver has started to receive the message

Page 9: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

9/44

Point-to-Point Communication(Cont.)

Ready send: MPI_RSEND or MPI_IRSEND Send may started only if matching receive has been posted:

if no corresponding receive operation is available, the result is undefined

Could be replaced by standard send with no effect other than performance

Meaning of blocking or non-blocking (variants with ‘I’):

Blocking: send operation is not completed before the send buffer can be reused

Non-blocking: immediate continuation, and the user has to make sure that the buffer won’t be corrupted

Page 10: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

10/44

Point-to-Point Communication(cont.) one receive function:

Blocking MPI_Recv :Receive operation is completed when the message

has been completely written into the receive buffer Non-blocking MPI_IRecv :

Continuation immediately after the receiving has begun

Can be combined with four send modes

Page 11: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

11/44

Point-to-Point Communication(Cont.) Syntax:

MPI_SEND(buf, count, datatype, dest, tag, comm) MPI_RECV(buf, count, datatype, source, tag, comm, status)

where Void *buf pointer to the buffer’s begin int count number of data objects int source process ID of the sending process int dest process ID of the destination process int tag ID of the message MPI_Datatype data type of the data objects MPI_Comm comm communicator (see later) MPI_Status *status object containing message information

In the non-blocking versions, there’s one additional argument complete (request) for checking the completion of the communication.

Page 12: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

12/44

Test Message Arrived MPI_Buffer_attach(...):

lets MPI provide a buffer MPI_Probe(...)/ MPI_Iprobe(...):

Blocking/ non-blocking test whether a message has arrived without actually receive them

MPI_Test(...): checks whether a send or receive operation is completed

MPI_Wait(...): causes the process to wait until a send or receive

operation has been completed MPI_Get_count(...):

provides the length of a message received

Page 13: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

13/44

Data Types Standard MPI data types:

MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_UNSIGNED MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE(8-binary digit) MPI_PACKED

Page 14: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

14/44

Grouping Data

Why? The fewer messages sent, better overall

performance Three mechanisms:

Count Parameter: group data having the same basic type as an array

Derived Types Pack/Unpack

Page 15: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

15/44

Building Derived Types

Specify types of members of the derived type Number of elements of each type Calculate addresses of members Calculate displacements: Relative location Create the derived type

MPI_Type_Struct(…) Commit it

MPI_Type_commit(…)

Page 16: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

16/44

Other Derived Data type constructors

MPI_Type_contiguous(...): Constructs an array consisting of count elements of type old type

belong to contiguous memory MPI_Type_vector(...):

constructs an MPI array with element-to-element distance stride

MPI_Type_ indexed(...): constructs an MPI array with different block lenghts

Page 17: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

17/44

Packing and Unpacking Elements of a complex data structure can be packed,

sent, and unpacked again element by element: expensive and error-prone

Pack: store noncontiguous data in contiguous memory location

Unpack: copy data from a contiguous buffer into noncontiguous memory locations

MPI functions for explicit packing and unpacking: MPI_Pack(...):

Packs data into a buffer MPI_Unpack(...):

unpacks data from the buffer

Page 18: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

18/44

Collective Communication

Why? Many applications require not only a point-to-point

communication, but also collective communication operations

Collective communication: Broadcast Gather Scatter All-to-All Reduce

Page 19: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

19/44

Broadcast

P0

P1

P2

P3

P0

P1

P2

P3

Send Buffers Receive Buffers

Page 20: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

20/44

Gather

P0

P1

P2

P3

P0

P1

P2

P3

Send Buffers Receive Buffers

Page 21: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

21/44

Scatter

P0

P1

P2

P3

P1

Send Buffers Receive Buffers

P0

P2

P3

Page 22: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

22/44

All to All

Send Buffers Receive Buffers

A B C D C

B

A

D

Page 23: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

23/44

Reduce

P0

P1

P2

P3

P0

P1

P2

P3

Send Buffers Receive Buffers

ReductionOperation

Page 24: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

24/44

All Reduce

Send Buffers Receive Buffers

P0

P1

P2

P3

P0

P1

P2

P3

ReductionOperation

Page 25: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

25/44

Collective Communication (Cont.) Important application scenario:

distribute the elements of vectors or matrices among several processors

Some functions offered by MPI MPI_Barrier(...):

synchronization barrier: process waits for the other group members; when all of them have reached the barrier, they can continue

MPI_Bcast(...):sends the data to all members of the group given by a

communicator (hence more a multicast than a broadcast)

MPI_Gather(...):collects data from the group members

Page 26: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

26/44

Collective Communication(Cont.)

MPI_Allgather(...):gather-to-all: data are collected from all processes, and all get

the collection MPI_Scatter(...):

classical scatter operation: distribution of data among processes

MPI_Reduce(...):executes a reduce operation

MPI_Allreduce(...):executes a reduce operation where all processes get its result

MPI_Op_create(...) and MPI_Op_free(...):defines a new reduce operation or removes it, respectively

Note that all of the functions above are with respect to a communicator (hence not necessarily a global communication)

Page 27: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

27/44

Process Groups and Communicators

Messages are tagged for identification – message tag is message ID!

Again: process groups for restricted message exchange and restricted collective communication Process groups are ordered sets of processes Each process is locally uniquely identified via its local

(group-related) process ID or rank Ordering starts with zero, successive numbering Global identification of a process via the pair (process

group, rank)

Page 28: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

28/44

Process Groups and Communicators

MPI communicators: concept for working with contexts Communicator = process group + message context MPI offers intra-communicators for collective

communication within a process group and inter-communicators for (point-to-point) communication between two process groups

Default (including all processes): MPI_COMM_WORLD

MPI provides a lot of functions for working with process groups and communicators

Page 29: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

29/44

Working with communicator

To create new communicator Make a list of the processes in new communicator Get a group of processor in the list

MPI_Comm_Group(…) Create new group

MPI_Group_incl(…) Create actual communicator

MPI_Comm_create(…) Note: To create several communicator simultaneously

MPI_Comm_split(…)

Page 30: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

30/44

Process Topologies

Provide a convenient naming mechanism for processes of a group

Assist the runtime system in mapping onto hardware

Only for intra-communicator virtual topology:

Set of process represented by a graph Most common topologies: mesh ,tori

Page 31: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

31/44

Some useful functions

MPI_Comm_rank(…) Indicates rank of the process call it

MPI_Comm_size Returns size of the group

MPI_Comm_dup(..) Cerates a new communicator with the same

attributes of input communicator MPI_Comm_free(MPI_Comm *comm)

set the handle to MPI_COMM_NULL

Page 32: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

32/44

An example of Cartesian graphUpper number is ranklower pair is (row,col) coordinates

Page 33: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

33/44

Cartesian Topology Functions MPI_Cart_create(…)

Returns a handle to a new communicator to which the Cartesian topology information is attached

MPI_Dimes_create(…) To select a balanced distribution of process

MPI_Cartdim_get(…) Returns numbers of dimensions

MPI_Cart_get(…) Returns information on topology

MPI_Cart_sub(…) Partition Cartesian topology into a Cartesian of lower dimension

MPI_Cart_coords(..), MPI_Cart_rank(…)

Page 34: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

34

DCT Parallelism

Page 35: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

35/44

Preliminary

DCT: Discrete Cosine Transform 2D DCT: applied a 1D DCT twice 2D-DCT Equation

X: N*N Matrix C: N*N matrix defined as:

Y contains DCT coefficients Main operation is matrix mult

XCCY T

0/1

/2]

2

)12([cos

nwhenN

otherwiseNnnmn kwhereN

nmkc

Page 36: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

36/44

FOX’s Algorithm

Multiply two square matrices Assume two matrices: A = (aij) and B = (bij) Matrices are from order n Assume number of processes are p: perfect

square so: p=q2

n_bar = n/q: an integer Each process has a block of A and B as a

matrices from order n/q

Page 37: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

37/44

FOX’s Algorithm (Cont.)

For example: p=9 and n=6

Page 38: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

38/44

FOX’s Algorithm (Cont.)

Page 39: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

39/44

FOX’s Algorithm (Cont.)

The chosen submatrix in the r’th row is Ar,u

where u= (r+step) mode q Example: at step=0 these multiplication done

r=0: A00B00,A00B01,A00B02

r=1:A11B10,A11B11,A11B12

r=1:A22B20,A22B21,A22B22

Other mults done in other steps Processes communicate to each other so the

mult of two matrices results

Page 40: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

40/44

Implementation of algorithm

Assume each row of processes as a communicator Assume each column of processes as a communicator MPI_Cart_sub(Com, var_coor, row_com); MPI_Cart_sub(grid->Com, var_coor,col_com)); Can use other functions: (more general communicator

cunstruction functions) MPI_Comm_incl(com,q,rank,row_comm) MPI_Comm_create(comm,row_com,&row_com)

Page 41: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

41/44

Implementation of MPI

An MPI implementation consists of a subroutine library with all MPI functions include files for the calling application program some startup script (usually called mpirun, but not standardized)

MPICH Support both operating systems: linux and Microsaft Windows

Other implementation of MPI: Many different MPI implementation are available i.e: LAM

Support MPI programming on networks of unix workstation See other implementation and their features:

http://www.lam-mpi.org/mpi/implementations/fulllist.php

Page 42: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

42/44

Implementation of MPI (Cont.)

IMPI: Interoperable MPI A protocol specification to allow multiple MPI

implementations to cooperate on a single MPI job.

Any correct MPI program will run correctly under IMPI

Divided into four parts: Startup/shutdown protocols Data transfer protocol Collective algorithm A centralized IMPI conformance testing methodology

Page 43: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

43/44

Extensions to MPI

External Interfaces One-sided Communication Dynamic Resource Management Extended Collective Bindings Real Time Some of these features are still subject to

change

Page 44: 1/44 MPI Programming Hamid Reza Tajozzakerin Sharif University of technology.

44/44

Question?