Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Parallel Programming with MPI

Prof. Sivarama Dandamudi

School of Computer Science

Carleton University

Carleton University © S. Dandamudi 2

Introduction Problem

Lack of a standard for message–passing routines Big issue: Portability problems

MPI defines a core set of library routines (API) for message passing (more than 125 functions in total!!) Several commercial and public domain implementations

Cray, IBM, Intel MPI-CH from Argonne National Laboratory LAM from Ohio Supercomputer Center/Indiana University


Introduction (cont’d)

Some Additional Goals [Snir et al. 1996]

Allows efficient communicationAvoids memory-to-memory copyingAllows computation and communication overlap

Non-blocking communication

Allows implementation in a heterogeneous environment

Provides reliable communication interfaceUsers don’t have to worry about communication failures


MPI MPI is large but not complex

125 functionsBut….

Need only 6 functions to write a simple MPI program MPI_Init MPI_Finalize MPI_Comm_size MPI_Comm_rank MPI_Send Mpi_Recv


MPI (cont’d)

Before any other function is called, we must initialize

MPI_Init(&argc, &argc) To indicate end of MPI calls

MPI_Finalize()Cleans up the MPI stateShould be the last MPI function call


MPI (cont’d)

A typical program structureint main(int argc, char **argv) { MPI_Init(&argc, &argv); . . . /* main program */ . . . MPI_Finalize();}


MPI (cont’d)

MPI uses communicators to group processes that communicate with each other

Predefined communicator: MPI_COMM_WORLD

consists of all processes running when the program begins execution

Sufficient for simple programs


MPI (cont’d)

Process rankSimilar to mytid in PVM

MPI_Comm_rank(MPI_Comm comm,

int *rank)First argument: CommunicatorSecond argument: returns process rank


MPI (cont’d)

Number of processesMPI_Comm_size(MPI_Comm comm, int *size)

First argument: CommunicatorSecond argument: returns number of processesExample:

MPI_Comm_size(MPI_COMM_WORLD, &nprocs)


MPI (cont’d)

Sending a message (blocking version)MPI_Send( void* buf, int count,

MPI_Datatype datatype,

int dest, int tag, MPI_Comm comm )

Data types

MPI_CHAR, MPI_INT, MPI_LONG MPI_FLOAT, MPI_DOUBLE

Buffer description

Destination specification


MPI (cont’d)

Receiving a message (blocking version)MPI_Recv( void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm,

MPI_Status *status )

Wildcard specification allowedMPI_ANY_SOURCEMPI_ANY_TAG


MPI (cont’d)

Receiving a messageStatus of the received message

Status gives two pieces of information directly Useful when wildcards are used

status.MPI_SOURCE Gives identity of the source

status.MPI_TAG Gives the tag information


MPI (cont’d)

Receiving a messageStatus also gives message size information indirectly

MPI_Get_count( MPI_Status *status,


int *count)

Takes status and the datatype as inputs and returns the number of elements via count


MPI (cont’d)

Non-blocking communicationPrefix send and recv by “I” (for immediate)

MPI_IsendMPI_Irecv

Need completion operations to see if the operation is completed

MPI_WaitMPI_Test


MPI (cont’d)

Sending a message (non-blocking version)

MPI_Isend( void* buf, int count,


int dest, int tag,

MPI_Comm comm,

MPI_Request *request )

Returns the request handle


MPI (cont’d)

Receiving a message (non-blocking version)

MPI_Irecv( void* buf, int count,


int source, int tag,

MPI_Comm comm,

MPI_Request *request )

Same arguments as Isend


MPI (cont’d)

How do we know when a non-blocking operation is done?

Use MPI_Test or MPI_Wait

Completion of a send indicates:Sender can access the send buffer

Completion of a receive indicatesReceive buffer contains the message


MPI (cont’d)

MPI_Test returns the statusDoes not wait for the operation to complete

MPI_Test( MPI_Request*request, int *flag,

MPI_Status *status )

Request handle

Operation status: true (if completed)

If flag = true, gives status


MPI (cont’d)

MPI_Wait waits until the operation is completed

MPI_Wait( MPI_Request *request, MPI_Status

*status )

Request handle

Gives status


MPI Collective Communication Collective communication

Several functions are provided to support collective communication

Some examples are given here:MPI_BarrierMPI_BcastMPI_ScatterMPI_GatherMPI_Reduce

Broadcast

Barrier synchronization

Global reduction


From Snir et al. 1996


MPI Collective Communication (cont’d)

MPI_Barrier blocks the caller until all group members have called it

MPI_Barrier( MPI_Comm comm ) The call returns at any process only after all group

members have entered the call



MPI_Bcast broadcasts a message from root to all processes of the group

MPI_Bcast( void* buf, int count,


int root,

MPI_Comm comm )



MPI_Scatter distributes data from the root process to all the others in the group

MPI_Scatter(void* send_buf, int send_count, MPI_Datatype send_type,

void* recv_buf, int

recv_count, MPI_Datatype recv_type,

int root, MPI_Comm comm )



MPI_Gather inverse of the scatter operation (gathers data and stores it in rank order)

MPI_Scatter(void* send_buf, int send_count, MPI_Datatype send_type,

void* recv_buf, int

recv_count, MPI_Datatype recv_type,




MPI_Reduce performs global reduction operations such as sum, max, min, AND, etc.

MPI_Reduce(void* send_buf,

void* recv_buf, int count, MPI_Datatype datatype,

MPI_Op operation,




Predefined reduce operations includeMPI_MAX maximumMPI_MIN minimumMPI_SUM sum MPI_PROD productMPI_LAND logical ANDMPI_BAND bitwise ANDMPI_LOR logical ORMPI_BOR bitwise ORMPI_LXOR logical XORMPI_BXOR bitwise XOR


FromSnir et al. 1996

Last slide

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Documents

mpi stateshould

long mpi

mpi contdmpi

comm comm data types

count mpi contdnon

datatype datatype

int tag

int count