Introduction to Parallel Programming at MCSR • Message Passing Computing – Processes coordinate and communicate results via calls to message passing library routines – Programmers “parallelize” algorithm and add message calls – At MCSR, this is via MPI programming with C or Fortran • Sweetgum – Origin 2800 Supercomputer • Mimosa – Beowulf Cluster with 219 Nodes • Shared Memory Computing – Processes or threads coordinate and communicate results via shared memory variables – Care must be taken not to modify the wrong memory areas – At MCSR, this is via OpenMP programming with C or Fortran on sweetgum
Introduction to Parallel Programming at MCSR. Message Passing Computing Processes coordinate and communicate results via calls to message passing library routines Programmers “parallelize” algorithm and add message calls At MCSR, this is via MPI programming with C or Fortran - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction to Parallel Programming at MCSR
• Message Passing Computing– Processes coordinate and communicate results via calls to message passing
library routines– Programmers “parallelize” algorithm and add message calls– At MCSR, this is via MPI programming with C or Fortran
• C MPI Exampleierror = MPI_Comm_rank(MPI_COMM_WORLD, &size);
Message Passing Basics
• Each process has a rank, or id number– 0, 1, 2, … n-1, where there are n processes
• Once a process knows the size, it also knows the ranks (id #’s) of those other processes, and can send or receive a message to/from any other process.
• Buf starting location of data• Count number of elements• Datatype MPI_Integer, MPI_Real, MPI_Character…• Destination rank of process to whom msg being sent• Source rank of sender from whom msg being received
or MPI_ANY_SOURCE
• Tag integer chosen by program to indicate type of messageor MPI_ANY_TAG
• Communicator id’s the process team, e.g., MPI_COMM_WORLD
• Status the result of the call (such as the # data items received)
Synchronous Message Passing
• Message calls may be blocking or nonblocking
• Blocking Send– Waits to return until the message has been received by the
destination process
– This synchronizes the sender with the receiver
• Nonblocking Send– Return is immediate, without regard for whether the message has
been transferred to the receiver
– DANGER: Sender must not change the variable containing the old message before the transfer is done.
– MPI_ISend() is nonblocking
Synchronous Message Passing
• Locally Blocking Send– The message is copied from the send parameter
variable to intermediate buffer in the calling process– Returns as soon as the local copy is complete– Does not wait for receiver to transfer the message from
the buffer– Does not synchronize– The sender’s message variable may safely be reused
immediately – MPI_Send() is locally blocking
Synchronous Message Passing
• Blocking Receive– The call waits until a message matching the given tag has been
received from the specified source process.– MPI_RECV() is blocking.
• Nonblocking Receive– If this process has a qualifying message waiting, retrieves that
message and returns– If no messages have been received yet, returns anyway– Used if the receiver has other work it can be doing while it waits– Status tells the receive whether the message was received– MPI_Irecv() is nonblocking– MPI_Wait() and MPI_Test() can be used to periodically check to see
if the message is ready, and finally wait for it, if desired
Collective Message Passing
• Broadcast– Sends a message from one to all processes in the group
• Scatter– Distributes each element of a data array to a different
process for computation
• Gather– The reverse of scatter…retrieves data elements into an
array from multiple processes
Collective Message Passing w/MPI
MPI_Bcast() Broadcast from root to all other processes
MPI_Gather() Gather values for group of processes
MPI_Scatter() Scatters buffer in parts to group of processes
MPI_Alltoall() Sends data from all processes to all processes
MPI_Reduce() Combine values on all processes to single val
MPI_Reduce_Scatter() Broadcast from root to all other processes
MPI_Bcast() Broadcast from root to all other processes
Message Passing Deadlock
• Deadlock can occur when all critical processes are waiting for messages that never come, or waiting for buffers to clear out so that their own messages can be sent
• Possible Causes– Program/algorithm errors
– Message and buffer sizes
• Solutions– Order operations more carefully
– Use nonblocking operations
– Add debugging output statements to your code to find the problem
Portable Batch System in SGI
• Sweetgum: – PBS Pro 5.1.4 is installed on sweetgum.
Queue Max # Processors Max # Running Memory Limit CPU Time Limit Special Validationper User Job Jobs per Queue per User Job per User Job Required