Distributed-Memory Programming Using MPIGAP
Vladimir JanjicInternational Workhsop “Parallel Programming in GAP”
Aug 2013
What is MPIGAP?
• A library for distributed-memory programming in GAP– Based on the ParGAP package by Gene Cooperman– Uses MPI communication library for communication between
distributed nodes– Each node itself can be a multicore processor => shared-
memory operations within a node are supported– Easiest to use in batch/SIMD way
• Supports distributed implementation of many shared-memory parallel primitives – RunTask, TaskResult, WaitTask, …
• Also supports explicit copying/sharing of objects between distributed nodes and explicit task placement– RemoteCopyObj, RemotePushObj, SendTask, …
• In the final version, it will support implicitly-distributed data structures and skeletons
What is this MPI thing?
• Message Passing Interface– Standardised and portable message-passing protocol
• Contains operations for sending and receiving binary messages between nodes in distributed system
• Supports point-to-point and collective operations, synchronisation primitives (barriers)…
• Bindings exist for C, C++, Fortran, Python…
• Two best known C implementations are MPICH and OpenMPI
MPIGAP Architecture
Object Marshalling
Low-Level MPI Bindings
Global Object Pointers
Shared Objects
Distributed Tasks
Implicitly Distributed Data
Skeletons
User Application
MPIGAP Architecture
Object Marshalling
Low-Level MPI Bindings
Global Object Pointers
Shared Objects
Distributed Tasks
Implicitly Distributed Data
Skeletons
Object Marshalling
• Since we work with GAP objects, and MPI can only send binary data, we need a method for converting GAP objects into their binary (string) representation
• Object Marshalling• Two methods for marshalling currently supported:
– Object serialisation (SerializeToNativeString, DeserializeNativeString) -- default method
– IO Pickling (IO_Pickle, IO_Unpickle) -- requires IO package
• Object Serialisation : Faster, less general, not architecture independent
• IO Pickling : Slower, more general, architecture independent
What method of object marshalling should I use?
• Most of the time, no need to worry about this
• By default, object serialisation is used– To use IO Pickling, set MPIGAP_MARSHALLING variable to
“Pickle” in your init file
• If you can, use object serialisation
• If– you need to transfer “unusual” objects between nodes (the ones
that do not have serialisation primitive installed)– you are working on a platform where not all nodes have the same
architecture
then use IO Pickling
MPIGAP Architecture
Object Serialisation
Low-Level MPI Bindings
Global Object Pointers
Shared Objects
Distributed Tasks
Implicitly Distributed Data
Skeletons
Low-level MPI Bindings
• Borrowed from ParGAP package• Simplified GAP bindings for a small subset of MPI operations
– MPI_Send(msg, dest), MPI_Binsend(msg,size, dest)
– MPI_Recv(msg, [source, [tag]]), MPI_ProbeRecv(msg)
– MPI_Probe(), MPI_Iprobe()– MPI_comm_rank(), MPI_Get_size(), MPI_Get_source()
• All of these bindings work with string• You need to explicitly marshall your objects into strings using
some marshalling method– For IO pickling, use MPI_Send– For object serialization, use MPI_Binsend
• Not recommended to use, too low-level– Use if your application cannot be written using task/shared object
abstractions
Some Useful (Readonly) Variables
• processId– rank (id) of a node in a distributed system
• commSize– number of nodes in a system
MPIGAP Architecture
Object Serialisation
Low-Level MPI Bindings
Global Object Pointers
Shared Objects
Distributed Tasks
Implicitly Distributed Data
Skeletons
Global Object Pointers (Handles)
• Global object handle is a GAP object that represents a global pointer to an object
• Handles can be copied to multiple distributed nodes– Can be used to access the same object on different nodes
• Handles (and the underlying shared objects) managed in semi-automatic way– Reference counting– Manual creation and destruction
Creation of Global Object Handles
• CreateHandleFromObj (obj, accessType)– Creates a handle for object obj (with access type accessType) on the node where it is called
• Access types limit the operations that can be used on underying objects:– ACCESS_TYPE.READ_ONLY– ACCESS_TYPE.READ_WRITE– ACCESS_TYPE.VOLATILE
• Internally, handles are identified by a combination of node id and local id (unique on a node)
Opening and Closing of Handles
• Before you do anything with a handle, you need to open it– Each thread that works with the handle needs to open it
separately– Open(handle)
• After a thread finishes working with a handle, it needs to close it– Close(handle)
Distribution of Global Object Handles
• SendHandle (handle, node)– Sends handle to the distributed node node– SendHandle (h, 1);
• SendAndAssignHandle (handle, node, name)– Sends handle to the distributed node node, creates a
global variable with name name there and assigns the handle to it
– SendAndAssignHandle (handle, node, “h”);
Accessing underlying objects
• GetHandleObj (handle)– Returns an underlying object of handle– This does not copy the object to the node where it is called
• SetHandleObj (handle, obj)– Sets an underying object of handle to obj– Only if handle is not read-only
• SetHandleObjList (handle, index, obj)– If an underlying object of handle is a list, puts obj in the
place index in that list– Only if handle is not read-only
• CAUTION: Wrapping an object in a handle automatically shares that object (and a handle), therefore a lock needs to be obtained to use it
Global Object Handles Example
Node 0 Node 1
[1,2,3]x
Global Object Handles Example (2)
Node 0 Node 1
42x
h := CreateHandleObj(x, ACCESS_TYPES.READ_WRITE)On node 0:
h
[1,2,3]
Global Object Handles Example (3)
Node 0 Node 1
42x
SendAndAssignHandle(h, 1, “h”);On node 0:
h h
[1,2,3]
Global Object Handles Example (4)
Node 0 Node 1
21x
SetByHandleList(h, 2, 4);On node 1:
h h
[1,4,3]
MPIGAP Architecture
Object Serialisation
Low-Level MPI Bindings
Global Object Pointers
Shared Objects
Distributed Tasks
Implicitly Distributed Data
Skeletons
Operations on Shared Objects
• Global object handles enable user to have pointers on different distributed nodes that point to the same object
• They do not allow user to transfer (copy/move) object between nodes
• That is where operations on shared objects come into play
• They use global object handles and enable user to copy, push, clone and pull the objects pointed to by handles between nodes
Copying of Shared Objects
• Allowed for read-only and volatile handles• RemoteCopyObj(handle, dest)
– Copies the object that handle points to to the dest node
• RemoteCloneObj(handle)– Copies the object that handle points to from the node that
owns the object– RemoteCloneObj(handle)called on dest node has the
same effect as calling RemoteCopyObj(handle, dest) on the node that owns the object pointed to by handle
• If handle does not exist on the destination node, RemoteCopyObj will also copy it there
Pushing of Shared Objects
• Allowed for all types of handles• RemotePushObj(handle, dest)
– Pushes the object that handle points to to the dest node– dest node becomes the owner of the object
• RemotePullObj(handle)– Pulls the object that handle points to from the node that
owns the object– The node on which this is called becomes the owner of the
object– RemotePullObj(handle)called on dest node has the
same effect as calling RemotePushObj(handle, dest) on the node that owns the object pointed to by handle
• As with copying, if handle does not exist on dest node, RemotePushObj copies it there
Shared Object Example (1.1)
Node 0 Node 1
Shared Objects Example (1.2)
Node 0 Node 1
h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE)On node 0:
h
[1,2,3]
Shared Objects Example (1.3)
Node 0 Node 1
h := RemotePushObj(h, 1)On node 0:
hh
[1,2,3]
Shared Objects Example (2.1)
Node 0 Node 1
h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE)On node 0:
h
[1,2,3]
Shared Objects Example (2.2)
Node 0 Node 1
h := RemoteCopyObj(h,1); -- error, read-write handle!
On node 0:
h
[1,2,3]
Shared Objects Example (3.1)
Node 0 Node 1
h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_ONLY)On node 0:
h
[1,2,3]
Shared Objects Example (3.2)
Node 0 Node 1
h := RemoteCopyObj(h, 1)On node 0:
h
[1,2,3] [1,2,3]
Shared Objects Example (4.1)
Node 0 Node 1
h := CreateHandleObj([1,2,3], ACCESS_TYPES.READ_WRITE)On node 0:
h
[1,2,3]
Shared Objects Example (4.2)
Node 0 Node 1
SendAndAssignHandle(h,1,”h”);On node 0:
h
[1,2,3]
h
Shared Objects Example (4.2)
Node 0 Node 1
RemotePullObj(h);On node 1:
h
[1,2,3]
h
MPIGAP Architecture
Object Serialisation
Low-Level MPI Bindings
Global Object Pointers
Shared Objects
Distributed Tasks
Implicitly Distributed Data
Skeletons
Explicit Task Placement
• MPIGAP supports explicit task placement on nodes
• CreateTask ([taskArgs]) creates a task (but does not execute it)– taskArgs is a list where the first element is task function
name, and the rest of the elements are task arguments
• SendTask(t,dest) sends the task t to the destination node dest
• SendTask creates a handle for the task result and returns this handle
• Task result can be obtained using TaskResult(t), or it can be fetched back to the node that called SendTask
Explicit Task Placement ExampleDeclareGlobalFunction (“f”);InstallGlobalFunction (f, function(handle, num) local l, res; res := []; l := GetHandleObj(handle); atomic readonly l do
res := List(l, x -> x + num); od; return res;end);
If processId = 0 then h := CreateHandleFromObj([1,2,3,4,5]); t := CreateTask ([“f”,h,1]); SendTask(t, 1); result := TaskResult(t);fi;
Explicit Task Placement ExampleDeclareGlobalFunction (“f”);InstallGlobalFunction (f, function(handle, num); local l, res; res := []; l := GetHandleObj(handle); atomic readonly l do
res := List (l, x -> x + num); od; return res;end);
If processId = 0 then h := CreateHandleFromObj([1,2,3,4,5]); RemotePushObj (h, 1); t := CreateTask ([“f”,h,1]); SendTask(t, 1); result := TaskResult(t);fi;
Implicit Task Placement
• Some of the task primitives have distributed implementations that have almost the same API as their shared-memory counterparts– RunTask(f,arg1,arg2,…,argN)– TaskResult(t)– WaitTask(t)
• Tasks are “magically” distributed to the nodes
• One minor difference in API:– If object serialisation is used as a marshalling method, first
argument of RunTask needs to be function name (rather than function object)
– Functions that are used for tasks need to be global (need to be implemented using DeclareGlobalFunction and InstallGlobalFunction)
Implicit Task Placement -- Task Distribution
• Task distribution over distributed nodes is done using work-stealing (more about it in a minute)
– Work-stealing needs to be enabled using StartStealing() function
– It can be turn off using StopStealing() function
Implicit Task Placement -- Details (1)
…
Node 0
Message Manager
Task Manager
Worker 1
Worker 2
Worker n
…
Task queue
Implicit Task Placement -- Details (2)
…
Node 0
Message Manager
Task Manager
Worker 1
Worker 2
Worker n
…
Task queue
…
Message Manager
Task Manager
Worker 1
Worker 2
Worker m
…
Task queue
Node 1
Implicit Task Placement -- Details (3)
…
Node 0
Message Manager
Task Manager
Worker 1
Worker 2
Worker n
…
Task queue
…
Message Manager
Task Manager
Worker 1
Worker 2
Worker m
…
Task queue
Node 1
STEAL_MSG
Implicit Task Placement -- Details (4)
…
Node 0
Message Manager
Task Manager
Worker 1
Worker 2
Worker n
…
Task queue
…
Message Manager
Task Manager
Worker 1
Worker 2
Worker m
…
Task queue
Node 1
Implicit Task Placement -- Details (5)
…
Node 0
Message Manager
Task Manager
Worker 1
Worker 2
Worker n
…
Task queue
…
Message Manager
Task Manager
Worker 1
Worker 2
Worker m
…
Task queue
Node 1
Implicit Task Placement -- Details (6)
…
Node 0
Message Manager
Task Manager
Worker 1
Worker 2
Worker n
…
Task queue
…
Message Manager
Task Manager
Worker 1
Worker 2
Worker m
…
Task queue
Node 1
STEAL_MSG
Implicit Task Placement Example -- Bad Fibonacci
DeclareGlobalFunction(“badFib”);InstallGlobalFunction(badFib, function(n) if n < 3 then return 1; else t1 := RunTask(“badFib”, n-1); t2 := RunTask(“badFib”, n-2); return TaskResult(t1) + TaskResult(t2); fi;end);
if processId = 0 then res := badFib(20); Print (res,”\n”);fi;
Summary
• MPIGAP supports distributed-memory computing, with multiple threads within the same distributed node
• Supports sharing objects on multiple distributed nodes, and explicit moving of objects between nodes
• Supports task management in distributed world, where each node has multiple worker threads– Implicit task placement using RunTask– Explicit task placement using SendTask
• Still to come: implicitly distributed data structures and skeletons (ParList, ParDivideConquer, ParMasterWorker etc.)