Top Banner
UNIVERSITY OF MASSACHUSETTS AMHERST Department of Computer Science Operating Systems CMPSCI 377 Distributed Parallel Programming Emery Berger University of Massachusetts Amherst
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Operating SystemsCMPSCI 377

Distributed Parallel ProgrammingEmery Berger

University of Massachusetts Amherst

Page 2: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2

Outline

Previously:

Programming with threads

Shared memory, single machine

Today:

Distributed parallel programming

Message passing

some material adapted from slides by Kathy Yelick, UC Berkeley

Page 3: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 3

Why Distribute?

SMP (symmetric multiprocessor): easy to program but limited Bus becomes

bottleneck when processors not operating locally

Typically < 32 processors

$$$

P1

network/bus

$

memory

P2

$

Pn

$

Page 4: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 4

Distributed Memory Vastly different platforms

Networks of workstations

Supercomputers

Clusters

Page 5: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 5

Distributed Architectures

Distributed memory machines:local memory but no global memory

Individual nodes often SMPs

Network interface for all interprocessorcommunication – message passing

interconnect

P0

memory

NI

. . .

P1

memory

NI Pn

memory

NI

Page 6: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 6

Message Passing Program: # independent communicating processes

Thread + local address space only

Shared data: partitioned

Communicate by send & receive events

Cluster = message sent over sockets

PnP1P0

y = ..s ...

s: 12

i: 2

s: 14

i: 3

s: 11

i: 1

send P1,s

Network

receive Pn,s

Page 7: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 7

Message Passing

Pros: efficient Makes data sharing explicit

Can communicate only what is strictly necessary for computation No coherence protocols, etc.

Cons: difficult Requires manual partitioning

Divide up problem across processors

Unnatural model (for some)

Deadlock-prone (hurray)

Page 8: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 8

Message Passing Interface

Library approach to message-passing

Supports most common architectural abstractions Vendors supply optimized versions

⇒ programs run on different machine, but with (somewhat) different performance

Bindings for popular languages Especially Fortran, C

Also C++, Java

Page 9: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 9

MPI execution model

Spawns multiple copies of same program (SPMD = single program, multiple data)

Each one is different “process”(different local memory)

Can act differently by determining which processor “self” corresponds to

Page 10: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 10

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

Page 11: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 11

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

initializes MPI (passes

arguments in)

Page 12: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 12

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

returns # of processors in

“world”

Page 13: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 13

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

which processor am I?

Page 14: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 14

An Example

% mpirun –np 10 exampleProgram

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, size;MPI_Init(&argc, &argv );MPI_Comm_size(MPI_COMM_WORLD, &size);MPI_Comm_rank(MPI_COMM_WORLD, &rank);printf("Hello world from process %d of %d\n",

rank, size);MPI_Finalize();return 0;

}

we’re done sending

messages

Page 15: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15

An Example% mpirun –np 10 exampleProgramHello world from process 5 of 10Hello world from process 3 of 10Hello world from process 9 of 10Hello world from process 0 of 10Hello world from process 2 of 10Hello world from process 4 of 10Hello world from process 1 of 10Hello world from process 6 of 10Hello world from process 8 of 10Hello world from process 7 of 10% // what happened?

Page 16: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16

Message Passing

Messages can be sent directly to another processor

MPI_Send, MPI_Recv

Or to all processors

MPI_Bcast (does send or receive)

Page 17: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17

Send/Recv Example

Send data from process 0 to all

“Pass it along” communication

Operations: MPI_Send (data *, count, MPI_INT, dest, 0,

MPI_COMM_WORLD );

MPI_Recv (data *, count, MPI_INT, source, 0, MPI_COMM_WORLD );

Page 18: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18

Send & Receive

Send integer input in a ring

int main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

Page 19: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19

Send & Receive

Send integer input in a ring

int main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

send destination?

Page 20: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20

Send & Receive

Send integer input in a ring

int main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

receive from?

Page 21: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21

Send & Receiveint main(int argc, char * argv[]) {int rank, value, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);do {

if (rank == 0) {scanf( "%d", &value );MPI_Send(&value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );} else {

MPI_Recv(&value, 1, MPI_INT, rank - 1,0, MPI_COMM_WORLD, &status );

if (rank < size - 1)MPI_Send( &value, 1, MPI_INT, rank + 1,

0, MPI_COMM_WORLD );}printf("Process %d got %d\n", rank, value);

} while (value >= 0);MPI_Finalize();return 0;

}

message tag

message tag

message tag

Page 22: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Exercise

Compute expensiveComputation(i) on n processors; process 0 computes & prints sum

22

// MPI_Send (&value, 1, MPI_INT, dest, 0, MPI_COMM_WORLD );int main(int argc, char * argv[]) {int rank, size;MPI_Status status;MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &rank);MPI_Comm_size(MPI_COMM_WORLD, &size);if (rank == 0) {

int sum = 0;

printf(“sum = %d\n", sum);} else {

}MPI_Finalize(); return 0;

}

Page 23: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science

Broadcast

Send and receive: point-to-point

Can also broadcast data

Source sends to everyone else

23

Page 24: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 24

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

Page 25: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 25

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

send or receive value

Page 26: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 26

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

how many to send/receive?

Page 27: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 27

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

what’s the datatype?

Page 28: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 28

Broadcast

Repeatedly broadcast input (one integer) to all

#include <stdio.h>#include <mpi.h>

int main(int argc, char * argv[]) {int rank, value;MPI_Init( &argc, &argv );MPI_Comm_rank( MPI_COMM_WORLD, &rank );do {if (rank == 0)

scanf( "%d", &value );MPI_Bcast( &value, 1, MPI_INT, 0, MPI_COMM_WORLD);printf( "Process %d got %d\n", rank, value );

} while (value >= 0);MPI_Finalize( );return 0;

}

who’s “root” for broadcast?

Page 29: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 29

Communication Flavors

Basic communication

blocking = wait until done

point-to-point = from me to you

broadcast = from me to everyone

Non-blocking

Think create & join, fork & wait…

MPI_ISend, MPI_IRecv

MPI_Wait, MPI_Waitall, MPI_Test

Collective

Page 30: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 30

The End

Page 31: Operating Systems - Distributed Parallel Computing

UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 31

From Pat Worley, ORNL

Scaling Limits Kernel used in

atmospheric models

99% floating point ops; multiplies/adds

Sweeps through memory with little reuse

One “copy” of code running independently on varying numbers of procs