Top Banner
Advanced Hybrid MPI/OpenMP Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Parallelization Paradigms for Nested Loop Algorithms onto Nested Loop Algorithms onto Clusters of SMPs Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr
45

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

Dec 24, 2015

Download

Documents

Neal Page
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

Advanced Hybrid MPI/OpenMP Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Parallelization Paradigms for Nested Loop Algorithms onto Nested Loop Algorithms onto

Clusters of SMPsClusters of SMPs

Nikolaos Drosinos and Nectarios Koziris

National Technical University of Athens

Computing Systems Laboratory

{ndros,nkoziris}@cslab.ece.ntua.grwww.cslab.ece.ntua.gr

Page 2: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 2

OverviewOverview

Introduction Pure MPI Model Hybrid MPI-OpenMP Models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 3: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 3

IntroductionIntroduction

Motivation:

• SMP clusters• Hybrid programming models

Mostly fine-grain MPI-OpenMP paradigms Mostly DOALL parallelization

Page 4: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 4

IntroductionIntroduction

Contribution:

• 3 programming models for the parallelization of nested loops algorithms

• pure MPI• fine-grain hybrid MPI-OpenMP• coarse-grain hybrid MPI-OpenMP

• Advanced hyperplane scheduling• minimize synchronization need• overlap computation with communication

Page 5: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 5

IntroductionIntroduction

Algorithmic Model:

FOR j0 = min0 TO max0 DO

FOR jn-1 = minn-1 TO maxn-1 DO

Computation(j0,…,jn-1);

ENDFOR

ENDFOR

Perfectly nested loops Constant flow data dependencies

Page 6: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 6

IntroductionIntroduction

Target Architecture: SMP clusters

Page 7: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 7

OverviewOverview

Introduction Pure MPI Model Hybrid MPI-OpenMP Models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 8: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 8

Pure MPI ModelPure MPI Model

Tiling transformation groups iterations into atomic execution units (tiles) Pipelined execution Overlapping computation with communication Makes no distinction between inter-node and intra-node communication

Page 9: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 9

Pure MPI ModelPure MPI Model

Example:

FOR j1=0 TO 9 DO FOR j2=0 TO 7 DO A[j1,j2]:=A[j1-1,j2] + A[j1,j2-1]; ENDFORENDFOR

10

01D

Page 10: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 10

Pure MPI ModelPure MPI Model

j1

j2

CPU1

CPU0

CPU1

CPU0

NODE1

NODE0

4 MPI nodes

Page 11: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 11

Pure MPI ModelPure MPI Model

j1

j2

CPU1

CPU0

CPU1

CPU0

NODE1

NODE0

4 MPI nodes

Page 12: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 12

Pure MPI ModelPure MPI Model

tile0 = nod0;…tilen-2 = nodn-2;FOR tilen-1 = 0 TO DO

Pack(snd_buf, tilen-1 – 1, nod); MPI_Isend(snd_buf, dest(nod)); MPI_Irecv(recv_buf, src(nod)); Compute(tile); MPI_Waitall; Unpack(recv_buf, tilen-1 + 1, nod);END FOR

1

11 minmax

n

nn

x

Page 13: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 13

OverviewOverview

Introduction Pure MPI Model Hybrid MPI-OpenMP Models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 14: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 14

Hyperplane SchedulingHyperplane Scheduling

Implements coarse-grain parallelism assuming inter-tile data dependencies Tiles are organized into data-independent subsets (groups) Tiles of the same group can be concurrently executed by multiple threads Barrier synchronization between threads

Page 15: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 15

Hyperplane SchedulingHyperplane Scheduling

j1

j2

CPU1

CPU0

CPU1

CPU0

NODE1

NODE0

2 MPI nodes

x

2 OpenMP threads

Page 16: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 16

j1

j2

Hyperplane SchedulingHyperplane Scheduling

CPU1

CPU0

CPU1

CPU0

NODE1

NODE0

2 MPI nodes

x

2 OpenMP threads

Page 17: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 17

Hyperplane SchedulingHyperplane Scheduling#pragma omp parallel{ group0 = nod0; … groupn-2 = nodn-2; tile0 = nod0 * m0 + th0; … tilen-2 = nodn-2 * mn-2 + thn-2; FOR(groupn-1){ tilen-1 = groupn-1 - ;

if(0 <= tilen-1 <= ) compute(tile); #pragma omp barrier }}

tnn 11 minmax

2

0

n

iitile

Page 18: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 18

OverviewOverview

Introduction Pure MPI Model Hybrid MPI-OpenMP Models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 19: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 19

Fine-grain ModelFine-grain Model

Incremental parallelization of computationally intensive parts Relatively straightforward from pure MPI Threads (re)spawned at computation Inter-node communication outside of multi-threaded part Thread synchronization through implicit barrier of omp parallel directive

Page 20: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 20

Fine-grain ModelFine-grain Model

FOR(groupn-1){ Pack(snd_buf, tilen-1 – 1, nod); MPI_Isend(snd_buf, dest(nod)); MPI_Irecv(recv_buf, src(nod)); #pragma omp parallel { thread_id=omp_get_thread_num(); if(valid(tile,thread_id,groupn-1)) Compute(tile); } MPI_Waitall; Unpack(recv_buf, tilen-1 + 1, nod);}

Page 21: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 21

OverviewOverview

Introduction Pure MPI Model Hybrid MPI-OpenMP Models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 22: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 22

Coarse-grain ModelCoarse-grain Model

SPMD paradigm Requires more programming effort Threads are only spawned once Inter-node communication inside multi-threaded part (requires MPI_THREAD_MULTIPLE) Thread synchronization through explicit barrier (omp barrier directive)

Page 23: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 23

Coarse-grain ModelCoarse-grain Model#pragma omp parallel{ thread_id=omp_get_thread_num(); FOR(groupn-1){ #pragma omp master{ Pack(snd_buf, tilen-1 – 1, nod); MPI_Isend(snd_buf, dest(nod)); MPI_Irecv(recv_buf, src(nod)); } if(valid(tile,thread_id,groupn-1)) Compute(tile); #pragma omp master{ MPI_Waitall; Unpack(recv_buf, tilen-1 + 1, nod); } #pragma omp barrier }}

Page 24: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 24

Summary: Fine-grain vs Summary: Fine-grain vs Coarse-grainCoarse-grain

Fine-grain Coarse-grainThreads re-spawning Threads are only

spawned onceInter-node MPI communication outside of multi-threaded region

Inter-node MPI communication inside multi-threaded region, assumed by master thread

Intra-node synchronization through implicit barrier (omp parallel)

Intra-node synchronization through explicit OpenMP barrier

Page 25: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 25

OverviewOverview

Introduction Pure MPI model Hybrid MPI-OpenMP models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 26: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 26

Experimental ResultsExperimental Results

8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20) MPICH v.1.2.5 (--with-device=ch_p4, --with-comm=shared) Intel C++ compiler 7.0 (-O3 -mcpu=pentiumpro -static) FastEthernet interconnection ADI micro-kernel benchmark (3D)

Page 27: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 27

Alternating Direction Implicit Alternating Direction Implicit (ADI)(ADI)

Unitary data dependencies 3D Iteration Space (X x Y x Z)

X

Y

Z

Page 28: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 28

ADI – 4 nodesADI – 4 nodes

Pure MPI Hybrid

I: MPII: OpenMP

XX

Y Y

Page 29: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 29

ADI – 4 nodesADI – 4 nodes

X < Y

X > Y

Pure MPI HybridX

Y

X

Y

Pure MPI HybridXX

Y Y

Page 30: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 30

ADI X=512 Y=512 Z=8192 – 4 ADI X=512 Y=512 Z=8192 – 4 nodesnodes

Page 31: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 31

ADI X=128 Y=512 Z=8192 – 4 ADI X=128 Y=512 Z=8192 – 4 nodesnodes

Page 32: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 32

ADI X=512 Y=128 Z=8192 – 4 ADI X=512 Y=128 Z=8192 – 4 nodesnodes

Page 33: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 33

ADI – 2 nodesADI – 2 nodes

Pure MPI Hybrid

I: MPII: OpenMP

XX

Y Y

Page 34: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 34

ADI – 2 nodesADI – 2 nodes

X < Y

X > Y

Pure MPI HybridXX

Y Y

Pure MPI HybridXX

Y Y

Page 35: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 35

ADI X=128 Y=512 Z=8192 – 2 ADI X=128 Y=512 Z=8192 – 2 nodesnodes

Page 36: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 36

ADI X=256 Y=512 Z=8192 – 2 ADI X=256 Y=512 Z=8192 – 2 nodesnodes

Page 37: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 37

ADI X=512 Y=512 Z=8192 – 2 ADI X=512 Y=512 Z=8192 – 2 nodesnodes

Page 38: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 38

ADI X=512 Y=256 Z=8192 – 2 ADI X=512 Y=256 Z=8192 – 2 nodesnodes

Page 39: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 39

ADI X=512 Y=128 Z=8192 – 2 ADI X=512 Y=128 Z=8192 – 2 nodesnodes

Page 40: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 40

ADI X=128 Y=512 Z=8192 – 2 ADI X=128 Y=512 Z=8192 – 2 nodesnodes

Computation Communication

Page 41: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 41

ADI X=512 Y=128 Z=8192 – 2 ADI X=512 Y=128 Z=8192 – 2 nodesnodes

Computation Communication

Page 42: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 42

OverviewOverview

Introduction Pure MPI model Hybrid MPI-OpenMP models

• Hyperplane Scheduling• Fine-grain Model• Coarse-grain Model

Experimental Results Conclusions – Future Work

Page 43: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 43

ConclusionsConclusions

Nested loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm Hybrid models can be competitive to the pure MPI paradigm Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated Programming efficiently in OpenMP not easier than programming efficiently in MPI

Page 44: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 44

Future WorkFuture Work

Application of methodology to real applications and benchmarks Work balancing for coarse-grain model Performance evaluation on advanced interconnection networks (SCI, Myrinet) Generalization as compiler technique

Page 45: Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.

2/10/2003 EuroPVM/MPI 2003 45

Questions?Questions?

http://www.cslab.ece.ntua.gr/~ndros