Top Banner
Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T. Hoefler 1,2 , J. Sqyures 3 , G. Fagg 4 , G. Bosilca 4 , W. Rehm 2 , A. Lumsdaine 1 1 Open Systems Lab 2 Computer Architecture Group Indiana University Technical University of Chemnitz 3 Cisco Systems 4 University of Tennessee San Jose Dept. of Computer Science 2nd Austrian Grid Symposium - DAPSYS’06 Innsbruck, Austria, 22nd September 2006 T. Hoefler 1,2 , J. Sqyures 3 , G. Fagg 4 , G. Bosilca 4 , W. Rehm 2 , A. Lumsdaine 1 MPI Collective Communication
22

A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

Jul 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

A new Approach to MPI CollectiveCommunication Implementations

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2,A. Lumsdaine1

1Open Systems Lab 2Computer Architecture GroupIndiana University Technical University of Chemnitz

3Cisco Systems 4University of TennesseeSan Jose Dept. of Computer Science

2nd Austrian Grid Symposium - DAPSYS’06Innsbruck, Austria, 22nd September 2006

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 2: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Outline

1 IntroductionKnown ProblemsState of the ArtOpen MPIDesign Goals

2 Framework ArchitectureSoftware ArchitectureInitializationRuntime Selection

3 Conclusions

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 3: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Outline

1 IntroductionKnown ProblemsState of the ArtOpen MPIDesign Goals

2 Framework ArchitectureSoftware ArchitectureInitializationRuntime Selection

3 Conclusions

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 4: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Known Problems

huge number of different collective algorithms andimplementationshardware-dependent collective implementationsno framework that offers run-time selection existsselection of optimal algorithm not trivial, because

depends on MPI-parameters (size, comm)decision in critical pathdifferent implementations only work for certain parametersevery process has to chose the same (runtime-decision)

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 5: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Predictive Performance Models

Prediction is PossibleLogP-Family (LogGP) predicts accurately

L hardware latencyo host overhead (can be divided into or and os)g gap between consecutive messages (bw limiting)G gap between each byte of a messageP number of processes

Collective OperationsAll collective operations based on point-to-point messages canbe predicted with Log(G)P!

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 6: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Predictive Performance Models

Prediction is PossibleLogP-Family (LogGP) predicts accurately

L hardware latencyo host overhead (can be divided into or and os)g gap between consecutive messages (bw limiting)G gap between each byte of a messageP number of processes

Collective OperationsAll collective operations based on point-to-point messages canbe predicted with Log(G)P!

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 7: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Further Problems

LogP vs. HW optimized implementations⇒ need common denominatorseconds to assess running timeHW implementations have to offer predictive modelsbypassing must be possible (optimized impl.)

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 8: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

State of the Art

most impl. use suboptimal hard-coded switching points(MPICH(2), MVAPICH, LAM/MPI, Open MPI)”tuned” Open MPI component experiments with dynamicselection with a fixed set of algorithms (no HWoptimization)Open MPI allows coarse grained third party coll modules⇒ no flexible selection framework available yet

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 9: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Open MPI

⇒ merged FT-MPI, LA-MPI, LAM/MPI, PACX-MPIimplements MPI-2support for different networks (TCP, GM, MX, MVAPI,OpenIB, Portals, SM)modular framework architecture

some frameworks: PML, BTL, COLL ...easy addition of new ideasclearly defined interfacesbinary modules (vendor)

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 10: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Known ProblemsState of the ArtOpen MPIDesign Goals

Goals of our Design

⇒ redesign of collv1 framework in Open MPI 1.0/1.1enable fine-grained selectionefficient run-time decisionbypassing/fast-pathingmodular approach/third party (binary) modulesautomatic usage of best available module

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 11: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Outline

1 IntroductionKnown ProblemsState of the ArtOpen MPIDesign Goals

2 Framework ArchitectureSoftware ArchitectureInitializationRuntime Selection

3 Conclusions

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 12: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Terms

component functionality without resources provided byimplementer

module communicator specific instance of aimplementation

query request to a component to return comm specificmodules

implementation implementation of a collective operationopaque functions non-visible functions in coll. modules

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 13: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Software Architecture

Broadcast Module*broadcast_fn_1*broadcast_fn_2*broadcast_eval_fn

Barrier Module*barrier_fn*barrier_eval_fn

Alltoall Module*alltoall_fn_1*alltoall_fn_2*alltoall_eval_fn

Broadcast Module*broadcast_fn*broadcast_eval_fn

Broadcast Module*broadcast_fn*broadcast_eval_fn

...

Component A

Gather Module*gather_fn_1*gather_fn_2*gather_eval_fn

...

Component B

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 14: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Actions During MPI_INIT

didthe user force

anything?

listreturn component

component if itwants to run

open no

unload itclose it and

yes

no

componentsload all available load selected

components

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 15: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Actions During Communicator Construction

to the avail_<op> arrayadd returned modules

are

left?

yes any components

no

query componentwith comm

left?no

yes

at the communicatorput the decision function

for each collective operation

is there only one module

at the communicatorput it as direct callable

construct function listunify module array

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 16: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Architecture

all returned modules are attached to the communicatoreach module offers an evaluation functioneval. function returns pointer to fastest function and anestimated timeestimation up to implementer (model, previous benchmark,...)

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 17: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Invocation

MPIarguments in

cache?but winner

put fastest to cache

call fastest

cleanup all modules

estimated running timequery module for

yes

no

untestedmodule left?

no

yes

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 18: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Decision Overhead

Cache Hitaccess in a hash-table

Cache Missdepends on number of modules

query each modulereturns model or benchmark-based prediction

Cache FriendlinessABINIT/Band: 295/16 (94.6%)ABINIT/CG: 53887/75 (99.9%)CPMD: 15428/85 (99.4%)

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 19: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Software ArchitectureInitializationRuntime Selection

Decision Overhead

Cache Hitaccess in a hash-table

Cache Missdepends on number of modules

query each modulereturns model or benchmark-based prediction

Cache FriendlinessABINIT/Band: 295/16 (94.6%)ABINIT/CG: 53887/75 (99.9%)CPMD: 15428/85 (99.4%)

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 20: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Outline

1 IntroductionKnown ProblemsState of the ArtOpen MPIDesign Goals

2 Framework ArchitectureSoftware ArchitectureInitializationRuntime Selection

3 Conclusions

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 21: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Conclusions and Future Work

Conclusionseasy, flexible and reliable schemeoptimized for common caseuses ”argument-locality”

Future Workimplement system in Open MPIanalyze more applications

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication

Page 22: A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.

IntroductionFramework Architecture

Conclusions

Conclusions and Future Work

Conclusionseasy, flexible and reliable schemeoptimized for common caseuses ”argument-locality”

Future Workimplement system in Open MPIanalyze more applications

T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication