Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T. Hoefler 1,2 , J. Sqyures 3 , G. Fagg 4 , G. Bosilca 4 , W. Rehm 2 , A. Lumsdaine 1 1 Open Systems Lab 2 Computer Architecture Group Indiana University Technical University of Chemnitz 3 Cisco Systems 4 University of Tennessee San Jose Dept. of Computer Science 2nd Austrian Grid Symposium - DAPSYS’06 Innsbruck, Austria, 22nd September 2006 T. Hoefler 1,2 , J. Sqyures 3 , G. Fagg 4 , G. Bosilca 4 , W. Rehm 2 , A. Lumsdaine 1 MPI Collective Communication
22
Embed
A new Approach to MPI Collective Communication Implementations · Introduction Framework Architecture Conclusions A new Approach to MPI Collective Communication Implementations T.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroductionFramework Architecture
Conclusions
A new Approach to MPI CollectiveCommunication Implementations
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2,A. Lumsdaine1
1Open Systems Lab 2Computer Architecture GroupIndiana University Technical University of Chemnitz
3Cisco Systems 4University of TennesseeSan Jose Dept. of Computer Science
2nd Austrian Grid Symposium - DAPSYS’06Innsbruck, Austria, 22nd September 2006
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Outline
1 IntroductionKnown ProblemsState of the ArtOpen MPIDesign Goals
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
Known Problems
huge number of different collective algorithms andimplementationshardware-dependent collective implementationsno framework that offers run-time selection existsselection of optimal algorithm not trivial, because
depends on MPI-parameters (size, comm)decision in critical pathdifferent implementations only work for certain parametersevery process has to chose the same (runtime-decision)
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
Predictive Performance Models
Prediction is PossibleLogP-Family (LogGP) predicts accurately
L hardware latencyo host overhead (can be divided into or and os)g gap between consecutive messages (bw limiting)G gap between each byte of a messageP number of processes
Collective OperationsAll collective operations based on point-to-point messages canbe predicted with Log(G)P!
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
Predictive Performance Models
Prediction is PossibleLogP-Family (LogGP) predicts accurately
L hardware latencyo host overhead (can be divided into or and os)g gap between consecutive messages (bw limiting)G gap between each byte of a messageP number of processes
Collective OperationsAll collective operations based on point-to-point messages canbe predicted with Log(G)P!
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
Further Problems
LogP vs. HW optimized implementations⇒ need common denominatorseconds to assess running timeHW implementations have to offer predictive modelsbypassing must be possible (optimized impl.)
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
State of the Art
most impl. use suboptimal hard-coded switching points(MPICH(2), MVAPICH, LAM/MPI, Open MPI)”tuned” Open MPI component experiments with dynamicselection with a fixed set of algorithms (no HWoptimization)Open MPI allows coarse grained third party coll modules⇒ no flexible selection framework available yet
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
Open MPI
⇒ merged FT-MPI, LA-MPI, LAM/MPI, PACX-MPIimplements MPI-2support for different networks (TCP, GM, MX, MVAPI,OpenIB, Portals, SM)modular framework architecture
some frameworks: PML, BTL, COLL ...easy addition of new ideasclearly defined interfacesbinary modules (vendor)
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
IntroductionFramework Architecture
Conclusions
Known ProblemsState of the ArtOpen MPIDesign Goals
Goals of our Design
⇒ redesign of collv1 framework in Open MPI 1.0/1.1enable fine-grained selectionefficient run-time decisionbypassing/fast-pathingmodular approach/third party (binary) modulesautomatic usage of best available module
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication
all returned modules are attached to the communicatoreach module offers an evaluation functioneval. function returns pointer to fastest function and anestimated timeestimation up to implementer (model, previous benchmark,...)
T. Hoefler1,2, J. Sqyures3, G. Fagg4, G. Bosilca4, W. Rehm2, A. Lumsdaine1MPI Collective Communication