Top Banner
1 High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1
27

High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

Jun 12, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

1

High Performance MPI on IBM 12x InfiniBand Architecture

Abhinav Vishnu, Brad Benton1

and

Dhabaleswar

K. Panda

vishnu, panda @ [email protected]

Page 2: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

2

Presentation Road-Map

Introduction and Motivation•

Background

Enhanced MPI design for IBM 12x Architecture

Performance Evaluation•

Conclusions and Future Work

Page 3: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

3

Introduction and Motivation

Demand for more compute power is driven by Parallel Applications–

Molecular Dynamics (NAMD), Car Crash Simulations (LS-

DYNA) , ...... , ……

Cluster sizes have been increasing forever to meet these demands–

9K proc. (Sandia Thunderbird, ASCI Q)–

Larger scale clusters are planned using upcoming multi-

core architectures

MPI is used as the primary programming model for writing these applications

Page 4: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

4

Emergence of InfiniBand

Interconnects with very low latency and very high throughput have become available–

InfiniBand, Myrinet, Quadrics …•

InfiniBand –

High Performance and Open Standard–

Advanced Features•

PCI-Express Based InfiniBand Adapters are becoming popular–

8X (1X ~ 2.5 Gbps) with Double Data Rate (DDR) support–

MPI Designs for these Adapters are emerging•

Compared to PCI-Express, GX+ I/O Bus Based Adapters

are also emerging–

4X and 12X

link support

Page 5: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

5

InfiniBand AdaptersTo Network

PCI-X (4x Bidirectional)

HCAChipsetHCA

ChipsetP1P1

I/O Bus InterfaceI/O Bus Interface

P2P2

4x

4x

PCI-Express (16x Bidirectional)GX+ (>24x Bidirectional Bandwidth)

12x

12x

To Host

(SDR/DDR)

MPI for PCI-Express based are coming upIBM 12x InfiniBand Adapters on GX+ are coming up

Page 6: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

6

Problem Statement

How do we design an MPI with low overhead

for IBM 12x InfiniBand Architecture?

What are the performance benefits of enhanced design over the existing designs?–

Point-to-point communication–

Collective communication–

MPI Applications

Page 7: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

7

Presentation Road-Map

Introduction and Motivation•

Background

Enhanced MPI design for IBM 12x Architecture

Performance Evaluation•

Conclusions and Future Work

Page 8: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

8

Overview of InfiniBand

An interconnect technology to connect I/O nodes and processing nodes

InfiniBand provides multiple transport semantics–

Reliable Connection•

Supports reliable notification and Remote Direct Memory Access (RDMA)

Unreliable Datagram•

Data delivery is not reliable, send/recv

is supported–

Reliable Datagram•

Currently not implemented by Vendors–

Unreliable Connection•

Notification is not supported•

InfiniBand uses a queue pair (QP) model for data transfer–

Send queue (for send operations)–

Receive queue (not involved in RDMA kind of operations)

Page 9: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

9

MultiPathing

Configurations

SwitchSwitchA combination of these is also possible

Multiple Adapters and Multiple Ports

(Multi-Rail Configurations)

Multi-rail for multipleSend/recv

engines

Page 10: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

10

Presentation Road-Map

Introduction and Motivation•

Background

Enhanced MPI design for IBM 12x Architecture

Performance Evaluation•

Conclusions and Future Work

Page 11: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

11

MPI Design for 12x Architecture

InfiniBand Layer

ADI Layer

CommunicationScheduler

SchedulingPolicies

CompletionNotifier

Communication Marker

Notification

EPC

Multiple QPs/port

Jiuxing Liu, Abhinav Vishnu and Dhabaleswar K. Panda. , “Building Multi-rail InfiniBand Clusters:

MPI-level Design and Performance Evaluation, ”. SuperComputing 2004

Eager, Rendezvouspt-to-pt,collective?

Page 12: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

12

Discussion on Scheduling Policies

Policies

Reverse Multiplexing Even Striping

Binding Round Robin

Enhanced Pt-to-Pt and Collective (EPC)

Overhead•Multiple Stripes•Multiple Completions

Non-blockingBlocking

CommunicationCollective

Communication

Page 13: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

13

EPC Characteristics

For small messages, round robin

policy is used –

Striping leads to overhead for small messages

pt-2-pt blocking striping

non-blocking round-robin

collective striping

Page 14: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

14

MVAPICH/MVAPICH2

We have used MVAPICH

as our MPI framework for the enhanced design

MVAPICH/MVAPICH2–

High Performance MPI-1/MPI-2 implementation over InfiniBand and iWARP

Has powered many supercomputers in TOP500 supercomputing rankings

Currently being used by more than 450 organizations (academia and industry worldwide)

http://nowlab.cse.ohio-state.edu/projects/mpi-iba•

The enhanced design is available with MVAPICH–

Will become available with MVAPICH2 in the upcoming releases

Page 15: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

15

Presentation Road-Map

Introduction and Motivation•

Background

Enhanced MPI design for IBM 12x Architecture

Performance Evaluation•

Conclusions and Future Work

Page 16: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

16

Experimental TestBed

The Experimental Test-Bed consists of:–

Power5 based systems with SLES9 SP2–

GX+ at 950 MHz clock speed–

2.6.9 Kernel Version–

2.8 GHz Processor with 8 GB of Memory–

TS120 switch for connecting the adapters•

One port per adapter and one adapter is used for communication–

The objective is to see the benefit with using only one physical port

Page 17: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

17

Ping-Pong Latency Test

EPC adds insignificant overhead

to the small message latency•

Large Message latency reduces by 41% using EPC

with IBM 12x architecture

Page 18: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

18

Small Messages Throughput

Unidirectional bandwidth doubles for small messages

using EPC

Bidirectional bandwidth does not improve with increasing number of QPs

due to the copy bandwidth limitation

Page 19: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

19

Large Messages Throughput

EPC improves the uni-directional and bi-directional throughput significantly for medium size messages

We can achieve a peak unidirectional bandwidth of 2731 MB/s

and bidirectional bandwidth of 5421 MB/s

Page 20: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

20

Collective Communication

MPI_Alltoall shows significant benefits for large messages•

MPI_Bcast

shows more benefits for very large messages

Page 21: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

21

NAS Parallel Benchmarks

For class A and class B problem sizes, x1 configuration shows improvement

There is no degradation for other configurations on Fourier Transform

Page 22: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

22

NAS Parallel Benchmarks

Integer sort shows 7-11%

improvement for x1 configurations•

Other NAS Parallel Benchmarks do not show performance degradation

Page 23: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

23

Presentation Road-Map

Introduction and Motivation•

Background

Enhanced MPI design for IBM 12x Architecture

Performance Evaluation•

Conclusions and Future Work

Page 24: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

24

Conclusions

We presented an enhanced design for IBM 12x InfiniBand Architecture–

EPC (Enhanced Point-to-Point and collective communication)

We have implemented our design and evaluated with Micro-

benchmarks, collectives and MPI application kernels

IBM 12x HCAs

can significantly improve communication performance–

41% for ping-pong latency test–

63-65% for uni-directional and bi-directional bandwidth tests

7-13% improvement in performance for NAS Parallel Benchmarks

We can achieve a peak bandwidth of 2731 MB/s

and 5421 MB/s

unidirectional and bidirectional bandwidth respectively

Page 25: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

25

Future Directions

We plan to evaluate EPC with multi-rail configurations on upcoming multi-core systems–

Multi-port configurations

Multi-HCA configurations•

Scalability studies of using multiple QPs

on large

scale clusters–

Impact of QP caching

Network Fault Tolerance

Page 26: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

26

Acknowledgements

Our research is supported by the following organizations

• Current Funding support by

• Current Equipment support by

Page 27: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •

27

Web Pointers

http://nowlab.cse.ohio-state.edu/

MVAPICH Web Pagehttp://mvapich.cse.ohio-state.edu

E-mail: vishnu, [email protected],[email protected]