Top Banner
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009
24

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS

University of Illinois at Urbana-Champaign

Abhinav Bhatele, Eric Bohm, Laxmikant V. KaleParallel Programming Laboratory

Euro-Par 2009

Page 2: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

2

Outline

Motivation Solution: Mapping of OpenAtom Performance Benefits Bigger Picture:

Resources Needed Heuristic Solutions

Automatic Mapping

August 27th, 2009

Page 3: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

3

OpenAtom

Ab-Initio Molecular Dynamics code Consider electrostatic interactions

between the nuclei and electrons Calculate different energy terms Divided into different phases with lot of

communication

August 27th, 2009

Page 4: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

4

OpenAtom on Blue Gene/L

August 27th, 2009

512 1024 2048 4096 81920

0.05

0.1

0.15

0.2

0.25

0.3

w32 Default

No. of cores

Tim

e p

er

ste

p (

secs)

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode

w32 = 32 water molecules with 70 Ry cutoff

Page 5: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

5

The problem lies in …

August 27th, 2009

Performance Analysis and Visualization Tool: Projections (part of Charm++) – Timeline View

Page 6: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

6

Solution –

August 27th, 2009

Topology Aware Mapping

Page 7: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

7

Processor Virtualization

User View System View

Programmer: Decomposes the computation into

objects

Runtime: Maps the computation on to the

processors

August 27th, 2009

Page 8: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

8

Benefits of Charm++

August 27th, 2009

Computation is divided into objects/chares/virtual processors (VPs)

Separates decomposition from mapping VPs can be flexibly mapped to actual

physical processors (PEs)

Page 9: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

9

Topology Manager API†

The application needs information such as Dimensions of the partition Rank to physical co-ordinates and vice-versa

TopoManager: a uniform API On BG/L and BG/P: provides a wrapper for

system calls On XT3/4/5, there are no such system calls Provides a clean and uniform interface to the

application

August 27th, 2009

† http://charm.cs.uiuc.edu/~bhatele/phd/topomgr.htm

Page 10: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

10

Parallelization using Charm++

August 27th, 2009

Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2):159-174, 2008.

Page 11: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

11

Mapping Challenge

August 27th, 2009

Load Balancing: Multiple VPs per PE Multiple groups of communicating

objects Intra-group communication Inter-group communication

Conflicting communication requirements

Page 12: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

12

Topology Mapping of Chare Arrays

August 27th, 2009

RealSpace and GSpace have state-wisecommunication Paircalculator

and GSpace have plane-wisecommunication

Page 13: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

13

Performance Improvements on BG/L

August 27th, 2009

512 1024 2048 4096 81920

0.05

0.1

0.15

0.2

0.25

0.3

w32 Defaultw32 Topology

No. of cores

Tim

e p

er

ste

p (

secs)

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode, Year: 2006

w32 = 32 water molecules with 70 Ry cutoff

Page 14: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

14

Improved Timeline Views

August 27th, 2009

Page 15: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

15

Results on Blue Gene/L

August 27th, 2009

1024 2048 4096 8192 163840

5

10

15

20

25

w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology

No. of cores

Tim

e p

er

ste

p (

secs)

GST_BIG = 64 Ge, 128 Sb and 256 Te molecules with 20 Ry cutoff

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode

Page 16: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

16

Results on Blue Gene/P

August 27th, 2009

1024 2048 4096 81920

2

4

6

8

10

12

w256 Defaultw256 Topology

No. of cores

Tim

e p

er

ste

p (

secs)

w256 = 256 water molecules with 70 Ry cutoff

Runs on Blue Gene/P at Argonne National Laboratory, VN mode

Page 17: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

17

Results on Cray XT3

August 27th, 2009

512 1024 20482

3

4

5

6

7

8

w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology

No. of cores

Tim

e p

er

ste

p (

secs)

Runs on Cray XT3 (Bigben) at Pittsburgh Supercomputing Center, VN mode(with system reservation to obtain complete 3d mesh shapes)

Page 18: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

18

Performance Analysis

August 27th, 2009

1024 2048 4096 81920

500000

1000000

1500000

2000000

2500000

3000000

DefaultTopology

No. of cores

Idle

Tim

e (

secs)

Performance Analysis and Visualization Tool: Projections – Idle time added across all processors

w256M_70Ry on Blue Gene/L

Page 19: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

19

Reduction in Communication Volume

August 27th, 2009

1024 2048 4096 81920

200

400

600

800

1000

1200

DefaultTopology

No. of cores

Ban

dw

idth

(G

B)

Data obtained from Blue Gene/P’s Uniform Performance Counters

w256M_70Ry on Blue Gene/P

Page 20: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

20

Relative Performance Improvement

August 27th, 2009

512 1024 2048 4096 81920

0.5

1

1.5

2

2.5

Cray XT3Blue Gene/LBlue Gene/P

No. of cores

% I

mp

rovem

en

t

w256M_70Ry

Page 21: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

21

Bigger picture

August 27th, 2009

Different kinds of applications: Computation bound Communication bound

Latency tolerant Latency sensitive

Technique: Obtain processor topology and application

communication graph Heuristic Techniques for mapping

Page 22: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

22

Why does distance affect message latencies?

Consider a 3D mesh/torus interconnect Message latencies can be modeled by

(Lf/B) x D + L/B

Lf = length of flit, B = bandwidth,

D = hops, L = message size

When (Lf * D) << L, first term is negligible

August 27th, 2009

But in presence of contention …

Page 23: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

23

Automatic Topology Aware Mapping

August 27th, 2009

Many MPI applications exhibit a simple two-dimensional near-neighbor communication pattern

Examples: MILC, WRF, POP, Stencil, …

Page 24: A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V.

Abhinav Bhatele @ Euro-Par 2009

Acknowledgements: Shawn Brown and Chad Vizino (PSC) Glenn Martyna, Sameer Kumar, Fred Mintzer (IBM) Teragrid for running time on Bigben (XT3) ANL for running time on Blue Gene/P

DOE Grant B341494 (CSAR), DOE Grant DE-FG05-08OR23332 (ORNL LCF) and NSF Grant ITR 0121357

August 27th, 2009

Funding

E-mail: [email protected]: http://charm.cs.illinois.edu