Top Banner
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009
24

A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Jul 09, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS

University of Illinois at Urbana-Champaign

Abhinav Bhatele, Eric Bohm, Laxmikant V. KaleParallel Programming Laboratory

Euro-Par 2009

Page 2: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Outline

MotivationSolution: Mapping of OpenAtomPerformance BenefitsBigger Picture:

Resources NeededHeuristic Solutions

Automatic Mapping

August 27th, 2009

2

Abhinav Bhatele @ Euro-Par 2009

Page 3: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

OpenAtom

Ab-Initio Molecular Dynamics codeConsider electrostatic interactions between the nuclei and electronsCalculate different energy termsDivided into different phases with lot of communication

August 27th, 2009

3

Abhinav Bhatele @ Euro-Par 2009

Page 4: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

OpenAtom on Blue Gene/L

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

4

0

0.05

0.1

0.15

0.2

0.25

0.3

512 1024 2048 4096 8192

Tim

e pe

r ste

p (s

ecs)

No. of cores

w32 Default

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode

w32 = 32 water molecules with 70 Ry cutoff

Page 5: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

The problem lies in …

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

5

Performance Analysis and Visualization Tool: Projections (part of Charm++) – Timeline View

Page 6: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Solution –

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

6

Topology Aware Mapping

Page 7: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Processor Virtualization

User View System View

Programmer: Decomposes the computation into objects

Runtime: Maps the computation on to the processors

August 27th, 2009

7

Abhinav Bhatele @ Euro-Par 2009

Page 8: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Benefits of Charm++

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

8

Computation is divided into objects/chares/virtual processors (VPs)Separates decomposition from mappingVPs can be flexibly mapped to actual physical processors (PEs)

Page 9: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Topology Manager API†

The application needs information such asDimensions of the partitionRank to physical co-ordinates and vice-versa

TopoManager: a uniform APIOn BG/L and BG/P: provides a wrapper for system callsOn XT3/4/5, there are no such system callsProvides a clean and uniform interface to the application

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

9

† http://charm.cs.uiuc.edu/~bhatele/phd/topomgr.htm

Page 10: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Parallelization using Charm++

August 27th, 2009

10

Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2):159-174, 2008.

Abhinav Bhatele @ Euro-Par 2009

Page 11: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Mapping Challenge

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

11

Load Balancing: Multiple VPs per PEMultiple groups of communicating objects

Intra-group communicationInter-group communication

Conflicting communication requirements

Page 12: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Topology Mapping of Chare Arrays

August 27th, 2009

12

RealSpace and GSpace have state-wisecommunication Paircalculator and

GSpace have plane-wisecommunication

Abhinav Bhatele @ Euro-Par 2009

Page 13: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Performance Improvements on BG/L

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

13

0

0.05

0.1

0.15

0.2

0.25

0.3

512 1024 2048 4096 8192

Tim

e pe

r ste

p (s

ecs)

No. of cores

w32 Defaultw32 Topology

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode, Year: 2006

w32 = 32 water molecules with 70 Ry cutoff

Page 14: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Improved Timeline Views

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

14

Page 15: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Results on Blue Gene/L

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

15

0

5

10

15

20

25

1024 2048 4096 8192 16384

Tim

e pe

r ste

p (s

ecs)

No. of cores

w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology

GST_BIG = 64 Ge, 128 Sb and 256 Te molecules with 20 Ry cutoff

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode

Page 16: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Results on Blue Gene/P

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

16

0

2

4

6

8

10

12

1024 2048 4096 8192

Tim

e pe

r ste

p (s

ecs)

No. of cores

w256 Defaultw256 Topology

w256 = 256 water molecules with 70 Ry cutoff

Runs on Blue Gene/P at Argonne National Laboratory, VN mode

Page 17: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Results on Cray XT3

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

17

2

3

4

5

6

7

8

512 1024 2048

Tim

e pe

r ste

p (s

ecs)

No. of cores

w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology

Runs on Cray XT3 (Bigben) at Pittsburgh Supercomputing Center, VN mode(with system reservation to obtain complete 3d mesh shapes)

Page 18: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Performance Analysis

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

18

0

500000

1000000

1500000

2000000

2500000

3000000

1024 2048 4096 8192

Idle

Tim

e (s

ecs)

No. of cores

DefaultTopology

Performance Analysis and Visualization Tool: Projections – Idle time added across all processors

w256M_70Ry on Blue Gene/L

Page 19: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Reduction in Communication Volume

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

19

0

200

400

600

800

1000

1200

1024 2048 4096 8192

Band

wid

th (

GB)

No. of cores

DefaultTopology

Data obtained from Blue Gene/P’s Uniform Performance Counters

w256M_70Ry on Blue Gene/P

Page 20: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Relative Performance Improvement

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

20

0

0.5

1

1.5

2

2.5

512 1024 2048 4096 8192

% Im

prov

emen

t

No. of cores

Cray XT3Blue Gene/LBlue Gene/P

w256M_70Ry

Page 21: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Bigger picture

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

21

Different kinds of applications:Computation boundCommunication bound

Latency tolerantLatency sensitive

Technique:Obtain processor topology and application communication graphHeuristic Techniques for mapping

Page 22: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Why does distance affect message latencies?

Consider a 3D mesh/torus interconnect

Message latencies can be modeled by

(Lf/B) x D + L/B

Lf = length of flit, B = bandwidth,

D = hops, L = message size

When (Lf * D) << L, first term is negligible

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

22

But in presence of contention …

Page 23: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Automatic Topology Aware Mapping

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

23

Many MPI applications exhibit a simple two-dimensional near-neighbor communication patternExamples: MILC, WRF, POP, Stencil, …

Page 24: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Acknowledgements:Shawn Brown and Chad Vizino (PSC)Glenn Martyna, Sameer Kumar, Fred Mintzer (IBM)Teragrid for running time on Bigben (XT3)ANL for running time on Blue Gene/P

DOE Grant B341494 (CSAR), DOE Grant DE-FG05-08OR23332 (ORNL LCF) and NSF Grant ITR 0121357

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

Funding

E-mail: [email protected]: http://charm.cs.illinois.edu