Top Banner
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster, Nick Karonis, Matei Ripeanu, Ed Seidel, Brian Toonen Proceedings of Supercomputing 2001 (Winning Paper for Gordon Bell Prize - Special Category) Presenter Imran Patel
19

Presenter Imran Patel

Jan 17, 2016

Download

Documents

Sian

- PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Presenter Imran Patel

Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus

Gabrielle Allen, Thomas Dramlitsch, Ian Foster, Nick Karonis, Matei Ripeanu, Ed Seidel, Brian Toonen

Proceedings of Supercomputing 2001 (Winning Paper for Gordon Bell Prize - Special Category)

Presenter

Imran Patel

Page 2: Presenter Imran Patel

Outline

Introduction Computational Grids Grid-enabled Cactus Toolkit Experimental Results Ghostzones and Compression Adaptive Strategies Conclusion

Page 3: Presenter Imran Patel

Introduction

Widespread use of numerical simulation techniques has led to a high demand for traditional high-performance computing resources (supercomputers).

Low-end computers are becoming increasingly powerful and are connected with high-speed networks.

“Computational Grids” aim to tie these scattered resources into an integrated infrastructure.

Applications for the grid include large-scale simulations which require high resources for increased throughput.

Page 4: Presenter Imran Patel

Introduction: The Problem

Heterogeneous and dynamically behaving resources makes development of grid-enabled applications extremely difficult.

One approach is to develop computational frameworks which hide this complexity from the programmer.

Cactus-G: a simulation framework which uses grid-aware components and message passing library (MPICH-G2)

Page 5: Presenter Imran Patel

Computational Grids

Computational Grids differ from other parallel computing environments:

A grid may have nodes with different processor speeds, memory, etc

Grids may have widely different network interconnects and topologies.

Resource availability varies in a grid. Nodes in a grid may have different software

configurations.

Page 6: Presenter Imran Patel

Computational Grids – Programming Techniques

To overcome these problems, some generic techniques have been devised:

Irregular Data Distributions: use application/network/node information.

Grid-aware Communication schedules: overlapping/grouping, dedicated nodes

Redundant Computation: at the expense of reduced communication.

Protocol Tuning: TCP tweaks,compression

Page 7: Presenter Imran Patel

Cactus-G Cactus is a modular and parallel simulation

environment used by scientists and engineers in the fields of numerical relativity, astrophysics, climate modeling, etc.

Cactus design consists of a core (flesh) which connects to application modules (thorns).

Various thorns exist for services such as Globus Toolkit, PETSc library, visualization, etc.

Cactus is highly portable and parallel since it uses abstraction APIs which themselves are implemented as thorns.

MPICH-G2 exploits Globus services and provides faster communication and QOS.

Page 8: Presenter Imran Patel

Cactus-G: Architecture Applications thorns

need not be grid-aware.

Example of a grid-aware Cactus thorn is PUGH, which provides MPI-based parallelism.

The DUROC library handles process management.

Page 9: Presenter Imran Patel

Experimental Results: Setup

An application in Fortran for solving numerical relativity problems

57 3-d variables, 780 flops per gridpoint per iteration.

N x N x 6 x ghostzone_size x 8 variables need to be sync’ed at each processor.

Total 1500 CPUs organized in a 5 x 12 x 25 3-d mesh

Page 10: Presenter Imran Patel

Experimental Results: Setup

4 supercomputers at SDSC and NCSA 1024-CPU IBM Power-SP (306

MFlops/s). One 256 CPU and two 128-CPU SGI

Origin2000 systems. (168 MFlops/s). Intramachine: 200 MB/s Intermachine: 100 MB/s SDSC<->NCSA: 3 MB/s on 622 Mb/s

Page 11: Presenter Imran Patel

Communication Optimizations

Communication/Computation Overlap: Processors across WANs were given few grid points so that they could overlap their communication with computations.

Compression: A cactus thorn which exploits the regularity of data for compression using the libz library.

Ghostzones: Larger ghostzones were used for efficient communication at the expense of redundant computations.

Page 12: Presenter Imran Patel

Performance Metrics

Flop/s rate and efficiency are used as metrics. Total execution time (ttot): Measured using

MPI_Wtime() Expected Communication Time (tcomp): Ideal time

calculated from single node. Flop Count F: Calculated using hardware counters

780 Flops per gridpoint per iteration. Flop/s Rate =

F * num_gridpts * num_iterations / ttot

E = tcomp/ttot

Page 13: Presenter Imran Patel

Performance Figures

4 supercomputers: 42 GFlop/s, 14% Compression + 10 Ghostzones: 249

GFlop/s, 63.3% Smaller run on 120+1120=1140

processors: 292 GFlop/s, 88%

Page 14: Presenter Imran Patel

Ghostzones

Increasing ghostzone size can reduce latency overhead by transferring fewer messages with same amount of total data.

Increasing the ghostzone size beyond a certain point does not give any benefits and wastes memory.

Page 15: Presenter Imran Patel

Compression

For increased throughput across WANs.

Compression found to be highly useful since data is regular/smooth.

Since smoothness of data changes over time, compression effects can change. So, we need adaptive compression.

Page 16: Presenter Imran Patel

Adaptive Strategies - Compression

Predicting optimal values of ghostzone/compression parameters might be good.

Don’t want to use detailed network characteristics.

For ex: change the compression state based on efficiency, which is averaged over N iterations.

Page 17: Presenter Imran Patel

Adaptive Ghostzone Sizes

Adapting Ghostzone size is challenging: Many ghostsize values possible, memory re-allocations, ripple effects, need to get extra data from neighbors.

Start with a size of 1 and increase/decrease in accordance with the efficiency.

Page 18: Presenter Imran Patel

Adaptive Ghostzone Sizes

Page 19: Presenter Imran Patel

Further Information

The Cactus Framework and Toolkit: Design and Applications Tom Goodale, et al.

Vector and Parallel Processing - VECPAR'2002

Grid Aware Parallelizing Algorithms Thomas Dramlitsch, Gabrielle Allen, Edward Seidel

Journal of Parallel and Distributed Computing