Cactus in GrADS

Cactus in GrADS

Dave Angulo, Ian Foster

Matei Ripeanu, Michael Russell

Distributed Systems Laboratory

The University of Chicago

With: Gabrielle Allen, Thomas Dramlitsch, Ed Seidel, John Shalf, Thomas Radke

Distributed Systems Lab ARGONNE CHICAGO

Presentation Outline

Cactus Overview– Architecture

– Applications Cactus and Grid computing

– Metacomputing, Worms, … Proposed Cactus-GrADS project

– The “Cactus-G worm”

– Tequila thorn and architecture

– Issues


What is Cactus?

Cactus is a freely available, modular, portable and manageable environment for collaboratively developing parallel, high-performance multidimensional simulations– Originally developed for astrophysics,

but nothing about it is astrophysics-specific


Cactus Applications

Example output from Numerical Relativity Simulations


Cactus Architecture

Codes are constructed by linking a small core (flesh) with selected modules (thorns)– Custom linking/configuration tools

Core provides basic management services A wide variety of thorns are supported

– Numerical methods

– Grids and domain decompositions

– Visualization and steering

– Etc.


Cactus Architecture

Configure CST

Flesh

ComputationalToolkit

Toolkit Toolkit

Operating SystemsAIX NT

LinuxUnicos

SolarisHP-UX

Thorns

Cactus

SuperUX Irix

OSF

Make


Cactus Applications

A Cactus “application” is just another thorn, “linked” with other tool thorns

Numerous Astrophysics applications– E.g., Calculate Schwartzchild Event Horizons

for colliding black holes Potential candidates for GrADS work

– Elliptical Solver, BenchADM

– Both use 3-D grid abstract topology


Cactus Model (cont.)

Building an executable

Cactus Source

Flesh

IOBasic

IOASCII

WaveToy

LDAP

Worm

…

ThornsConfiguration

• Compiler options

• Tool options

• MPI options

• HDF5 options


Running Cactus

Parameter File

• Specify which thorns to activate

• Specify global parameters

• Specify restricted parameters

• Specify private parameters


Parallelism in Cactus Distributed memory model: each thorn is passed a section of

the global grid The parallel driver (implemented in a thorn) can use whatever

method it likes to decompose the grid across processors and exchange ghost zone information - each thorn is presented with a standard interface, independent of the driver

Standard driver distributed with Cactus (PUGH) is for a parallel unigrid and uses MPI for the communication layer

PUGH can do custom processor decomposition and static load balancing

AMR driver also provided


Cactus and Grid Computing:General Observations

Reasons to work with Cactus– Rich structure, computationally intensive,

numerous opportunities for Grid computing

– Talented and motivated developer/user community

Issues– At core, relatively simple structure

– Cactus system is relatively complex

– User community is relatively small


Cactus-G: Possible Opportunities

“Metacomputing”: use heterogeneous systems as source of low-cost cycles– Departmental pool or multi-site system

Dynamic resource selection, e.g.– “Cheapest” resources to achieve interactivity

– “Fastest” resource for best turnaround

– “Best” resolution to meet turnaround goal

– Spawn independent tasks: e.g., analysis

– Migration to “better” resource for all above


Cactus-G: Common Building Blocks

Resource selection based on resource and application characterizations

Implementation and management of distributed output

(De)centralized logging, accounting for resource usage, parameter selection, etc.

Fault discovery, recovery, tolerance Code/executable management and creation Next-generation Cactus that increases flexibility

with respect to parameter selection


Proposed Cactus-G Challenge Problem: Cactus-G Worm

Migrate to “faster/cheaper/bigger” system– When system identified by resource discovery

– When resource requirements change Why?

– Tests much of the machinery required for Cactus-G (source code mgmt, discovery, …)

– Places substantial demands on GrADS

– Good potential to show real benefit

– Migration approach simplifies infrastructure demands (MPI-2 support not required)


Cactus-G WormBasic Architecture and Operation

Cactus “flesh”

“Tequila” Thorn

Computeresource

Computeresource

…

Coderepository…

Coderepository

Storageresource

Storageresource

…

GridInformation

Service

GrADSResourceSelector

ApplicationManager

Appln& otherthorns

(1) Adapt.request (2)

Resourcerequest

(3) Writecheckpoint

(4) Migrationrequest

(5) Cactusstartup

(7) Readcheckpoint

(0) Possibleuser input

(6) Loadcode

(1’) Resourcenotification

Storemodels, etc.

Query


Tequila Thorn Functions

Initiates adaptation on application request or on notification of new resources– Can include user input (e.g., HTTP thorn)

Requests resources from external entity– GIS or ResourceSelector

Checkpoints application Contacts Application Manager to request

restart on new resources– AppManager has security, robustness

advantages vs. direct restart


Cactus-G Worm: Approach

1) Uniproc Tequila thorn that speaks to GIS, adapts periodically [done: Cactus group]

2) Tequila thorn that speaks to UCSD Resource Selector [current focus]

3) Integrate accurate performance models

4) Support multiprocessor execution

5) Detailed evaluation

6) Add adaptation triggers: e.g., contract violation, new regime, user input


Tequila Thorn + ResourceSelector

ResourceSelector must be set up as service

Tequila thorn sends request for new bag of resources

ResourceSelector responds with the new bag


Current Status

Tequila thorn prototype developed that speaks to ResourceSelector

Dummy ResourceSelector that returns a static bag of resources

Demonstrated Cactus+Tequila operating Performance model developed Expected by May: multiprocessor support,

ResourceSelector interface, real performance model


Open Issues

Should we move more management logic into Application Manager?

How does Contract Monitor fit into architecture?

How does PPS fit into architecture? How does COP and Aplication Launcher fit into

architecture (Cactus has its own launcher and compiles its own code)?

How does Pablo fit into architecture (Which thorns are monitored, is flesh monitored)?

The End


Request and Response

The Request to the ResourceSelector will be stored in the InformationService

Only the pointer to the data in the IS will be passed to the ResourceSelector

The Response from the ResourceSelector will also be stored in the IS

Only the pointer to the data in the IS will be passed back.


Tequila communication overview

Cactus Tequila ThornResourceSelector

InformationService


Cactus Architecture in GrADS

Configure CST

Flesh

ComputationalToolkit

Toolkit

Operating SystemsAIX NT

LinuxUnicos

SolarisHP-UX

Thorns

Cactus

SuperUX Irix

OSF

Make

Toolkit

GradsCommuni-

cationlibrary


Communication Details step 1

Event sent to Tequila thorn requesting restart

Cactus Tqeuila ThornResourceSelector

InformationService



Tequila store AART in IS


InformationService



Tequila sends request to ResourceSelector passing pointer to data in IS


InformationService



ResourceSelector retrieves AART from IS


InformationService



ResourceSelector stores bag of resources (in AART) in IS


InformationService



ResourceSelector responds to Tequila passing pointer to data in IS


InformationService



Tequila retrieves AART with new bag of resources from IS


InformationService


Requirements

Using the IS for communication adds overhead.

Why do this? GrADS requirement 1: do some things (e.g.

compile) at one time and have the results stored in a persistent storage area. Pick these stored results up later and complete other phases.


Sample Tequila Scenario User asks to run an ADM simulation 400x400x400 for

1000 timesteps in 10s. Resource selector contacted to obtain virtual machines Best virtual machine selected based on performance

model. AM starts Cactus on that virtual machine (and monitors

execution Contracts?) User (or application manager) decides that

computation advances too slow and decides to search for a better virtual machine

AM finds a better machine, commands the Cactus run to Checkpoint, transfers files and restart Cactus

Cactus in GrADS

Documents

Cactus in GrADS