Many Task Applications for Grids and Supercomputers

Ian Foster

Computation Institute

Argonne National Lab & University of Chicago

From the Heroic to the Logistical

Programming Model Implications of New Supercomputing Applications

What will we do with 1+ Exaflops and 1M+ cores?

4

Or, If You Prefer, A Worldwide Grid (or Cloud)

EGEE

55

1) Tackle Bigger and Bigger Problems

ComputationalScientist

as Hero

6

2) Tackle More Complex Problems

ComputationalScientist

as LogisticsOfficer

77

“More Complex Problems” Ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database

of ligand structures against target proteins Study economic model sensitivity to parameters Analyze turbulence dataset from many perspectives Perform numerical optimization to determine optimal

resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical

compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel

computations

88

Programming Model Issues

Massive task parallelism Massive data parallelism Integrating black box applications Complex task dependencies (task graphs) Failure, and other execution management issues Data management: input, intermediate, output Dynamic computations (task graphs) Dynamic data access to large, diverse datasets Long-running computations Documenting provenance of data products

9

9

Problem Types

Number of tasks

Inputdatasize

1 1K 1M

Hi

Med

Lo

HeroicMPI

tasks

Dataanalysis,mining

Many loosely coupled tasks

Much data and complex tasks

10

An Incomplete and Simplistic View ofProgramming Models and Tools

Many TasksDAGMan+Pegasus

Karajan+Swift

Much DataMapReduce/Hadoop

Dryad

Complex Tasks, Much DataDryad, Pig, Sawzall

Swift+Falkon

Single task, modest dataMPI, etc., etc., etc.

1111

Image courtesy Pat Behling and

Yun Liu, UW Madison

NCAR computer + grad student160 ensemble members in 75 days

TeraGrid + “Virtual Data System”250 ensemble members in 4 days

Many Tasks

Climate Ensemble Simulations

(Using FOAM,2005)

12

Many Many Tasks:Identifying Potential Drug Targets

2M+ ligands Protein xtarget(s)

(Mike Kubal, Benoit Roux, and others)

13

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

14

DOCK on SiCortex CPU cores: 5760 Tasks: 92160 Elapsed time: 12821 sec Compute time: 1.94 CPU years Average task time: 660.3 sec

(does not include ~800 sec to

stage input data)

IoanRaicu

ZhaoZhang

15

DOCK on BG/P: ~1M Tasks on 118,000 CPUs

CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:

Sustained: 99.6% Overall: 78.3%

• GPFS

• 1 script (~5KB)

• 2 file read (~10KB)

• 1 file write (~10KB)

• RAM (cached from GPFS on first task per node)

• 1 binary (~7MB)

• Static input data (~45MB)IoanRaicu

ZhaoZhang

MikeWilde

Time (secs)

16

Managing 120K CPUs

Slower shared storage

High-speed local disk

Falkon

17

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 180 360 540 720 900 1080 1260 1440Time (sec)

CP

U C

ore

s

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

0 180 360 540 720 900 1080 1260 1440

Mic

ro-T

asks

Idle CPUsBusy CPUsWait Queue LengthCompleted Micro-Tasks

MARS Economic Model

Parameter Study 2,048 BG/P CPU cores Tasks: 49,152 Micro-tasks: 7,077,888 Elapsed time: 1,601 secs CPU Hours: 894

ZhaoZhang

MikeWilde

1818

19

AstroPortal Stacking Service Purpose

On-demand “stacks” of random locations within ~10TB dataset

Challenge Rapid access to 10-10K “random” files Time-varying load

Sample Workloads

S4 SloanData

+

+++

+

+

=

+

Web page

or Web Service

Locality Number of Objects Number of Files1 111700 111700

1.38 154345 1116992 97999 490003 88857 296204 76575 191455 60590 1212010 46480 465020 40460 202530 23695 790

20

AstroPortal Stacking Servicewith Data Diffusion

Aggregate throughput: 39Gb/s

10X higher than GPFS Reduced load on GPFS

0.49Gb/s 1/10 of the original load

0

5

10

15

20

25

30

35

40

45

50

1 1.38 2 3 4 5 10 20 30Locality

Ag

gre

gat

e T

hro

ug

hp

ut

(Gb

/s)

Data Diffusion Throughput LocalData Diffusion Throughput Cache-to-CacheData Diffusion Throughput GPFSGPFS Throughput (FIT)GPFS Throughput (GZ)

Big performance gains as locality increases

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 1.38 2 3 4 5 10 20 30 Ideal

Locality

Tim

e (m

s) p

er s

tack

per

CP

U

Data Diffusion (GZ)Data Diffusion (FIT)GPFS (GZ)GPFS (FIT)

Ioan Raicu, 11:15am TOMORROW

21

B. Berriman, J. Good (Caltech)J. Jacob, D. Katz (JPL)

22

MontageBenchmark

(Yong Zhao, Ioan Raicu, U.Chicago)

MPI: ~950 lines of C for one stagePegasus: ~1200 lines of C + tools to

generate DAG for specific dataset SwiftScript: ~92 lines for any dataset

23

Summary Peta- and exa-scale computers enable us to

tackle new problems at greater scales Parameter studies, ensembles, interactive data

analysis, “workflows” of various kinds Such apps frequently stress petascale hardware

and software in interesting ways New programming models and tools required

Mixed task/data parallelism, task management complex data management, failure, …

Tools (DAGman, Swift, Hadoop, …) exist but need refinement

Interesting connections to distributed systems

More info: www.ci.uchicago.edu/swift

Many Task Applications for Grids and Supercomputers

Technology

intensive tasks

bgp cpu cores tasks

data dryad

multiprocessor tasks

tasks dagman

collection of data

heroic mpi tasks data

cpuhrs amber