Top Banner
Ian Foster Computation Institute Argonne National Lab & University of Chicago Logistical Programming Model Implications of New Supercomputing Applications
23

Many Task Applications for Grids and Supercomputers

May 10, 2015

Download

Technology

Ian Foster

Invited talk at CLADE 2008 in Boston: http://www.cs.okstate.edu/clade2008/.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Many Task Applications for Grids and Supercomputers

Ian Foster

Computation Institute

Argonne National Lab & University of Chicago

From the Heroic to the Logistical

Programming Model Implications of New Supercomputing Applications

Page 2: Many Task Applications for Grids and Supercomputers

What will we do with 1+ Exaflops and 1M+ cores?

Page 3: Many Task Applications for Grids and Supercomputers

4

Or, If You Prefer, A Worldwide Grid (or Cloud)

EGEE

Page 4: Many Task Applications for Grids and Supercomputers

55

1) Tackle Bigger and Bigger Problems

ComputationalScientist

as Hero

Page 5: Many Task Applications for Grids and Supercomputers

6

2) Tackle More Complex Problems

ComputationalScientist

as LogisticsOfficer

Page 6: Many Task Applications for Grids and Supercomputers

77

“More Complex Problems” Ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database

of ligand structures against target proteins Study economic model sensitivity to parameters Analyze turbulence dataset from many perspectives Perform numerical optimization to determine optimal

resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical

compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel

computations

Page 7: Many Task Applications for Grids and Supercomputers

88

Programming Model Issues

Massive task parallelism Massive data parallelism Integrating black box applications Complex task dependencies (task graphs) Failure, and other execution management issues Data management: input, intermediate, output Dynamic computations (task graphs) Dynamic data access to large, diverse datasets Long-running computations Documenting provenance of data products

Page 8: Many Task Applications for Grids and Supercomputers

9

9

Problem Types

Number of tasks

Inputdatasize

1 1K 1M

Hi

Med

Lo

HeroicMPI

tasks

Dataanalysis,mining

Many loosely coupled tasks

Much data and complex tasks

Page 9: Many Task Applications for Grids and Supercomputers

10

An Incomplete and Simplistic View ofProgramming Models and Tools

Many TasksDAGMan+Pegasus

Karajan+Swift

Much DataMapReduce/Hadoop

Dryad

Complex Tasks, Much DataDryad, Pig, Sawzall

Swift+Falkon

Single task, modest dataMPI, etc., etc., etc.

Page 10: Many Task Applications for Grids and Supercomputers

1111

Image courtesy Pat Behling and

Yun Liu, UW Madison

NCAR computer + grad student160 ensemble members in 75 days

TeraGrid + “Virtual Data System”250 ensemble members in 4 days

Many Tasks

Climate Ensemble Simulations

(Using FOAM,2005)

Page 11: Many Task Applications for Grids and Supercomputers

12

Many Many Tasks:Identifying Potential Drug Targets

2M+ ligands Protein xtarget(s)

(Mike Kubal, Benoit Roux, and others)

Page 12: Many Task Applications for Grids and Supercomputers

13

start

report

DOCK6Receptor

(1 per protein:defines pocket

to bind to)

ZINC3-D

structures

ligands complexes

NAB scriptparameters

(defines flexibleresidues, #MDsteps)

Amber Score:1. AmberizeLigand3. AmberizeComplex5. RunNABScript

end

BuildNABScript

NABScript

NABScript

Template

Amber prep:2. AmberizeReceptor4. perl: gen nabscript

FREDReceptor

(1 per protein:defines pocket

to bind to)

Manually prepDOCK6 rec file

Manually prepFRED rec file

1 protein(1MB)

6 GB2M

structures(6 GB)

DOCK6FRED ~4M x 60s x 1 cpu~60K cpu-hrs

Amber~10K x 20m x 1 cpu

~3K cpu-hrs

Select best ~500

~500 x 10hr x 100 cpu~500K cpu-hrsGCMC

PDBprotein

descriptions

Select best ~5KSelect best ~5K

For 1 target:4 million tasks

500,000 cpu-hrs(50 cpu-years)

Page 13: Many Task Applications for Grids and Supercomputers

14

DOCK on SiCortex CPU cores: 5760 Tasks: 92160 Elapsed time: 12821 sec Compute time: 1.94 CPU years Average task time: 660.3 sec

(does not include ~800 sec to

stage input data)

IoanRaicu

ZhaoZhang

Page 14: Many Task Applications for Grids and Supercomputers

15

DOCK on BG/P: ~1M Tasks on 118,000 CPUs

CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization:

Sustained: 99.6% Overall: 78.3%

• GPFS

• 1 script (~5KB)

• 2 file read (~10KB)

• 1 file write (~10KB)

• RAM (cached from GPFS on first task per node)

• 1 binary (~7MB)

• Static input data (~45MB)IoanRaicu

ZhaoZhang

MikeWilde

Time (secs)

Page 15: Many Task Applications for Grids and Supercomputers

16

Managing 120K CPUs

Slower shared storage

High-speed local disk

Falkon

Page 16: Many Task Applications for Grids and Supercomputers

17

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 180 360 540 720 900 1080 1260 1440Time (sec)

CP

U C

ore

s

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

0 180 360 540 720 900 1080 1260 1440

Mic

ro-T

asks

Idle CPUsBusy CPUsWait Queue LengthCompleted Micro-Tasks

MARS Economic Model

Parameter Study 2,048 BG/P CPU cores Tasks: 49,152 Micro-tasks: 7,077,888 Elapsed time: 1,601 secs CPU Hours: 894

ZhaoZhang

MikeWilde

Page 17: Many Task Applications for Grids and Supercomputers

1818

Page 18: Many Task Applications for Grids and Supercomputers

19

AstroPortal Stacking Service Purpose

On-demand “stacks” of random locations within ~10TB dataset

Challenge Rapid access to 10-10K “random” files Time-varying load

Sample Workloads

S4 SloanData

+

+++

+

+

=

+

Web page

or Web Service

Locality Number of Objects Number of Files1 111700 111700

1.38 154345 1116992 97999 490003 88857 296204 76575 191455 60590 1212010 46480 465020 40460 202530 23695 790

Page 19: Many Task Applications for Grids and Supercomputers

20

AstroPortal Stacking Servicewith Data Diffusion

Aggregate throughput: 39Gb/s

10X higher than GPFS Reduced load on GPFS

0.49Gb/s 1/10 of the original load

0

5

10

15

20

25

30

35

40

45

50

1 1.38 2 3 4 5 10 20 30Locality

Ag

gre

gat

e T

hro

ug

hp

ut

(Gb

/s)

Data Diffusion Throughput LocalData Diffusion Throughput Cache-to-CacheData Diffusion Throughput GPFSGPFS Throughput (FIT)GPFS Throughput (GZ)

Big performance gains as locality increases

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 1.38 2 3 4 5 10 20 30 Ideal

Locality

Tim

e (m

s) p

er s

tack

per

CP

U

Data Diffusion (GZ)Data Diffusion (FIT)GPFS (GZ)GPFS (FIT)

Ioan Raicu, 11:15am TOMORROW

Page 20: Many Task Applications for Grids and Supercomputers

21

B. Berriman, J. Good (Caltech)J. Jacob, D. Katz (JPL)

Page 21: Many Task Applications for Grids and Supercomputers

22

MontageBenchmark

(Yong Zhao, Ioan Raicu, U.Chicago)

MPI: ~950 lines of C for one stagePegasus: ~1200 lines of C + tools to

generate DAG for specific dataset SwiftScript: ~92 lines for any dataset

Page 22: Many Task Applications for Grids and Supercomputers

23

Summary Peta- and exa-scale computers enable us to

tackle new problems at greater scales Parameter studies, ensembles, interactive data

analysis, “workflows” of various kinds Such apps frequently stress petascale hardware

and software in interesting ways New programming models and tools required

Mixed task/data parallelism, task management complex data management, failure, …

Tools (DAGman, Swift, Hadoop, …) exist but need refinement

Interesting connections to distributed systems

More info: www.ci.uchicago.edu/swift

Page 23: Many Task Applications for Grids and Supercomputers