Top Banner
Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory Bologna, Italy, July 5, 2002 Issues in Advanced Computing: A US Perspective Astrofisica computazionale in Italia: modelli e metodi di visualizzazione
23

Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

Mar 28, 2015

Download

Documents

Katelyn Graham
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

Robert Rosner

Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The

University of Chicagoand

Argonne National Laboratory

Bologna, Italy, July 5, 2002

Issues in Advanced Computing: AUS Perspective

Astrofisica computazionale in Italia: modelli e metodi di

visualizzazione

Page 2: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

2July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

An outline of what I will discuss

Defining “advanced computing” “advanced” vs. “high-performance”

Overview of scientific computing in the US today Where, with what, who pays, … ? What has been the “roadmap”? The challenge from Japan

What are the challenges? Technical Sociological

What is one to do? Hardware: What does $600M ($2M/$20M/$60M) per year buy you? Software: What does $4.0M/year for 5 years buy you?

Conclusions

Page 3: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

3July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Advanced vs. high-performance computing

• “Advanced computing” encompasses frontiers of computer use• Massive archiving/data bases

• High performance networks and high data transfer rates

• Advanced data analysis and visualization techniques/hardware

• Forefront high-performance computing (= peta/teraflop computing)

• “High-performance computing” is a tiny subset, and encompasses frontiers of• Computing speed (“wall clock time”)

• Application memory footprint

Page 4: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

4July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Ingredients of US advanced computing today

Major program areas Networking: Teragrid, IWIRE, … Grid Computing: Globus, GridFTP, … Scalable numerical tools: DOE/ASCI and SciDAC, NSF CS Advanced visualization: Software, computing hardware, displays Computing hardware: Tera/Petaflop initiatives

The major advanced computing science initiatives Data-intensive science (incl. “data mining”)

Virtual observatories, digital sky surveys, bioinformatics, LHC science, … Complex systems science

Multi-physics/multi-scale numerical simulations Code verification and validation

Page 5: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

5July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Example: Grid Science

Page 6: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

6July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Specific Example: Sloan Digital Sky Survey Analysis

Image courtesy SDSS

Page 7: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

7July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Size distribution ofgalaxy clusters?

1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy cluster size distribution

Chimera Virtual Data System+ iVDGL Data Grid (many CPUs)

Specific Example: Sloan Digital Sky Survey Analysis

Example courtesy I. Foster (Uchicago/Argonne)

Page 8: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

8July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

ANL

Multiple 10 GbE Fault Tolerant Terabit Back Plane

NERSC/LBNL NCSF Back Plane

CCS/ORNL

Anchor Facilities (Petascale systems)Satellite Facilities (Terascale systems)

Proposed DOE Distributed National Computational Sciences Facility

Specific Example: Toward Petaflop Computing

Example courtesy R. Stevens (Uchicago/Argonne)

Page 9: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

9July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

32

32

5

32

32

5

Router or Switch/Router

32 quad-processor McKinley Servers(128p @ 4GF, 8GB memory/server)

Fibre Channel Switch

HPSS

HPSS

ESnetHSCCMREN/AbileneStarlight

10 GbE

16 quad-processor McKinley Servers(64p @ 4GF, 8GB memory/server)

NCSA500 Nodes

8 TF, 4 TB Memory240 TB disk

SDSC256 Nodes

4.1 TF, 2 TB Memory225 TB disk

Caltech32 Nodes

0.5 TF 0.4 TB Memory

86 TB disk

Argonne64 Nodes

1 TF0.25 TB Memory

25 TB disk

IA-32 nodes

4

Juniper M160

OC-12

OC-48

OC-12

574p IA-32 Chiba City

128p Origin

HR Display & VR Facilities

= 32x 1GbE

= 64x Myrinet

= 32x FibreChannel

Myrinet Clos SpineMyrinet Clos Spine Myrinet Clos SpineMyrinet Clos Spine

= 8x FibreChannel

OC-12

OC-12

OC-3

vBNSAbileneMREN

Juniper M40

1176p IBM SPBlue Horizon

OC-48

NTON

32

24

8

32

24

8

4

4

Sun E10K

4

1500p Origin

UniTree

1024p IA-32 320p IA-64

2

14

8

Juniper M40vBNS

AbileneCalrenESnet

OC-12

OC-12

OC-12

OC-3

8

SunStarcat

16

GbE

= 32x Myrinet

HPSS

256p HP X-Class

128p HP V2500

92p IA-32

24Extreme

Black Diamond

32 quad-processor McKinley Servers(128p @ 4GF, 12GB memory/server)

OC-12 ATM

Calren

2 2

> 10 Gb/s

Specific Example: NSF-funded 13.6 TF Linux TeraGrid

Cost: ~ $53M,FY01-03

Page 10: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

10July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Re-thinking the role of computing in science

Computer science (= informatics) research is typically carried out as a traditional academic-style research operation Mix of basic research (applied math, CS, …) and applications (PETSc,

MPICH, Globus, …) Traditional “outreach” meant providing packaged software to others

New intrusiveness/ubiquity of computing Opportunities

E.g., integrate computational science into the natural sciences Computational science as the fourth component of astrophysical science:

Observations Theory Experiment Computational science The key step:

To motivate and drive informatics developments by the applications discipline

Page 11: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

11July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

What are the challenges? The hardware …

Staying along Moore’s Law trajectory Reliability/redundancy/“soft” failure modes

The ASCI Blue Mountain experience … Improving “efficiency”

“efficiency” = actual performance/peak performance Typical #s on tuned codes for US machines ~ 5-15% (!!) Critical issue: memory speed vs. processor speed US vs. Japan: do we examine hardware architecture?

Network speed/capacity Storage speed/capacity Visualization

Display technology Computing technology (rendering, ray tracing, …)

Page 12: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

12July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

What are the challenges? The software …

Programming models MPI vs. OpenMP vs. … Language interoperability (F77, F90/95, HPF, C, C++, Java, …)

Glue languages: scripts, Python, … Algorithms

Scalability Reconciling time/spatial scalings (example: rad. hydro) Data organization/data bases Data analysis/visualization

Coding and code architecture Code complexity (debugging, optimization, code repositories, access control,

V&V) Code reuse & code modularity Load balancing

Page 13: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

13July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

What are the challenges? The sociology …

How do we get astronomers, applied mathematicians, computer scientists, … to talk to one another productively?

• Overcoming cultural gap(s): language, research style, …• Overcoming history• Overcoming territoriality: who’s in charge?

• Computer scientists doing astrophysics?• Astrophysicists doing computer science?

• Initiation: top-down or bottom-up?• Anectodal evidence is that neither works well, if at all

• Possible solutions include:• Promote acculturation (mix): Theory institutes and Centers• Encourage collaboration: Institutional incentives/seed funds• Lead by example: construct “win-win” projects, change “other” to “us”

• ASCI/Alliance centers at Caltech, Chicago, Illinois, Stanford, Utah

Page 14: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

14July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

The Japanese example: focus

High resolution global modelsPredictions of global

warming, etc.

High resolution regional models

Predictions of El Niño events and Asian monsoons, etc.

High resolution local models

Atmospheric and oceanographic science

Global dynamic model

Simulation of earthquake generation processes,

seismic wave tomography

Solid earth science

Regional modelDescription of crust/mantle

activity in the Japanese Archipelago region

Describing the entire solid earth as a system

Other HPC applications: biology, energy science, space physics, etc.

Predictions of weather disasters (typhoons, localized torrential downpours, downbursts, etc.)

Information courtesy: Keiji Tani, Earth Simulator Research and Development Center, Japan Atomic Energy Research Institute

EarthEarthSimulatorSimulator

Page 15: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

15July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Using the science to define requirements

Requirements for the Earth Simulator

Necessary CPU capabilities for atmospheric circulation models:

Present Earth Simulator CPU ops ratio

Global model 50-100 km 5-10 km ~100 Regional model 20-30 km 1 km few 100s Layers several 10s 100-200 few 10s Time mesh 1 1/10 10

Necessary memory footprint for 10 km-mesh:Assume: 150-300 words for each grid point:4000×2000×200×(150-300)×2×8 = 3.84 - 7.68 TB

CPU must be at least 20 times faster than those of present computers for atmospheric circulation models; memory comparable to NERSC

Seaborg.

Effective performance, NERSC Glenn Seaborg: ~0.05*5 Tops ~ 0.25 Tops

Effective performance of E.S.: > 5 Tops Main memory of E.S. : > 8 TB

Horizontalmesh

Page 16: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

16July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

What is the result, ~$600M later?

Page 17: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

17July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

What is the result, ~$600M later?

Architecture: MIMD-type, distributed memory, parallel system, consisting ofcomputing nodes with tightly coupled vector-type multi-processors which share main memory

Performance: Assuming an efficiency ~ 12.5%, the peak performance is ~ 40 TFLOPS (recently, achieved well over 30% [!!])

The effective performance for atmospheric circulation model > 5 TFLOPS

Earth Simulator Seaborg

Total number of processor nodes: 640 208 Number of PE’s for each node: 8 16 Total number of PE’s: 5120 3328 Peak performance of each PE: 8 Gops 1.5 Gops Peak performance of each node: 64 Gops 24 Gops Main memory: 10 TB (total) > 4.7 TB Shared memory / node: 16 GB 16-64 GB Interconnection network: Single-Stage Crossbar Network

Page 18: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

18July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

The US Strategy: “Layering”

Small SystemComputing Capability

0.1 GF – 10GF

High-EndComputing Capability

10+ TF

Major CentersExample: NERSC

Local (university) resources

$3-5M capital costs~$2-3M operating costs

Mid-RangeComputing/Archiving

Capability~1.0 TF/~100 TB Archive

Local CentersExample: Argonne

$3-5K capital costs<$0.5K operating costs

>$100M capital costs~$20-30M operating costs

Page 19: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

19July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

The US example: focusing software advances

The DOE/ASCI challenge: how can application software development be sped up, and take advantage of latest advances in physics, applied math, computer science, … ?

The ASCI solution: do an experiment• Create 5 groups at universities, in a variety of areas of “multi-physics”

• Astrophysics (Chicago), shocked materials (Caltech), jet turbines (Stanford), accidental large-scale fires (U. Utah), solid fuel rockets (U. Illinois/Urbana)

• Fund well, at ~$20M total for 5 years (~$45M for 10 years)• Allow each Center to develop its own computing science infrastructure• Continued funding contingent on meeting specific, pre-identified, goals• Results? See example, after 5 years!

The SciDAC solution: do an experiment• Create a mix of applications and computer science/applied math groups• Create funding-based incentives for collaborations, forbid “rolling one’s own” solutions

• Example: application groups funded at ~15-30% of ASCI/Alliance groups

• Results? Not yet clear (effort ~ 1 year old)

Page 20: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

20July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Example: The Chicago ASCI/Alliance Center

• Funded starting Oct. 1, 1997, 5-year anniversary Oct. 1, 2002, w/ possible extension for another 5 years

• Collaboration between• University of Chicago (Astrophysics, Physics, Computer Science,

Math, and 3 Institutes [Fermi Institute, Franck Institute, Computation Institute])

• Argonne National Laboratory (Mathematics and Computer Science)• Rensselear Polytechnic Institute (Computer Science)• Univ. of Arizona/Tuscon (Astrophysics)• “Outside collaborators”: SUNY/Stony Brook (Relativistic rad. hydro),

U. Illinois/Urbana (rad. hydro), U. Iowa (Hall mhd), U. Palermo (solar/time-dependent ionization), UC Santa Cruz (flame modeling), U. Torino (mhd, relativistic hydro)

• Extensive “validation” program with external experimental groups• Los Alamos, Livermore, Princeton/PPPL, Sandia, U. Michigan, U.

Wisconsin

Page 21: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

21July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

What does $4.0M/yr for 5 years buy?

Cellular detonation

Compressed turbulence

Helium burning on neutron starsRichtmyer-Meshkov instability

Laser-driven shock instabilitiesNova outbursts on white dwarfs

Flame-vortex interactions

Wave breaking on white dwarfs

Type Ia Supernova

Intracluster interactions

MagneticRayleigh-Taylor

Rayleigh-Taylor instability

Relativistic accretion onto NS

Gravitational collapse/Jeans instability

Orzag/Tang MHDvortex

The Flash code1. Is modular2. Has a modern CS-influenced architecture3. Can solve a broad range of (astro)physics problems4. Is highly portable

a. Can run on all ASCI platformsb. Runs on all other available massively-parallel systems

5. Can utilize all processors on available MMPs6. Scales well, and performs well7. Is extensively (and constantly) verified/validated8. Is available on the web: http://flash.uchicago.edu9. Has won a major prize (Gordon Bell 2001)10. Has been used to solve significant science problems

• (nuclear) flame modeling• Wave breaking

Page 22: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

22July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

Conclusions

Key first step: Answer the question: Is the future imposed? planned? opportunistic? Answer the question: What is the role of various institutions, and of

individuals? Agree on specific science goals

What do you want to accomplish?Who are you competing with?

Key second steps: Insure funding support for long-term (= expected project

duration) Construct science “roadmap” Define specific science milestones

Key operational steps Allow for early mistakes Insist on meeting specific science milestones by mid-project

Page 23: Robert Rosner Enrico Fermi Institute and Departments of Astronomy & Astrophysics and Physics, The University of Chicago and Argonne National Laboratory.

23July 5, 2002Astrofisica computazionale in Italia: modelli e metodi di visualizzazione, Bologna, Italy

And that brings us to …

Questions and Discussion