Top Banner
Sidney Fernbach Director of Computing at Lawrence Livermore National Laboratory 1952-1982
66

Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Aug 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Sidney FernbachDirector of Computing at

Lawrence Livermore National Laboratory1952-1982

Page 2: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Fernbach Awardees

1993 David Bailey1994 Charles Peskin1995 Paul Woodward1996 Gary Glatzmaier1997 Charbel Farhat1998 Phillip Collela1999 Michael Norman

2000 Stephen Attaway2002 Robert Harrison2003 Jack Dongarra2004 Marsha Berger2005 John Bell2006 Ed Seidel2007

David Keyes, William Gropp, Mitchell Smooke, Tony Chan, Xiao-Chuan Cai, Barry Smith, David Young, Dana Knoll, M. Driss Tidriri, V. Venkatakrishnan, Dimitri Mavriplis, C. Timothy Kelley, Omar Ghattas, Lois C. McInnes, Dinesh Kaushik, John Shadid,, Kyle Anderson, Carol Woodward, Florin Dobrian, Daniel Reynolds, Yuan He

Page 3: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

David KeyesApplied Physics & Applied Mathematics

Columbia University&

Towards Optimal Petascale Simulations (TOPS) CenterU.S. DOE SciDAC Program

T ÇÉÇÄ|ÇxtÜÄç |ÅÑÄ|v|à ÅtÇ|yxáàÉ

2007 Sidney Fernbach Lecturewww.columbia.edu/~kd2112/Fernbach_2007.pdf

*

* a public declaration of principles and intentions

Page 4: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

got implicitness?

Page 5: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Going implicit?Why you would, if you could

multiscale problems with good scale separationcoupled problems (“multiphysics”)problems with uncertain or controllable inputs (optimization: design, control, inversion, assimilation)

You can, so you shouldoptimal and scalable algorithms knownfreely available software availablereasonable learning curve that harvests legacy code

Page 6: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

got Jacobians?

circa 2002

Page 7: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Current focus on Jacobian-free implicit methods

Two stories to track in supercomputing

raise the peak capabilitylower the entry threshold

higher capabilityfor hero users

best practicesfor all users

New York Blue at BNL (#10 on the Top 500)

first frontier

“new” frontier

Jacobian a steep price, in terms of coding

very valuable to have, but not necessaryapproximations thereto often sufficientmeanwhile, automatic differentiation tools lowering the threshold

Page 8: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Recent “E3” report highlights weaknesses of explicit methods

“The dominant computational solution strategy over the past 30 years has been the use of first-order-accurate operator-splitting, semi-implicit and explicit time integration methods, and decoupled nonlinear solution strategies. Such methods have not provided the stability properties needed to perform accurate simulations over the dynamical time-scales of interest. Moreover, in most cases, numerical errors and means for controlling such errors are understood heuristically at best.”

2007

Page 9: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Recent E3 report highlights opportunities for implicit methods

“Research in linear and nonlinear solvers remains a critical focus area because the solvers provide the foundation for more advanced solution methods. In fact, as modeling becomes more sophisticated to increasingly include optimization, uncertainty quantification, perturbationanalysis, and more, the speed androbustness of the linear and nonlinear solvers will directly determine the scope of feasible problems to be solved.”

2007

Page 10: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

20022003

2003-2004 (2 vol )2004

20062006

2007

Fusion Simulation Project

June 2007

2007

Mathematical Challenges for the

Department of Energy

December 2007

2007

among many other developments

Some reports predicated on scalable implicit

solvers …

Page 11: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Plan of presentationMotivations for implicit solvers

one-dimensional model problems, linear and nonlinearportfolio of real world problems

State-of-the-art for large-scale nonlinearly implicit solvers

brief look at algorithms and softwareintuition about how they scale

Illustrative stories from the trenchesan undergraduate semester project “gone Broadway”community code simulations supporting international magneticallyconfined fusion energy program

Invitation and acknowledgments

Page 12: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

“Explicit” versus “implicit”

nj

nj

nj

nj

nj UUUUU =+−− +

−++

++ )2( 1

111

11 νImplicit methods solve a

function of state data at the current time, to update all components simultaneously

equivalent to inverting a matrix, in linear problems

2

2

xu

tu

∂∂

=∂∂

)2( 111 n

jnj

nj

nj

nj UUUUU −++ +−+= ν

2)( xt

ΔΔ

≡ν

Explicit methods evaluate a function of state data at prior time, to update each component of the current state independently

equivalent to matrix-vector multiplication, in linear problems

Page 13: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Explicit methods can be unstable –linear example

nj

nj

nj

nj

nj UUUUU =+−− +

−++

++ )2( 1

111

11 ν

)2( 111 n

jnj

nj

nj

nj UUUUU −++ +−+= ν

Stable for all ν

Unstable for ν>1/2

c/o K. Morton & D. Mayers, 2005

initialdata

after 1step

after 25steps

after 50steps

Δt = 0.0012 Δt = 0.0013

Page 14: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Explicit methods can be unphysically oscillatory –nonlinear example

⎩⎨⎧

>−+≤

≡≡⎟⎠⎞

⎜⎝⎛

∂∂

∂∂

=∂∂

critcrit

critxx ssssk

sssux

xuux

xtu

||,)|(|||,

)(;)(;),( 2/10

0

κκ

κκαα

nj

nj

nj

nj

nj

nj

nj

nj

UUU

UUU

=−−

−−+−

+−

++++

+

)](

)([11

12/1

1112/1

1

α

αν

Linearly implicit, nonlinearly explicit:

nj

nj

nj

nj

nj

nj

nj

nj

UUU

UUU

=−−

−−+−

++−

+++

++

+

)](

)([11

112/1

111

12/1

1

α

αν

Linearly and nonlinearly implicit:

history at station 10

history at station 10

Oscillatory

Non-oscillatory

Page 15: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

However –implicit methods can be unruly and expensive

O(N c), e.g., c=7/3O(N)Complexity

O(N w), e.g., w=5/3O(N)Workspace

global, in principlenearest neighbor*Communication

many times per steponce per stepSynchronization

limitedO(N)Concurrency

data-dependentpredictablePerformance

uncertainrobust when stableReliability

Naïve ImplicitExplicit

* plus the estimation of the stable step size

Page 16: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Many simulation opportunities are multiscaleMultiple spatial scales

interfaces, fronts, layersthin relative to domain size, δ << L

Multiple temporal scalesfast wavessmall transit times relative to convection or diffusion, τ << T

Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over more modest range of scalesOften involves filtering of high frequency modes, quasi-equilibrium assumptions, etc.May lead to infinitely “stiff” subsystem requiring implicit treatment

Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL

Page 17: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

CS

Math

Applications

Common technologies respond

Many applications

drive

e.g., DOE’s SciDAC* portfolio is multiscale

* Scientific Discovery through Advanced Computing

Page 18: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Examples of scale-separated features of multiscale problems

Gravity surface waves in global climateAlfvén waves in tokamaksAcoustic waves in aerodynamicsFast transients in detailed kinetics chemical reactionBond vibrations in protein folding (?)

Explicit methods are restricted to marching out the long-scale dynamics on short scales. Implicit methods can “step over” or “filter out” with equilibrium assumptions the dynamically irrelevant short scales,ignoring stability bounds. (Accuracy bounds must still be satisfied; for long time steps, one can use high-order temporal integration schemes!)

Page 19: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

IBM’s BlueGene/P: 72K quad-core procs w/ 2 FMADD @ 850 MHz = 1.008 Pflop/s

13.6 GF/s8 MB EDRAM

4 processors

1 chip

13.6 GF/s2 GB DDRAM

32 compute cards

435 GF/s64 GB

32 node cards

72 racks

1 PF/s144 TB

Rack

System

Node Card

Compute Card

Chip

14 TF/s2 TB

Thread concurrency: 288K (or 294,912) processors

On the floor at Argonne National Laboratory by early 2009

What’s “big iron” for, if not multiscale?

“You go to [petascale] with the [computers] you have, not with the [computers] you want.”paraphrase of a recent U.S. Secretary of Defense

Page 20: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Review: two definitions of scalability“Strong scaling”

execution time decreases in inverse proportion to the number of processorsfixed size problem overalloften instead graphed as reciprocal, “speedup”

“Weak scaling”execution time remains constant, as problem size and processor number are increased in proportionfixed size problem per processoralso known as “Gustafson scaling”

T

p

good

poor

poor

N ∝ p

log T

log pgood

N constant

Slope= -1

Slope= 0

Page 21: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Explicit methods do not weak scale!Illustrate for CFL-limited explicit time stepping*Parallel wall clock time

dd PST //1 αα+∝

d-dimensional domain, length scale Ld+1-dimensional space-time, time scale Th computational mesh cell sizeτ computational time step size τ=O(hα) stability bound on time stepn=L/h number of mesh cells in each dimN=nd number of mesh cells overallM=T/τ number of time steps overallO(N) total work to perform one time stepO(MN) total work to solve problemP number of processorsS storage per processorPS total storage on all processors (=N)O(MN/P) parallel wall clock time∝ (T/τ)(PS)/P ∝ T S1+α/d Pα/d

(since τ ∝ hα ∝ 1/nα = 1/Nα/d = 1/(PS)α/d )

3 months10 days1 dayExe. time

105×105×105104×104×104103× 103×103Domain

Example: explicit wave problem in 3D (α=1, d=3)

27 years3 months1 dayExe. time

105× 105104× 104103× 103Domain

Example: explicit diffusion problem in 2D (α=2, d=2)

*assuming dynamics needs to be followed only on coarse scales

“blackboard”

Page 22: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Interfacial couplingOcean-atmosphere coupling in climateCore-edge coupling in tokamaksFluid-structure vibrations in aerodynamicsBoundary layer-bulk phenomena in fluidsSurface-bulk phenomena in solids

Bulk-bulk couplingRadiation-hydrodynamicsMagneto-hydrodynamics

Many simulation opportunities are multiphysics

SST Anomalies, c/o A. Czaja, MIT

Coupled systems may admit destabilizing modes not present in either system alone

Page 23: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Climate predictionSubsurface contaminant transport or petroleum recovery, and seismologyMedical imagingStellar dynamics, e.g., supernovaeNondestructive evaluation of structures

Uncertainty can be inconstitutive lawsinitial conditionsboundary conditions

Many simulation opportunities face uncertainty

Subsurface property estimation, c/o Roxar

Sensitivity, optimization, parameter estimation, boundary control require the ability to apply the inverse action of the Jacobian – available in all Newton-like implicit methods

Page 24: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Forward vs. inverse problems

model

forcingBCs

params

ICs

forward problem

solution

inverse problem

model

forcingBCs

solution

ICs

params

+ regularization

Page 25: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Applications requiring scalable solvers –conventional and progressive

Magnetically confined fusion Poisson problemsnonlinear coupling of multiple physics codes

Accelerator designMaxwell eigenproblemsshape optimization subject to PDE constraints

Porous media flowdiv-grad Darcy problemsparameter estimation

Page 26: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

The TOPS “Center for Enabling Technology”spans 4 labs & 5 universities

Towards Optimal Petascale Simulations

Mission: enable scientists and engineers to take full advantage of petascale hardware by overcoming the scalability bottlenecks of traditional solvers, and assist users to move beyond “one-off”simulations, to validation and optimization (1 of 3 such centers)

Columbia University University of Colorado University of TexasUniversity of California

at San Diego

Lawrence Livermore National Laboratory

Sandia National Laboratories

Page 27: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

TOPS institutions

UCB/LBNLANL

UT

TOPS lab (4)

CU

LLNL

TOPS university (5)

UCSD

CU-B

Towards Optimal Petascale Simulations

SNL

Page 28: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

TOPS is building a toolchain of proven solver components that interoperate

We aim to carry users from “one-off” solutionsto the full scientific agenda of sensitivity, stability, and optimization (from heroic point studies to systematic parametric studies) all in one software suiteTOPS solvers are nested, from applications-hardened linear solvers outward, leveraging common distributed data structures Communication and performance-oriented details are hidden so users deal with mathematical objects throughoutTOPS features these trusted packages, whose functional dependences are illustrated (right):Hypre, PETSc, ScaLAPACK, SUNDIALS, SuperLU, TAO, Trilinos

Optimizer

Linear solver

Eigensolver

Time integrator

Nonlinear solver

Indicates dependence

Sens. Analyzer

These are in use and actively debugged in dozens of high-performance computing environments, in dozens of applications domains, by thousands of user groups around the world.

Page 29: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Adams Cai Demmel Falgout Ghattas Heroux

Hu Husbands Kaushik Knepley Li Manteuffel

Reynolds Serban Smith Woodward C. Yang U. Yang

McCormick McInnes Moré Munson Ng Norris

TOPS software is brought to you by … many of the finest computational scientists alive !

Page 30: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

It’s all about algorithms (at the petascale)Given, for example:

a “physics” phase that scales as O(N)a “solver” phase that scales as O(N3/2)computation is almost all solver after several doublings

Most applications groups have not yet “felt” this curve in their gut

as users actually get into queues with more 4K processors, this will change

0

0.2

0.4

0.6

0.8

1

1.2

1 4 16 64 256 1024

SolverPhysics

Solver takes 50% time on 128 procs

Solver takes 97% time on 128K procs

Weak scaling limit, assuming efficiency of 100% in both physics and solver phases

problem size

Page 31: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Legacy solvers will not port to the petascale

Data Layoutstructured composite block-struc unstruc CSR

Linear SolversGMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...

Linear System Interfaces

Nonscalable solution!

Page 32: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Reminder: solvers evolve underneath “Ax = b”Advances in algorithmic efficiency rival advances in hardware architectureConsider Poisson’s equation on a cube of size N=n3

If n=64, this implies an overall reduction in flops of ~ 16 million

n3n3BrandtFull MG1984

n3.5 log nn3ReidCG-MILU1971

n4 log nn3YoungOptimal SOR1950

n7n5Von Neumann & Goldstine

GE (banded)1947

FlopsStorage ReferenceMethodYear

∇2u=f 64

64 64

*Six months is reduced to 1 second

*

Page 33: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

year

relative speedup

Algorithms and Moore’s LawThis advance took place over a span of about 36 years, or 24 doubling times for Moore’s Law224 ≈ 16 million ⇒ the same as the factor from algorithms alone!

16 million speedup

from each

Algorithmic and architectural

advances work together!

Page 34: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Implicit methods can scale!2004 Gordon Bell “special” prize

Cortical bone

Trabecular bone

2004 Bell Prize in “special category” went to an implicit, unstructured grid bone mechanics simulation

0.5 Tflop/s sustained on 4 thousand procs of ASCI Whitelarge-deformation analysisin production in bone mechanics lab

c/o M. Adams, Columbia

Page 35: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Transonic “Lambda” Shock, Mach contours on surfaces

1999 Bell Prize in “special category” went to implicit, unstructured grid aerodynamics problems

0.23 Tflop/s sustained on 3 thousand processors of Intel’s ASCI Red11 million degrees of freedomincompressible and compressible Euler flowemployed in NASA analysis/design missions

to s

Implicit methods can scale!1999 Gordon Bell “special” prize

Page 36: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

SPMD parallelism w/domain decompositionputs off limitation of Amdahl in weak scaling

Partitioning of the grid induces block structure on the system matrix (Jacobian)

Computation scales with area; communication scales with perimeter; ratio fixed in weak scaling

Ω1

Ω2

Ω3

A23A21 A22rows assigned

to proc “2”

Page 37: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

DD relevant to any local stencil formulation

finite differences finite elements finite volumes

• lead to sparse Jacobian matrices J=

node i

row i• however, the inverses are generally dense; even the factors suffer unacceptable fill-in in 3D• want to solve in subdomains only, and use to precondition full sparse problem

Page 38: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

No “scalable” without “optimal”“Optimal” for a theoretical numerical analyst means a method whose floating point complexity grows at most linearly in the data of the problem, N, or (more practically and almost as good) linearly times a polylog termFor iterative methods, this means that both the cost per iteration and the number of iterations must be O(N logp N)Cost per iteration must include communication cost as processor count increases in weak scaling, P ∝ N

BlueGene, for instance, permits this with its log-diameter hardware global reduction

Number of iterations comes from condition number for linear iterative methods; Newton’s superlinear convergence is important for nonlinear iterations

Page 39: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Why optimal algorithms?

The more powerful the computer, the greater the importance of optimalityExample:

Suppose Alg1 solves a problem in time C N2, where N is the input sizeSuppose Alg2 solves the same problem in time C N log2 NSuppose Alg1 and Alg2 parallelize perfectly on a machine of 1,000,000 processors

In constant time (compared to serial), Alg1 can run a problem 1,000 X larger, whereas Alg2 can run a problem nearly 65,000 X larger

Page 40: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Components of scalable solvers for PDEsSubspace solvers

elementary smoothersincomplete factorizations full direct factorizations

Global linear preconditionersSchwarz and Schur methodsmultigrid

Linear acceleratorsKrylov methods

Nonlinear rootfindersNewton-like methods

alone unscalable: either too many iterations or too much fill-in

opt. combins. of subspace solvers

mat-vec algs.

vec-vec algs. + linear solves

Page 41: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Newton-Krylov-Schwarz: a PDE applications “workhorse”

Newtonnonlinear solver

asymptotically quadratic

0)(')()( =+≈ uuFuFuF cc δuuu c δλ+=

Krylovaccelerator

spectrally adaptive

FuJ −=δ}{minarg

},,,{ 2FJxu

FJJFFVx+=

≡∈ L

δ

Schwarzpreconditionerparallelizable

FMuJM 11 −− −=δ

iTii

Tii RJRRRM 11 )( −− ∑=

Page 42: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

“Secret sauce” #1: iterative correction w/ each step O(N)

The most basic idea in iterative methods for Ax = b

Evaluate residual accurately, but solve approximately, where is an approximate inverse to AA sequence of complementary solves can be used, e.g., with first and then one has

)(1 AxbBxx −+← −

)]([ 11

12

12

11 AxbABBBBxx −−++← −−−−

2B1B

1−B

RRARRB TT 112 )( −− =

)( 1AB−Optimal polynomials of lead to various preconditioned Krylov methodsScale recurrence, e.g., with , leads to multilevel methods

Page 43: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

smoother

Finest Grid

First Coarse Gridcoarser grid has fewer cells

(less work & storage)

Restrictiontransfer from fine to coarse grid

Recursively apply this idea until we have an easy problem to solve

A Multigrid V-cycle

Prolongationtransfer from coarse to fine grid

“Secret sauce” #2:treat each error component in optimal subspace

c/o R. Falgout, LLNL

Page 44: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

“Secret sauce” #3:skip the Jacobian

In the Jacobian-Free Newton-Krylov (JFNK) method for F(u) = 0 , a Krylov method solves the linear Newton correction equation, requiring Jacobian-vector productsThese are approximated by the Fréchet derivatives

(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed

One builds the Krylov space on a true F′(u) (to within numerical approximation)

)]()([1)( uFvuFvuJ −+≈ εε

ε Carl Jacobi

Page 45: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Secret sauce #4:use the user’s solver to precondition

Almost any code to solve F(u) = 0 computes a residual and invokes some process to compute an update to u based on the residualDefines a weakly converging nonlinearly method

M is, in effect, a preconditioner and can be applied directly within a Jacobian-free Newton context This is the “physics-based preconditioning”strategy discussed in the E3 report

uuFM k δa)(:uuu kk δ+←+1

Page 46: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Example: fast spin-up of ocean circulation model using Jacobian-free Newton-Krylov

State vector, u(t)Propagation operator (this is any code) Φ (u,t): u(t) = Φ (u(0),t)

here, single-layer quasi-geostrophic ocean forced by surface Ekman pumping, damped with biharmonic hyperviscosity

Task: find state u that repeats every period T (assumed known)Difficulty: direct integration (DI) to find steady state may require thousands of years of physical timeInnovation: pose as Jacobian-free NK rootfinding problem, F(u) = 0, where F(u) ≡ u - Φ (u(0),T)

Jacobian is dense, would never think of forming!

converged streamfunction

difference between DI and NK (10-14)

Page 47: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Example: fast spin-up of ocean circulation model using Jacobian-free Newton-Krylov

2-3 orders of magnitude speedup of Jacobian-free NK relative to Direct Integration (DI)

OGCM: Helfrich-Holland integrator

Implemented in PETSc as undergraduate research project

c/o T. Merlis, Caltech, Dept. Environmental Science & Engineering

Page 48: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

SciDAC’s Fusion Simulation Project: support of the international fusion program

+

J. Fusion Energy 20: 135-196 (2001)

updated in June 2007, Kritz & Keyes, eds.

ITER: $12B “the way (L)”

Fusion by 2017; criticality by 2022

“Big Iron” meets “Big Copper”

Page 49: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

0)( =Φ u

Page 50: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Scaling fusion simulations up to ITER

c/o S. Jardin, PPPL

1012 needed (explicit uniform

baseline)

International ThermonuclearExperimentalReactor

2017 – first experiments, in Cadaraches, France

Small tokamak

Large tokamak

Huge tokamak

Page 51: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

1.5 orders: increased processor speed and efficiency1.5 orders: increased concurrency1 order: higher-order discretizations

Same accuracy can be achieved with many fewer elements

1 order: flux-surface following griddingLess resolution required along than across field lines

4 orders: adaptive griddingZones requiring refinement are <1% of ITER volume and resolution requirements away from them are ~102 less severe

3 orders: implicit solversMode growth time 9 orders longer than Alfven-limited CFL

Where to find 12 orders of magnitude in 10 years?H

ardw

are:

3So

ftwar

e: 9

Algorithmic improvements bring

yottascale (1024) calculation down to

petascale (1015)!

Page 52: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Increased processor speed10 years is 6.5 Moore doubling times

Increased concurrencyBG/L is already 217 procs, MHD now routinely at ca. 212

Higher-order discretizations low-order preconditioning of high-order discretizations

Flux-surface following griddingin SciDAC, this is ITAPS; evolve mesh to approximately follow flux surfaces

Adaptive griddingin SciDAC, this is APDEC; Cartesian AMR

Implicit solversin SciDAC, this is TOPS; Newton-Krylov w/multigrid preconditioning

Comments on ITER simulation roadmap

Page 53: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Illustrations from computational MHD

M3D code (Princeton)multigrid replaces block Jacobi/ASM preconditioner for optimalitynew algorithm callable across Ax=b interface

NIMROD code (General Atomics)direct elimination replaces PCG solver for robustnessscalable implementation of old algorithm for Ax=b

The fusion community may use more cycles on unclassified U.S. DOE computers than any other (e.g., 32% of all cycles at NERSC in 2003). Well over 90% of these cycles are spent solving linear systems in M3D and NIMROD, which are prime U.S. code contributions to the designing of ITER.

Page 54: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

NIMROD: direct elim. for robustnessNIMROD code

high-order finite elements in each poloidal crossplanecomplex, nonsymmetric linear systems with 10K-100K unknowns in 2D (>90% exe. time)

TOPS collaborationreplacement of diagonally scaled Krylov with SuperLU, a supernodal parallel sparse direct solver2D tests run 100× faster; 3D production runs are ~5× faster

c/o D. Schnack, et al.

Page 55: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

M3D: multigrid for optimalityM3D code

unstructured mesh, hybrid FE/FD discretization with C0 elements in each poloidal crossplanereal linear systems (>90% exe. time)

TOPS collaborationreplacement of additive Schwarz (ASM) preconditioner with algebraic multigrid (AMG) from Hypreachieved mesh-independent convergence rate ~5× improvement in execution time

0

100

200

300

400

500

600

700

3 12 27 48 75

ASM-GMRESAMG-FMGRES

c/o S. Jardin, et al.

Page 56: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Algebraic multigrid a key algorithmic technologyDiscrete operator defined for finest grid by the application, itself, and for many recursively derived levels with successively fewer degrees of freedom, for solver purposesUnlike geometric multigrid, AMG not restricted to problems with “natural”coarsenings derived from grid alone

Optimality (cost per cycle) intimately tied to the ability to coarsen aggressivelyConvergence scalability (number of cycles) and parallel efficiency also sensitive to rate of coarsening

c/o U. M. Yang, LLNL

Solvers are scaling:algebraic multigrid (AMG) on BG/L (hypre)

Figure shows weak scaling result for AMG out to 120K processors, with one 25×25×25block per processor (up to 1.875B dofs) procs

While much research and development remains, multigrid will clearly be practical at BG/P-scale concurrency

fu =Δ−

0

5

10

15

20

0 50000 100000

2B dofs

15.6K dofs

sec

Page 57: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Resistive MHD prototype implicit solverMagnetic reconnection: the breaking and reconnecting of oppositely directed magnetic field lines in a plasma, replacing hot plasma core with cool plasma, halting the fusion process

Replace explicit updates with implicit Newton-Krylov from SUNDIALS with factor of ~5× in execution time

Current (J = r £ B)

J. Brin et al., “Geospace Environmental Modeling (GEM) magnetic reconnection challenge,” J. Geophys. Res. 106 (2001) 3715-3719.

c/o D. Reynolds, UCSD

Page 58: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Resistive MHD: implicit solver, ex #2

Magnetic reconnection: previous example was compressible – primitive variable; this example is incompressible –streamfunction/vorticity

Replace explicit updates with implicit Newton-Krylov from PETSc with factor of ~5× in speedup

c/o F. Dobrian, ODU

Page 59: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

“What changed were simulations that showed that the new ITER design will, in fact, be capable of achieving and

sustaining burning plasma.”

– Ray Orbach,Undersecretary of Energy

& CETS

The U.S. role in multi-billion-dollar international projects will increasingly depend upon large-scale simulation, as exemplified

by the 2003 Congressional testimony of Ray Orbach, above.

Page 60: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Engage at a higher-level than Ax=bNewton-Krylov-Schwarz/MG on coupled nonlinear system

Sensitivity analysesvalidation studies

Stability analyses“routine” outer loop on steady-state solutions

Optimizationparameter identificationdesign of facilities (accelerators, tokamaks, power plants, etc.)control of experiments

TOPS’ wishlist for MHD collaborations —“Asymptopia”

Page 61: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Hardware Infrastructure

ARCHITECTURES

Applications

A “perfect storm” for scientific simulation

scientific models

numerical algorithms

computer architecture

scientific software engineering

(dates are symbolic)

1686

1947

1976

1992

Page 62: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

TOPS dreams that users will…

Understand range of algorithmic options w/tradeoffse.g., memory vs. time, comp. vs. comm., inner iteration

work vs. outer

Try all reasonable options “easily”without recoding or extensive recompilation

Know how their solvers are performingwith access to detailed profiling information

Intelligently drive solver researche.g., publish joint papers with algorithm researchers

Simulate truly new physics free from solver limitse.g., finer meshes, complex coupling, full nonlinearity

User’s Rights

Page 63: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

People who share the 2007 Fernbach

1982 William Gropp, UIUC1984 Mitchell Smooke, Yale1984 Tony Chan, UCLA1989 Xiao-Chuan Cai, CU-Boulder1990 Barry Smith, Argonne1991 David Young, Boeing1992 Dana Knoll, Idaho Nat Lab 1992 M. Driss Tidriri, Iowa State 1993 V. Venkatakrishnan, Boeing1993 Dimitri Mavriplis, UWyoming

1995 C. Timothy Kelley, NCSU1995 Omar Ghattas, UT-Austin 1995 Lois C. McInnes, Argonne1996 Dinesh Kaushik, Argonne1997 John Shadid, Sandia1997 Kyle Anderson, UT-C1997 Carol Woodward, LLNL 2001 Florin Dobrian, ODU2002 Daniel Reynolds, UCSD2006 Yuan He, Columbia

(with the year in which we began substantive collaborations)

Page 64: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Primary support through the years

DOE/SC SciDAC ISIC/CET project (2001-present)DOE/NNSA ASCI Level-2 center (1998-2001)NSF Multidisciplinary Computational Challenges center (1995-1998)NASA Computational Aerosciences project (1995-1998)NSF Presidential Young Investigator (1989-1994)

Page 65: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

Sidney FernbachDirector of Computing at

Lawrence Livermore National Laboratory1952-1982

Page 66: Sidney Fernbach Director of Computing at Lawrence ...kd2112/Fernbach_2007.pdf · Fernbach Awardees z1993 David Bailey z1994 Charles Peskin z1995 Paul Woodward z1996 Gary Glatzmaier

EOF