Top Banner
Dongwook Lee Flash Center at the University of Chicago Overcoming Challenges in High Performance Computing: High-order Numerical Methods for Large-Scale Scientific Computing and Plasma Simulations Department of Applied Mathematics and Statistics University of California, Santa Cruz February 24, 2014 FLASH Simulation of a 3D Core-collapse Supernova Courtesy of S. Couch MIRA, BG/Q, Argonne National Lab 49,152 nodes, 786,432 cores
52
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dongwook's talk on High-Performace Computing

Dongwook Lee Flash Center at the University of Chicago

Overcoming Challenges in High Performance Computing: High-order Numerical Methods for

Large-Scale Scientific Computing and Plasma Simulations

Department of Applied Mathematics and Statistics University of California, Santa Cruz

February 24, 2014

FLASH Simulation of a 3D Core-collapse Supernova Courtesy of S. Couch

MIRA, BG/Q, Argonne National Lab 49,152 nodes, 786,432 cores

Page 2: Dongwook's talk on High-Performace Computing

Computer Genius

Introducing…

Math GeniusHigh Performance Computing

Page 3: Dongwook's talk on High-Performace Computing

High Performance Computing (HPC)

‣To solve large problems in science, engineering, or

business

‣Modern HPC architectures have

▪ increasing number of cores

▪ declining memory/core

‣This trend will continue for the foreseeable future

Page 4: Dongwook's talk on High-Performace Computing

High Performance Computing (HPC)

‣This tension between computation & memory brings a

paradigm shift in numerical algorithms for HPC

‣To enable scientific computing on HPC architectures:

▪ efficient parallel computing, (e.g., data parallelism, task

parallelism, MPI, multi-threading, GPU accelerator, etc.)

▪ better numerical algorithms for HPC

Page 5: Dongwook's talk on High-Performace Computing

Numerical Algorithms for HPC

‣Numerical algorithms should conform to the

abundance of computing power and the scarcity of

memory

‣But…

▪ without losing solution accuracy

▪ maintaining maximum solution stability

▪ faster convergence to “correct” solution

Page 6: Dongwook's talk on High-Performace Computing

Large Scale Astrophysics Codes▪ FLASH (Flash group, U of Chicago)

▪ PLUTO (Mignone, U of Torino),

▪ CHOMBO (Colella, LBL)

▪ CASTRO (Almgren, Woosley, LBL, UCSC)

▪ MAESTRO (Zingale, Almgren, SUNY, LBL)

▪ ENZO (Bryan, Norman, Abel, Enzo group)

▪ BATS-R-US (CSEM, U of Michigan)

▪ RAMSES (Teyssier, CEA)

▪ CHARM (Miniati, ETH)

▪ AMRVAC (Toth, Keppens, CPA, K.U.Leuven)

▪ ATHENA (Stone, Princeton)

▪ ORION (Klein, McKee, U of Berkeley)

▪ ASTROBear (Frank, U of Rochester)

▪ ART (Kravtsov, Klypin, U of Chicago)

▪ NIRVANA (Ziegler, Leibniz-Institut für Astrophysik Potsdam), and others

Peta- Scale Current HPC

Future HPC

Giga- Scale Current Laptop/Desktop

?

Page 7: Dongwook's talk on High-Performace Computing

The FLASH Code

‣FLASH is is free, open source code for astrophysics and HEDP !▪ modular, multi-physics, adaptive mesh refinement (AMR), parallel (MPI & OpenMP), finite-volume Eulerian compressible code for solving hydrodynamics and MHD !▪ professionally software engineered and maintained (daily regression test suite, code verification/validation), inline/online documentation !▪ 8500 downloads, 1500 authors, 1000 papers

!▪ FLASH can run on various platforms from laptops to supercomputing (peta-scale) systems such as IBM BG/P and BG/Q

Page 8: Dongwook's talk on High-Performace Computing

FLASH Simulations

cosmological cluster formation supersonic MHD

turbulence

Type Ia SN

RT

CCSN

ram pressure stripping

laser slab

rigid body structure

Accretion Torus

LULI/Vulcan experiments: B-field generation/amplification

Page 9: Dongwook's talk on High-Performace Computing

High Energy Density Physics !▪ multi-temperature (1T, 2T, & 3T) in hydrodynamics and MHD ▪ implicit electron thermal conduction using HYPRE ▪ flux-limited multi-group approximation for diffusion radiative transfer ▪ multi-material support: EoS and opacity (tabular & analytic) ▪ laser energy deposition using ray tracing ▪ rigid body structures

FLASH’s Multi-Physics Capabilities

Astrophysics !▪ hydrodynamics, MHD, RHD, cosmology, hybrid PIC ▪ EoS:

gamma laws, multi-gamma, Helmholtz

▪ nuclear physics and other source terms ▪ external gravity, self-gravity ▪ active and passive particles

(used for PIC, laser ray tracing, dark matter, tracer particles)

▪ material properties

✓Fortran, C, Python > 1.2 million lines (25% comments!) ✓Extensive documentations available in User’s Guide ✓Scalable to tens of thousand processors with AMR

Page 10: Dongwook's talk on High-Performace Computing

Numerical Algorithm Astrophysics

High Energy Density Physics

My Research Work

coming soon!

Page 11: Dongwook's talk on High-Performace Computing

My Collaborators▪ M. Ruszkowski (U of Michigan)

▪ J. ZuHone (NASA, Goddard)

▪ K. Murawski (UMSC, Poland)

▪ F. Cattaneo (U of Chicago)

▪ P. Ricker (UIUC)

▪ M. Bruggen, R. Banerjee (U of Hamburg)

▪ M. Shin (U of Oxford)

▪ P. Oh, S. Ji (UCSB)

▪ I. Parrish (UCB)

▪ E. Zweibel (UWM)

▪ A. Deane (UMD)

▪ A. Dubey, P. Colella, J. Bachan, C. Daley (LBL)

▪ C. Federrath (Monash U, Australia)

▪ R. Fisher (UM Darthmouth)

▪ G. Gianluca, J. Meinecke (U of Oxford)

▪ P. Drake (U of Michigan)

▪ R. Yurchak (LULI, France)

▪ F. Miniati (ETH, Switzerland)

Astrophysics

High Energy Density Physics

Page 12: Dongwook's talk on High-Performace Computing

Parallelization, Optimization & Speedup

‣Adaptive Mesh Refinement with Paramesh

▪ standard MPI (Message Passing Interface)

parallelism

▪ domain decomposition distributed over

multiple processor units

▪ distributed memory

uniform grid octree-based block AMR

patch-based AMR

Single block

Page 13: Dongwook's talk on High-Performace Computing

Parallelization, Optimization & Speedup

16 cores/node

16 GB/node

4 threads/core

Page 14: Dongwook's talk on High-Performace Computing

▪ 5 leaf blocks in a single MPI rank ▪ 2 threads/core (or 2 threads/rank)

Parallelization, Optimization & Speedup

thread block list thread within block

‣Multi-threading (shared memory) using OpenMP directives

▪more parallel computing on BG/Q using hardware threads on a core

▪ 16 cores/node, 4 threads/core

Page 15: Dongwook's talk on High-Performace Computing

FLASH Scaling Test on BG/Q

VESTA: 2 racks

1024 nodes/rack

MIRA: 48 racks

1024 nodes/rack

BG/Q: 16 cores/node 4 threads/core

16GB/core

512(4k)

1024(8k)

2048(16k)

4096(32k)

8192(64k)

16384(128k)

32768(256k)

Mira nodes (ranks), 8 threads/rank

0

2

4

6

8

10

10�

8co

reho

urs/

zone

/ste

p

0

5

10

15

20

25

10�

7co

reho

urs/

zone

/IO

wri

te

Total evolutionMHDGravityGridIO (1 write)⌫-LeakageIdeal

RTFlame, strong scaling: 4 threads/rank CCSN, weak scaling: 8 threads/rank

Page 16: Dongwook's talk on High-Performace Computing

Numerical Algorithms, Solvers,

Godunov Schemes, High-Order Methods,

Oscillations, Stability,

Convergence, etc…

Page 17: Dongwook's talk on High-Performace Computing

Discretization Approaches

‣Finite Volume (FV)

▪ shock capturing, compressible flows, structured/unstructured grids

▪ hard to achieve high-order (higher than 2nd order)

‣Finite Difference (FD)

▪ smooth flows, incompressible, high-order methods

▪ non-conservative, simple geometry

‣Finite Element (FE)

▪ arbitrary geometry, basis functions, continuous solutions

▪ hard coding, problems at strong gradients

‣Spectral Element (SE)

‣Discontinuous Galerkin (DG)

Page 18: Dongwook's talk on High-Performace Computing

FV Godunov Scheme for Hyperbolic System

‣The system of conservation laws (hyperbolic PDE) in 1D: !!!!‣A discrete integral form (i.e., finite-volume): !

!!‣Godunov scheme seeks for time averaged fluxes at each interface by solving the self-similar solution of the Riemann problem: !!

@U

@t

+@F

@x

= 0

U

n+1i = U

ni � �t

�x

(Fni+1/2 � F

ni�1/2)

U

⇤i+1/2(x/t) = RP(Un

i , Uni+1),

F

ni+1/2 = F (U⇤

i+1/2(0))

= F (Uni , U

ni+1)

t

x

U

⇤(x/t)

Uni+1Un

i

Riemann Fan

rarefaction

contact discontinuity

shock

piecewise-constant (first-order)

Page 19: Dongwook's talk on High-Performace Computing

A Discrete World of FV

U(x, tn)

xixi�1 xi+1

Page 20: Dongwook's talk on High-Performace Computing

A Discrete World of FV

u(xi, tn) = Pi(x), x 2 (xi�1/2, xi�1/2)

xixi�1 xi+1

piecewise polynomial reconstruction on each cell

uL = Pi+1(xi+1/2)uR = Pi(xi+1/2)

Page 21: Dongwook's talk on High-Performace Computing

A Discrete World of FV

xixi�1 xi+1

At each interface we solve a RP and obtain Fi+1/2

Page 22: Dongwook's talk on High-Performace Computing

A Discrete World of FV

We are ready to advance our solution in time and get new volume-averaged states

U

n+1i = U

ni � �t

�x

(Fi+1/2 � Fi�1/2)

Page 23: Dongwook's talk on High-Performace Computing

It Gets Much More Complicated in Reality

▪ In 3D we have 6 interfaces per cell

▪ 2 transverse RPs per each interface

▪ 12 RPs are needed to for maximum Courant stability limit, Ca~1

▪ Expensive!

x

y

z

Page 24: Dongwook's talk on High-Performace Computing

Computational Advantages In Unsplit Solvers

‣New Efficient Unsplit Algorithm (Lee & Deane, 2009; Lee, 2013)

‣Most unsplit schemes need 12 Riemann solves (see Table)

‣3D Unsplit solvers in FLASH need

▪ 3 Riemann solves in hydro & 6 Riemann solves in MHD with

maximum Courant stability limit, Ca ~ 1

Mignone et al. 2007

Page 25: Dongwook's talk on High-Performace Computing

Stability, Consistency and Convergence

‣Lax Equivalence Theorem (for linear problem, P. Lax, 1956)

▪ The only convergent schemes are those that are both consistent and stable

▪Hard to show that the numerical solution converges to the original

solution of PDE; relatively easy to show consistency and stability of

numerical schemes

‣In practice, non-linear problems adopts the linear theory as guidance

▪ code verification (code-to-code comparison)

▪ code validation (code-to-experiment, code-to-analytical solution

comparisons)

▪ self-convergence test over grid resolutions (a good measurement for

numerical accuracy)

Page 26: Dongwook's talk on High-Performace Computing

PLM PPM

High-Order Polynomial Reconstruction

• Godunov’s order-barrier theorem (1959) • Monotonicity-preserving advection schemes are at most first-order! (Oh no…) • Only true for linear PDE theory (YES!)

!• High-order “polynomial” schemes became available using non-linear slope limiters (70’s and 80’s: Boris, van Leer, Zalesak, Colella, Harten, Shu, Engquist, etc)

• Can’t avoid oscillations completely (non-TVD) • Instability grows (numerical INSTABILITY!)

FOG

Page 27: Dongwook's talk on High-Performace Computing

Shu-Osher Problem:1D Mach 3 Shock

Page 28: Dongwook's talk on High-Performace Computing

Low-Order vs. High-Order

1st Order High-Order

Ref. Soln

1st order: 3200 cells (50 MB), 160 sec, 3828 steps vs

High-order: 200 cells (10 MB), 9 sec, 266 steps

Page 29: Dongwook's talk on High-Performace Computing

Circularly Polarized Alfven Wave (CPAW)

▪A CPAW problem propagates smoothly varying oscillations of the transverse components of velocity and magnetic field

▪The initial condition is the exact nonlinear solutions of the MHD equations

▪The decay of the max of Vz and Bz is solely due to numerical dissipation: direct measurement of numerical diffusion (Ryu, Jones & Frank, ApJ, 1995)

These results are in agreement with previous investigations [5,6] and strongly supports the idea that problems involvingcomplex wave interactions may benefit from using higher-order schemes such as the ones presented here.

4.2. Shock tube problems

Shock tube problems are commonly used to test the ability of the scheme in describing both continuous and discontin-uous flow features. In the following we consider two- and three-dimensional rotated configurations of standard one-dimen-sional tubes. The default value for the parameter ap controlling monopole damping (see Eq. (9)) is 0.8.

4.2.1. Two-dimensional shock tubeFollowing [30,33], we consider a rotated version of the Brio-Wu test problem [8] with left and right states are given by

VL ¼ ð1; 0;0; 0:75;1;1ÞT for x1 < 0;

VR ¼ ð0:125;0;0;0:75;$1;0:1ÞT for x1 > 0;

(ð49Þ

where V ¼ ðq; v1;v2;B1;B2; pÞ is the vector of primitive variables. The subscript ‘‘1” gives the direction perpendicular to theinitial surface of discontinuity whereas ‘‘2” corresponds to the transverse direction. Here C = 2 is used and the evolution isinterrupted at time t = 0.2, before the fast waves reach the borders.

In order to address the ability to preserve the initial planar symmetry we rotate the initial condition by the angle a ¼ p=4in a two-dimensional plane with x 2 ½$1;1& and y 2 ½$0:01;0:01& using Nx ' Nx=100 grid points, with Nx = 600. Vectors followthe same transformation given by Eq. (45) with b = c = 0. This is known to minimize errors of the longitudinal component ofthe magnetic field (see for example the discussions in [52,27]). Boundary conditions respect the translational invariancespecified by the rotation: for each flow quantity we prescribe qði; jÞ ¼ qði( di; j( djÞ where ðdi; djÞ ¼ ð1;$1Þ, with the plus(minus) sign for the leftmost and upper (rightmost and lower) boundary. Computations are stopped before the fast rarefac-tion waves reach the boundaries, at t ¼ 0:2 cos a.

Fig. A.4 shows the primitive variable profiles for all schemes against a one-dimensional reference solution obtained on abase grid of 1024 zones with five levels of refinement. Errors in L1 norm, computed with respect to the same reference solu-tion, are sorted in Table 2 for density and the normal component of magnetic field. The out-coming wave pattern is com-prised, from left to right, of a fast rarefaction, a compound wave (an intermediate shock followed by a slow rarefaction),a contact discontinuity, a slow shock and a fast rarefaction wave. We see that all discontinuities are captured correctlyand the overall behavior matches the reference solution very well. The normal component of magnetic field is best describedwith MP5 and does not show erroneous jumps. Indeed, the profiles are essentially constant with small amplitude oscillationsshowing a relative peak )0.7%. Divergence errors, typically K 10$2, remain bounded with resolution and tend to saturate

Fig. A.3. Long term decay of circularly polarized Alfvén waves after 16.5 time units, corresponding to ) 100 wave periods. In the left panel, we plot themaximum value of the vertical component of velocity as a function of time for the WENO $ Z (solid line) and WENO + 3 (dashed line) schemes. Forcomparison, the dotted line gives the result obtained by a second-order TVD scheme. The panel on the right shows the analogous behavior of the verticalcomponent of magnetic field Bz for LimO3 and MP5. For all cases, the resolution was set to 120 ' 20 and the Courant number is 0.4.

A. Mignone et al. / Journal of Computational Physics 229 (2010) 5896–5920 5907

Page 30: Dongwook's talk on High-Performace Computing

Outperformance of High-Order: CPAW

L1 norm error

avg. comp. time / step

32 256

0.221 (x5/3)sec 38.4 sec

Source: Mignone & Tzeferacos, 2010, JCP

▪PPM-CT (overall 2nd order): 2h42m50s

▪MP5 (5th order): 15s(x5/3)=25s

!▪More computational

work & less memory

▪ Better suited for HPC

▪ Easier in FD; harder in FV

▪High-orders schemes are better in preserving solution accuracy on AMR

Page 31: Dongwook's talk on High-Performace Computing

Numerical Oscillations

‣ In general, numerical (spurious) oscillations happen

‣near steep gradients (Gibbs’ oscillation)

‣lack of numerical dissipation (high-order schemes)

‣lack of numerical stability (Courant condition)

‣if present, numerical solution is invalid

‣Controlling oscillations is crucial for solution accuracy & stability

‣More complicated situations (see LeVeque):

‣carbuncle/even-odd decoupling instability (Quirk, 1992)

‣start-up error

‣slow-moving shocks (see next)

Page 32: Dongwook's talk on High-Performace Computing

Numerical Oscillations‣Slow-moving shocks:

‣unphysical oscillations can exponentially grow in time, especially near

strong and slowly moving shocks (Woodward & Colella, 1984)

‣Jin &Liu (1996); Donat & Marquina (1996); Karni & Canic (1997); Arora

& Roe (1997); Striba & Donat (2003); Lee (2010)

First-order Godunov method (Source: LeVeque)

Page 33: Dongwook's talk on High-Performace Computing

PPM Oscillations for Slow-Moving Shocks

✓Standard 3rd order PPM suffers from unphysical oscillations for MHD Brio-Wu Shock Tube (Brio-Wu, 1988)

✓Fix is available by applying an upwind slope limiter for PPM (Lee, 2010)

✓Upwind PPM behaves very similar to WENO5, reducing oscillations!

standard PPM (Colella & Woodward, 1984)

upwind PPM (Lee, 2010)

WENO5 (Jiang & Shu, 1996)

Bad!

Improved!

Reference solution

Page 34: Dongwook's talk on High-Performace Computing

Traditional High-Order Schemes

‣ Traditional approaches to get (N+1)th high-order schemes take Nth degree polynomial

▪ only for normal direction (e.g., FOG, MH, PPM, ENO, WENO, etc)

▪with monotonicity controls (e.g., slope limiters, artificial viscosity)

‣ High-order in FV is tricky (when compared to FD)

▪ volume-averaged quantities (quadrature rules)

▪ preserving conservation w/o losing accuracy

▪ higher the order, larger the stencil

▪ high-order temporal update (ODE solvers, e.g., RK3, RK4, etc.)

2D stencil for 2nd order PLM

2D stencil for 3rd order PPM

Page 35: Dongwook's talk on High-Performace Computing

High-Order using Gaussian Processes (GP)

▪ Gaussian Processes (GP) are a class of a stochastic processes that yield sampling data from a function that is probabilistically constrained, but not exactly known

▪C. Graziani, P. Tzeferacos & D. Lee (Flash, U of Chicago)

▪Our high-order GP interpolation scheme is based on:

▪ samples (i.e., volume-averaged data points) of the function

▪ train the GP model on the samples by means of Bayes’ theorem

▪ the posterior mean function is our high-order interpolant of the unknown function

▪ The result is to pass from an “agnostic” prior model (a mean function and a covariance kernel) to a data-informed posterior model (an updated mean function and covariance)

Page 36: Dongwook's talk on High-Performace Computing

Agnostic Prior Model

GP is defined through (1) a mean function, and (2) a symmetric positive-definite integral kernel K(x,y):

‣ Mean function !!‣ Kernel (covariance function) !!‣ Write !!‣ The likelyhood function (the probability of f given the GP model)

Page 37: Dongwook's talk on High-Performace Computing

‣ Want to predict an unknown function f probabilistically at a new point x* !!‣ Then the augmented likelyhood function is !!! where

Data-Informed Posterior Model

The result is to pass from an agnostic prior model (a mean function and a covariance kernel) to a data-informed posterior model (an updated mean function and covariance)

Page 38: Dongwook's talk on High-Performace Computing

‣ Bayes’ Theorem gives

Updated Mean Function

The result is to pass from an agnostic prior model (a mean function and a covariance kernel) to a data-informed posterior model (an updated mean function and covariance)

Our high-order interpolated value: a Gaussian probability distribution on the unknown function value f*

Page 39: Dongwook's talk on High-Performace Computing

Truly Multidimensional Use of Stencil

The current GP interpolation method in FLASH for smooth flow tests. For this, we use square exponential (SE) convariance and interpolate on “blocky sphere” of radius R

C1‣ SE covariance has the property of having a native functions, thus can provide with spectral convergence rates when the underlying approximated function is itself C1

2D stencil for GP

2D stencil for 2nd order PLM 2D stencil for

3rd order PPM

Page 40: Dongwook's talk on High-Performace Computing

Revisited:1D Mach 3 Shock

!PLM on 1600 GP (spectral) WENO-Z (5)

PPM (3) PLM (2) FOG (1)

Page 41: Dongwook's talk on High-Performace Computing

Results on Smooth Flows 1

• 2D advection of an isentropic vortex along the domain diagonal on a periodic box ( , )R = 2� � = 6�

Page 42: Dongwook's talk on High-Performace Computing

Results on Smooth Flows II

• 1D advection of Gaussian profile ( , )� = 12�R = 2�

Page 43: Dongwook's talk on High-Performace Computing

Exponential Convergence Rate

• 1D advection of Gaussian profile ( , )� = 12�R = 2�

spatial error dominance

temporal error dominance

Page 44: Dongwook's talk on High-Performace Computing

Implicit Solver in Unsplit Hydro

‣Spatial parallelism & optimization (MPI, multi-threading, numerical

algorithm improvements, coding optimizations)

‣Temporal optimization:

▪ overcome small diffusion (parabolic PDE) time scales

▪ Jacobian-Free Newton-Krylov fully implicit solver (e.g., Knoll and

Keys, 2004; Toth et al., 2006) for unsplit hydro solver

▪ NSF grant (PHY-0903997), 2009-2012, $400K

▪ Dongwook Lee (PI), Guohua Xia (postdoc), Shravan Gopal &

Prateeti Mohapatra (scientists)

▪ GMRES with preconditioner

Page 45: Dongwook's talk on High-Performace Computing

Future ApplicationsSUBMITTED TO APJ ON 2013 OCTOBER 21 COUCH & O’CONNOR

Figure 13. Pseudo-color slices of entropy at four postbounce times for s27 fheat 1.05 3D. The colormap and limits are indicated on the left and kept fixed for eachtime. Convection is already strong by 100 ms, as is indicated in Figures 11 & 12. As explosion sets in (right two panels), the convection becomes volume-fillingand large, high-entropy bubbles emerge that push the shock outward. The explosion begins in an asymmetrical fashion (right-most panel). The development ofconvection in our simulations is very similar to that of Ott et al. (2013).

10

0

10

1

10

2

`

10

23

10

24

10

25

10

26

E

`

r = 125 km, tpb

= 150 ms

`�1

`�5/3

`�3

s15 0.95 2Ds15 1.00 2Ds15 1.00 3Ds15 1.05 3D

10

0

10

1

10

2

`

10

23

10

24

10

25

10

26

E

`

r = 125 km, tpb

= 150 ms

`�1

`�5/3

`�3

s27 0.95 2Ds27 1.00 2Ds27 1.00 3Ds27 1.05 3D

Figure 14. Turbulent kinetic energy spectra, as measured by the non-radialcomponent of the velocity. The top panel shows 2D and 3D spectra for s15and the bottom panel displays the same for s27. The E

`

are averaged over a10 km-wide shell, centered on a radius of 125 km, and over 10 ms, centered at150 ms postbounce. In all cases, 2D simulations result in much greater kineticenergy density on large scales than 3D. Kinetic energy on large scales hasbeen suggested to be conducive to explosion (Hanke et al. 2012).

et al. 2013). Turbulent stresses can aid shock expansion inmultidimensional simulations of CCSNe (Murphy et al. 2013).The presence of strong turbulent motions behind the forwardshock during the explosion phase may even effect collectiveneutrino flavor oscillations (Lund & Kneller 2013). Based onthe global CCSN turbulence model developed by Murphy &Meakin (2011), Murphy et al. (2013) argue that the turbulencein neutrino-powered CCSNe explosions is primarily the resultof neutrino-driven convection. Here, rather than focus on the

primary driver of turbulence in our simulations, we address thedifferences in the development of turbulence between 2D and3D.

Following a number of previous studies, we examine tur-bulent motion by decomposing the non-radial component ofthe kinetic energy density in terms of spherical harmonics(e.g., Hanke et al. 2012; Dolence et al. 2013; Couch 2013a;Fernandez et al. 2013). We define coefficients,

✏`m =

I p⇢(✓, �)vt(✓, �)Y

m` (✓,�)d⌦, (13)

where the transverse velocity magnitude is vt = [v

2

✓ + v

2

�]

1/2.The non-radial kinetic energy density as a function of ` is then

E` =

X

m=�`

2

`m [erg cm

�3

]. (14)

In Figure 14, we show the E` spectra for s15 (top) and s27(bottom) in both 2D and 3D. The spectra are computed in a 10km-wide spherical shell centered on a radius of 125 km andat a postbounce time of 150 ms. This time and radius werechosen to coincide with the initial development of strong non-radial motion yet prior to onset of significant shock expansionor contraction (see Figs. 10 & 11). Immediately apparentis that 2D simulations have much greater turbulent kineticenergy on large scales (small `) than 3D. This is the caseeven when comparing the 2D fheat = 0.95 cases with the3D fheat = 1.05 cases. Similar behavior is found in othercomparisons of turbulence in 2D and 3D (Hanke et al. 2012;Dolence et al. 2013; Couch 2013a). These studies also foundthat non-radial kinetic energy on large scales correlated withvigor of explosion. Hanke et al. (2012) even suggest that non-radial kinetic energy on large scales, by significantly increasingmatter dwell times in the gain region, could be key to thesuccess of the neutrino mechanism. Our results also supportthis conclusion; the closer a model is to explosion, the largerthe turbulent kinetic energy on large scales.

It is well-known that turbulence in 2D exhibits very dif-ferent behavior than in 3D. The most significant difference,particularly for the present discussion, is the so-called “inverseenergy cascade” in 2D. According to Kolmogorov’s theory ofturbulence, turbulent energy is injected on large scales and sub-sequently is transfered via the turbulent cascade to small scales(Kolmogorov 1941). In 2D, turbulent energy is still injectedat the large, driving scale, but from there cascades to largescales instead. Enstrophy, the integrated squared-vorticity,

17

‣In Core-collapse SN, sound

speed at the core of the proto-

neutron star reaches up to ~1/3

of speed of light

‣This results in a very small

time step is the observed

regions, where there is no shock

‣Hybrid Time Stepping: Implicit

solutions at the core; explicit

solutions elsewhere will bring

huge computing acceleration for

CCSNFLASH Simulation of a 3D Core-collapse Supernova

Courtesy of S. Couch

entropy

Page 46: Dongwook's talk on High-Performace Computing

Summary

• Novel mathematical algorithms and ideas in designing a scientific code, and performing state-of-art simulations are the most important key to success in scientific computing on HPC !

• High-order method is a good approach to embody the desired tradeoff between memory and computation in future HPC !

• Building a good large-scale scientific code with computational accuracy, stability, efficiency, modularity, especially with multi-physics capabilities requires to combine various research fields, including mathematics, physics (and other fields of sciences) and computer science

Page 47: Dongwook's talk on High-Performace Computing

Future Work!

!

▪More high-order reconstruction methods

▪High-order quadrature rules for multi-dimensions

▪More studies on GP

▪ covariance kernels, applications on AMR

prolongations, more convergence studies, GP for

FD, etc.

▪High-order temporal integrations (RK3, RK4, etc.)

▪Hybrid explicit-implicit solvers

!

▪More fine-grained threading

▪ Keep up with HPC trends

▪GPU accelerators

Page 48: Dongwook's talk on High-Performace Computing

Type 1a SN (105M hr, 2013)

Shock-Generated Magnetic Fields (40M hr, 2013)Turbulent Nuclear Combustion (150M hr, 2013)

FLASH’s Recent Computing Time Awards

Core-Collapse SN (30M hr, 2013)

Page 49: Dongwook's talk on High-Performace Computing

Thank you! Questions?

Supersonic Banana Slug

Page 50: Dongwook's talk on High-Performace Computing

Supplementary Slides

Page 51: Dongwook's talk on High-Performace Computing

High-Order Numerical Algorithms

‣ Provide more accurate numerical solutions using

▪ less grid points (=memory save)

▪ higher-order mathematical approximations (promoting

floating point operations, or computation)

▪ faster convergence to solution

Page 52: Dongwook's talk on High-Performace Computing

Unsplit Hydro/MHD Solvers

‣Spatial Evolution (PDE evolution with Finite-volume):

‣Reconstruction Methods:

‣Polynomial-based:1st order Godunov, 2nd order PLM, 3rd order

PPM, 5th order WENO

‣Gaussian Process model-based: spectral-order GP (very new!)

‣Riemann Solvers:

‣Rusanov, HLLE, HLLC, HLLD (MHD), Marquina, Roe, Hybrid

‣Temporal Evolution (ODE evolution):

‣2nd-order characteristic tracing (predictor-corrector type) method