Top Banner
CELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner 1 Michael L. Norman 1 Brian O’Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department of Physics and Astronomy Extreme Scaling Workshop 2012 Blue Waters / XSEDE Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 1 / 21
28

Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

Mar 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo-P / CelloScalable Adaptive Mesh Refinementfor Astrophysics and Cosmology

James Bordner1 Michael L. Norman1 Brian O’Shea2

1University of California, San Diego

San Diego Supercomputer Center

2Michigan State University

Department of Physics and Astronomy

Extreme Scaling Workshop 2012Blue Waters / XSEDE

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 1 / 21

Page 2: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo overview

Parallel astrophysics and cosmologyimplemented in C++ / Fortranapproximately 150K SLOCparallelized using MPI / OpenMP

Vast range of scalesastrophysical fluid dynamicshydrodynamic cosmology

Adaptive mesh refinement (AMR)

Growing development community[ Norman et al ]

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 2 / 21

Page 3: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo’s physics and algorithms

Eulerian hydrodynamicspiecewise-parabolic method (PPM)

Lagrangian dark matterparticle-mesh method (PM)

Self-gravityFFT’s on root-level gridmultigrid on refinement patches

Local physicsheating, cooling, chemistry, etc.

MHD, RHD (ray-tracing or implicit FLD) [ John Wise ]

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 3 / 21

Page 4: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo’s pursuit of scalability

Enzo born in early 1990’s

“Extreme” meant 100 processors

Continual scalability improvementsMPI/OpenMP parallelism“neighbor-finding” algorithmI/O optimizations

Further improvement getting harderincreasing scalability requirementseasy improvements already made

Motivates concurrent rewritingEnzo-P “petascale” Enzo forkCello AMR framework

[ Sam Skillman, Matt Turk ]

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 4 / 21

Page 5: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo’s AMR data structure

root patchrefinement

patchroot grid

Patch-based SAMR

Each patch is a C++ object

Patches assigned to processes

NP root patches, grid size ≈ 643,1283Refinement patches generally smaller

Refinement patches initially local to parent

Load balancing relocates refinement patches

Patch data (grids, particles) are distributed

AMR hierarchy structure is replicated[ Tom Abel, John Wise, Ralf Kaehler ]

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 5 / 21

Page 6: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo’s timestepping

Adaptive timestepping by level

parallel within level� less computation by O(NL)� reduced parallel efficiency

1 11

15

3

6

9

12

16

13

14

19

17

20

18

8

2

7

5

4

10

x

L=0 L=1 L=2 L=3

t

dt

dt0

1

2dt

EvolveLevel(L,dtL−1)1 refresh level L ghosts2 compute timestep dtL3 advance level L by dtL4 EvolveLevel(L+1,dtL)5 correct fluxes

x

t

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 6 / 21

Page 7: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo’s scaling issues

Memory usageAMR structure is non-scalableghost zone layer three zones deepmemory fragmentation

Data localitydisrupted by load balancing

Parallel task definitionwidely varying patch sizesgranularity determined by AMR

Parallel task schedulingparallel within a levelsynchronization between levels [ Elizabeth Tasker ]

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 7 / 21

Page 8: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Talk outline

1 Enzo1 overview2 design3 scaling issues

2 Talk outline3 Enzo-P / Cello

1 overview2 design3 scaling solutions4 implementation using Charm++5 recursively generated parallel data structures

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 8 / 21

Page 9: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Enzo-P / Cello overview

Enzo-P intended to be “petascale Enzo”

Cello is a scalable AMR framework

Parallelism using Charm++

MPI as a backup

25K SLOC

Work in progressPPM HD / PPML MHD on distributed Cartesian gridsprototyping Charm++ AMR implementations

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 9 / 21

Page 10: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s AMR data structure

Patch BlockTreeForest

Tree-based SAMR

Patches define unit of refinement

cubical, varying size

Blocks define parallel tasks

flexible size and shape

One Block per Charm++ chare

Tree structure optionally distributed

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 10 / 21

Page 11: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s timestepping

Optional adaptive timesteppingincludes by Patch or Blocklocal synchronizationparallelism between levels

1t

x

5

6

7

8

1

2

3

43

5

7

1

1

L=0 L=1 L=2 L=3

4

dt

dt

dt2

0

1

Quantized timesteps� avoids “sliver timesteps” dt ≈ 0� but dt’s smaller than optimal

reduced numerical roundoff

t

x

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 11 / 21

Page 12: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingReducing replicated AMR structure

Fewer replicated patches

“patch-merging” technique

truncates full subtrees

≈ 2 to 3× reduction

original patch-merged

Smaller replicated patches

Enzo: �grid� = 1544 bytes

Cello: �Node� = 16 bytes

≈ 100× reduction!

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 12 / 21

Page 13: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingReducing replicated AMR structure

Fewer replicated patches

“patch-merging” technique

truncates full subtrees

≈ 2 to 3× reduction

original patch-merged

Smaller replicated patches

Enzo: �grid� = 1544 bytes

Cello: �Node� = 16 bytes

≈ 100× reduction!

Enzo grid Cello Node

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 12 / 21

Page 14: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingDistributed AMR Structure

Forest of Trees

each assigned a process range

simple indexing

load balancing issues

Space-filling curve

improved load balancing

requires global scan

greater surface area

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 13 / 21

Page 15: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingDistributed AMR Structure

Forest of Trees

each assigned a process range

simple indexing

load balancing issues

Space-filling curve

improved load balancing

requires global scan

greater surface area

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 13 / 21

Page 16: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingDynamically load balancing AMR block data

Dynamic Load Balancing

Use Charm++ load balancing

Space-filling curvesequally distribute loadmaintain data localityno parent-child communication

Use measured performancecomputationmemory usage

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 14 / 21

Page 17: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingParallel tasks

Task definition

Flexible choice of size / shape

Reduced size variability� constant grid Block sizes� variable subcycling, # particles

Enzo Cello

Task scheduling

Charm++: asynchronous, data-driven

Blocks advance when ghosts refreshed

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 15 / 21

Page 18: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s improvements to scalingParallel tasks

Task definition

Flexible choice of size / shape

Reduced size variability� constant grid Block sizes� variable subcycling, # particles

Enzo Cello

Task scheduling

Charm++: asynchronous, data-driven

Blocks advance when ghosts refreshed

t

x

L=0 L=1 L=2 L=3

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 15 / 21

Page 19: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Charm++ program structure

pX()

Main

ChareC

pW()

Main()

ChareA

pZ()

pY()

ChareB

pV()

A Charm++ Program

Charm++ programCharm++ objects are chares

invoke entry methods

communicate via messages

Charm++ runtime systemmaps chares to processorsschedules entry methodsmigrates chares to load balance

Additional scalability featuresfault tol.: checkpoint/restartdynamic load balancing

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 16 / 21

Page 20: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Charm++ collections of chares

Chare Arrays

distributed array of chares

migratable elements

flexible indexing

Chare Groups

one chare per processor (non-migratable)

Chare Nodegroups

one chare per node (non-migratable)

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 17 / 21

Page 21: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Charm++ collections of chares

Chare Arrays

distributed array of chares

migratable elements

flexible indexing

Chare Groups

one chare per processor (non-migratable)

Chare Nodegroups

one chare per node (non-migratable)

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 17 / 21

Page 22: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Charm++ collections of chares

Chare Arrays

distributed array of chares

migratable elements

flexible indexing

Chare Groups

one chare per processor (non-migratable)

Chare Nodegroups

one chare per node (non-migratable)

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 17 / 21

Page 23: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Three implementation strategies

1. Single chare array

efficient: single access

restricted Tree depth

PTB BBBTPBBTBBPB BPPBBP

Hierarchy

PT

H

2. Composite chare arrays

“tree” of chare arrays

less restricted Tree depth

3. Singleton chares

no depth restrictions

possible performance issues

how to generate...?

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 18 / 21

Page 24: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Three implementation strategies

1. Single chare array

efficient: single access

restricted Tree depth

PTB BBBTPBBTBBPB BPPBBP

Hierarchy

PT

H

2. Composite chare arrays

“tree” of chare arrays

less restricted Tree depth

T T P P B B

TreeForest PatchTF P

3. Singleton chares

no depth restrictions

possible performance issues

how to generate...?

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 18 / 21

Page 25: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Cello’s implementation using Charm++Three implementation strategies

1. Single chare array

efficient: single access

restricted Tree depth

PTB BBBTPBBTBBPB BPPBBP

Hierarchy

PT

H

2. Composite chare arrays

“tree” of chare arrays

less restricted Tree depth

T T P P B B

TreeForest PatchTF P

3. Singleton chares

no depth restrictions

possible performance issues

how to generate...?

B BBBTPBBTBBPB BPPBBPPT PT

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 18 / 21

Page 26: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Recursively generated parallel data structuresGenerating a “software network”

1 Start with single Seedconceptually complete p.d.s.with processor range

2 grow() spawns remote Seedsconceptually partitioned p.d.s.with processor subrangesSeeds interlinked

3 Recurse to individual elementsfinal Seeds complete p.d.s.scaffolding : previous Seeds data structure

topology

root seed

scaffolding

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 19 / 21

Page 27: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Recursively generated parallel data structuresUsing the software network

Three types of Seed linksparent: reductionsneighbor : collaborationschild : distributions reduce link

broadcast links

neighbor links

Seed

parent Seed

neighbor Seeds

child Seeds

Scalablelinks per node: O(1)generation: ≈ O(logN)

Load balancingspace-filling curveshierarchical

Usable in Cellogrids / octrees “easy”SeedGrid, SeedTree, etc.

Usable with MPIsend encoded seed creationlink: process rank + pointer

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 20 / 21

Page 28: Enzo-P / Cello - XSEDECELLO Enzo-P / Cello Scalable Adaptive Mesh Refinement for Astrophysics and Cosmology James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California,

CELLO

Summary

Enzo Enzo-P / CelloParallelization MPI Charm++

SAMR patch-based tree-basedAMR structure replicated optionally distributedRemote patches 1544 bytes 16 bytesTimestepping level-adaptive block-adaptiveBlock sizes ×1000 variation constantTask scheduling level-parallel dependency-drivenLoad balancing patch migration space-filling curvesData locality LB conflict no LB conflict

http://cello-project.org

NSF PHY-1104819, AST-0808184

Extreme Scaling Workshop Enzo-P / Cello 16 July 2012 21 / 21