Staggered mesh methods for MHD- and charged particle simulations of astrophysical turbulence Åke Nordlund Niels Bohr Institute for Astronomy, Physics,

Staggered mesh methods for MHD-

and charged particle simulations of astrophysical turbulence

Åke Nordlund

Niels Bohr Institute for

Astronomy, Physics, and Geophysics

University of Copenhagen

Star Formation The IMF is a result of statistics of MHD-turbulence

Planet Formation Gravitational fragmentation (or not!)

Stars Turbulent convection determines structure BCs

Stellar coronae & chromospheres Heated by magnetic dissipation

Context examples

Charged particle contexts

Solar Flares To what extent is MHD OK? Particle acceleration mechanisms? Reconnection & dissipation?

Gamma-Ray Bursts Relativistic collisionless shocks? Weibel-instability creates B? Synchrotron radiation or gitter radiation?

Overview

MHD methods Godunov-like vs. direct Staggered mesh vs. centered method

Radiative transfer Fast & cheap methods

Charged particle dynamics Methods & examples

Solving the (M)HD Partial Differential Equations (PDEs)

Godunov-type methods Solve the local Riemann problem (approx.)

OK in ideal gas hydro MHD: 7 waves, 648 combos (cf. Schnack’s talk)

Constrained Transport (CT)

Gets increasingly messy when adding gravity ... non-ideal equation of state (ionization) ... radiation ...

Direct methods Evaluate right hand sides (RHS)

High order spatial derivatives & interpolations Spectral Compact Local stencils

e.g. 6th order derivatives, 5th order interpolations

Step solution forward in time Runge-Kutta type methods (e.g. 3rd order):

Adams-Bashforth Hyman’s method RK3-2N

Saves memory – uses only F and dF/dt (hence 2N)

Which variables? Conservative!

Mass Momentum Internal energy

not total energy consider cases where magnetic or kinetic energy

dominates total energy is well conserved

e.g. Mach 5 supersonic 3D-turbulence test (Wengen) less than 0.5% change in total energy

Dissipation

Working with internal energy also means that all dissipation (kinetic to thermal, magnetic to thermal) must be explicit

Shock- and current sheet-capturing schemes Negative part of divergence captures shocks Ditto for cross-field velocity captures current

sheets

Advantages

Much simpler HD ~ 700 flops / point (6th/5th order in space)

ENZO ~ 10,000 flops / point FLASH ~ 20,000 flops / point

MHD ~ 1100 flops / point Trivial to extend

Non-ideal equation-of-state Radiative energy transfer Relativistic

Direct method: Disadvantages?

Smaller Courant numbers allowed 3 sub-step limit ~ 0.6 (runs at 0.5) 2 sub-step limit ~ 0.4 (runs at 0.333)

PPM typically runs at 0.8 factor 1.6 further per full step (unless directionally split)

Comparison of hydro flops ~2,000 (direct, 3 sub-steps) ~10,000 (ENZO/PPM, FLASH/PPM)

Need to also compare flops per second Cache use?

Perhaps much more diffusive?

2D implosion test indicates not so square area with central, rotated low pressure

square generates thin ’jet’ with vortex pairs moves very slowly, in ~ pressure equilibrium essentially a wrinkled 2D contact discontinuity

see Jim Stone’s test pages, with references

2D Implosion Test

Imagine: non-ideal EOS + shocks + radiation + conduction along B

Ionization: large to small across a shock Radiation: thick to thin across a shock Heat conduction only along B ...

Rieman solver? Any volunteers? Operator and/or direction split? With anisotropic resistivity & heat conduction?!

Non-ideal EOS + radiation + MHD:Validation?

Godunov-type methods No exact solutions to check against Difficult to validate

Direct methods Need only check conservation laws

mass & momentum, no direct change energy conservation; easy to verify

Valid equations + stable methods valid results

Staggered Mesh Code(Nordlund et al)

Cell centered mass and thermal energy densities

Face-centered momenta and magnetic fields

Edge-centered electric fields and electric currents

Advantages: •simplicity; OpenMP (MPI btw boxes)•consistency (e.g., divB=0)•conservative, handles extreme Mach

Code Philosophy Simplicity

F90/95 for ease of development Simplicity minimizes operator count Conservative (per volume variables)

Can nevertheless handle SNe in the ISM

Accuracy 6th/5th order in space, 3rd order in time

Speed About 650,000 zone-updates/sec on laptop

Code Development Stages

1. Simplest possible code Dynamic allocation

No need to recompile for different resolutions F95 array valued function calls

P4 speed is the SAME as with subroutine calls

2. SMP/OMP version Open MP directives added

Uses auto-parallelization and/or OMP on SUN, SGI & IBM

3. MPI version for clusters Implemented with CACTUS (see www.cactuscode.org)

Scales to arbitrary number of CPUs

CACTUS Provides

“flesh” (application interface) Handles cluster-communication

E.g. MPI (but not limited to MPI) Handles GRID computing

Presently experimental Handles grid refinement and adaptive meshes

AMR not yet available “thorns” (applications and services)

Parallel I/O Parameter control (live!) Diagnostic output

X-Y plots JPEG slices Isosurfaces

mhd.f90mhd.f90

Example Code Induction Equation

stagger-code/src-simple Makefile (with includes for OS- and host-dep) Subdirectories with optional code:

INITIAL (initial values) BOUNDARIES EOS (equation of state) FORCING EXPLOSIONS COOLING EXPERIMENTS

stagger-code/src (SMP production) Ditto Makefile and subdirs

CACTUS_Stagger_Code Code becomes a ”thorn” in the CACTUS ”flesh”

!----------------------------------------------------! Magnetic field's time derivative, dBdt = - curl(E)!---------------------------------------------------- dBxdt = dBxdt + ddzup(Ey) - ddyup(Ez) dBydt = dBydt + ddxup(Ez) - ddzup(Ex) dBzdt = dBzdt + ddyup(Ex) - ddxup(Ey)

!----------------------------------------------------! Magnetic field's time derivative, dBdt = - curl(E)!---------------------------------------------------- call ddzup_set(Ey, scr1) ; call ddyup_set(Ez, scr2)!$omp parallel do private(iz) do iz=1,mz dBxdt(:,:,iz) = dBxdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do call ddxup_set(Ez, scr1) ; call ddzup_set(Ex, scr2)!$omp parallel do private(iz) do iz=1,mz dBydt(:,:,iz) = dBydt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do call ddyup_set(Ex, scr1) ; call ddxup_set(Ey, scr2)!$omp parallel do private(iz) do iz=1,mz dBzdt(:,:,iz) = dBzdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do

SUBROUTINE mhd(eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt)

USE params USE stagger

real, dimension(mx,my,mz) :: & eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt!hpf$ distribute (*,*,block) :: &!hpf$ eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt real, allocatable, dimension(:,:,:) :: & Jx,Jy,Jz,Ex,Ey,Ez, & Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2!hpf$ distribute (*,*,block) :: &!hpf$ Jx,Jy,Jz,Ex,Ey,Ez, &!hpf$ Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2

SUBROUTINE mhd(eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt)

USE params USE stagger

real, dimension(mx,my,mz) :: & eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt!hpf$ distribute (*,*,block) :: &!hpf$ eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt real, allocatable, dimension(:,:,:) :: & Jx,Jy,Jz,Ex,Ey,Ez, & Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2!hpf$ distribute (*,*,block) :: &!hpf$ Jx,Jy,Jz,Ex,Ey,Ez, &!hpf$ Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2

SUBROUTINE mhd(CCTK_ARGUMENTS) USE hd_params USE stagger_params USE stagger

IMPLICIT NONE DECLARE_CCTK_ARGUMENTS DECLARE_CCTK_PARAMETERS DECLARE_CCTK_FUNCTIONS

CCTK_REAL, allocatable, dimension(:,:,:) :: & Jx, Jy, Jz, Ex, Ey, Ez, & Bx_y, Bx_z, By_x, By_z, Bz_x, Bz_y

SUBROUTINE mhd(CCTK_ARGUMENTS) USE hd_params USE stagger_params USE stagger

IMPLICIT NONE DECLARE_CCTK_ARGUMENTS DECLARE_CCTK_PARAMETERS DECLARE_CCTK_FUNCTIONS

CCTK_REAL, allocatable, dimension(:,:,:) :: & Jx, Jy, Jz, Ex, Ey, Ez, & Bx_y, Bx_z, By_x, By_z, Bz_x, Bz_y

Physics (staggered mesh code)

Equation of state Qualitative: H+He+Me Accurate: Lookup table

Opacity Qualitative: H-minus Accurate: Lookup table

Radiative energy transfer Qualitative: Vertical + a few (4) Accurate: Comprehensive set of rays

Staggered Mesh Code Details Dynamic memory allocation

Any grid size; no recompilation Parallelized

Shared memory: OpenMP (and auto-) parallelization MPI: Direct (Galsgaard) or via CACTUS

Organization – Makefile includes Experiments

EXPERIMENTS/$(EXPERIMENT).mkf Selectable features

Eq. of state Cooling & conduction Boundaries

OS and compiler dependencies hidden OS/$(MACHTYPE).f90 OS/$(HOST).mkf OS/$(COMPILER).mkf

Radiative Transfer Requirements

Comprehensive Need at least 20-25 (double) rays

4-5 frequency bins (recent paper)At least 5 directions

Speed issue Would like 25 rays to add negligible time

BenchmarkTiming Results microseconds/point/substep

Pentium, 4 2 GHz Alpha EV7 1.3 GHz128x105x128 dcsc.sdu.dk accum hyades accum

mass+momentum fixed mesh 1,80 1,80 1,57 1,57 variable meshmhd fixed mesh 1,01 2,81 0,93 2,50 variable meshenergy fixed mesh 0,42 3,23 0,37 2,87 variable mesheqation of state ideal - 3,23 2,87 H+He subroutine 0,98 H+He table lookup table 0,11 3,33 2,87 opacity H-minus 0,20 lookup table 0,09 3,42 2,87 radiative transfer rays rays Feautrier 0,026 132 0,046 63 Splines Hermite 0,027 129 0,047 61 Integral 0,045 76 0,080 36

Altix Itanium-2 Scaling

Star Formation

Planet Formation

Stellar coronae & chromospheres

Applications

Star Formation

Nordlund & Padoan 2002

Key feature: intermittency!

What does it mean in this context? Low density, high velocity gas fills

most of the volume! High density, low velocity features

occupy very little space, but carry much of the mass!

How does it influence star formation? It greatly simplifies understanding it!

Inertial dynamics in most of the volume!

Collapsing featuresare relatively well

defined!

Turbulence Diagnostics of Molecular Clouds

Padoan, Boldyrev, Langer & Nordlund, ApJ 2002 (astro-ph/0207568)

Numerical (2503 sim) & Analytical IMF

Padoan & Nordlund (astro-ph/0205019)

Low Mass IMF

Padoan & Nordlund, ApJ 2004 (astro-ph/0205019)

Planet formation; gas collapse

Coronal Heating Initial Magnetic Field

Potential extrapolation of AR 9114

Coronal Heating: TRACE 195 Loops

Current sheet hierarchy

Current sheet hierarchy: close-up

Scan through hierarchy: dissipation

Hm, the dissipation looks

pretty intermittent– large nice empty areas to ignore with an AMR code, right?

Note that all features rotate as we scan

through – this means that these currents

sheets are all curved in the 3rd dimension.

Electric current JThis is still the dissipation.

Lets replace it by the electric current, as a

check!

Hm, not quite as empty, but the electric current is

at least mostly weak, right?

J log(J)

So, let’s replace the current with the log of

current, to see the levels of the hierarchy better!

Log of the electric current

Not really much to win with AMR here, if we want

to cover the hierarchy!

Solar & stellar surface MHD

Faculae

Sunspots

Chromospheres

Coronae

Faculae:Center-to-LimbVariation

Radiative transfer

’Exact’ radiative energy transfer is not expensive allows up to ~100 rays per point for 2 x CPU-time parallelizes well (with MPI or OpenMP)

Reasons for not using Flux Limited Diffusion Not the right answer (e.g. missing shadows) Is not cheaper

Radiative Transfer: Significance Cosmology

End of Dark Ages

Star Formation Feedback: evaporation of molecular clouds Dense phases of the collapse

Planet Formation External illumination of discs Structure and cooling of discs

Stellar surfaces Surface cooling: the driver of convection

Radiative transfer methods

Fast local solvers Feautrier schemes; the fastest (often) Optimized integral solutions; the simplest

A new approach to parallellizing RT Solve within each domain, with no bdry radiation Propagate and accumulate solutions globally

Moments of the radiation field

Give up, adopting some approximation? Flux Limited Diffusion

Did someone say ”shadows”??

Or, solve as it stands? Fast solvers Parallelize

Did someone say ”difficult”?

Phew, 7 variables!?!

Rays Through Each Grid Point

Interpolate source function to rays in each plane

How many rays are needed?

Depends entirely on the geometry

For stellar surfaces, surprisingly few! 1 vertical + 4 slanted, rotating

1% accuracy in the mean Q a few % in fluctuating Q

8 rays / 48 rays see plots

8 rays / 48 rays

Radiative transfer steps

Interpolate source function(s) and opacity Simple translation of planes – fast

Solve along rays May be done in parallel (distribute rays)

Interpolate back to rectangular mesh Inverse of 1st interpolation (negative shift)

Add up Integrate over angles (and possibly frequencies or bins)

Along straight rays, solve

Or actually, solve directly forthe cooling (I-S)!

Source Function(input)

New Source Function(input)

Formal (and useful) solutions

For simplicity, let’s consider the standard formulation

Has the formal solution:

')'()( ||||0

0 deSeII ')'()( ||||0

0 deSeII

Doubly useful

As a direct method Very accurate, if S() is piecewise parabolic The slowness of exp() can be largely avoided

As a basis for domain decomposition Add ’remote’ contributions separately!

Direct solution, integral form

How to parallelize (Heinemann, Dobler, Nordlund & Brandenburg – in prep.)

Solve for the intensity generated internally in each domain, separately and in parallel

Then propagate and accumulated the boundary intensities, modified only by trivial optical depth factors

Putting it together

The Transfer Equation & The Transfer Equation & ParallelizationParallelization

Analytic Solution:Processors

The Transfer Equation & The Transfer Equation & ParParaallelizationllelization

Analytic Solution:

Ray direction

Intrinsic Calculation

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Communication

Processors

Analytic Solution:

Ray direction

Processors

Intrinsic Calculation

Pencil Code (Brandenburg et al)CPU-time per ray-point

Ignore!(bad node

distribution)

about160 nsec / pt / ray

Can be improved w

factor 4-5!

CPU-time per point (Pencil Code)

Timing Results, Stagger Code microseconds/point/substep

Pentium 4, 2 GHz Alpha EV7 1.3 GHz128x105x128 dcsc.sdu.dk accum hyades accum

mass+momentum fixed mesh 1.80 1.80 1.57 1.57 variable meshmhd fixed mesh 1.01 2.81 0.93 2.50 variable meshenergy fixed mesh 0.42 3.23 0.37 2.87 variable mesheqation of state lookup table 0.11 3.33 2.87 opacity lookup table 0.09 3.42 2.87 radiative transfer rays rays Feautrier 0.026 132 0.046 63 Hermite 0.027 129 0.047 61 Integral 0.045 76 0.080 36

Radiative Transfer Radiative Transfer ConclusionsConclusions

The methods are conceptually simple fast robust scale well in parallel environments

Collisionless shocks

Not an artists rendering! Shows electrical current filamentsin a collisionless shock simulation

with ~ 109 particles and ~ 3 109 mesh zones

Particle-in-Cell (PIC) code

Steps Relativistic particle move, using B & E

Uses - relativistic momenta About 3 105 particle updates / sec on P4 laptop Parallelizes nearly linearly (OpenMP on Altix)

Gather fields; ni, ne , ji , je

2nd order; Triangular Shaped Clouds (TSC) Push B & E – staggered in space and time

Electrostatic solver

Based on original 2-D, non-relativistic code by Michael Hesse, GSF

3-D, relativistic version developed by Frederiksen, Haugbølle, Hededal & Nordlund, Copenhagen

Use of Maxwell’s Equations in the code

Fields on mesh

Sampledparticles

Basic tests: wave propagation, etcBasic tests: wave propagation, etc

Example: Single electron

Electron & proton circling in separate orbits Relativistic; =10

NOTE: resolution implications of high !

Far field: Synchrotron radiation

The Weibel Instability Well known and understood

First principles; anisotropic PDFsWeibel 1959, Fried 1959, Yoon & Davidson 1987

Numerical studies, electron-positron, 2-DWallace & Epperlein 1991, Yang et al 1994Kazimura et al 1998 (ApJ)

Numerical studies, relativistic, ion-electronCalifano et al 1997, ‘98, ‘99, ‘00, ‘01, ’02, ..

Application to GRBsMedvedev & Loeb 1999, Medvedev 2000, ’01, …

The Weibel Instability (two-stream)The Weibel Instability (two-stream)

(Weibel 1959, Medvedev & Loeb 1999)

Experiments 3-D

Of the order 200x200x800 mesh, 109 part.

Cold beam from the left Carries negligible magnetic field

Hits denser plasma, initially field free Weibel instability B, E

So, what is this?

A Weibel-like instability at high Initial scales ~ skin depth Conventional expectation: restricted to skin depth

Generated fields propagate at v~c Fluctuations ‘ride’ on the beam Losses supported by beam population Scales grow down the line!!

Across

Electron and ion current channels

Coherent Structures in Collisionless Shocks

Ion and electron structuresIon and electron structures

A non-Fermi acceleration scenarioA non-Fermi acceleration scenario

Hededal, Haugbølle, Frederiksen and Nordlund (2004)astro-ph/0408558

Electrons are accelerated instantaneously inside the Debye cylinder surrounding the ion current channels.

Electron path near ion channelElectron path near ion channel

CH note:10%-40% optical dark (HETE, BeppoSax).

50% detected in radio.

CH note:10%-40% optical dark (HETE, BeppoSax).

50% detected in radio.

Hededal, Haugbølle, Frederiksen and Nordlund (2004)astro-ph/0408558

Perspectives for the future

Star Formation Is turbulent fragmentation the main mechnism? How important are magnetic fields are important for the

IMF? Include radiative transfer during collapse! Magnetic fields are also important during collapse!

Planet Formation RT important for initial conditions ... ... as well as for disc structure and cooling

Stellar surfaces Include approx. RT in simulations of chromosphere

Solar Plans

Convection: from granulation to supergranulation scales

SunspotsFaculae

Chromosphere

Corona

Staggered mesh methods for MHD- and charged particle simulations of astrophysical turbulence Åke Nordlund Niels Bohr Institute for Astronomy, Physics,

total energy slide

relativistic slide

d implosion test slide

direct methods

kinetic energy

mhd ok

overview mhd methods

cheap methods

Documents

Åke Nordlund & Anders Lagerfjärd Niels Bohr Institute,...

Staggered Truss Example

Mat-F February 9, 2005 Partial Differential Equations Åke.....

Nets Day 2015 Sirpa Nordlund

Åke Svensson, CSN

Åke Nygren – Coordinator for Lifelong Learning

Staggered Start Days

Fastighetsredovisning enligt IFRS Bo Nordlund Redovisnings-....

Numerical Simulations of Supergranulation and Solar...

Staggered Truss Technical Note

1- Taylor Staggered

STAGGERED PRT WITH GROUND CLUTTER FILTERING AND OVERLAID...

Mats Nordlund Vice President of Research 15 November, 2012

Staggered and Non-Staggered Time-Domain Meshless Radial...

Staggered Pulse Repetition Frequency

Session 5 Lars-Åke Asplund