Staggered mesh methods for MHD- and charged particle simulations of astrophysical turbulence Åke Nordlund Niels Bohr Institute for Astronomy, Physics,
Post on 15-Dec-2015
216 Views
Preview:
Transcript
Staggered mesh methods for MHD-
and charged particle simulations of astrophysical turbulence
Åke Nordlund
Niels Bohr Institute for
Astronomy, Physics, and Geophysics
University of Copenhagen
Star Formation The IMF is a result of statistics of MHD-turbulence
Planet Formation Gravitational fragmentation (or not!)
Stars Turbulent convection determines structure BCs
Stellar coronae & chromospheres Heated by magnetic dissipation
Context examples
Charged particle contexts
Solar Flares To what extent is MHD OK? Particle acceleration mechanisms? Reconnection & dissipation?
Gamma-Ray Bursts Relativistic collisionless shocks? Weibel-instability creates B? Synchrotron radiation or gitter radiation?
Overview
MHD methods Godunov-like vs. direct Staggered mesh vs. centered method
Radiative transfer Fast & cheap methods
Charged particle dynamics Methods & examples
Solving the (M)HD Partial Differential Equations (PDEs)
Godunov-type methods Solve the local Riemann problem (approx.)
OK in ideal gas hydro MHD: 7 waves, 648 combos (cf. Schnack’s talk)
Constrained Transport (CT)
Gets increasingly messy when adding gravity ... non-ideal equation of state (ionization) ... radiation ...
Direct methods Evaluate right hand sides (RHS)
High order spatial derivatives & interpolations Spectral Compact Local stencils
e.g. 6th order derivatives, 5th order interpolations
Step solution forward in time Runge-Kutta type methods (e.g. 3rd order):
Adams-Bashforth Hyman’s method RK3-2N
Saves memory – uses only F and dF/dt (hence 2N)
Which variables? Conservative!
Mass Momentum Internal energy
not total energy consider cases where magnetic or kinetic energy
dominates total energy is well conserved
e.g. Mach 5 supersonic 3D-turbulence test (Wengen) less than 0.5% change in total energy
Dissipation
Working with internal energy also means that all dissipation (kinetic to thermal, magnetic to thermal) must be explicit
Shock- and current sheet-capturing schemes Negative part of divergence captures shocks Ditto for cross-field velocity captures current
sheets
Advantages
Much simpler HD ~ 700 flops / point (6th/5th order in space)
ENZO ~ 10,000 flops / point FLASH ~ 20,000 flops / point
MHD ~ 1100 flops / point Trivial to extend
Non-ideal equation-of-state Radiative energy transfer Relativistic
Direct method: Disadvantages?
Smaller Courant numbers allowed 3 sub-step limit ~ 0.6 (runs at 0.5) 2 sub-step limit ~ 0.4 (runs at 0.333)
PPM typically runs at 0.8 factor 1.6 further per full step (unless directionally split)
Comparison of hydro flops ~2,000 (direct, 3 sub-steps) ~10,000 (ENZO/PPM, FLASH/PPM)
Need to also compare flops per second Cache use?
Perhaps much more diffusive?
2D implosion test indicates not so square area with central, rotated low pressure
square generates thin ’jet’ with vortex pairs moves very slowly, in ~ pressure equilibrium essentially a wrinkled 2D contact discontinuity
see Jim Stone’s test pages, with references
2D Implosion Test
Imagine: non-ideal EOS + shocks + radiation + conduction along B
Ionization: large to small across a shock Radiation: thick to thin across a shock Heat conduction only along B ...
Rieman solver? Any volunteers? Operator and/or direction split? With anisotropic resistivity & heat conduction?!
Non-ideal EOS + radiation + MHD:Validation?
Godunov-type methods No exact solutions to check against Difficult to validate
Direct methods Need only check conservation laws
mass & momentum, no direct change energy conservation; easy to verify
Valid equations + stable methods valid results
Staggered Mesh Code(Nordlund et al)
Cell centered mass and thermal energy densities
Face-centered momenta and magnetic fields
Edge-centered electric fields and electric currents
Advantages: •simplicity; OpenMP (MPI btw boxes)•consistency (e.g., divB=0)•conservative, handles extreme Mach
Advantages: •simplicity; OpenMP (MPI btw boxes)•consistency (e.g., divB=0)•conservative, handles extreme Mach
Code Philosophy Simplicity
F90/95 for ease of development Simplicity minimizes operator count Conservative (per volume variables)
Can nevertheless handle SNe in the ISM
Accuracy 6th/5th order in space, 3rd order in time
Speed About 650,000 zone-updates/sec on laptop
Code Development Stages
1. Simplest possible code Dynamic allocation
No need to recompile for different resolutions F95 array valued function calls
P4 speed is the SAME as with subroutine calls
2. SMP/OMP version Open MP directives added
Uses auto-parallelization and/or OMP on SUN, SGI & IBM
3. MPI version for clusters Implemented with CACTUS (see www.cactuscode.org)
Scales to arbitrary number of CPUs
CACTUS Provides
“flesh” (application interface) Handles cluster-communication
E.g. MPI (but not limited to MPI) Handles GRID computing
Presently experimental Handles grid refinement and adaptive meshes
AMR not yet available “thorns” (applications and services)
Parallel I/O Parameter control (live!) Diagnostic output
X-Y plots JPEG slices Isosurfaces
JE
BJ
mhd.f90mhd.f90
EJQm
BJF
BuEE
EtB
/
MHD
Example Code Induction Equation
stagger-code/src-simple Makefile (with includes for OS- and host-dep) Subdirectories with optional code:
INITIAL (initial values) BOUNDARIES EOS (equation of state) FORCING EXPLOSIONS COOLING EXPERIMENTS
stagger-code/src (SMP production) Ditto Makefile and subdirs
CACTUS_Stagger_Code Code becomes a ”thorn” in the CACTUS ”flesh”
!----------------------------------------------------! Magnetic field's time derivative, dBdt = - curl(E)!---------------------------------------------------- dBxdt = dBxdt + ddzup(Ey) - ddyup(Ez) dBydt = dBydt + ddxup(Ez) - ddzup(Ex) dBzdt = dBzdt + ddyup(Ex) - ddxup(Ey)
!----------------------------------------------------! Magnetic field's time derivative, dBdt = - curl(E)!---------------------------------------------------- dBxdt = dBxdt + ddzup(Ey) - ddyup(Ez) dBydt = dBydt + ddxup(Ez) - ddzup(Ex) dBzdt = dBzdt + ddyup(Ex) - ddxup(Ey)
!----------------------------------------------------! Magnetic field's time derivative, dBdt = - curl(E)!---------------------------------------------------- call ddzup_set(Ey, scr1) ; call ddyup_set(Ez, scr2)!$omp parallel do private(iz) do iz=1,mz dBxdt(:,:,iz) = dBxdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do call ddxup_set(Ez, scr1) ; call ddzup_set(Ex, scr2)!$omp parallel do private(iz) do iz=1,mz dBydt(:,:,iz) = dBydt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do call ddyup_set(Ex, scr1) ; call ddxup_set(Ey, scr2)!$omp parallel do private(iz) do iz=1,mz dBzdt(:,:,iz) = dBzdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do
!----------------------------------------------------! Magnetic field's time derivative, dBdt = - curl(E)!---------------------------------------------------- call ddzup_set(Ey, scr1) ; call ddyup_set(Ez, scr2)!$omp parallel do private(iz) do iz=1,mz dBxdt(:,:,iz) = dBxdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do call ddxup_set(Ez, scr1) ; call ddzup_set(Ex, scr2)!$omp parallel do private(iz) do iz=1,mz dBydt(:,:,iz) = dBydt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do call ddyup_set(Ex, scr1) ; call ddxup_set(Ey, scr2)!$omp parallel do private(iz) do iz=1,mz dBzdt(:,:,iz) = dBzdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) end do
SUBROUTINE mhd(eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt)
USE params USE stagger
real, dimension(mx,my,mz) :: & eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt!hpf$ distribute (*,*,block) :: &!hpf$ eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt real, allocatable, dimension(:,:,:) :: & Jx,Jy,Jz,Ex,Ey,Ez, & Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2!hpf$ distribute (*,*,block) :: &!hpf$ Jx,Jy,Jz,Ex,Ey,Ez, &!hpf$ Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2
SUBROUTINE mhd(eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt)
USE params USE stagger
real, dimension(mx,my,mz) :: & eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt!hpf$ distribute (*,*,block) :: &!hpf$ eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt real, allocatable, dimension(:,:,:) :: & Jx,Jy,Jz,Ex,Ey,Ez, & Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2!hpf$ distribute (*,*,block) :: &!hpf$ Jx,Jy,Jz,Ex,Ey,Ez, &!hpf$ Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2
SUBROUTINE mhd(CCTK_ARGUMENTS) USE hd_params USE stagger_params USE stagger
IMPLICIT NONE DECLARE_CCTK_ARGUMENTS DECLARE_CCTK_PARAMETERS DECLARE_CCTK_FUNCTIONS
CCTK_REAL, allocatable, dimension(:,:,:) :: & Jx, Jy, Jz, Ex, Ey, Ez, & Bx_y, Bx_z, By_x, By_z, Bz_x, Bz_y
SUBROUTINE mhd(CCTK_ARGUMENTS) USE hd_params USE stagger_params USE stagger
IMPLICIT NONE DECLARE_CCTK_ARGUMENTS DECLARE_CCTK_PARAMETERS DECLARE_CCTK_FUNCTIONS
CCTK_REAL, allocatable, dimension(:,:,:) :: & Jx, Jy, Jz, Ex, Ey, Ez, & Bx_y, Bx_z, By_x, By_z, Bz_x, Bz_y
Physics (staggered mesh code)
Equation of state Qualitative: H+He+Me Accurate: Lookup table
Opacity Qualitative: H-minus Accurate: Lookup table
Radiative energy transfer Qualitative: Vertical + a few (4) Accurate: Comprehensive set of rays
Staggered Mesh Code Details Dynamic memory allocation
Any grid size; no recompilation Parallelized
Shared memory: OpenMP (and auto-) parallelization MPI: Direct (Galsgaard) or via CACTUS
Organization – Makefile includes Experiments
EXPERIMENTS/$(EXPERIMENT).mkf Selectable features
Eq. of state Cooling & conduction Boundaries
OS and compiler dependencies hidden OS/$(MACHTYPE).f90 OS/$(HOST).mkf OS/$(COMPILER).mkf
Radiative Transfer Requirements
Comprehensive Need at least 20-25 (double) rays
4-5 frequency bins (recent paper)At least 5 directions
Speed issue Would like 25 rays to add negligible time
BenchmarkTiming Results microseconds/point/substep
Pentium, 4 2 GHz Alpha EV7 1.3 GHz128x105x128 dcsc.sdu.dk accum hyades accum
mass+momentum fixed mesh 1,80 1,80 1,57 1,57 variable meshmhd fixed mesh 1,01 2,81 0,93 2,50 variable meshenergy fixed mesh 0,42 3,23 0,37 2,87 variable mesheqation of state ideal - 3,23 2,87 H+He subroutine 0,98 H+He table lookup table 0,11 3,33 2,87 opacity H-minus 0,20 lookup table 0,09 3,42 2,87 radiative transfer rays rays Feautrier 0,026 132 0,046 63 Splines Hermite 0,027 129 0,047 61 Integral 0,045 76 0,080 36
Altix Itanium-2 Scaling
Star Formation
Planet Formation
Stars
Stellar coronae & chromospheres
Applications
Star Formation
Nordlund & Padoan 2002
Key feature: intermittency!
What does it mean in this context? Low density, high velocity gas fills
most of the volume! High density, low velocity features
occupy very little space, but carry much of the mass!
How does it influence star formation? It greatly simplifies understanding it!
Inertial dynamics in most of the volume!
Collapsing featuresare relatively well
defined!
Turbulence Diagnostics of Molecular Clouds
Padoan, Boldyrev, Langer & Nordlund, ApJ 2002 (astro-ph/0207568)
Numerical (2503 sim) & Analytical IMF
Padoan & Nordlund (astro-ph/0205019)
Low Mass IMF
Padoan & Nordlund, ApJ 2004 (astro-ph/0205019)
Planet formation; gas collapse
Coronal Heating Initial Magnetic Field
Potential extrapolation of AR 9114
Coronal Heating: TRACE 195 Loops
Current sheet hierarchy
Current sheet hierarchy: close-up
Scan through hierarchy: dissipation
Hm, the dissipation looks
pretty intermittent– large nice empty areas to ignore with an AMR code, right?
Note that all features rotate as we scan
through – this means that these currents
sheets are all curved in the 3rd dimension.
Electric current JThis is still the dissipation.
Lets replace it by the electric current, as a
check!
Hm, not quite as empty, but the electric current is
at least mostly weak, right?
J log(J)
So, let’s replace the current with the log of
current, to see the levels of the hierarchy better!
Log of the electric current
Not really much to win with AMR here, if we want
to cover the hierarchy!
Solar & stellar surface MHD
Faculae
Sunspots
Chromospheres
Coronae
Faculae:Center-to-LimbVariation
Radiative transfer
’Exact’ radiative energy transfer is not expensive allows up to ~100 rays per point for 2 x CPU-time parallelizes well (with MPI or OpenMP)
Reasons for not using Flux Limited Diffusion Not the right answer (e.g. missing shadows) Is not cheaper
Radiative Transfer: Significance Cosmology
End of Dark Ages
Star Formation Feedback: evaporation of molecular clouds Dense phases of the collapse
Planet Formation External illumination of discs Structure and cooling of discs
Stellar surfaces Surface cooling: the driver of convection
Radiative transfer methods
Fast local solvers Feautrier schemes; the fastest (often) Optimized integral solutions; the simplest
A new approach to parallellizing RT Solve within each domain, with no bdry radiation Propagate and accumulate solutions globally
Moments of the radiation field
Give up, adopting some approximation? Flux Limited Diffusion
Did someone say ”shadows”??
Or, solve as it stands? Fast solvers Parallelize
Did someone say ”difficult”?
Phew, 7 variables!?!
Rays Through Each Grid Point
Interpolate source function to rays in each plane
How many rays are needed?
Depends entirely on the geometry
For stellar surfaces, surprisingly few! 1 vertical + 4 slanted, rotating
1% accuracy in the mean Q a few % in fluctuating Q
8 rays / 48 rays see plots
8 rays / 48 rays
Radiative transfer steps
Interpolate source function(s) and opacity Simple translation of planes – fast
Solve along rays May be done in parallel (distribute rays)
Interpolate back to rectangular mesh Inverse of 1st interpolation (negative shift)
Add up Integrate over angles (and possibly frequencies or bins)
Along straight rays, solve
SId
dI
SI
d
dI
Or actually, solve directly forthe cooling (I-S)!
d
dSq
d
dq
SIq
SId
dI
d
dSq
d
dq
SIq
SId
dI
Source Function(input)
New Source Function(input)
Formal (and useful) solutions
For simplicity, let’s consider the standard formulation
Has the formal solution:
SId
dI
SI
d
dI
')'()( ||||0
0 deSeII ')'()( ||||0
0 deSeII
Doubly useful
As a direct method Very accurate, if S() is piecewise parabolic The slowness of exp() can be largely avoided
As a basis for domain decomposition Add ’remote’ contributions separately!
Direct solution, integral form
How to parallelize (Heinemann, Dobler, Nordlund & Brandenburg – in prep.)
Solve for the intensity generated internally in each domain, separately and in parallel
Then propagate and accumulated the boundary intensities, modified only by trivial optical depth factors
Putting it together
The Transfer Equation & The Transfer Equation & ParallelizationParallelization
Analytic Solution:Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Intrinsic Calculation
Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParallelizationParallelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParallelizationParallelization
Analytic Solution:
Ray direction
Communication
Processors
The Transfer Equation & The Transfer Equation & ParallelizationParallelization
Analytic Solution:
Ray direction
Processors
Intrinsic Calculation
Pencil Code (Brandenburg et al)CPU-time per ray-point
Ignore!(bad node
distribution)
about160 nsec / pt / ray
Can be improved w
factor 4-5!
CPU-time per point (Pencil Code)
Timing Results, Stagger Code microseconds/point/substep
Pentium 4, 2 GHz Alpha EV7 1.3 GHz128x105x128 dcsc.sdu.dk accum hyades accum
mass+momentum fixed mesh 1.80 1.80 1.57 1.57 variable meshmhd fixed mesh 1.01 2.81 0.93 2.50 variable meshenergy fixed mesh 0.42 3.23 0.37 2.87 variable mesheqation of state lookup table 0.11 3.33 2.87 opacity lookup table 0.09 3.42 2.87 radiative transfer rays rays Feautrier 0.026 132 0.046 63 Hermite 0.027 129 0.047 61 Integral 0.045 76 0.080 36
Radiative Transfer Radiative Transfer ConclusionsConclusions
The methods are conceptually simple fast robust scale well in parallel environments
Collisionless shocks
Not an artists rendering! Shows electrical current filamentsin a collisionless shock simulation
with ~ 109 particles and ~ 3 109 mesh zones
Particle-in-Cell (PIC) code
Steps Relativistic particle move, using B & E
Uses - relativistic momenta About 3 105 particle updates / sec on P4 laptop Parallelizes nearly linearly (OpenMP on Altix)
Gather fields; ni, ne , ji , je
2nd order; Triangular Shaped Clouds (TSC) Push B & E – staggered in space and time
Electrostatic solver
Based on original 2-D, non-relativistic code by Michael Hesse, GSF
3-D, relativistic version developed by Frederiksen, Haugbølle, Hededal & Nordlund, Copenhagen
Use of Maxwell’s Equations in the code
02
0
1
0
0
c t
t
EB
BE
J
B
E
02
0
1
0
0
c t
t
EB
BE
J
B
E
Fields on mesh
Sampledparticles
Basic tests: wave propagation, etcBasic tests: wave propagation, etc
Example: Single electron
Electron & proton circling in separate orbits Relativistic; =10
NOTE: resolution implications of high !
Far field: Synchrotron radiation
The Weibel Instability Well known and understood
First principles; anisotropic PDFsWeibel 1959, Fried 1959, Yoon & Davidson 1987
Numerical studies, electron-positron, 2-DWallace & Epperlein 1991, Yang et al 1994Kazimura et al 1998 (ApJ)
Numerical studies, relativistic, ion-electronCalifano et al 1997, ‘98, ‘99, ‘00, ‘01, ’02, ..
Application to GRBsMedvedev & Loeb 1999, Medvedev 2000, ’01, …
The Weibel Instability (two-stream)The Weibel Instability (two-stream)
(Weibel 1959, Medvedev & Loeb 1999)
Experiments 3-D
Of the order 200x200x800 mesh, 109 part.
Cold beam from the left Carries negligible magnetic field
Hits denser plasma, initially field free Weibel instability B, E
So, what is this?
A Weibel-like instability at high Initial scales ~ skin depth Conventional expectation: restricted to skin depth
Generated fields propagate at v~c Fluctuations ‘ride’ on the beam Losses supported by beam population Scales grow down the line!!
Along
Across
Electron and ion current channels
Coherent Structures in Collisionless Shocks
Ion and electron structuresIon and electron structures
A non-Fermi acceleration scenarioA non-Fermi acceleration scenario
Hededal, Haugbølle, Frederiksen and Nordlund (2004)astro-ph/0408558
Electrons are accelerated instantaneously inside the Debye cylinder surrounding the ion current channels.
Electron path near ion channelElectron path near ion channel
CH note:10%-40% optical dark (HETE, BeppoSax).
50% detected in radio.
CH note:10%-40% optical dark (HETE, BeppoSax).
50% detected in radio.
Hededal, Haugbølle, Frederiksen and Nordlund (2004)astro-ph/0408558
Perspectives for the future
Star Formation Is turbulent fragmentation the main mechnism? How important are magnetic fields are important for the
IMF? Include radiative transfer during collapse! Magnetic fields are also important during collapse!
Planet Formation RT important for initial conditions ... ... as well as for disc structure and cooling
Stellar surfaces Include approx. RT in simulations of chromosphere
Solar Plans
Convection: from granulation to supergranulation scales
SunspotsFaculae
Chromosphere
Corona
20 M
m30
Mm
50 Mm
top related