1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter † , David Ham † , Lawrence Mitchell † , Eike Hermann M ¨ uller * , Robert Scheichl * * University of Bath, † Imperial College London 16th ECMWF Workshop on High Performance Computing in Meteorology Oct 2014 Eike Mueller FEM multigrid solvers in NWP
20
Embed
Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1/20
Efficient multigrid solvers for mixed finite elementdiscretisations in NWP models
Colin Cotter†, David Ham†, Lawrence Mitchell†,Eike Hermann Muller*, Robert Scheichl*
*University of Bath, †Imperial College London
16th ECMWF Workshop on High Performance Computing inMeteorology Oct 2014
Eike Mueller FEM multigrid solvers in NWP
2/20
Motivation
Equations, Algorithms
Parallel Optimised
Code
Equations, Algorithms
High level code
Parallel Optimised
Code
PETSc firedrakePyOP2 *
Computational ScientistDomain specialist
discretisation
Use case: complex iterative solver for pressure correction eqn.* All credit for developing firedrake/PyOP2 goes to David Ham and his group at
Imperial, I’m giving this talk as a “user”
Eike Mueller FEM multigrid solvers in NWP
Separation of concerns and high-level abstractions
16 64 256 1024 4096 16384 65536Number of CPU cores
256512768
CGMultigrid
Titan [nVidia Kepler]Hector [AMD Opteron]
1 4 16 64 256 1024 4096 16384Number of GPUs
109
1010
1011
1012
1013
1014
1015
1016
FLOP
s
GigaFLOP
TeraFLOP
PetaFLOP
single GPU peak
Titan theoretical peak
256512768
CGMultigrid
Weak scaling, total solution time Absolute performance
0.5 · 1012 dofs on 16384 GPUs of TITAN, 0.78 PFLOPs, 20%-50% BWFinite volume discretisation, simpler geometry
Eike Mueller FEM multigrid solvers in NWP
4/20
Motivation
ChallengesFinite element discretisation (Met Office next generation dyn. core),unstructured grids⇒ more complicatedDuplicate effort for CPU, GPU (Xeon Phi,. . . ) implementation& optimisation, reinvent the wheel?Mix two issues:
GoalSeparation of concerns (algorithmic↔ computational)⇒ rapid algorithm development and testing at highabstraction levelPerformance portabilityReuse existing, tested and optimised tools and libraries
⇒ Choice of correct abstraction(s)Eike Mueller FEM multigrid solvers in NWP
5/20
This talk
Iterative solver for Helmholtz equation:
φ + ω∇ (φ∗u) = rφ−u − ω∇φ = ru
Mixed finite element discretisation, icosahedral grid
Matrix-free multigrid preconditioner for pressure correction
Performance portable firedrake/PyOP2 toolchain[Rathgeber et al., (2012 & 2014)]
Convergence historyInner Solve: 3 CG iterations with multigrid preconditioner
Icosahedral grid, 327,680 cells
4 multigrid levels
0 5 10 15 20iteration
10-6
10-5
10-4
10-3
10-2
10-1
100
rela
tive r
esi
dual ||r|| 2/||r
0|| 2
DG0 +RT0 Richardson
DG0 +RT0 GMRES
DG1 +BDFM1 Richardson
DG1 +BDFM1 GMRES
Eike Mueller FEM multigrid solvers in NWP
17/20
Efficiency
Single node run on ARCHER, lowest order, 5.2 · 106 cells[preliminary]
0 1 2 3 4 5 6 7time [s]
17.1GB/s
10.6GB/s
4.7GB/s
1.3GB/s
Total solve
Pressure solve
Helmholtz operator
STREAM triad: 74.1GB/s per node⇒ up to ≈ 23% of peak BW
Eike Mueller FEM multigrid solvers in NWP
18/20
Parallel scalability
Weak scaling on ARCHER [preliminary]
6 24 96 384
1536
Number of cores
100
101
102
Solu
tion t
ime [
s]Number of cells
1.3
mio
5.2
mio
21.0
mio
83.9
mio
335.
5mio
hypre BoomerAMG
matrix-free MGDG1 +BDFM1
DG0 +RT0
Eike Mueller FEM multigrid solvers in NWP
19/20
Conclusion
Summary
Matrix-free multigrid-preconditioner for pressure correctionequationImplementation in firedrake/PyOP2/PETSc framework:
Performance portabilityCorrect abstractions⇒ Separation of concerns
Outlook
Test on GPU backend
Extend to 3d: Regular vertical grid⇒ Improved caching⇒Improved memory BW
Improve and extend parallel scaling
Full 3d dynamical core (Colin Cotter)
Eike Mueller FEM multigrid solvers in NWP
20/20
References
The firedrake project: http://firedrakeproject.org/
F. Rathgeber et al.: Firedrake: automating the finite elementmethod by composing abstractions. in preparation
F. Rathgeber et al.: PyOP2: A High-Level Framework forPerformance-Portable Simulations on Unstructured Meshes.HPC, Networking Storage and Analysis, SC Companion, p 1116-1123, Los
Alamitos, CA, USA, 2012. IEEE Computer Society
E. Muller et al.: Petascale elliptic solvers for anisotropic PDEson GPU clusters.Submitted to Parallel Computing (2014) [arxiv:1402.3545]
E. Muller, R. Scheichl: Massively parallel solvers for ellipticPDEs in Numerical Weather- and Climate Prediction.QJRMS (2013) [arxiv:1307.2036]
Helmholtz solver code on githubhttps://github.com/firedrakeproject/firedrake-helmholtzsolver