Petascale Computational Fluid Dynamics with Python on GPUs F.D. Witherden, P.E. Vincent Department of Aeronautics Imperial College London
Petascale Computational Fluid Dynamics with Python on GPUs
F.D. Witherden, P.E. Vincent
Department of Aeronautics Imperial College London
Introduction
• Computational fluid dynamics (CFD) is the bedrock of several high-tech industries.
• Desire amongst practitioners to perform unsteady, scale resolving simulations, within the vicinity of complex geometries.
Image courtesy of A.S. Ayer
The Need for FLOP/s
• From The Opportunities and Challenges of Exascale Computing, US DOE, fall 2010.
RMAX != RPEAK
• FLOP/s are great…
• if you can get them.
• Most commercial codes struggle to get ~10% of peak on CPUs.
PyFR
• A high-order compressible Navier-Stokes solver for unstructured grids.
• Designed from the ground up to run on NVIDIA GPUs.
• Written entirely in Python!
The Py in PyFR
• Leverages PyCUDA and mpi4py.
• Makes extensive use of run-time code generation.
• All compute performed on device.
• Overhead from the Python interpreter < 1%.
The Py in PyFR
• Leverages PyCUDA and mpi4py.
• Makes extensive use of run-time code generation.
• All compute performed on device.
• Overhead from the Python interpreter < 1%.
The FR in PyFR
• Uses flux reconstruction (FR) approach;
• can recover well-know schemes including nodal Discontinuous Galerkin (DG) methods.
• Lots of element-local structured compute.
The FR in PyFR• Majority of operations are block-by-panel type matrix
multiplications:
• where N ~ 105 and N ≫ (M, K).
C A B
N K
M
The FR in PyFR• In parallel only simple halo exchanges are required
between MPI ranks.
The FR in PyFR
• FR is a great fit for modern hardware.
• Previous GTC talks have outlined the key tenants of an efficient multi-GPU capable implementation:
• GTC 2014 — PyFR: Technical Challenges of Bringing Next Generation Fluid Dynamics to GPUs
• GTC 2015 — GiMMiK: Generating Bespoke Matrix Multiplication Kernels
PyFR Scaling• Evaluated on the Piz Daint cluster at CSCS.
• Test case is a NACA 0021 aerofoil at a high angle of attack.
Animation courtesy of J.S. Park
PyFR Strong Scaling
% o
f Pea
k FL
OP/
s
0
20
40
60
80
100
K20X GPUs
50 100 200 400
PyFR Weak Scaling
% o
f Pea
k FL
OP/
s
0
20
40
60
80
100
K20X GPUs
2 4 8 40 80 160 2000
1.31 PFLOP/s
So The Solver Scales• There’s a lot more to a code than just the solver…
• and it all needs to scale.
Traditional Visualisation• Traditional visualisation pipeline with PyFR:
Traditional Visualisation• Traditional visualisation pipeline with PyFR:
Traditional Visualisation
• Disk I/O…
• like device↔host transfers only
slower
• …much slower!
Ban
dwid
th M
iB/s
0
1400
2800
4200
5600
7000
Device↔host Disk
In-situ Visualisation• Cut out the middle men…
In-situ Visualisation• Cut out the middle men…
• Using ParaView Catalyst it is possible to avoid disk I/O…
In-situ Visualisation• Pipeline with Catalyst…
• majority of processing performed on the host with VTK.
Solution Triangle list
In-situ Visualisation• Can we do better?
• Yes!
• Interface with PyFR using the plugin infrastructure.
In-situ Visualisation
C++ shared libraryCUDA pointerPyFR plugin
In-situ Visualisation• Pipeline with Catalyst and VTK-m…
• all compute performed on the device.
Solution Triangle list
In-situ Visualisation• Pipeline with Catalyst and VTK-m…
• all compute performed on the device.
Solution Triangle list
In-situ Visualisation• Kitware
• Utkarsh Ayachit
• T.J. Corona
• David DeMarle
• Berk Geveci
• Robert Maynard
• Robert O’Bara
• Patrick O’Leary
• NVIDIA
• Bhushan Desam
• Tom Fogal
• Peter Messmer
• Jeremy Purches
• Imperial College
• Arvind Iyer
• Jin Seok Park
• Brian Vermeire
• ORNL
• Jack Wells
• Zenotech
• Mark Allan
• Jamil Appa
• Andrei Cimpoeru
• David Standingford
In-situ Visualisation
Animation courtesy of A.S. Ayer
In-situ Visualisation
Animation courtesy of A.S. Ayer
Summary• Funded and supported by
• Any questions?
• E-mail: [email protected]
• Website: http://pyfr.org