Petascale Computational Fluid Dynamics with Python on GPUs
F.D. Witherden, P.E. Vincent
Department of Aeronautics Imperial College London
Introduction
• Computational fluid dynamics (CFD) is the bedrock of several high-tech industries.
• Desire amongst practitioners to perform unsteady, scale resolving simulations, within the vicinity of complex geometries.
Image courtesy of A.S. Ayer
The Need for FLOP/s
• From The Opportunities and Challenges of Exascale Computing, US DOE, fall 2010.
RMAX != RPEAK
• FLOP/s are great…
• if you can get them.
• Most commercial codes struggle to get ~10% of peak on CPUs.
PyFR
• A high-order compressible Navier-Stokes solver for unstructured grids.
• Designed from the ground up to run on NVIDIA GPUs.
• Written entirely in Python!
The Py in PyFR
• Leverages PyCUDA and mpi4py.
• Makes extensive use of run-time code generation.
• All compute performed on device.
• Overhead from the Python interpreter < 1%.
The Py in PyFR
• Leverages PyCUDA and mpi4py.
• Makes extensive use of run-time code generation.
• All compute performed on device.
• Overhead from the Python interpreter < 1%.
The FR in PyFR
• Uses flux reconstruction (FR) approach;
• can recover well-know schemes including nodal Discontinuous Galerkin (DG) methods.
• Lots of element-local structured compute.
The FR in PyFR• Majority of operations are block-by-panel type matrix
multiplications:
• where N ~ 105 and N ≫ (M, K).
C A B
N K
M
The FR in PyFR• In parallel only simple halo exchanges are required
between MPI ranks.
The FR in PyFR
• FR is a great fit for modern hardware.
• Previous GTC talks have outlined the key tenants of an efficient multi-GPU capable implementation:
• GTC 2014 — PyFR: Technical Challenges of Bringing Next Generation Fluid Dynamics to GPUs
• GTC 2015 — GiMMiK: Generating Bespoke Matrix Multiplication Kernels
PyFR Scaling• Evaluated on the Piz Daint cluster at CSCS.
• Test case is a NACA 0021 aerofoil at a high angle of attack.
Animation courtesy of J.S. Park
PyFR Strong Scaling
% o
f Pea
k FL
OP/
s
0
20
40
60
80
100
K20X GPUs
50 100 200 400
PyFR Weak Scaling
% o
f Pea
k FL
OP/
s
0
20
40
60
80
100
K20X GPUs
2 4 8 40 80 160 2000
1.31 PFLOP/s
So The Solver Scales• There’s a lot more to a code than just the solver…
• and it all needs to scale.
Traditional Visualisation• Traditional visualisation pipeline with PyFR:
Traditional Visualisation• Traditional visualisation pipeline with PyFR:
Traditional Visualisation
• Disk I/O…
• like device↔host transfers only
slower
• …much slower!
Ban
dwid
th M
iB/s
0
1400
2800
4200
5600
7000
Device↔host Disk
In-situ Visualisation• Cut out the middle men…
In-situ Visualisation• Cut out the middle men…
• Using ParaView Catalyst it is possible to avoid disk I/O…
In-situ Visualisation• Pipeline with Catalyst…
• majority of processing performed on the host with VTK.
Solution Triangle list
In-situ Visualisation• Can we do better?
• Yes!
• Interface with PyFR using the plugin infrastructure.
In-situ Visualisation
C++ shared libraryCUDA pointerPyFR plugin
In-situ Visualisation• Pipeline with Catalyst and VTK-m…
• all compute performed on the device.
Solution Triangle list
In-situ Visualisation• Pipeline with Catalyst and VTK-m…
• all compute performed on the device.
Solution Triangle list
In-situ Visualisation• Kitware
• Utkarsh Ayachit
• T.J. Corona
• David DeMarle
• Berk Geveci
• Robert Maynard
• Robert O’Bara
• Patrick O’Leary
• NVIDIA
• Bhushan Desam
• Tom Fogal
• Peter Messmer
• Jeremy Purches
• Imperial College
• Arvind Iyer
• Jin Seok Park
• Brian Vermeire
• ORNL
• Jack Wells
• Zenotech
• Mark Allan
• Jamil Appa
• Andrei Cimpoeru
• David Standingford
In-situ Visualisation
Animation courtesy of A.S. Ayer
In-situ Visualisation
Animation courtesy of A.S. Ayer
Summary• Funded and supported by
• Any questions?
• E-mail: [email protected]
• Website: http://pyfr.org