Top Banner
University of Dortmund parpp3d++ a Parallel HPC Research Code for CFD Block smoothers in parallel multigrid methods; Hitachi SR8000 vs. Linuxcluster Sven H.M. Buijssen <[email protected]>, Stefan Turek <[email protected]> Institute of Applied Mathematics University of Dortmund Germany 6 th HLRS Workshop (Stuttgart, October 6-7, 2003) p.1/28
64

parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations....

Mar 20, 2018

Download

Documents

trinhminh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

parpp3d++ –a Parallel HPC Research Code for CFD

Block smoothers in parallel multigrid methods;Hitachi SR8000 vs. Linuxcluster

Sven H.M. Buijssen <[email protected]>,Stefan Turek <[email protected]>

Institute of Applied MathematicsUniversity of DortmundGermany

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.1/28

Page 2: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Outline of this talk

• Presentation of a portable research code for thesimulation of 3-D incompressible nonstationary flows.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.2/28

Page 3: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Outline of this talk

• Presentation of a portable research code for thesimulation of 3-D incompressible nonstationary flows.

• Comparison of run times and scaling on HitachiSR8000, Cray T3E (Jülich), Linuxcluster HELICS

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.2/28

Page 4: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Outline of this talk

• Presentation of a portable research code for thesimulation of 3-D incompressible nonstationary flows.

• Comparison of run times and scaling on HitachiSR8000, Cray T3E (Jülich), Linuxcluster HELICS

• Use of block smoothers in parallel multigrid methods↪→ (numerical) consequences for parallel efficiencies

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.2/28

Page 5: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Outline of this talk

• Presentation of a portable research code for thesimulation of 3-D incompressible nonstationary flows.

• Comparison of run times and scaling on HitachiSR8000, Cray T3E (Jülich), Linuxcluster HELICS

• Use of block smoothers in parallel multigrid methods↪→ (numerical) consequences for parallel efficiencies

• Some current and recent applications of the code

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.2/28

Page 6: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

The underlying problem

• Incompressible nonstationary Navier–Stokesequations

ut − ν∆u + (u · ∇)u + ∇p = f , ∇ · u = 0

have to be solved

• Finite element discretisation of this system of PDEsleads to huge systems of (non-)linear equations

(> 107 unknowns per timestep)

• Solving with parallel multigrid methods(chosen for their optimal numerical complexity forill-conditioned PDE problems)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.3/28

Page 7: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (1)

Numerics applied to solve the Navier–Stokes equations:

• (implicit) 2nd order discretisation in time(both Fractional-Step-Θ- and Crank-Nicolson-scheme↪→ adaptive time stepping possible)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.4/28

Page 8: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (1)

Numerics applied to solve the Navier–Stokes equations:

• (implicit) 2nd order discretisation in time(both Fractional-Step-Θ- and Crank-Nicolson-scheme↪→ adaptive time stepping possible)

• finite element approach for space discretisation

(non-parametric non-conforming Q̃1/Q0 ansatz)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.4/28

Page 9: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (1)

Numerics applied to solve the Navier–Stokes equations:

• (implicit) 2nd order discretisation in time(both Fractional-Step-Θ- and Crank-Nicolson-scheme↪→ adaptive time stepping possible)

• finite element approach for space discretisation

(non-parametric non-conforming Q̃1/Q0 ansatz)

M3M2

M6

M4

M1

M5

↪→ hexahedral grids

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.4/28

Page 10: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (1)

Numerics applied to solve the Navier–Stokes equations:

• (implicit) 2nd order discretisation in time(both Fractional-Step-Θ- and Crank-Nicolson-scheme↪→ adaptive time stepping possible)

• finite element approach for space discretisation

(non-parametric non-conforming Q̃1/Q0 ansatz)

M3M2

M6

M4

M1

M5

↪→ hexahedral grids

• Stabilisation of convective term with (Samarskij)Upwind scheme

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.4/28

Page 11: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (2)

Numerics applied to solve the Navier–Stokes equations(cont.):

• Within each time step:

– Discrete Projection method to decouplevelocity-pressure problem

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.5/28

Page 12: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (2)

Numerics applied to solve the Navier–Stokes equations(cont.):

• Within each time step:

– Discrete Projection method to decouplevelocity-pressure problem

– The resulting nonlinear Burgers equation in u issolved by fixed point defect correction method(outer loop) and multigrid (inner loop)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.5/28

Page 13: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (2)

Numerics applied to solve the Navier–Stokes equations(cont.):

• Within each time step:

– Discrete Projection method to decouplevelocity-pressure problem

– The resulting nonlinear Burgers equation in u issolved by fixed point defect correction method(outer loop) and multigrid (inner loop)

– Remaining linear problem in p (Pressure Poissonproblem, ill-conditioned!) is solved with multigridpreconditioned conjugate gradient method

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.5/28

Page 14: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (4)

Parallelisation strategy

• Domain decomposition using graph-orientedpartitioning tool (Metis or PARTY library)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.6/28

Page 15: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (4)

Parallelisation strategy

• Domain decomposition using graph-orientedpartitioning tool (Metis or PARTY library)

• Uniform refinement of each parallel block, typically 4-6times

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.6/28

Page 16: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (4)

Parallelisation strategy

• Domain decomposition using graph-orientedpartitioning tool (Metis or PARTY library)

• Uniform refinement of each parallel block, typically 4-6times

• local communication between at most two adjacent

parallel blocks (due to FEM ansatz: Q̃1/Q0 !)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.6/28

Page 17: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (4)

Parallelisation strategy

• Domain decomposition using graph-orientedpartitioning tool (Metis or PARTY library)

• Uniform refinement of each parallel block, typically 4-6times

• local communication between at most two adjacent

parallel blocks (due to FEM ansatz: Q̃1/Q0 !)

• block smoothing

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.6/28

Page 18: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (5)

Smoothing:

• Numerical and computational complexity of multigridstands and falls with the smoothing algorithms used.

• smoothers, however, have in general a highly recursivecharacter

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.7/28

Page 19: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (5)

Smoothing:

• Numerical and computational complexity of multigridstands and falls with the smoothing algorithms used.

• smoothers, however, have in general a highly recursivecharacter

Idea of block smoothing:

• Avoid direct parallelisation of global smoother(significant amount of communication!)

• Instead: Apply the same smoothing algorithm withineach parallel block only(parallel block = one patch of elements from thepartitioning algorithm)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.7/28

Page 20: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (6)

Consequences of block smoothing:With increasing number of parallel processes:

• It takes more than 1 iteration to spread informationacross the grid (weakened smoothing property)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.8/28

Page 21: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (6)

Consequences of block smoothing:With increasing number of parallel processes:

• It takes more than 1 iteration to spread informationacross the grid (weakened smoothing property)

In limit case:block smoother turns into simple Jacobi iteration!

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.8/28

Page 22: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (6)

Consequences of block smoothing:With increasing number of parallel processes:

• It takes more than 1 iteration to spread informationacross the grid (weakened smoothing property)

In limit case:block smoother turns into simple Jacobi iteration!

• Thus, the number of multigrid sweps will increase.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.8/28

Page 23: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (6)

Consequences of block smoothing:With increasing number of parallel processes:

• It takes more than 1 iteration to spread informationacross the grid (weakened smoothing property)

In limit case:block smoother turns into simple Jacobi iteration!

• Thus, the number of multigrid sweps will increase.

Significantly?

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.8/28

Page 24: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (6)

Consequences of block smoothing:With increasing number of parallel processes:

• It takes more than 1 iteration to spread informationacross the grid (weakened smoothing property)

In limit case:block smoother turns into simple Jacobi iteration!

• Thus, the number of multigrid sweps will increase.

Significantly?

– for Burgers equation: probably not, goodconditioned (scaling with time step k)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.8/28

Page 25: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Mathematical background (6)

Consequences of block smoothing:With increasing number of parallel processes:

• It takes more than 1 iteration to spread informationacross the grid (weakened smoothing property)

In limit case:block smoother turns into simple Jacobi iteration!

• Thus, the number of multigrid sweps will increase.

Significantly?

– for Burgers equation: probably not, goodconditioned (scaling with time step k)

– for discrete Pressure Poisson equation:probably, problem with condition of O(h−2)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.8/28

Page 26: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Implementation

• Code written in C/C++(thrifty usage of comfortable, but performancereducing language elements)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.9/28

Page 27: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Implementation

• Code written in C/C++(thrifty usage of comfortable, but performancereducing language elements)

• designed to run on most MPP units running some kindof Unix flavour and providing an MPI environment.(tested on clusters of Sun, SGI & Alpha Workstations,Linux-PCs, Cray T3E, SR8000, ...)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.9/28

Page 28: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Implementation

• Code written in C/C++(thrifty usage of comfortable, but performancereducing language elements)

• designed to run on most MPP units running some kindof Unix flavour and providing an MPI environment.(tested on clusters of Sun, SGI & Alpha Workstations,Linux-PCs, Cray T3E, SR8000, ...)

↪→ does not incorporate explicit vector processingroutines

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.9/28

Page 29: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Implementation

• Code written in C/C++(thrifty usage of comfortable, but performancereducing language elements)

• designed to run on most MPP units running some kindof Unix flavour and providing an MPI environment.(tested on clusters of Sun, SGI & Alpha Workstations,Linux-PCs, Cray T3E, SR8000, ...)

↪→ does not incorporate explicit vector processingroutines

• has a well-tested sequential (F77) counterpart fromthe FEATFLOW package (author: Turek et al., since1985)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.9/28

Page 30: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.10/28

Page 31: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

• Roughly a dozen different flow problems have beensimulated yet

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.11/28

Page 32: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

• Roughly a dozen different flow problems have beensimulated yet

• Pars pro toto, the typical effects that can be observedwill be illustrated on the basis of the DFG benchmark3D-2Z from 1995 ("channel flow around a cylinder")

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.11/28

Page 33: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

DFG benchmark 3D-2Z from 1995:

inflowboundary

outflowboundary

Re = 500

aspect ratio ≈ 20

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.12/28

Page 34: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

Grids:

2x refined grid, side view

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.13/28

Page 35: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

We did some long term simulation (TEnd = 20s) ...

• degrees of freedom: 32 million (9 GByte RAM)

• #time steps: 6.500

... computed lift and drag coefficients ...

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0 2 4 6 8 10 12 14 16 18

lift coefficient0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 2 4 6 8 10 12 14 16 18

drag coefficient

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.14/28

Page 36: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Numerical section

... and studied the scaling of the program on differentplatforms for this problem (64, 128, 256 cpus) (T = [0, 1])

CrayT3E-1200

HitachiSR8000

HELICS

run time [h]0

25% comm. time 128 cpus

37%

62%

16%

19%

39%

17%

29%

42%

20 40 60

64 cpus

256 cpus

(HELICS = Linuxcluster at IWR Heidelberg, 512 Athlons 1.4 GHz, Myrinet,

www.helics.de)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.15/28

Page 37: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Observations (1)

• Hitachi SR8000 is conspicuously in last position

– sCC compiler (latest release) used

– run times with g++ worse

– code does not compile with KCC(although it does on Cray T3E-1200)

• The fact that Hitachi is outperformed by a (muchcheaper) Linux cluster (factor 2–3 in average) hasbeen perceived for different problem sizes, degrees ofparallelism and geometries.That’s the price for just using MPI and notincorporating vector processing techniques directlyinto the code.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.16/28

Page 38: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Observations (2)

• Performance of a single cpu on SR8000 is just notappropriately enough exploited by the code

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.17/28

Page 39: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Observations (2)

• Performance of a single cpu on SR8000 is just notappropriately enough exploited by the code

• But:How many of the codes being granted access toSR8000 are especially designed for this architecture?Percentage?

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.17/28

Page 40: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Observations (2)

• Performance of a single cpu on SR8000 is just notappropriately enough exploited by the code

• But:How many of the codes being granted access toSR8000 are especially designed for this architecture?Percentage?

• I bet: a significant percentage uses SR8000 as oneMPP unit among others, too.(relying on MPI and compiler optimisations of thecode)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.17/28

Page 41: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Observations (3)

Things are not merely bad at SR8000!

• best communication network, least time spent incommunication routines

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.18/28

Page 42: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Observations (3)

Things are not merely bad at SR8000!

• best communication network, least time spent incommunication routines

• hence, scaling is best of all tested platforms yet

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.18/28

Page 43: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Reduction in run time for different problem

AlphaclusterAlpha ES40 (cxx)Alpha ES40 (g+ + )Cray T3E-1200LinuxclusterSun Enterprise 3500

run

time

(min

)

# cpus

(Lid-Driven Cavity, 11 million d.o.f., 100 time steps)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.19/28

Page 44: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Parallel efficiencies

• Parallel efficiencies are rather good at few cpus:0.9 - 0.95

• But drop then to 0.6 - 0.7 at higher degrees ofparallelism (as long as problem size is reasonable big)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.20/28

Page 45: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Parallel efficiencies

• Parallel efficiencies are rather good at few cpus:0.9 - 0.95

• But drop then to 0.6 - 0.7 at higher degrees ofparallelism (as long as problem size is reasonable big)

Reason?

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.20/28

Page 46: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Parallel efficiencies

• Parallel efficiencies are rather good at few cpus:0.9 - 0.95

• But drop then to 0.6 - 0.7 at higher degrees ofparallelism (as long as problem size is reasonable big)

Reason?

• communication time increases at half the speed theparallel efficiencies drop↪→ there must be a different effect!

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.20/28

Page 47: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

The Pressure Poisson Problem

• Solving the Pressure Poisson Problem takes 10-15%of overall run time for 1-2 cpus

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.21/28

Page 48: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

The Pressure Poisson Problem

• Solving the Pressure Poisson Problem takes 10-15%of overall run time for 1-2 cpus

• Same problem, 64 cpus or more: 50-65% of run time!What is going on?

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.21/28

Page 49: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

The Pressure Poisson Problem

• Solving the Pressure Poisson Problem takes 10-15%of overall run time for 1-2 cpus

• Same problem, 64 cpus or more: 50-65% of run time!What is going on?

3

6

27

# it

era

tion

s

# cpus1 64 2561282 4

9

12

30

8

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.21/28

Page 50: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Remarks

• Deterioration is depending on the aspect ratios.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.22/28

Page 51: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Remarks

• Deterioration is depending on the aspect ratios.

• Speedup is not that bad if comparing e.g. 16 vs. 128cpus

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.22/28

Page 52: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Remarks

• Deterioration is depending on the aspect ratios.

• Speedup is not that bad if comparing e.g. 16 vs. 128cpus

• But:The performance of the code is not yet satisfyingenough to kick off incorporating more features like

– heat transfer (Boussinesq)

– k − ε–model

– free surface

– multiphase flow (bubble colon reactors)

which already exist in sequential.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.22/28

Page 53: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Remarks

• Hence, we’re looking for a solver for the PressurePoisson Problem which does not deteriorate with thenumber of cpus

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.23/28

Page 54: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Remarks

• Hence, we’re looking for a solver for the PressurePoisson Problem which does not deteriorate with thenumber of cpus

• And we have found a candidate. The implementation isdone in the context of the projects ScaRC and FEAST.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.23/28

Page 55: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

A current application

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.24/28

Page 56: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

A current application:

BMBF project 03C0348A: design of ceramic wallreactors

• Intension: development of ceramic wall reactors andceramic plate heat exchangers as micro reactors forheterogeneously catalysed gas phase reactions.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.25/28

Page 57: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

A current application:

BMBF project 03C0348A: design of ceramic wallreactors

• Intension: development of ceramic wall reactors andceramic plate heat exchangers as micro reactors forheterogeneously catalysed gas phase reactions.

• Main aim: increasing performance of reactor byoptimising its geometry to gain a equally distributedvelocity field.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.25/28

Page 58: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

A current application:

BMBF project 03C0348A: design of ceramic wallreactors

• Intension: development of ceramic wall reactors andceramic plate heat exchangers as micro reactors forheterogeneously catalysed gas phase reactions.

• Main aim: increasing performance of reactor byoptimising its geometry to gain a equally distributedvelocity field.

• Given this, the partners (Institute of ChemicalEngineering, University of Dortmund) and HermsdorferInstitute for Technical Ceramics) will try to calibratecatalytic activity, diffusive mass transport and heatremoval to attain an optimal temperature distribution.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.25/28

Page 59: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

A current application:

Sketch of overall geometry of ceramic wall reactor andflow directions

Inflow nozzle

Outflow nozzleSome obstaclesof a suitable shape

• 2 dozen different geometries so far

• average problem size: 60 million d.o.f., 100 time stepsto stationary limit case

• 12h with 16 cpus on SR8000 per simulation6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.26/28

Page 60: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

A current application:

Velocity field for some geometries

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.27/28

Page 61: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Conclusion and Outlook

• We have a parallel solver for 3-D incompressiblenonstationary Navier–Stokes equations which is fast,robust and portable.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.28/28

Page 62: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Conclusion and Outlook

• We have a parallel solver for 3-D incompressiblenonstationary Navier–Stokes equations which is fast,robust and portable.

• Exploitation of performance of computers like CrayT3E1200 and SR8000 is still too poor (due toimplementation and numerics)

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.28/28

Page 63: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Conclusion and Outlook

• We have a parallel solver for 3-D incompressiblenonstationary Navier–Stokes equations which is fast,robust and portable.

• Exploitation of performance of computers like CrayT3E1200 and SR8000 is still too poor (due toimplementation and numerics)

• But hopefully, my access to SR8000 will not bediscarded within the next days!

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.28/28

Page 64: parpp3d++ – a Parallel HPC Research Code for · PDF file · 2018-02-12a Parallel HPC Research Code for CFD ... • Incompressible nonstationary Navier–Stokes equations ut −ν∆u+(u·∇)

University of Dortmund

Conclusion and Outlook

• We have a parallel solver for 3-D incompressiblenonstationary Navier–Stokes equations which is fast,robust and portable.

• Exploitation of performance of computers like CrayT3E1200 and SR8000 is still too poor (due toimplementation and numerics)

• But hopefully, my access to SR8000 will not bediscarded within the next days!

• A new package is currently written which incorporatesboth better numerics (hardly any deterioration) andhardware-oriented implementation techniques(vectorisation, better cache exploitation).First tests show that we can get nearly 30-50% ofmachine’s peak performance.

6th HLRS Workshop (Stuttgart, October 6-7, 2003) p.28/28