Top Banner
GPU-accelerated CFD Simulations for Turbomachinery Design Optimization Mohamed H. Aissa Co-promotor:Dr. Tom Verstraete Promotor: Prof. C. Vuik www.researchgate.net/profile/Mohamed_Aissa3
51

GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Jun 29, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

GPU-accelerated CFD Simulationsfor Turbomachinery Design Optimization

Mohamed H. Aissa

Co-promotor:Dr. Tom Verstraete

Promotor: Prof. C. Vuik

www.researchgate.net/profile/Mohamed_Aissa3

Page 2: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Can your simulation profit from the GPU?

• What is a GPU?• How fast it is?• How to use it?

Page 3: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Multi-core vs many-core

Processor

Control Unit

Memory

Processor

Control Unit

Memory

Processor

Control Unit

Memory

Processor

Control Unit

Memory

Communication Netwok

Processor

Control Unit

Memory

Processor ProcessorProcessor

2/35

Page 4: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Massive Parallel Systems (e.g. GPU) as a trade-off

ALU

Control Unit

Memory

ALUALU

ALUALU

ALUALU

ALUALU

ALUALU

ALUALU

ALUALU

ALUALU

ALU ALU ALU

Source: the guardian.com

3

Page 5: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

How fast is it?

Performance Gain

LU QR

SpMV Ray tracing

FFT Image processing

How to use a GPU

Ease of

Use

Performance Gain

GPU libraires:cuFFT, cuBLAS…

Lattice Boltzmann

4

Page 6: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Airplanes are gettingmore efficient

Engine optimization is a main contributor

Page 7: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Topic of Interest

• Nblades= 15• Chord length fixed

• Casing fixture

TurboLab Stator (1/4)

60 mm d=10mmh=20mm

d=2mm6

Page 8: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Inlet P0: 102713.0 PaInlet T0: 294.314 K

Inlet whirl angle: 42°Inlet pitch angle: 0 °

TurboLab (2/4): Boundary conditions and summary

Objectives:• Lower axial deviation• Lower total pressure loss

9 kg/s

7

Page 9: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

TurboLab (3/4): Parametrization21 Design variables

8

Page 10: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

TurboLab (4/4):Optimization Results

1.7 %

IT074IND6

60%

0.17%

9Every point is a costly CFD optimization need for a HPC solution

Page 11: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

How beneficial are GPUs, a quick literature check:

• Acceleration is case-dependent (from 1x to 1000x).

• Speedups are sometimes contradicting.

• Some publications are very critical to GPUs for scientific computations:– Lee et al “Debunking the 100x GPU vs. CPU myth”

– Vuduc et al. “On the limits of GPU acceleration”

10/35

Page 12: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

* All icon in this document from Flaticon.com

Page 13: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

Page 14: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Numerical Scheme:

Implicit Time Stepping

Explicit Time Stepping

12

Page 15: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

• Explicit time integration

Page 16: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Explicit solver

• Application:Steady RANS simulation

• Solved Equations:RANS (SA Model)

• Discretization (2nd Order):Roe Scheme + Flux LimiterExplicit RK 4 Stage

• Mesh: Multi-Block, Structured

• Acceleration: • 2 level Multigrid• Implicit Residual Smoothing

13

Page 17: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Explicit solver

162

90

0

20

40

60

80

100

120

140

160

180

0 100 200 300 400 500

Spe

ed

up

(o

ver

1 c

ore

Xeo

n E

3)

N Cells in thousands

GTX980

K40

13

Page 18: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Convective Flux Evaluation (1/3)

14

Page 19: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Convective Flux (2/3):Thread mapping possibilities

• Face-wise is not thread-safe

18 /47

Page 20: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

• Cell-based mapping thread safe but with redundancy

14

Convective Flux (2/3): Thread mapping possibilities

Page 21: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

• Direction-based mapping thread safe and less redundancy

14

Convective Flux (2/3): Thread mapping possibilities

Page 22: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

• Multicoloring (MC) Face-based mappingthread safe and No redundancy

14

Convective Flux (2/3): Thread mapping possibilities

Page 23: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Compute Residual at

face i

Read striped

N/2 thread

Storestriped

i+1i

MC: Run 1

Compute Residual at

face i

Read coalesced

N thread

Storecoalesced

i

Red: Run 1

Compute Residual at

face i+1

N thread

Compute Residual at

face i+1

N/2 threadMC: Run 2

i+2i+1Read striped

Storestriped 0

50

100

150

200

250

1 2 3 4 5 6 7 8Mem

ory

Ban

dw

idth

[G

B/s

]

Stride length (in 4 bytes)

Memory bandwidth for striped memory access on GTX780:

A[i*stride]=B[i*stride]+C[i*stride]

15/35

Convective Flux (3/3): Multicolred (MC) vs redundant (Red)

Page 24: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

MC RED

Face fluxes per call N/2 2N

Total faces fluxes N 2N

Time per call [ms] 0,28 0,71

Total time [ms] 0,56 0,71

Operations ratio - 2x

Total Speedup 1,26x -

1,26x instead of 2x:cost of striped access

16

Convective Flux (3/3): Multicolred (MC) vs redundant (Red)

Page 25: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Convergence Acceleration on GPU (1/3)

• Explicit solver is well adapted to the GPU architecture• Flow convergence is slow (CFL limitation)• Need for convergence acceleration.

• convergence acceleration methods on the GPU?• Multigrid• Implicit residual smoothing

17

Page 26: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Convergence Acceleration on GPU (2/3): Multigrid is also fast on the GPU

• Solve on fine grid• Interpolate solution and residual to coarse grid

CPU

GPU1,81

1,581,35

0

0,5

1

1,5

2

0 100 200 300 400 500

T 2xG

rid

s/T 1

Gri

d

N Cells in thousands

Cost of a 2-Grid schemeconverging to ideal cost of 1,125

• Solve on coase grid assisted by fine residual• Prolongate coarse correction to fine grid

1,125

18

Page 27: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Convergence Acceleration on GPU (3/3): Implicit Residual Smoothing on GPU

• Higher CFL Oscillation in the solution.

• A smoother residual reduces the oscillation ->Higher CFLs.

• Smoothing: diffusion equation solve a tridiagonal system.

i: 0 -> Ni

j: 0 -> N

j

Page 28: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

• Higher CFL Oscillation in the solution.

• A smoother residual reduces the oscillation ->Higher CFLs.

• Smoothing: diffusion equation solve a tridiagonal system.

Interesting envelop(CFL increase with IRS x2-x3)

i: 0 -> Ni

j: 0 -> N

j

19

Convergence Acceleration on GPU (3/3): Implicit Residual Smoothing on GPU

Page 29: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers• Explicit time integration• Implicit time integration

Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

Page 30: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Implicit Time Stepping is more Stable but …

20

Page 31: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

GMRES + Preconditioner

21/35

Page 32: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

ILU is costly on GPU

• ILU-GMRES: Small gainon every iteration but ILU setup is slow:

• MCILU-GMRES: Multi-colored ILU fast only for small problems.

0

1

2

3

4

5

6

7

8

9

10

264k 1058k 2249k 4631k

Spee

du

p

Nrows

Speedup Krylov Speedup ILU

0

1

2

3

4

5

6

7

8

9

264k 1058k 2249k 4631kSp

eed

up

Nrows

Speedup Krylov Speedup MCILU

Page 33: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Why not Jacobi PC

0

20

40

60

80

100

120

140

160

180

264k 1058k 2249k 4631k

Spee

du

p

Nrows

Speedup Krylov Speedup Jacobi

• Jacobi-GMRES: very fastbut stable only for small time steps

• Jacobi-GMRES: Speedup decreases for higher CFLs

23

Page 34: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

56023412 3586

1178

16568

579

16691

591

CPU ILU GPU ILU CPU OD-ILU GPU OD-ILU

Solve [s] Assemble[s]

On-demand factorization

5.55x11.46x

24

Page 35: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

Page 36: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Speedup 1000x

Speedup 10x

Actually GPU is slower

Picture modified from :https://pixabay.com/en/blind-men-elephant-story-feel-see-1458438/

Classification (1/2):GPUs controversy

• GPU thousands of lightweight cores.

• Explicit solver: 10x to 100x speedup.

• Implicit solver: 1x to 10x speedup

We need a classification

25

Page 37: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Speedup 1000x

Speedup 10x

Actually GPU is slower

Picture modified from :https://pixabay.com/en/blind-men-elephant-story-feel-see-1458438/

25

Classification (1/2):GPUs controversy

Page 38: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

38

Full article: Aissa M. , Verstraete ,T., and Vuik, c.. "Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes." Computers & Mathematics with Applications 74.1 (2017): 201-217.

Speedup 1000x

Speedup 10x

Actually GPU is slower

25

Classification (2/2):CFD operations

Page 39: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Performance Comparison: Explicit/Implicit

136x

7x

3x

Ref

GPU Explicit

GPU Implicit

CPU Implicit

CPU Explicit

26

Page 40: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Performance Comparison: Explicit/Implicit

0,001

0,01

0,1

1

10

100

1000

0,1 1 10 100 1000

Implicit CPU Explicit CPU implicit GPU explicit GPU

136x

Normalized wall time

27

Page 41: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

Page 42: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Rc=14 (low CFL for implicit =15)

Example of a stator Optimization

0,001

0,01

0,1

1

10

100

1000

0,1 1 10 100 1000

Implicit CPU Explicit CPU

implicit GPU explicit GPU

28/35

Page 43: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Example of a stator Optimization

29

Page 44: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Example of a stator Optimization

Page 45: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

LS82 cascade

Rc=457 (Explicit solver bad flow convergence)

0,001

0,01

0,1

1

10

100

1000

0,1 1 10 100 1000

Implicit CPU Explicit CPU

implicit GPU explicit GPU

31

Page 46: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

LS82 cascade: Results

32

Page 47: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

LS82 cascade: Optimized blade

Page 48: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Main objective: A more tangible GPU potential

CFD GPU solvers Classification of CFD operations

Summaryand Conclusions

Proof-of-concept: Optimization cases

Page 49: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Summary

Choice Explicit/Implicit: Convergence ratio is decisive.

Explicit RANS:100x-180x speedup.

Implicit RANS: 10x-20x speedup(due to slow preconditioning.

On-demand preconditioning:x3 faster but GPU-friendlierpreconditioner is needed.

The classification: an operation-specific acceleration offers more insights.

34

Page 50: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Can your simulation profit from the GPU?

• Where you situate your algorithm (slide 4: QR to ray-tracing)?• Do you need double precision (for half-precision FPGA is faster)?• ready to code (otherwise openACC is easier to use)?• Anyone provided a classification for operation used in your field?

35

Page 51: GPU-accelerated CFD Simulations for Turbomachinery …...Aissa M. , Verstraete ,T., and Vuik, c.. " Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured

Thanks for your attention

Dr. Mohamed H. Aissa

Turbomachinery & Propulsion Department

Email: [email protected]

www.researchgate.net/profile/Mohamed_Aissa3