Top Banner
GPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1
25

GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Mar 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

GPU Computing with CUDALecture 9 - Applications - CFD

Christopher Cooper Boston University

August, 2011UTFSM, Valparaíso, Chile

1

Page 2: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Outline of lecture

‣Overview of CFD

- Navier Stokes equations

- Types of problems

- Discretization methods

‣ “Conventional” CFD

‣ Port CFD codes to CUDA

‣ Efforts

‣ Example problem: implicit heat transfer

2

Page 3: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣Numerical modeling of fluid systems

‣Navier-Stokes equation: momentum conservation

‣ Type of problems:

- Incompressible

- Compressible (non-viscous approximation)

- Shallow water

- Biphasic flows....3

CFD - Introduction

!u!t

+ (u ·!)u = "!p

"+ #!2u

Page 4: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣ Earliest: Richardson (1910)

- Human computers

- Quickest averaged 2000 operations a week

‣CFD development tied with computers!

- 50s-60s: use of digital computers, finite difference methods

- 70s: finite element methods, spectral methods

- 80s: finite volume methods

- 90s: application to diverse industries

4

CFD - Introduction

Page 5: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

CFD - Main discretization methods

‣ Finite difference

‣ Finite volume

5

!u

!x=

ui+1,j ! ui,j

!x

!

!t

!

!

"Ud! +"

!!

"Fnds = 0

!"U

!t+

! "F

!x= 0

Page 6: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣ Finite element method

‣ Spectral methods

6

CFD - Main discretization methods

!

!!u ·!v dx =

!

!fv dx.

!!u

!x= iku

Page 7: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣Mesh free methods

- Smoothed Particle Hydrodynamics

- Vortex methods

- Radial Basis Functions

- ...

7

CFD - Main discretization methods

!u

!x=

N!

i=0

"i!#i

!x

Page 8: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣ Fluid flow is a multi-scale phenomena

- We need Re3 mesh points to reproduce all scales!

- Turbulence modeling

- Approximate turbulence effects

8

CFD - Fluid Modeling

Re =V D

!

Page 9: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Conventional CFD

‣Unstructured grids

- Unstructured sparse matrices

‣ Incompressible

- Projection methods

‣ Implicit

- Linear solvers

‣Modeled turbulence

- Reduced number of points

9

! · u = 0

Page 10: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣CFD is a tough problem for the GPU:

- Memory bound problems

‣Also, needs to convince people

- Old legacy codes

- How to port old codes to the GPU?

‣On the other hand, CFD codes are

- SIMD

- Single precision

- Large data sets10

Conventional CFD

Page 11: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Porting a code to GPU

‣Option 1: accelerate the existing code

‣Option 2: Rewrite code from scratch

‣Option 3: Rethink algorithms

11Next slides credits: J. Cohen - NVIDIA

Pot

entia

l ac

cele

ratio

n

Page 12: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣ Easiest way

‣ Probably not huge speedup

‣ Libraries like Cusp or CUFFT may be useful

12

Option 1: Accelerate existing code

Page 13: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣ SpeedIT (OpenFOAM)

- Ported linear solvers to GPU

- Supports multi-GPU

13

Option 1: Accelerate existing code - SpeedIT

Ville Tossavainen (Seeinside Ltd.)

Mesh Speedup

20x20 -100x

96x96x96 2.4x

128x128x32 2.0x

Xeon X5650 CPUM2050 GPU

Page 14: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

‣ FEAST (Finite Element Analysis and Solution Tools)

- High level abstraction approach

- Isolate “accelerable” parts of code

- Ports solver to GPU: Multigrid

14

Option 1: Accelerate existing code - FEAST

Strzodka, Goddeke, Behr (2009)

Opteron 2214 4 nodes CPUGTX 8800 GPU

Acceleration fraction: 75%Local speedup: 11.5xGlobal speedup: 3.8x

Page 15: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Option 2: Rewrite whole code

‣ First need to think about

- What is the total application speedup that you can get

- How does rewrite compare to accelerator approach

- Good design

- What global optimizations are possible

15

Page 16: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Option 2: Rewrite whole code - cuIBM

‣ Immersed Boundary Method on GPU (cuIBM)

- Finite difference code with immersed boundary no slip condition

- 2 linear systems: implicit diffusion and projection

- Reported speedup: 7x

16

Layton, Krishnan, Barba 2011

0

0.5

1

1.5

2

2.5

Average over 16000 timesteps

Tim

e [s

]

AXPYApply BCsConversionForce CalculationForce OutputGenerate bc1Generate r2Generate rNMMMMat!vecMem TransferOutputPreconditionerSolve 1Solve 2Transfer qTransposeUpdate BUpdate QT

Page 17: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

17

Option 2: Rewrite whole code - cuIBM

0

5

10

15

20

25

30

35

40

45

50

Tim

e [s

]

pyAMGblackbox

pyAMGSmoothed

Aggregation

CuspNon!PC

CuspDiagonal PC

CuspScaled

Bridson

CuspSmoothed

Aggregation

With good pre-conditioner, GPU is 9x faster, not much difference in other cases (best is 1.6x faster)

CPU

GPU

Page 18: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Option 2: Rewrite whole code - Open Current

‣Developed by Jonathan Cohen in NVIDIA

‣Compared a highly optimized CPU code and GPU code

- CPU: Fortran, 8-core 2.5 GHz Xeon (8 thredas with MPI and OpenMP)

- GPU: CUDA, Tesla C1060

‣ Solved the Rayleigh-Bernard with a finite difference code

18

Page 19: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Option 2: Rewrite whole code - Open Current

19

ResolutionCUDA

time/step msFortran

time/step msSpeedup

64x64x32 24 47 2.0x

128x128x64 79 327 4.1x

256x256x128 498 4070 8.2x

384x384x192 1616 13670 8.5x

Page 20: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Option 3: Rethink numerical algorithms

‣Most time consuming alternative!

‣Maybe new architectures require new numerics

‣ Find methods that map well to the hardware

- Maybe we overlooked something in the past because it was impractical

20

Page 21: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Option 3: Rethink numerical algorithms - DG

‣Discontinuous Galerkin Methods

- Arithmetically intensive

- Mainly local

‣ Klockner et al. used DG to solve conservation laws

21

GPU T1060CPU: Xeon E5472

Page 22: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Implicit heat equation solver

‣Conventional CFD usually is dominated by Poisson type solvers

- Projection methods

- Implicit solvers to avoid stability constraints

‣Heat equation with Crank-Nicolson

- No stability constraint!

22

!u

!t= "!2u

!k

h2< 0.5

Page 23: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Implicit heat equation solver

23

T = 200

T = 200

T = 0

T = 0

3.5

3.5

α = 0.645k = 1e-5N = 128

!u

!t= "!2u

un+1i,j ! un

i,j

k= !

2

!un

i,j+1+uni,j!1+un

i+1,j+uni!1,j!4un

i,j

h2

+un+1i,j+1+un+1

i,j!1+un+1i+1,j+un+1

i!1,j!4un+1i,j

h2

"

Page 24: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Implicit heat equation solver

24

aun+1i,j!1 + aun+1

i,j+1 + aun+1i!1,j + aun+1

i+1,j + bun+1i,j = RHSi,j

a = ! !k

2h2b = 1! 4a = 1 +

2!k

h2

RHSi,j = uni,j +

!k

2h2

!un

i,j!1 + uni,j+1 + un

i!1,j + uni+1,j ! 4un

i,j

"!BCn+1

Page 25: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Implicit heat equation solver

25

[A] = I +!k

2h2· [Poisson]

[A]un+1 = RHS

[A] size (N-2)2 x (N-2)2

RHS size (N-2)2

un+1 size (N-2)2