Top Banner
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead
23

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

May 11, 2018

Download

Documents

duongkhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

Accelerated ANSYS Fluent:

Algebraic Multigrid on a GPU

Robert Strzodka

NVAMG Project Lead

Page 2: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

2

A Parallel

Success Story

in Five Steps

Page 3: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

3

Step 1: Understand Application

ANSYS Fluent Computational Fluid Dynamics

Page 4: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

4

Step 2: Identify Bottleneck in

Coupled Solver of Incompressible NS

Solve Linear System of Equations: Ax = b

Assemble Linear System of Equations

Converged ? No Yes

Stop

Accelerate

this first

~ 33%

~ 67%

Runtime: Non-linear iterations

Page 5: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

5

Step 3: Parallelize Algorithm

Algebraic Multigrid (AMG)

Pre-smooth

Pre-smooth

Pre-smooth

Post-smooth

Post-smooth

Post-smooth

Restrict Res

Restrict Res

Restrict Res

Prolongate Corr

Prolongate Corr

Prolongate Corr

Page 6: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

6

Step 4: Create Library of

Production Quality Parallel Iterative Solvers

People (NVIDIA and ANSYS)

Assemble a great team

Collaborate closely

Algorithms

Innovate with parallelism

Understand numerical tradeoffs

Software

Invest in library design and testing

Optimize for GPUs

Page 7: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

7

2832

933

517 517

0

1000

2000

3000

Dual Socket CPU Dual Socket CPU + Tesla C2075

AN

SY

S F

luent

AM

G S

olv

er

Tim

e (

Sec)

2 x Xeon X5650, Only 1 Core Used

1.8x

5.5x

Lower is

Better

2 x Xeon X5650, All 12 Cores Used

Step 5: Enjoy Acceleration

ANSYS Fluent 14.5 with nvAMG Solver

Helix geometry

1.2M Hex cells

Unsteady, laminar

Coupled PBNS, DP

AMG F-cycle on CPU

AMG V-cycle on GPU

Helix Model

NOTE: • This is a performance preview • GPU support is a beta feature • All jobs solver time only

Page 8: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

8

More about

nvAMG

Page 9: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

9

nvAMG Library - Interaction

Supported matrix formats

Scalar and block CSR

Single and double precision

Infrastructure

CUDA, Thrust

NVIDIA GPUs, tuned for Tesla K20X

Integration

Dynamically linkable library

Public C interface with flexible text parameters

C++ plugin system for low-level extensions

Page 10: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

10

Library of nested solvers for large sparse Ax=b

Nesting creates a solver hierarchy, e.g.

Example solvers

Jacobi, simple local (neighbor) operations, no/little setup

BiCGStab, local and global operations, no setup

MC-DILU, graph coloring and factorization at setup

AMG, multi-level scheme, on each level: graph coarsening and matrix-

matrix products at setup

Accelerate state-of-the-art multi-level linear solvers in targeted

application domains

Primary Targets: CFD and Reservoir Simulation

Other domains will follow

Focus on difficult-to-parallelize algorithms

Parallelize both setup and solve phases

Difficult problems: parallel graph algorithms, sparse matrix

manipulation, parallel smoothers

No groups have successfully mapped production-quality algorithms to

fine-grained parallel architectures

Ensure NVIDIA architecture team understands these

applications and is influenced by them

nvAMG Library - Solvers

BiCGstab AMG Jacobi

MC-DILU

Page 11: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

11

Solvers

Page 12: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

12

Jacobi Solver – Trivial Parallelism

Defect correction with preconditioner M

In case of Jacobi

Ds may be small

blocks

themselves, e.g.

4x4

Page 13: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

13

ILU Solvers – Coloring Enables Parallelism

Incomplete LU factorization: M = L U ≈ A

Graph coloring allows parallel setup and solve

With m unknowns and p colors, m/p unknowns run in parallel

Page 14: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

14

From Geometric to Algebraic Multigrid

𝑨𝒉𝒙𝒉 = 𝒃𝒉

𝑨𝟐𝒉𝒙𝟐𝒉 = 𝒃𝟐𝒉

𝑨𝟒𝒉𝒙𝟒𝒉 = 𝒃𝟒𝒉

𝑹𝟐𝒉

𝑹𝟒𝒉

𝑷𝟐𝒉

𝑷𝟒𝒉

Page 15: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

15

From Fine to Coarse Matrix

Sparse matrix-matrix product Aggregation

Page 16: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

16

Parallel Sparse Matrix-Matrix Product

Galerkin product in AMG: 𝑨𝟐𝒉 = 𝑹𝟐𝒉𝑨𝒉𝑷𝟐𝒉

In general: A x B = C

Two parallel steps

Find the number of non-zeroes per row of C

Compute the columns indices and values per row of C

Non zero

Zero x =

A B C

Page 17: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

17

nvAMG results

for different Ax=b

Page 18: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

18

Hardware

K20X

Kepler architecture, Tesla K20X GPU Accelerator

C2090

Fermi architecture, Tesla C2090 GPU Accelerator

3930K(6)

Sandy Bridge architecture, Core i7-3930K, 6 cores

Page 19: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

19

AMG Timings on Regular Discretizations CPU Fluent solver: AMG(F-cycle, agg8, DILU, 0pre, 3post)

GPU nvAMG solver: AMG(V-cycle, agg8, MC-DILU, 0pre, 3post)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Helix (hex 208K) Helix (tet 1173K)

K20X

C2090

3930K(6)

Lower is

Better

Page 20: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

20

AMG Timings on Irregular Discretizations CPU Fluent solver: AMG(F-cycle, agg8, DILU, 0pre, 3post)

GPU nvAMG solver: AMG(V-cycle, agg2, MC-DILU, 0pre, 3post)

0

1

2

3

4

5

6

7

8

9

Airfoil (hex 784K) Aircraft (hex 1798K)

K20X

C2090

3930K(6)

Lower is

Better

Page 21: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

21

ANSYS and NVIDIA Collaboration Roadmap

Release

ANSYS Mechanical ANSYS Fluent

13.0 Dec 2010

SMP, Single GPU, Sparse and

PCG/JCG Solvers

14.0 Dec 2011

+ Distributed ANSYS;

+ Multi-node Support

Radiation Heat Transfer

(beta)

14.5 Oct 2012

+ Multi-GPU Support;

+ Hybrid PCG;

+ Kepler GPU Support

+ Radiation HT;

+ GPU AMG Solver (beta),

Single GPU

15.0 Mid-2013

CUDA 5 + Kepler Tuning Multi-GPU AMG Solver

Page 22: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

22

A Parallel Success Story in Five Steps

Step 1: Understand Application

Step 2: Identify Bottlenecks

Step 3: Parallelize Algorithms

Step 4: Create Library

People (Team + Collaboration)

Algorithms (Innovation + Mathematics)

Software (Design + Optimization)

Step 5: Enjoy Acceleration

Big praise for

nvAMG and

ANSYS team

Welcome to ANSYS Fluent with nvAMG Starting with single GPU support as a beta feature in 14.5

Page 23: Accelerated ANSYS Fluent: Algebraic Multigrid on a GPUon-demand.gputechconf.com/...Strzodka-Accelerated-ANSYS-Fluent.pdf · Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU

Accelerated ANSYS Fluent:

Algebraic Multigrid on a GPU

Robert Strzodka

NVAMG Project Lead

Questio

ns?