Top Banner
Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang, Ken Esler, Stone Ridge Technology Inc.
21

Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Feb 27, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Introducing GPUs to a Commercial Reservoir Simulator

Dominic Walsh, Paul Woodhams, Schlumberger

Yongpeng Zhang, Ken Esler, Stone Ridge Technology Inc.

Page 2: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Reservoir Simulation • Purpose: Estimate reserves, prediction of

optimal recovery and production strategy

• Input: rock and fluid well data, production history

• Model size: 104 cells (laptops) - 109 cells (Linux clusters)

• Uncertainty: multiple realizations

• Embedded: NetworkPlantEconomics

Very computationally demanding

0

500

1000

1500

2000

2011 2012 2013 2014

Client Model Size (Millions Cells)

Page 3: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

History and Problem Size • GPU have been very successful in the Seismic domain

• Seismic clusters are acquiring GPU & Infiniband simulation ready

• Clients are being constrained by power envelopes

• New GPU Simulator? • ECLIPSE (circa 1984) is the industry standard • INTERSECT (circa 2010) is the “high fidelity” simulator • Testing and validation is measured in Man Decades • User base migration is expected to take 5-10 yr. timeframe

Can we take advantage of new GPU hardware while preserving this investment?

Page 4: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Structure

• Reservoir: • Deposition: Semi-structured grids • Finite Volume: Low order stencils • Static structure • Time Stepping: Implicit and

adaptive • Up to Billions of cells

• Wells

• Pipe flow • Introduce local structure • Up to 105 wells

Reservoir Grid and Wells

Page 5: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Irregularity and Nonlinearity

• Many small tightly-coupled sub-problems

• Time varying structure

Terminal (THP)

Internal

constraint

Inflow model

Flow node

Phase Envelope

• Complicated Fluid and Phase Modelling • Per-cell nonlinear systems

• Possibly non-reversible rock models

Well Structure

Page 6: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Phase I: Thermal Linear Solver

Code volume

Small problem size

Fully Implicit

Windows workstation

Amdahl

• Thermal • Single Box • Linear Solver

0.21 0.44 4.02 8.22

71.55

10.19

FM% Reporting% Properties% Matrix% Linear Solver% Other%

THERMAL 16X Parallel Runtime %

0.00%

50.00%

100.00%

Lines of Code

Solver

Engine

Page 7: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Test Model: THERM

Small: 1 M Cells & 9 Well Pairs

Thermal: CH4 + Bitumen

2.5 yrs. steam injection

very strong transitions

Numerically very demanding

Property Distribution Solution Distribution

Page 9: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Preliminary indicators

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

IX 16X MKL 16X HYBRID K20X

Good ~5X Challenge† ~0.9X-0.75X

0%

20%

40%

60%

80%

100%

120%

140%

Nu

me

r o

f Li

ne

ar It

era

tio

ns

Sparse Matrix Multiply Algorithmic Weakening

CPU GPU AMG

GPU ILU

† Fine-Grained Parallel Preconditioners for Fast GPU-based Solvers, Dimitar Lukarski GTC 2012 High Performance Algebraic Multigrid for Commercial Applications, Jonathan Cohen GTC 2013

Page 10: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Offload: MPI & Multiplexing • INTERSECT:

• MPI process per domain

• Device shared memory? • only Linux • not windows

• Use threads to drive multiple cards

• C++ NOT OpenMP • CUDA 7

• Transfer:

• Stage on Host side • Pinned

Proc 0

Proc 7

……

..

SHMEM

GPU 0

GPU 1

Driver

16 IX Processes

Sleeping Threads Sleeping Processes

One Thread/GPU

Sleeping Threads

16 IX Processes

Standard Parallel IX

Multithreaded GPU

Solver

Standard Parallel IX

Time

Page 11: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Transfer Cost

• Transfer cost is a significant fraction of complete CPU solve

• Naïve implementation not sufficient

Props Jacobian

Solve

CPU:

GPU:

Props

Jacobian

Time

Matrix PCI bus

Solution (small)

Setup

Setup

Page 12: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Overlapping & CPR • CPR is a composite preconditioner!

– Pressure is 1/16th – AMG: small but costly – Second stage is relatively cheap

• Use streams

– per matrix – per thread/GPU

• Lambda’s in CUDA 7

• Use mixed precision

Time

Props Jacobian

Solve

CPU:

GPU:

Props

Jacobian

Pressure is 1/16th size

Solution (small)

Setup

Setup

Page 13: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

THERM Results

Good Solver speedup Still carrying a lot of non-solver time Marginal benefit

0

2000

4000

6000

8000

10000

12000

16X CPU "+K40" "+2xK40" "+4xK20X"

Linear Solver Solve s Linear Solver Setup s Not Linear Solver

0

0.5

1

1.5

2

2.5

3

3.5

16X CPU "+K40" "+2xK40" "+4xK20X"

Linear Solver

Sp

eed

up

Elapsed Time

Seco

nd

s

Page 14: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Larger Model: THERM_L (4M)

Better solver speedup More work on cards

0

2

4

6

16X CPU "+4xM2090" "+4xK40"

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

16X CPU "+4xM2090" "+4xK40"

Not Linear Solver Linear Solver Setup s Linear Solver Solve s

Bigger impact

Linear Solver

Spee

du

p

Seco

nd

s

Elapsed Time

Page 15: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

0

5

10

15

20

25

30

16 32 48 16 +4M2090 16+4K40

THERM_L CPU Elapsed (hr.)

Implications: GPU vs CPU

Currently need 48 nodes to match GPU performance Increased CPU’s does not speedup

Amdahl

Page 16: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

THERM_L Strong Scaling

0.0

5.0

10.0

15.0

20.0

25.0

30.0

16X CPU 4xM2090 4xK40 16 M2090 4 K40 16 K40 2 K80 4 K80 8 K80 16 K80

Lin

ear

So

lve

r Sp

ee

du

p

CUDASolver V1 Single Node

CUDAsolver V2 MPI Cluster Ready

Single-Node

Multi-Node

26X Linear Solver speedup

Page 17: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

THERM_XL(16M): Strong Scaling

0

5000

10000

15000

20000

25000

60MPI 120MPI_II 240MPI_II 320MPI 480MPI 54MPI+18K40 108MPI+36K40 180+60K40 216MPI+72K40

CPU vs. GPU Linear Solver Time

Page 18: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Compute Density

0

2

4

6

8

0 2 4 6 8 10 12 14 16 18 20

Lin

ear

So

lve

r ti

me

(h

rs)

Number of Nodes

CPU Best Time

GPU Worst Time

GPU Best Time

2.2X 5.4X

4.7X More Nodes

Page 19: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Compute Density

0

2

4

6

8

0 2 4 6 8 10 12 14 16 18 20

Lin

ear

So

lve

r ti

me

(h

rs)

Number of Nodes

CPU Best Time

GPU Worst Time

GPU Best Time

2.2X 5.4X

4.7X More Nodes

Page 20: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Next Steps • Commercialize current solution • Lessons learnt CPU Solver • Cluster hardware implications?

• Linear Solver is not enough extend GPU

• Wells too small & too complicated, remain on CPU • Reservoir

• Jacobian construction • Property calculation

• Requirements:

• Single code base: OpenACC?, Custom? • Overlapping rework

Page 21: Introducing GPUs to a Commercial Reservoir Simulator · 2015. 3. 19. · Introducing GPUs to a Commercial Reservoir Simulator Dominic Walsh, Paul Woodhams, Schlumberger Yongpeng Zhang,

Thanks and Acknowledgements

Many thanks to Jonathan Cohen, Julian Demouth, Patrice Castonguay, Justin Luitjens, Ken Hester & Doug Holt

Questions?