Top Banner
Introduction Devito Example - Seismic Imaging Devito Automated fast finite dierence computation Navjot Kukreja, Mathias Louboutin, Felippe Vieira, Fabio Luporini, Michael Lange, Gerard Gorman WolfHPC 2016 November 13, 2016 1 / 22
22

Devito - Automated fast finite difference computation

Feb 23, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

DevitoAutomated fast finite di↵erence computation

Navjot Kukreja, Mathias Louboutin, Felippe Vieira, FabioLuporini, Michael Lange, Gerard Gorman

WolfHPC 2016

November 13, 2016

1 / 22

Page 2: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Finite Di↵erence Methods

f

0(a) ⇡ f (a+ h)� f (a)

h

(1)

Taylor series expansion

Calculate derivatives of anyorder with relative simplicity

Mathematically simplemethod for solving PDEs

Figure: Discretizing a function on agrid (Wikipedia)

2 / 22

Page 3: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Why FD?

The acoustic wave equation

@2u

@x2� 1

c

2

@2u

@t2= 0 (2)

Discretized:

u

n+1i = �u

n�1i + 2uni + C

2(uni+1 � 2uni + u

ni�1) (3)

Figure: Mesh in space and time for a 1D wave equation3 / 22

Page 4: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Why the wave equation?

Figure: O↵shore seismic survey

Seismic imagingRTM

Inputs: Velocity Model, Seismic DataOutput: Seismic image

FWIInputs: Initial Velocity Model, Seismic DataOutput: Improved Velocity Model

4 / 22

Page 5: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Pure Python Implementation

5 / 22

Page 6: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Why does it need to be fast?

Large number of operations: ⇡ 5000 FLOPs per loop iterationof a 16th order TTI Kernel

Realistic problems have large grids: 1580 x 1580 x 1130 ⇡2.82 Billion points (SEAM Benchmark)

2.82⇥ 109 ⇥ 5000⇥ 3000(t)⇥2 (forward-reverse) ⇡ 8.5⇥ 1016

per iteration of FWI

Typically ⇡ 30000 FWI iterations ( ⇡ 2.5⇥ 1021 = 2.5⇥ 109

TFLOPs )

⇡ 135 wall-clock hours on the TACC Stampede (ideally)

6 / 22

Page 7: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Why automated

Computer Science Geophysics

Fast code is complexLoop blockingOpenMP clausesVectorisation - intrinsicsMemory - alignment,NUMAFactorizationFMA

Fast code is platformdependent

IntrinsicsCUDAOpenCL

Fast code is error prone

Change of discretizations

Change of physicsAnisotropy - VTI/TTIElastic equation

Boundary conditions

Continuous acquisition

7 / 22

Page 8: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

SymPy - Symbolic computation in Python

Symbolic computer algebra system written in pure Python

FeaturesComplex symbolic expressions as Python object treesSymbolic manipulation routines and interfacesConvert symbolic expressions to numeric functions

Python or NumPy functions

C or Fortran kernels

For specialised domains generating C code is not enough!

8 / 22

Page 9: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Devito - a prototype Finite Di↵erence DSL

Devito - A Finite Di↵erence DSL for seismic imaging

Aimed at creating fast high-order inversion kernels

Development is driven by ”real world” problems

Based on SymPy expressions

The acoustic wave equation:

m

@2u

@t2+ ⌘

@u

@t�ru = 0 (4)

can be written as

eqn = m * u.dt2 + eta * u.dt - u.laplace

Devito auto-generates optimised C kernel code

OpenMP threading and vectorisation pragmas

Cache blocking and auto-tuning

Symbolic stencil optimisation (eg. CSE, hoisting)9 / 22

Page 10: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Devito

Devito Data ObjectsTimeData(’u’, shape=())DenseData(’m’, shape=())

Stencil Equationeqn = m ⇤ u.dt2 � u.laplace

Devito OperatorOperator(eqn)

Devito Propagator

Devito CompilerGCC — Clang — Intel — Intel® Xeon Phi™

Act as symbols in expression+

numpy arrays

Expands to symbolic kernel (finite-di↵erence)

Transforms stencil in indexed format

Autogenerates C code

Compiles and loads platformspecific executable function

User Input

Figure: An overview of Devito’s architecture10 / 22

Page 11: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Devito

Real-world applications need more than PDE solvers

File I/O and support for large datasets

Non PDE kernel code e.g. sparse point interpolation

Ability to easily interface with complex outer code

Devito follows the principle of graceful degradation

Circumvent restrictions to the high-level API by customisationDevito translates high-level PDE-based stencils into ”matrixindex” format in steps# High-level expression equivalent to f.dx2

(-2*f(x, y) + f(x - h, y) + f(x + h, y)) / h**2

# Low-level expression with explicit indexing

(-2*f[x, y] + f[x - 1, y] + f[x + 1, y]) / h**2

Allows custom functionality in auto-generated kernels

11 / 22

Page 12: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Seismic Imaging

Full waveform inversionAcoustic and TTI wave equations of varying spatial orderNumerically verified against Industrial standard software onstandard datasetsAchieved performance is also comparable to industrialstandard software

12 / 22

Page 13: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

FWI

13 / 22

Page 14: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Example code for forward propagation

def forward(model , nt , dt , h, order =2):

shape = model.shape

m = DenseData(name="m", shape=shape ,

space_order=order)

m.data [:] = model

u = TimeData(name="u", shape=shape ,

time_dim=nt, time_order =2,

space_order=order , save=True)

eta = DenseData(name="eta", shape=shape ,

space_order=order)

# Derive stencil from symbolic equation

eqn = m * u.dt2 - u.laplace + eta * u.dt

stencil = solve(eqn , u.forward )[0]

op = Operator(stencils=Eq(u.forward , stencil),

nt=nt, subs={s: dt, h: h}, shape=shape ,

forward=True)

# Source injection code omitted for brevity

op.apply ()

14 / 22

Page 15: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

// #i n c l u d e d i r e c t i v e s omi t ted f o r b r e v i t ye x t e r n ”C” i n t ForwardOperator ( doub l e ⇤u vec , doub l e ⇤damp vec , doub l e ⇤m vec , doub l e ⇤ s r c v e c , f l o a t ⇤ s r c c o o r d s v e c , doub l e ⇤ r e c v e c , f l o a t ⇤ r e c c o o r d s v e c , l ong i 1b l o c k , l ong i 2 b l o c k ){

doub l e (⇤u ) [ 1 3 0 ] [ 1 3 0 ] [ 1 3 0 ] = ( doub l e (⇤ ) [ 1 3 0 ] [ 1 3 0 ] [ 1 3 0 ] ) u vec ;doub l e (⇤damp ) [ 1 3 0 ] [ 1 3 0 ] = ( doub l e (⇤ ) [ 1 3 0 ] [ 1 3 0 ] ) damp vec ;doub l e (⇤m) [ 1 3 0 ] [ 1 3 0 ] = ( doub l e (⇤ ) [ 1 3 0 ] [ 1 3 0 ] ) m vec ;doub l e (⇤ s r c ) [ 1 ] = ( doub l e (⇤ ) [ 1 ] ) s r c v e c ;f l o a t (⇤ s r c c o o r d s ) [ 3 ] = ( f l o a t (⇤ ) [ 3 ] ) s r c c o o r d s v e c ;doub l e (⇤ r e c ) [ 1 0 1 ] = ( doub l e (⇤ ) [ 1 0 1 ] ) r e c v e c ;f l o a t (⇤ r e c c o o r d s ) [ 3 ] = ( f l o a t (⇤ ) [ 3 ] ) r e c c o o r d s v e c ;{

#pragma omp p a r a l l e lf o r ( i n t i 4 = 0 ; i4 <149; i 4+=1){

{#pragma omp f o r s c h edu l e ( s t a t i c )f o r ( i n t i 1b = 1 ; i1b<129 � (128 % i 1 b l o c k ) ; i 1b+=i 1 b l o c k )

f o r ( i n t i 2b = 1 ; i2b<129 � (128 % i 2 b l o c k ) ; i 2b+=i 2 b l o c k )f o r ( i n t i 1 = i1b ; i1<i 1 b+i 1 b l o c k ; i 1++)

f o r ( i n t i 2 = i2b ; i2<i 2 b+i 2 b l o c k ; i 2++){

#pragma omp simd a l i g n e d (damp , m, u : 6 4 )f o r ( i n t i 3 = 1 ; i3 <129; i 3++){

doub l e temp1 = damp [ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp2 = m[ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp4 = u [ i 4 � 1 ] [ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp5 = u [ i 4 � 2 ] [ i 1 ] [ i 2 ] [ i 3 ] ;u [ i 4 ] [ i 1 ] [ i 2 ] [ i 3 ] = . . .

}}

f o r ( i n t i 1 = 129 � (128 % i 1 b l o c k ) ; i1 <129; i 1++)f o r ( i n t i 2 = 1 ; i2<129 � (128 % i 2 b l o c k ) ; i 2++){

#pragma omp simd a l i g n e d (damp , m, u : 6 4 )f o r ( i n t i 3 = 1 ; i3 <129; i 3++){

doub l e temp1 = damp [ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp2 = m[ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp4 = u [ i 4 � 1 ] [ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp5 = u [ i 4 � 2 ] [ i 1 ] [ i 2 ] [ i 3 ] ;u [ i 4 ] [ i 1 ] [ i 2 ] [ i 3 ] = . . .

}}

f o r ( i n t i 1 = 1 ; i1 <129; i 1++)f o r ( i n t i 2 = 129 � (128 % i 2 b l o c k ) ; i2 <129; i 2++){

#pragma omp simd a l i g n e d (damp , m, u : 6 4 )f o r ( i n t i 3 = 1 ; i3 <129; i 3++){

doub l e temp1 = damp [ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp2 = m[ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp4 = u [ i 4 � 1 ] [ i 1 ] [ i 2 ] [ i 3 ] ;doub l e temp5 = u [ i 4 � 2 ] [ i 1 ] [ i 2 ] [ i 3 ] ;u [ i 4 ] [ i 1 ] [ i 2 ] [ i 3 ] = . . .

}}// Source and Re c e i v e r code omi t ted f o r b r e v i t y

}}

}r e t u r n 0 ;

}

15 / 22

Page 16: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

def adjoint(model , nt , dt , h, spc_order =2):

m = DenseData("m", model.shape)

m.data [:] = model

v = TimeData(name=’v’, shape=model.shape , time_dim=nt ,

time_order =2, space_order=spc_order ,

save=False)

damp = DenseData("damp", model.shape)

# Derive stencil from symbolic equation

eqn = m * v.dt2 - v.laplace - damp * v.dt

stencil = solve(eqn , v.backward )[0]

# Add spacing substitutions

subs = {s: dt, h: h}

op = Operator(stencils=Eq(u.backward , stencil), nt=nt,

shape=model.shape , subs=subs , forward=False)

op.apply()

16 / 22

Page 17: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Performance

Performance of acoustic forward operator

Intel Xeon E5-2690v2 10C 3GHz and a Intel R� XeonPhiTMKnightscorner

Model size 201 x 201 x 70 + 40 ABC

Grid size 15m 10Hz Ricker wavelet source

1 2 4 8

16 32 64

128 256 512

1024

0.0625 0.125 0.25 0.5 1 2 4 8 16 32

Sing

le P

recis

ion

Perfo

rman

ce (G

Flop

s/s)

Arithmetic Intensity (Flops/Byte)

451 Gflops/sMax Achievable2nd Order4th Order6th Order8th Order

10th Order12th Order14th Order16th Order

1 2 4 8

16 32 64

128 256 512

1024 2048 4096

0.0625 0.125 0.25 0.5 1 2 4 8 16 32

Sing

le P

recis

ion

Perfo

rman

ce (G

Flop

s/s)

Arithmetic Intensity (Flops/Byte)

1534 GflopsMax Achievable2nd Order4th Order6th Order8th Order

10th Order12th Order14th Order16th Order

17 / 22

Page 18: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Performance

2D Di↵usion equation on a single core

18 / 22

Page 19: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Performance Optimizations

Automated code optimizations:

OpenMP and vectorization pragmas

Loop blocking and auto-tuning for block size

Automated roofline plotting for performance analysis

Symbolic Optimizations

Common Subexpression eliminationReduces compilation time from hours to seconds for largestencilsEnables further factorization techniques to reduce flops

Potential future optimizations

Polyhedral compilation (time blocking)

Automated data layout optimizations

19 / 22

Page 20: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Verification

Adjoint Test and Gradient Test

20 / 22

Page 21: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Conclusions

Devito: A finite di↵erence DSL for seismic imagingSymbolic problem description (PDEs) via SymPyLow-level API for kernel customisationAutomated performance optimisation

Devito is driven by real-world scientific problemsNot yet another stencil compilerBridge the gap between stencil compilers and real worldapplications

Future work:Extend feature range to facilitate more scienceMPI parallelism for larger modelsIntegrate stencil or polyhedral compiler backendsAdditional symbolic optimisation (factorisation, hoisting, etc.)Integrate automated verification tools to catch compiler bugs

21 / 22

Page 22: Devito - Automated fast finite difference computation

IntroductionDevito

Example - Seismic Imaging

Thank youPublications

N. Kukreja, M. Louboutin, F. Vieira, F. Luporini, M. Lange, and G. Gorman. Devito: automated fast finitedi↵erence computation. Accepted for WOLFHPC16, to appear in ACM SIGHPC, 2016

M. Lange, N. Kukreja, M. Louboutin, F. Luporini, F. Vieira, V. Pandolfo, P. Velesko, P. Kazakas, and G.Gorman. Devito: Towards a generic Finite Di↵erence DSL using Symbolic Python. Accepted forPyHPC2016, to appear in ACM SIGHPC, 2016

M. Louboutin, M. Lange, N. Kukreja, F. Herrmann, and G. Gorman. Performance prediction offinite-di↵erence solvers for di↵erent computer architectures. Submitted to Computers and Geosciences,2016

22 / 22