Code Generation for High-Level Problem Speci cation and HPC€¦ · Code Generation for High-Level Problem Speci cation and HPC ... fv dx in Python ... Andreas Klockner has many HPC

Code Generation forHigh-Level Problem Specification and HPC

Hans Petter Langtangen

Center for Biomedical ComputingSimula Research Laboratory

Dept. of Informatics, University of Oslo

Advances in Numerical Algorithms and HPC, April 14, 2014

FEniCS Mint

1 FEniCS: HPC Finite Element Solution of PDEs

2 Mint: HPC Finite Difference Solution of PDEs

3 Algorthms and HPC

FEniCS solves PDEs by the finite element method

Input: finite element formulation of the PDE problem∫Ω∇u · ∇v dx +

∫Ω fv dx in Python

Output: C++ code loaded back in Python

Python module with C++ definition of element matrix/vector











Python + large-scale simulation + HPC = true

Some HPC projects in Python:

General FE PDE solution: FEniCS, www.fenics.org

General FV PDE solution: FiPy, www.ctcms.nist.gov/fipy

Hyperbolic PDEs w/finite volumes: PyClaw,kingkong.amath.washington.edu/clawpack/users/pyclaw/

Discontinuous Galerkin FE PDE solver: hedge

Andreas Klockner has many HPC tools for Python

Quantum mechanics: GPAW, wiki.fysik.dtu.dk/gpaw

N-body dynamics: pNbody, obswww.unige.ch/˜revaz/pNbody

http://www.fenics.org

http://www.ctcms.nist.gov/fipy

http://kingkong.amath.washington.edu/clawpack/users/pyclaw/index.html

http://mathema.tician.de/software/hedge

http://andreask.cs.illinois.edu/software/

https://wiki.fysik.dtu.dk/gpaw/

http://obswww.unige.ch/~revaz/pNbody/

FEniCS tries to combine four contradictory goals

Simplicity∫Ω a∇u · ∇v dx → dot(a*grad(u), grad(v))*dx

Generality

Linear a(u, v) = L(v) or nonlinear F (u; v) = 0 variational problem

Efficiency

Generated C++ code tailored to the problem + efficientthird-party libraries (PETSc, Trilinos, ...)

Reliability

Given a goal M(u) and tolerance ε, compute u such that

||M(ue)−M(u)|| ≤ ε (ue: exact sol.)

Generality Efficiency

Code Generation



Generality


Efficiency


Reliability




Code Generation



Generality


Efficiency


Reliability




Code Generation



Generality


Efficiency


Reliability




Code Generation



Generality


Efficiency


Reliability




Code Generation

”Hello, world!” for PDEs: −∇ · (k∇u) = f

−∇ · (k∇u) = f in Ω

u = g on ∂ΩD

−k ∂u∂n

= α(u − u0) on ∂ΩR

Variational problem: find u ∈ V such that

F =

∫Ωk∇u · ∇vdx −

∫Ωfvdx +

∫∂ΩR

α(u − u0)vds = 0 ∀ v ∈ V

Implementation:

F = dot(k*grad(u), grad(v))*dx - f*v*dx + alpha*(u-u0)*v*ds

http://en.wikibooks.org/wiki/Computer_Programming/Hello_world


−∇ · (k∇u) = f in Ω

u = g on ∂ΩD

−k ∂u∂n

= α(u − u0) on ∂ΩR


F =


∫Ωfvdx +

∫∂ΩR

α(u − u0)vds = 0 ∀ v ∈ V

Implementation:




−∇ · (k∇u) = f in Ω

u = g on ∂ΩD

−k ∂u∂n

= α(u − u0) on ∂ΩR


F =


∫Ωfvdx +

∫∂ΩR

α(u − u0)vds = 0 ∀ v ∈ V

Implementation:




−∇ · (k∇u) = f in Ω

u = g on ∂ΩD

−k ∂u∂n

= α(u − u0) on ∂ΩR


F =


∫Ωfvdx +

∫∂ΩR

α(u − u0)vds = 0 ∀ v ∈ V

Implementation:



The complete ”Hello, world!” program

from dolfin import *

mesh = Mesh(’mydomain.xml.gz’)

V = FunctionSpace(mesh , ’Lagrange ’, degree=1)

dOmega_D = MeshFunction(’uint’, mesh , ’myboundary.xml.gz’)

g = Constant(0.0)

bc = DirichletBC(V, g, 1, dOmega_D)

u = TrialFunction(V)

v = TestFunction(V)

f = Constant(2.0)

k = Expression(’A*x[1]*sin(pi*q*x[0])’, A=4.5, q=1)

alpha = 10; u0 = 2


a = lhs(F); L = rhs(F)

u = Function(V) # finite element function to compute

solve(a == L, u, bc)

plot(u)

http://www.gnu.org/fun/jokes/helloworld.html

Example of an autogenerated element matrix routine

Mixed formulation of −∇ · (k∇u) = f

PDE problem:

∇ · q = f in Ω

−k−1q = ∇u in Ω

Variational problem: find (u, q) ∈ V × Q such that

F =

∫Ω∇ · q v dx −

∫Ωfv dx +

∫Ωk−1q · p dx +

∫Ω∇ · p u dx

∀ (v , p) ∈ V × Q

Principal implementation line:

F = div(q)*v*dx - f*v*dx + (1./k)*dot(q,p)*dx + div(p)*u*dx

The program

mesh = UnitCube(N, N, N)

Q = FunctionSpace(mesh , "BDM", 1)

V = FunctionSpace(mesh , "DG", 0)

W = Q * V

q, u = TrialFunctions(W)

p, v = TestFunctions(W)

f = Expression(’x[0] > L/2 ? a : 0’, L=1, a=2)

F = div(q)*v*dx - f*v*dx + (1./k)*dot(q,p)*dx + div(p)*u*dx


A = assemble(a)

b = assemble(L)

qu = Function(W) # compound (q,u) field to be solved for

solve(A, uq.vector (), b, ’gmres ’, ’ilu’)

q, u = qu.split ()

Discontinuous Galerkin method for −∇ · (k∇u) = f

Variational problem:

F =

∫Ω∇u · ∇v dx −

∫Γ〈∇u〉 · [vn] dS

−∫

Γ[un] · 〈∇v〉 dS +

α

h

∫Γ

[un] · [vn] dS −∫

Ωfv dx = 0

Implementation:

F = dot(grad(u), grad(v))*dx \

- dot(jump(u, n), avg(grad(v)))*dS \

- dot(avg(grad(u)), jump(v, n))*dS \

+ alpha/h*dot(jump(u, n), jump(v, n))*dS \

- f*v*dx


Fluid flow ”Hello, world!”: Stokes’ problem

Stokes’ problem for slow viscous flow:

−∇2u +∇p = f

∇ · u = 0

Variational problem: find (u, p) ∈ V × Q such that

F =

∫Ω

(∇v : ∇u −∇ · v p + v · f ) dx+∫Ωq∇ · u dx = 0 ∀ (v , q) ∈ V × Q

Fluid flow ”Hello, world!” code

V = VectorFunctionSpace(mesh , ’Lagrange ’, 2)

Q = FunctionSpace(mesh , ’Lagrange ’, 1)

W = V * Q # Taylor -Hood mixed finite element

v, q = TestFunctions(W)

u, p = TrialFunctions(W)

f = Constant ((0, 0))

F = (inner(grad(v), grad(u)) - div(v)*p + q*div(u))*dx + dot(v, f)*dx


up = Function(W)

solve(a == L, up, bc) # solve variational problem

# or

A = assemble(a); b = assemble(L)

solve(A, up.vector (), b) # solve linear system

u, p = up.split ()

Again, code ≈ math

Key mathematical formula:

F =

∫Ω

(∇v : ∇u −∇ · v p + v · f ) dx +

∫Ωq∇ · u dx

Key code line:

F = (inner(grad(v), grad(u)) - div(v)*p + dot(f,v)*dx + q*div(u))*dx

FEniCS supports a rich set of finite elements

Lagrangeq (Pq), DGq, BDMq, BDFMq, RTq, Nedelec

1st/2nd kind, Crouzeix–Raviart, Arnold-Winther, PqΛk ,P−q Λk , Morley, Hermite, Argyris, Bell, ...

Hyperelasticity (Fung model for biological tissues)

Mathematical problem:

F = I +∇uuu uuu : unknown displacement

C = FT : F

E = (C − I )/2

ψ =λ

2(trE )2 + K exp((EA) : E ) material law

P =∂ψ

∂Estress tensor

F =

∫ΩP : ∇vvv dx nonlinear variational form

J =∂F

∂uJacobian (”tanget stiffness”)

Hyperelasticity implementation

V = VectorFunctionSpace(mesh , ’Lagrange ’, order)

v = TestFunction(V)

u = TrialFunction(V)

u_ = Function(V) # computed solution

I = Identity(u_.cell().d)

F = I + grad(u_)

J = det(F)

C = F.T * F

E = (C-I)/2

# Material law

lambda_ = Constant(1.0)

A = Expression ([[’1.0 + x[0]’, ’0.3’], [’0.3’, ’2.3’]])

K = Constant(1.0)

psi = lambda_/2 * tr(E)**2 + K*exp(inner(A*E,E))

P = F*diff(psi , E) # symbolic differentiation

F = inner(P, grad(v))*dx

J = derivative(F, u_, u) # symbolic differentiation

A = assemble(J)

The generated code is order of magnitude faster thanstandard hand-written code

Mass matrix Poisson Navier-Stokes Elasticity10-8

10-7

10-6

10-5

10-4

Tim

e / s

Computing the element stiffness matrix with linear Lagrange elementsHand-writtenGenerated

The generated code is order of magnitude faster thanstandard hand-written code

Speed-up vs standard quadrature

Form q = 1 q = 2 q = 3 q = 4 q = 5 q = 6 q = 7

Mass 2D 12 31 50 78 108 147 183Mass 3D 21 81 189 355 616 881 1442Poisson 2D 8 29 56 86 129 144 189Poisson 3D 9 56 143 259 427 341 285Navier–Stokes 2D 32 33 53 37 — — —Navier–Stokes 3D 77 100 61 42 — — —Elasticity 2D 10 43 67 97 — — —Elasticity 3D 14 87 103 134 — — —

Parallel computing

Distributed computing via MPI:

Terminal> mpirun -n 32 python myprog.py

Shared memory via OpenMP:

# In program

parameters[’num_threads ’] = Q

Some applications of FEniCS

Fluid flow Hyperelasticity Fluid-structure

Mantle flow Electrophysiology Block prec.



3 Algorthms and HPC

Automated error control

Input

a(u, v) = L(v) orF (u; v) = 0

Goal M(u)

ε > 0

Output

u such that

‖M(ue)−M(u)‖ ≤ ε

(ue: exact solution)

FEniCS automatatically generates a posteriori errorestimators and refinement indicators

Solve

Dual Estimate

uh

Indicate

Refine

ηTT∈Th

ηh < ε

Error control: just define the goal and the tolerance

# Define variational form as usual

F = ....

# Define goal functional

M = dot(mu*(grad(u) + grad(u).T), n)*ds(FLAP)

tol = 1E-3

solve(F == 0, u, bc, M=M, tol=tol)

Example: compute shear stress in a bone implant

Polymer-fluid mixture

Nonlinear hyperelasticity

Complicated constitutive law

Novel mixeddisplacement-stressdiscretization viaArnold-Winther element

Adaptivity pays off – but would be really difficult toimplement by hand in this case



3 Algorthms and HPC

FEniCS really pays off in computational turbulence

Scope:

The jungle of Reynolds-AveragedNavier-Stokes (RANS) models: k-ε,k-ω, v2-f , various tensor models, ...’

Should be easy to implement andcompare...

Various models

Various linearizations

Coupled vs. segregated solution

Picard vs. Newton iteration

Example: k-ε model (unknowns: uuu, p, k , ε)

∂uuu

∂t+ uuu · ∇uuu = −1

%∇p + ν∇2uuu + fff −∇ · uuu′uuu′

∇ · uuu = 0

∇ · uuu′uuu′ = −2k2

ε

1

2(∇uuu +∇uuuT ) +

2

3kI

∂k

∂t+ uuu · ∇k = ∇ · (νk∇k) + Pk − ε− D,

∂ε

∂t+ uuu · ∇ε = ∇ · (νε∇ε) + (Cε1Pk − Cε2f2ε)

ε

k+ E

ε = 2νsss : sss, sss =1

2(∇uuu′ + (∇uuu′)T )

νk = ν +νTσk

...

Multi-physics problems with large systems of PDEs

Solve a system of PDEs, e.g.,

L(u1, u2, ..., u6) = 0

in some grouping into subsystems, e.g.,

L1(u1, u2) = 0

L2(u3) = 0

L3(u4, u5, u6) = 0

Segregated solve (iteration) between subsystems

One scalar/vector PDE solver is compact in FEniCS

Large PDE systems require tedious, repetitive code

Let’s automate!



L(u1, u2, ..., u6) = 0


L1(u1, u2) = 0

L2(u3) = 0

L3(u4, u5, u6) = 0




Let’s automate!



L(u1, u2, ..., u6) = 0


L1(u1, u2) = 0

L2(u3) = 0

L3(u4, u5, u6) = 0




Let’s automate!

Key issue: how to linearize nonlinear PDEs

PDE for turbulent kinetic energy k (unknowns: u, k, ε)

0 = −u · ∇k +∇ · (νk(u, k, ε)∇k) + Pk(u, k, ε)− ε

Typical linearization (underscore subscript: old value)

0 = −u− · ∇k +∇ · (νk−∇k) + Pk− − ε

Linearization, i.e., implicit vs explicit treatment is a matterof inserting or removing an underscore

Variational form (k and vk are trial and test functions)

Fk = −∫

Ωu− ·∇k vk dx −

∫Ωνk−∇k ·∇vk dx +

∫Ω

(Pk−− ε) vk dx

Corresponding code

F_k = - inner(dot(u_ , grad(k)), v_k)*dx \

- nu_k_*inner(grad(k), grad(v_k))*dx \

+ (P_k_ - e)*v_k*dx

Example of various explicit/implicit treatments of a term

Implicit treatment of ε in coupled k-ε system:

Fk = ...+∫

Ω εvk dx → e*v k*dx

Explicit treatment of ε for decoupled k-ε system:

Fk = ...+∫

Ω ε−vk dx → e *v k*dx

Explicit treatment of ε, but implicit term in k eq.:

Fk = ...+∫

Ω ε−kk−

vk dx → e *k/k *v k*dx

Weighted combination in coupled k-ε system:

Fk = ...+∫

Ω((1− w)ε−k + wεk−) 1k−

vk dx →(1/k )*((1-w)*e *k + w*e*k )*v k*dx



Fk = ...+∫



Fk = ...+∫



Fk = ...+∫

Ω ε−kk−



Fk = ...+∫

Ω((1− w)ε−k + wεk−) 1k−




Fk = ...+∫



Fk = ...+∫



Fk = ...+∫

Ω ε−kk−



Fk = ...+∫

Ω((1− w)ε−k + wεk−) 1k−




Fk = ...+∫



Fk = ...+∫



Fk = ...+∫

Ω ε−kk−



Fk = ...+∫

Ω((1− w)ε−k + wεk−) 1k−




Fk = ...+∫



Fk = ...+∫



Fk = ...+∫

Ω ε−kk−



Fk = ...+∫

Ω((1− w)ε−k + wεk−) 1k−


Impact of w on convergence of nonlinear iterationsChannel flow, Reτ = 395

102

10-2

10-6

10-10Norm

aliz

ed re

sidu

al(a) LaunderSharma

0.00.250.50.751.0

102

10-2

10-6

10-10Norm

aliz

ed re

sidu

al

(b) JonesLaunder

0.00.250.50.751.0

0 10 20 30 40 50Numer of iterations

102

10-2

10-6

10-10Norm

aliz

ed re

sidu

al

(c) Chien

0.00.250.50.751.0

Proof of concept: 18 highly coupled nonlinear PDEs

The elliptic relaxation model:

∂Rij

∂t+ uk

∂Rij

∂xk+∂Tkij

∂xk= Gij + Pij − εij

L2∇2fij − fij = −Ghij

k−

2Aij

T+ standard eqs. for u, p, k , ε

Coupled implementation of variational forms:

class RF_1(TurbModel):

def form(self , R, R_ , v_R , k_, e_, P_, nu , u_ , f, f_, v_f ,

A_, Gh , Cmu , T_, L_, **kwargs):

Fr = inner(dot(grad(R), u_), v_R)*dx + nu*inner(grad(R),

grad(v_R))*dx \

+ inner(Cmu*T_*dot(grad(R), R_), grad(v_R) )*dx

- inner(k_*f, v_R)*dx - inner(P_, v_R )*dx +

inner(R*e_*(1./k_), v_R)*dx \

Ff = inner(grad(f), grad(L_**2*v_f))*dx + inner(f , v_f)*dx \

- (1./k_)*inner(Gh , v_f)*dx - (2./T_)*inner(A_ , v_f)*dx

return Fr + Ff

What is the efficiency loss of so much flexibility?

Validation: ”DNS” of turbulent flow in a channel

Codes: FEniCS-based (Oasis), OpenFOAM, CDP

CPU-time with 32 processors:

FEniCS, OpenFOAM: 1.0CDP: 1.5

FEniCS needs twice as much memory as CDP

Conclusions:

80% time in Krylov solversthe splitting algorithm for N-S (# linear systems) is keypreconditioning is keyHPC of discretization details not so important

Linear systems: PETSc w/Hypre AMG

http://www.openfoam.com/

http://www.stanford.edu/group/cits/research/combustor/cdp.html

Summary

Links

fenicsproject.org, FEniCS book

simula.mint.no

launchpad.net/cbcpdesys

Acknowledgments

A. Logg (Chalmers/Simula), G. Wells (Cambridge), M. Alnaes(Simula), M. Mortensen (Univ. of Oslo), Didem Unat(UCSD/Simula), S. Baden (UCSD), X. Cai (Simula), J. Hake(Simula), M. Rognes (Simula)

http://fenicsproject.org

https://launchpad.net/fenics-book

http://simula.mint.no

https://launchpad.net/cbcpdesys

Code Generation for High-Level Problem Speci cation and HPC€¦ · Code Generation for High-Level Problem Speci cation and HPC ... fv dx in Python ... Andreas Klockner has many HPC

Documents