Code Generation for High-Level Problem Specification and HPC Hans Petter Langtangen Center for Biomedical Computing Simula Research Laboratory Dept. of Informatics, University of Oslo Advances in Numerical Algorithms and HPC, April 14, 2014
Code Generation forHigh-Level Problem Specification and HPC
Hans Petter Langtangen
Center for Biomedical ComputingSimula Research Laboratory
Dept. of Informatics, University of Oslo
Advances in Numerical Algorithms and HPC, April 14, 2014
FEniCS Mint
1 FEniCS: HPC Finite Element Solution of PDEs
2 Mint: HPC Finite Difference Solution of PDEs
3 Algorthms and HPC
FEniCS solves PDEs by the finite element method
Input: finite element formulation of the PDE problem∫Ω∇u · ∇v dx +
∫Ω fv dx in Python
Output: C++ code loaded back in Python
Python module with C++ definition of element matrix/vector
FEniCS solves PDEs by the finite element method
Input: finite element formulation of the PDE problem∫Ω∇u · ∇v dx +
∫Ω fv dx in Python
Output: C++ code loaded back in Python
Python module with C++ definition of element matrix/vector
FEniCS solves PDEs by the finite element method
Input: finite element formulation of the PDE problem∫Ω∇u · ∇v dx +
∫Ω fv dx in Python
Output: C++ code loaded back in Python
Python module with C++ definition of element matrix/vector
Python + large-scale simulation + HPC = true
Some HPC projects in Python:
General FE PDE solution: FEniCS, www.fenics.org
General FV PDE solution: FiPy, www.ctcms.nist.gov/fipy
Hyperbolic PDEs w/finite volumes: PyClaw,kingkong.amath.washington.edu/clawpack/users/pyclaw/
Discontinuous Galerkin FE PDE solver: hedge
Andreas Klockner has many HPC tools for Python
Quantum mechanics: GPAW, wiki.fysik.dtu.dk/gpaw
N-body dynamics: pNbody, obswww.unige.ch/˜revaz/pNbody
FEniCS tries to combine four contradictory goals
Simplicity∫Ω a∇u · ∇v dx → dot(a*grad(u), grad(v))*dx
Generality
Linear a(u, v) = L(v) or nonlinear F (u; v) = 0 variational problem
Efficiency
Generated C++ code tailored to the problem + efficientthird-party libraries (PETSc, Trilinos, ...)
Reliability
Given a goal M(u) and tolerance ε, compute u such that
||M(ue)−M(u)|| ≤ ε (ue: exact sol.)
Generality Efficiency
Code Generation
FEniCS tries to combine four contradictory goals
Simplicity∫Ω a∇u · ∇v dx → dot(a*grad(u), grad(v))*dx
Generality
Linear a(u, v) = L(v) or nonlinear F (u; v) = 0 variational problem
Efficiency
Generated C++ code tailored to the problem + efficientthird-party libraries (PETSc, Trilinos, ...)
Reliability
Given a goal M(u) and tolerance ε, compute u such that
||M(ue)−M(u)|| ≤ ε (ue: exact sol.)
Generality Efficiency
Code Generation
FEniCS tries to combine four contradictory goals
Simplicity∫Ω a∇u · ∇v dx → dot(a*grad(u), grad(v))*dx
Generality
Linear a(u, v) = L(v) or nonlinear F (u; v) = 0 variational problem
Efficiency
Generated C++ code tailored to the problem + efficientthird-party libraries (PETSc, Trilinos, ...)
Reliability
Given a goal M(u) and tolerance ε, compute u such that
||M(ue)−M(u)|| ≤ ε (ue: exact sol.)
Generality Efficiency
Code Generation
FEniCS tries to combine four contradictory goals
Simplicity∫Ω a∇u · ∇v dx → dot(a*grad(u), grad(v))*dx
Generality
Linear a(u, v) = L(v) or nonlinear F (u; v) = 0 variational problem
Efficiency
Generated C++ code tailored to the problem + efficientthird-party libraries (PETSc, Trilinos, ...)
Reliability
Given a goal M(u) and tolerance ε, compute u such that
||M(ue)−M(u)|| ≤ ε (ue: exact sol.)
Generality Efficiency
Code Generation
FEniCS tries to combine four contradictory goals
Simplicity∫Ω a∇u · ∇v dx → dot(a*grad(u), grad(v))*dx
Generality
Linear a(u, v) = L(v) or nonlinear F (u; v) = 0 variational problem
Efficiency
Generated C++ code tailored to the problem + efficientthird-party libraries (PETSc, Trilinos, ...)
Reliability
Given a goal M(u) and tolerance ε, compute u such that
||M(ue)−M(u)|| ≤ ε (ue: exact sol.)
Generality Efficiency
Code Generation
”Hello, world!” for PDEs: −∇ · (k∇u) = f
−∇ · (k∇u) = f in Ω
u = g on ∂ΩD
−k ∂u∂n
= α(u − u0) on ∂ΩR
Variational problem: find u ∈ V such that
F =
∫Ωk∇u · ∇vdx −
∫Ωfvdx +
∫∂ΩR
α(u − u0)vds = 0 ∀ v ∈ V
Implementation:
F = dot(k*grad(u), grad(v))*dx - f*v*dx + alpha*(u-u0)*v*ds
”Hello, world!” for PDEs: −∇ · (k∇u) = f
−∇ · (k∇u) = f in Ω
u = g on ∂ΩD
−k ∂u∂n
= α(u − u0) on ∂ΩR
Variational problem: find u ∈ V such that
F =
∫Ωk∇u · ∇vdx −
∫Ωfvdx +
∫∂ΩR
α(u − u0)vds = 0 ∀ v ∈ V
Implementation:
F = dot(k*grad(u), grad(v))*dx - f*v*dx + alpha*(u-u0)*v*ds
”Hello, world!” for PDEs: −∇ · (k∇u) = f
−∇ · (k∇u) = f in Ω
u = g on ∂ΩD
−k ∂u∂n
= α(u − u0) on ∂ΩR
Variational problem: find u ∈ V such that
F =
∫Ωk∇u · ∇vdx −
∫Ωfvdx +
∫∂ΩR
α(u − u0)vds = 0 ∀ v ∈ V
Implementation:
F = dot(k*grad(u), grad(v))*dx - f*v*dx + alpha*(u-u0)*v*ds
”Hello, world!” for PDEs: −∇ · (k∇u) = f
−∇ · (k∇u) = f in Ω
u = g on ∂ΩD
−k ∂u∂n
= α(u − u0) on ∂ΩR
Variational problem: find u ∈ V such that
F =
∫Ωk∇u · ∇vdx −
∫Ωfvdx +
∫∂ΩR
α(u − u0)vds = 0 ∀ v ∈ V
Implementation:
F = dot(k*grad(u), grad(v))*dx - f*v*dx + alpha*(u-u0)*v*ds
The complete ”Hello, world!” program
from dolfin import *
mesh = Mesh(’mydomain.xml.gz’)
V = FunctionSpace(mesh , ’Lagrange ’, degree=1)
dOmega_D = MeshFunction(’uint’, mesh , ’myboundary.xml.gz’)
g = Constant(0.0)
bc = DirichletBC(V, g, 1, dOmega_D)
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(2.0)
k = Expression(’A*x[1]*sin(pi*q*x[0])’, A=4.5, q=1)
alpha = 10; u0 = 2
F = dot(k*grad(u), grad(v))*dx - f*v*dx + alpha*(u-u0)*v*ds
a = lhs(F); L = rhs(F)
u = Function(V) # finite element function to compute
solve(a == L, u, bc)
plot(u)
Example of an autogenerated element matrix routine
Mixed formulation of −∇ · (k∇u) = f
PDE problem:
∇ · q = f in Ω
−k−1q = ∇u in Ω
Variational problem: find (u, q) ∈ V × Q such that
F =
∫Ω∇ · q v dx −
∫Ωfv dx +
∫Ωk−1q · p dx +
∫Ω∇ · p u dx
∀ (v , p) ∈ V × Q
Principal implementation line:
F = div(q)*v*dx - f*v*dx + (1./k)*dot(q,p)*dx + div(p)*u*dx
The program
mesh = UnitCube(N, N, N)
Q = FunctionSpace(mesh , "BDM", 1)
V = FunctionSpace(mesh , "DG", 0)
W = Q * V
q, u = TrialFunctions(W)
p, v = TestFunctions(W)
f = Expression(’x[0] > L/2 ? a : 0’, L=1, a=2)
F = div(q)*v*dx - f*v*dx + (1./k)*dot(q,p)*dx + div(p)*u*dx
a = lhs(F); L = rhs(F)
A = assemble(a)
b = assemble(L)
qu = Function(W) # compound (q,u) field to be solved for
solve(A, uq.vector (), b, ’gmres ’, ’ilu’)
q, u = qu.split ()
Discontinuous Galerkin method for −∇ · (k∇u) = f
Variational problem:
F =
∫Ω∇u · ∇v dx −
∫Γ〈∇u〉 · [vn] dS
−∫
Γ[un] · 〈∇v〉 dS +
α
h
∫Γ
[un] · [vn] dS −∫
Ωfv dx = 0
Implementation:
F = dot(grad(u), grad(v))*dx \
- dot(jump(u, n), avg(grad(v)))*dS \
- dot(avg(grad(u)), jump(v, n))*dS \
+ alpha/h*dot(jump(u, n), jump(v, n))*dS \
- f*v*dx
a = lhs(F); L = rhs(F)
Fluid flow ”Hello, world!”: Stokes’ problem
Stokes’ problem for slow viscous flow:
−∇2u +∇p = f
∇ · u = 0
Variational problem: find (u, p) ∈ V × Q such that
F =
∫Ω
(∇v : ∇u −∇ · v p + v · f ) dx+∫Ωq∇ · u dx = 0 ∀ (v , q) ∈ V × Q
Fluid flow ”Hello, world!” code
V = VectorFunctionSpace(mesh , ’Lagrange ’, 2)
Q = FunctionSpace(mesh , ’Lagrange ’, 1)
W = V * Q # Taylor -Hood mixed finite element
v, q = TestFunctions(W)
u, p = TrialFunctions(W)
f = Constant ((0, 0))
F = (inner(grad(v), grad(u)) - div(v)*p + q*div(u))*dx + dot(v, f)*dx
a = lhs(F); L = rhs(F)
up = Function(W)
solve(a == L, up, bc) # solve variational problem
# or
A = assemble(a); b = assemble(L)
solve(A, up.vector (), b) # solve linear system
u, p = up.split ()
Again, code ≈ math
Key mathematical formula:
F =
∫Ω
(∇v : ∇u −∇ · v p + v · f ) dx +
∫Ωq∇ · u dx
Key code line:
F = (inner(grad(v), grad(u)) - div(v)*p + dot(f,v)*dx + q*div(u))*dx
FEniCS supports a rich set of finite elements
Lagrangeq (Pq), DGq, BDMq, BDFMq, RTq, Nedelec
1st/2nd kind, Crouzeix–Raviart, Arnold-Winther, PqΛk ,P−q Λk , Morley, Hermite, Argyris, Bell, ...
Hyperelasticity (Fung model for biological tissues)
Mathematical problem:
F = I +∇uuu uuu : unknown displacement
C = FT : F
E = (C − I )/2
ψ =λ
2(trE )2 + K exp((EA) : E ) material law
P =∂ψ
∂Estress tensor
F =
∫ΩP : ∇vvv dx nonlinear variational form
J =∂F
∂uJacobian (”tanget stiffness”)
Hyperelasticity implementation
V = VectorFunctionSpace(mesh , ’Lagrange ’, order)
v = TestFunction(V)
u = TrialFunction(V)
u_ = Function(V) # computed solution
I = Identity(u_.cell().d)
F = I + grad(u_)
J = det(F)
C = F.T * F
E = (C-I)/2
# Material law
lambda_ = Constant(1.0)
A = Expression ([[’1.0 + x[0]’, ’0.3’], [’0.3’, ’2.3’]])
K = Constant(1.0)
psi = lambda_/2 * tr(E)**2 + K*exp(inner(A*E,E))
P = F*diff(psi , E) # symbolic differentiation
F = inner(P, grad(v))*dx
J = derivative(F, u_, u) # symbolic differentiation
A = assemble(J)
The generated code is order of magnitude faster thanstandard hand-written code
Mass matrix Poisson Navier-Stokes Elasticity10-8
10-7
10-6
10-5
10-4
Tim
e / s
Computing the element stiffness matrix with linear Lagrange elementsHand-writtenGenerated
The generated code is order of magnitude faster thanstandard hand-written code
Speed-up vs standard quadrature
Form q = 1 q = 2 q = 3 q = 4 q = 5 q = 6 q = 7
Mass 2D 12 31 50 78 108 147 183Mass 3D 21 81 189 355 616 881 1442Poisson 2D 8 29 56 86 129 144 189Poisson 3D 9 56 143 259 427 341 285Navier–Stokes 2D 32 33 53 37 — — —Navier–Stokes 3D 77 100 61 42 — — —Elasticity 2D 10 43 67 97 — — —Elasticity 3D 14 87 103 134 — — —
Parallel computing
Distributed computing via MPI:
Terminal> mpirun -n 32 python myprog.py
Shared memory via OpenMP:
# In program
parameters[’num_threads ’] = Q
Some applications of FEniCS
Fluid flow Hyperelasticity Fluid-structure
Mantle flow Electrophysiology Block prec.
1 FEniCS: HPC Finite Element Solution of PDEs
2 Mint: HPC Finite Difference Solution of PDEs
3 Algorthms and HPC
Automated error control
Input
a(u, v) = L(v) orF (u; v) = 0
Goal M(u)
ε > 0
Output
u such that
‖M(ue)−M(u)‖ ≤ ε
(ue: exact solution)
FEniCS automatatically generates a posteriori errorestimators and refinement indicators
Solve
Dual Estimate
uh
Indicate
Refine
ηTT∈Th
ηh < ε
Error control: just define the goal and the tolerance
# Define variational form as usual
F = ....
# Define goal functional
M = dot(mu*(grad(u) + grad(u).T), n)*ds(FLAP)
tol = 1E-3
solve(F == 0, u, bc, M=M, tol=tol)
Example: compute shear stress in a bone implant
Polymer-fluid mixture
Nonlinear hyperelasticity
Complicated constitutive law
Novel mixeddisplacement-stressdiscretization viaArnold-Winther element
Adaptivity pays off – but would be really difficult toimplement by hand in this case
1 FEniCS: HPC Finite Element Solution of PDEs
2 Mint: HPC Finite Difference Solution of PDEs
3 Algorthms and HPC
FEniCS really pays off in computational turbulence
Scope:
The jungle of Reynolds-AveragedNavier-Stokes (RANS) models: k-ε,k-ω, v2-f , various tensor models, ...’
Should be easy to implement andcompare...
Various models
Various linearizations
Coupled vs. segregated solution
Picard vs. Newton iteration
Example: k-ε model (unknowns: uuu, p, k , ε)
∂uuu
∂t+ uuu · ∇uuu = −1
%∇p + ν∇2uuu + fff −∇ · uuu′uuu′
∇ · uuu = 0
∇ · uuu′uuu′ = −2k2
ε
1
2(∇uuu +∇uuuT ) +
2
3kI
∂k
∂t+ uuu · ∇k = ∇ · (νk∇k) + Pk − ε− D,
∂ε
∂t+ uuu · ∇ε = ∇ · (νε∇ε) + (Cε1Pk − Cε2f2ε)
ε
k+ E
ε = 2νsss : sss, sss =1
2(∇uuu′ + (∇uuu′)T )
νk = ν +νTσk
...
Multi-physics problems with large systems of PDEs
Solve a system of PDEs, e.g.,
L(u1, u2, ..., u6) = 0
in some grouping into subsystems, e.g.,
L1(u1, u2) = 0
L2(u3) = 0
L3(u4, u5, u6) = 0
Segregated solve (iteration) between subsystems
One scalar/vector PDE solver is compact in FEniCS
Large PDE systems require tedious, repetitive code
Let’s automate!
Multi-physics problems with large systems of PDEs
Solve a system of PDEs, e.g.,
L(u1, u2, ..., u6) = 0
in some grouping into subsystems, e.g.,
L1(u1, u2) = 0
L2(u3) = 0
L3(u4, u5, u6) = 0
Segregated solve (iteration) between subsystems
One scalar/vector PDE solver is compact in FEniCS
Large PDE systems require tedious, repetitive code
Let’s automate!
Multi-physics problems with large systems of PDEs
Solve a system of PDEs, e.g.,
L(u1, u2, ..., u6) = 0
in some grouping into subsystems, e.g.,
L1(u1, u2) = 0
L2(u3) = 0
L3(u4, u5, u6) = 0
Segregated solve (iteration) between subsystems
One scalar/vector PDE solver is compact in FEniCS
Large PDE systems require tedious, repetitive code
Let’s automate!
Key issue: how to linearize nonlinear PDEs
PDE for turbulent kinetic energy k (unknowns: u, k, ε)
0 = −u · ∇k +∇ · (νk(u, k, ε)∇k) + Pk(u, k, ε)− ε
Typical linearization (underscore subscript: old value)
0 = −u− · ∇k +∇ · (νk−∇k) + Pk− − ε
Linearization, i.e., implicit vs explicit treatment is a matterof inserting or removing an underscore
Variational form (k and vk are trial and test functions)
Fk = −∫
Ωu− ·∇k vk dx −
∫Ωνk−∇k ·∇vk dx +
∫Ω
(Pk−− ε) vk dx
Corresponding code
F_k = - inner(dot(u_ , grad(k)), v_k)*dx \
- nu_k_*inner(grad(k), grad(v_k))*dx \
+ (P_k_ - e)*v_k*dx
Example of various explicit/implicit treatments of a term
Implicit treatment of ε in coupled k-ε system:
Fk = ...+∫
Ω εvk dx → e*v k*dx
Explicit treatment of ε for decoupled k-ε system:
Fk = ...+∫
Ω ε−vk dx → e *v k*dx
Explicit treatment of ε, but implicit term in k eq.:
Fk = ...+∫
Ω ε−kk−
vk dx → e *k/k *v k*dx
Weighted combination in coupled k-ε system:
Fk = ...+∫
Ω((1− w)ε−k + wεk−) 1k−
vk dx →(1/k )*((1-w)*e *k + w*e*k )*v k*dx
Example of various explicit/implicit treatments of a term
Implicit treatment of ε in coupled k-ε system:
Fk = ...+∫
Ω εvk dx → e*v k*dx
Explicit treatment of ε for decoupled k-ε system:
Fk = ...+∫
Ω ε−vk dx → e *v k*dx
Explicit treatment of ε, but implicit term in k eq.:
Fk = ...+∫
Ω ε−kk−
vk dx → e *k/k *v k*dx
Weighted combination in coupled k-ε system:
Fk = ...+∫
Ω((1− w)ε−k + wεk−) 1k−
vk dx →(1/k )*((1-w)*e *k + w*e*k )*v k*dx
Example of various explicit/implicit treatments of a term
Implicit treatment of ε in coupled k-ε system:
Fk = ...+∫
Ω εvk dx → e*v k*dx
Explicit treatment of ε for decoupled k-ε system:
Fk = ...+∫
Ω ε−vk dx → e *v k*dx
Explicit treatment of ε, but implicit term in k eq.:
Fk = ...+∫
Ω ε−kk−
vk dx → e *k/k *v k*dx
Weighted combination in coupled k-ε system:
Fk = ...+∫
Ω((1− w)ε−k + wεk−) 1k−
vk dx →(1/k )*((1-w)*e *k + w*e*k )*v k*dx
Example of various explicit/implicit treatments of a term
Implicit treatment of ε in coupled k-ε system:
Fk = ...+∫
Ω εvk dx → e*v k*dx
Explicit treatment of ε for decoupled k-ε system:
Fk = ...+∫
Ω ε−vk dx → e *v k*dx
Explicit treatment of ε, but implicit term in k eq.:
Fk = ...+∫
Ω ε−kk−
vk dx → e *k/k *v k*dx
Weighted combination in coupled k-ε system:
Fk = ...+∫
Ω((1− w)ε−k + wεk−) 1k−
vk dx →(1/k )*((1-w)*e *k + w*e*k )*v k*dx
Example of various explicit/implicit treatments of a term
Implicit treatment of ε in coupled k-ε system:
Fk = ...+∫
Ω εvk dx → e*v k*dx
Explicit treatment of ε for decoupled k-ε system:
Fk = ...+∫
Ω ε−vk dx → e *v k*dx
Explicit treatment of ε, but implicit term in k eq.:
Fk = ...+∫
Ω ε−kk−
vk dx → e *k/k *v k*dx
Weighted combination in coupled k-ε system:
Fk = ...+∫
Ω((1− w)ε−k + wεk−) 1k−
vk dx →(1/k )*((1-w)*e *k + w*e*k )*v k*dx
Impact of w on convergence of nonlinear iterationsChannel flow, Reτ = 395
102
10-2
10-6
10-10Norm
aliz
ed re
sidu
al(a) LaunderSharma
0.00.250.50.751.0
102
10-2
10-6
10-10Norm
aliz
ed re
sidu
al
(b) JonesLaunder
0.00.250.50.751.0
0 10 20 30 40 50Numer of iterations
102
10-2
10-6
10-10Norm
aliz
ed re
sidu
al
(c) Chien
0.00.250.50.751.0
Proof of concept: 18 highly coupled nonlinear PDEs
The elliptic relaxation model:
∂Rij
∂t+ uk
∂Rij
∂xk+∂Tkij
∂xk= Gij + Pij − εij
L2∇2fij − fij = −Ghij
k−
2Aij
T+ standard eqs. for u, p, k , ε
Coupled implementation of variational forms:
class RF_1(TurbModel):
def form(self , R, R_ , v_R , k_, e_, P_, nu , u_ , f, f_, v_f ,
A_, Gh , Cmu , T_, L_, **kwargs):
Fr = inner(dot(grad(R), u_), v_R)*dx + nu*inner(grad(R),
grad(v_R))*dx \
+ inner(Cmu*T_*dot(grad(R), R_), grad(v_R) )*dx
- inner(k_*f, v_R)*dx - inner(P_, v_R )*dx +
inner(R*e_*(1./k_), v_R)*dx \
Ff = inner(grad(f), grad(L_**2*v_f))*dx + inner(f , v_f)*dx \
- (1./k_)*inner(Gh , v_f)*dx - (2./T_)*inner(A_ , v_f)*dx
return Fr + Ff
What is the efficiency loss of so much flexibility?
Validation: ”DNS” of turbulent flow in a channel
Codes: FEniCS-based (Oasis), OpenFOAM, CDP
CPU-time with 32 processors:
FEniCS, OpenFOAM: 1.0CDP: 1.5
FEniCS needs twice as much memory as CDP
Conclusions:
80% time in Krylov solversthe splitting algorithm for N-S (# linear systems) is keypreconditioning is keyHPC of discretization details not so important
Linear systems: PETSc w/Hypre AMG
Summary
Links
fenicsproject.org, FEniCS book
simula.mint.no
launchpad.net/cbcpdesys
Acknowledgments
A. Logg (Chalmers/Simula), G. Wells (Cambridge), M. Alnaes(Simula), M. Mortensen (Univ. of Oslo), Didem Unat(UCSD/Simula), S. Baden (UCSD), X. Cai (Simula), J. Hake(Simula), M. Rognes (Simula)