Using GPUs for the Boundary Element Method Christopher Cooper Lorena Barba Boston University November, 2011 1 Pan-American Advanced Studies Institute NSF Program
Using GPUs for the Boundary Element Method
Christopher CooperLorena Barba
Boston University
November, 2011
1
Pan-American Advanced Studies Institute NSF Program
Boundary Element Method - Formulation
‣Numerical Method for PDEs
2
∇2φ(x)
Only on boundary!
Volume mesh Surface mesh
φ(x) =�
Γψ(x,x�)dΓ�
Boundary Element Method - Matrix Formulation
‣Apply for all boundary elements at
3
Γj
x = xi
x0
x1
x2
x3
x = xi
[A] {X} = [B] {Y }{X} unknown boundary values
{Y } known boundary values
Boundary Element Method - Applications
4
Advanced Numerical Solutions ansol.us
Acoustics
Stokes flow
FastBEMurbana.mie.uc.edu
Electrostatics
Yokota et al. 2011
Boundary Element Method - Limitations
‣Dense matrix vector multiplications scale as O(N2)!
- Assembling the RHS
- Krylov subspace linear solver (GMRES, CG, ...)
5
[A] {X} = {b}
[B] {Y } = {b}
Naive approach allows for only a few thousand elements!
Fast Multipole Method (FMM)
‣ Fast interaction calculation algorithm
‣Approximates far field
6
φ(x) =N−1�
i=0
αiψ(|x− xi|)
x
xc
xi
BEM - FMM
‣ BEM Matrix vector multiplications are interaction calculations
7
φ(x) =N−1�
i=0
αiG(|x− xi|)
PDE Integral N-body
BEM - FMM
8
Liu . Fast Multipole Boundary Element Method
BEM - FMM
‣Accelerates matrix vector computation:
- Far field is approximated -> O(N) calculations
‣No need to store the matrix
- Values computed on the fly -> O(N) storage
9
BEM FMM - Previous work
10
“Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures”
Rahimian et al.
Fast BEM Lu et al.
BEM FMM and GPUs
‣ BEM and FMM provide an important reduction in computation
11
Mesh reduction
Reduce computation
GPU
Look for further reduction from hardware!
BEM FMM
BEM FMM and GPUs
‣ FMM maps well to GPUs
12
!"!# !"$ !"% !"& ! & % $ !# '& #% !&$ &(#
!#
'&
#%
!&$
&(#
(!&
!)&%
&)%$
*+,-./012.3402/,250/647831+"96/,:
;//.02.93,4831+"547<831+"5:
!"#$%&'#!"#%()
*$%&*%()
%+,-#./0"12#34+5-6789/4+5-6789:
%+,-#./0"12#37988/7988:
;/<#%%=
$-9!768
$4(>
,6!?89/45976,6"!#49+@
FMM - Available resources
‣ ExaFMM: FMM library that runs on GPUs
- Shown to scale well to thousands of GPUs
- Performs in the order of Peta FLOPS
13
64 billion in 100 seconds1.0 PFlops4K GPUs on TSUBAME
Weak scaling
FMM - Available resources
‣ ExaFMM: an open source FMM library that runs on GPUs
- Released in SC’11
14
How far can we go?
‣ Bioelectrostatics
15Billions of elements!
How far can we go?
‣Acoustics
- Helmholtz equation solved for each frequency
16
FMM BEM - Implementation
17
Set up elements
Construct tree
Assemble RHS FMM
GMRES Check residual FMM
Solution
N iterations!
Only once
Main challenges
‣ Porting the whole BEM to the GPU
- Avoid excessive CPU-GPU communication
18
GMRES Check residual FMM
N iterations!
Main challenges
‣Constructing a BEM in the ExaFMM framework
- Needed for large number of elements
- Use ExaFMM’s domain decomposition for parallelization
19
Conclusions and future work
‣ BEM FMM technology is a very good candidate to be accelerated with GPUs
‣ Intelligent use of tools such as ExaFMM will allow us to solve cool engineering problems!
20