BigDFT BigDFT Wavelets Operations Performances O(N ) Minimal basis Performance Fragment Applications Perspectives Conclusions Linear Scaling DFT based on Daubechies Wavelets for Massively Parallel Archiecture L Ratcliff 1 , Stephan Mohr 1,2 , Paul Boulanger 1,3 , Luigi Genovese 1 , Damien Caliste 1 , Stefan Goedecker 2 , Thierry Deutsch 1 Laboratoire de Simulation Atomistique (L Sim), CEA Grenoble 2 Institut f ¨ ur Physik, Universit¨ at Basel, Switzerland 3 Institut N ´ eel, CNRS, Grenoble Centre Blaise Pascal, Lyon Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
42
Embed
Linear Scaling DFT based on Daubechies Wavelets for ... fileBigDFT BigDFT Wavelets Operations Performances O(N) Minimal basis Performance Fragment Applications Perspectives Conclusions
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Linear Scaling DFT based on DaubechiesWavelets for Massively Parallel Archiecture
L Ratcliff1, Stephan Mohr1,2, Paul Boulanger1,3,Luigi Genovese1, Damien Caliste1, Stefan Goedecker2,
Thierry Deutsch
1Laboratoire de Simulation Atomistique (L Sim), CEA Grenoble2Institut fur Physik, Universitat Basel, Switzerland
3Institut Neel, CNRS, Grenoble
Centre Blaise Pascal, Lyon
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Outline
1 The machinery of BigDFT (cubic scaling version)Mathematics of the waveletsMain OperationsMPI/OpenMP and GPU performances
2 O(N ) (linear scaling) BigDFT approachMinimal basis set approachPerformance and AccuracyFragment approachApplicationsPerspectives
3 Conclusions
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
A basis for nanosciences: the BigDFT project
STREP European project: BigDFT(2005-2008)Four partners, 15 contributors:CEA-INAC Grenoble (T.Deutsch), U. Basel (S.Goedecker),U. Louvain-la-Neuve (X.Gonze), U. Kiel (R.Schneider)
Aim: To develop an ab-initio DFT code basedon Daubechies Wavelets, to be integrated inABINIT, distributed under GNU-GPL license
After six years, BigDFT project still alive and wellBigDFT formalism from HPC perspective
Wavelets for O(N) DFT
Resonant states of open quantum systems
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Why do we use wavelets in BigDFT?
AdaptivityOne grid, two resolution levels in BigDFT:
• 1 scaling function (“coarse region”)
• 1 scaling function and 7 wavelets(“fine region”)
Ideal for big inhomogeneous systemsEfficient Poisson solver, capable ofhandling different boundary conditions –free, wire, surface, periodicExplicit treatment of charged systemsEstablished code with many capabilites
-1.5
-1
-0.5
0
0.5
1
1.5
-6 -4 -2 0 2 4 6 8
x
φ(x)
ψ(x)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
A brief description of wavelet theory
Two kind of basis functions
A Multi-Resolution real space basisThe functions can be classified following the resolution level theyspan.
Scaling FunctionsThe functions of low resolution level are a linear combination ofhigh-resolution functions
= +
φ(x) =m
∑j=−m
hjφ(2x− j)
Centered on a resolution-dependent grid: φj = φ0(x− j).
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
A brief description of wavelet theory
WaveletsThey contain the DoF needed to complete the information which islacking due to the coarseness of the resolution.
= 12 + 1
2
φ(2x) =m
∑j=−m
hjφ(x− j) +m
∑j=−m
gjψ(x− j)
Increase the resolution without modifying grid spaceSF + W = same DoF of SF of higher resolution
ψ(x) =m
∑j=−m
gjφ(2x− j)
All functions have compact support, centered on grid points.
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Adaptivity of the mesh
Atomic positions (H2O)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Adaptivity of the mesh
Fine grid (high resolution)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Adaptivity of the mesh
Coarse grid (low resolution)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
The hamiltonian-related quantities can be calculated up to machineprecision in the given basis.
The accuracy is only limited by the basis set (O(
h14grid
))
Exact evaluation of kinetic energyObtained by convolution with filters:
f (x) = ∑`
c`φ`(x) , ∇2f (x) = ∑
`
c` φ`(x) ,
c` = ∑j
cj a`−j , a` ≡∫
φ0(x)∂2x φ`(x) ,
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
http://www.bigdft.org version 1.7.x
• Isolated, surfaces and 3D-periodic boundary conditions(k-points, symmetries)
• All XC functionals of the ABINIT package (libXC library)
• Hybrid functionals, Fock exchange operator
• Direct Minimisation and Mixing routines (metals)
• Local geometry optimizations (with constraints)
• External electric fields (surfaces BC)
• Born-Oppenheimer MD
• Vibrations
• Unoccupied states
• Empirical van der Waals interactions
• Saddle point searches (NEB, Granot & Bear)
• All these functionalities are GPU-compatible
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Optimal for isolated systems
Test case: cinchonidine molecule (44 atoms)
10-4
10-3
10-2
10-1
100
101
105 106
Abs
olut
e en
ergy
pre
cisi
on (
Ha)
Number of degrees of freedom
Ec = 125 Ha
Ec = 90 Ha
Ec = 40 Ha
h = 0.3bohr
h = 0.4bohr
Plane waves
Wavelets
Allows a systematic approach for moleculesConsiderably faster than Plane Waves codes.10 (5) times faster than ABINIT (CPMD)Charged systems can be treated explicitly with the same time
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Systematic basis set
Two parameters for tuning the basisThe grid spacing hgrid
The extension of the low resolution points crmult
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Massively parallel
Two kinds of parallelisationBy orbitals (Hamiltonian application, preconditioning)
By components (overlap matrices, orthogonalisation)
A few (but big) packets of dataMore demanding in bandwidth than in latency
No need of fast network
Optimal speedup (eff. ∼ 85%), also for big systems
Cubic scaling codeFor systems bigger than 500 atomes (1500 orbitals) :orthonormalisation operation is predominant (N3)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Orbital distribution scheme
Used for the application of the hamiltonianThe hamiltonian (convolutions) is applied separately onto eachwavefunction
ψ5
ψ4
ψ3
ψ2
ψ1
MPI 0
MPI 1
MPI 2
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Coefficient distribution scheme
Used for scalar product & orthonormalisationBLAS routines (level 3) are called, then result is reduced
ψ5
ψ4
ψ3
ψ2
ψ1
MPI 0 MPI 1 MPI 2
Communications are performed via MPI ALLTOALLV
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
OpenMP parallelisation
Innermost parallelisation level(Almost) Any BigDFT operation is parallelised via OpenMP
4 Useful for memory demanding calculations
4 Allows further increase of speedups
4 Saves MPI processes and intra-node Message Passing
6 Less efficient thanMPI
6 Compiler and systemdependent
6 OpenMP sectionsshould be regularlymaintained 1.5
2
2.5
3
3.5
4
4.5
0 20 40 60 80 100 120
OM
P S
peed
up
No. of MPI procs
2OMP3OMP6OMP
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
High Performance Computing
Localisation & Orthogonality→ Data localityPrincipal code operations can be intensively optimised
Optimal for application on supercomputers
Little communication, bigpackets of data
No need of fast network
Optimal speedup
Efficiency of the order of 90%,up to thousands of processors
88
90
92
94
96
98
100
1 10 100 1000
Effi
cien
cy (
%)
Number of processors
(with orbitals/proc)
32 at
8
4
2
65 at
2
1
173 at
44
22
11
5
3257 at
16
4
11025 at
4
2
1
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Task repartition for a small system (ZnO, 128 atoms)
0
10
20
30
40
50
60
70
80
90
100
16 24 32 48 64 96 144 192 288 576 1
10
100
1000P
erce
nt
Sec
onds
(lo
g. s
cale
)
No. of cores
1 Th. OMP per core
CommsLinAlgConvCPUOtherTime (sec)Efficiency (%)
Data repartition optimal for material accelerators (GPU)Graphic Processing Units can be used to speed up the computation
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Using GPUs in a Big systematic DFT code
Nature of the operationsOperators approach via convolutions
Linear Algebra due to orthogonality of the basis
Communications and calculations do not interfere
* A number of operations which can be accelerated
Evaluating GPU convenienceThree levels of evaluation
1 Bare speedups: GPU kernels vs. CPU routinesDoes the operations are suitable for GPU?
2 Full code speedup on one processAmdahl’s law: are there hot-spot operations?
3 Speedup in a (massively?) parallel environmentThe MPI layer adds an extra level of complexity
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Hybrid and Heterogeneous runs with OpenCL
NVidia S2070 Connected each toa NehalemWorkstation
BigDFT may run onboth
ATI HD 6970
Sample BigDFT run: Graphene, 4 C atoms, 52 kpts
No. of Flop: 8.053 · 1012
MPI 1 1 4 1 4 8GPU NO NV NV ATI ATI NV + ATITime (s) 6020 300 160 347 197 109Speedup 1 20.07 37.62 17.35 30.55 55.23GFlop/s 1.34 26.84 50.33 23.2 40.87 73.87
Next Step: handling of Load (un)balancing
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Configuration space of the cage-like boron clusters
Stabilize the buckyball configuration of B80 systemsPRB 83 081403(R), 2011
-2
-1.5
-1
-0.5
0
0.5
-2
-1.5
-1
-0.5
0
0.5
B @B12
B 80
14:6
68
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Configuration space of the cage-like boron clusters
Stabilize the buckyball configuration of B80 systemsPRB 83 081403(R), 2011
-2
-1.5
-1
-0.5
0
0.5
-2
-1.5
-1
-0.5
0
0.5
-12.76 eV
-13.55 eV on B @B
12 68
@ B 80
8:12
Sc 3N
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Outline
1 The machinery of BigDFT (cubic scaling version)Mathematics of the waveletsMain OperationsMPI/OpenMP and GPU performances
2 O(N ) (linear scaling) BigDFT approachMinimal basis set approachPerformance and AccuracyFragment approachApplicationsPerspectives
3 Conclusions
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Scaling of BigDFT (I)
We can reach systems containing up to a few hundred electronsthanks to wavelet properties and efficient parallelization:
Efficient parallelization for 1000s of CPUs• ∼ 97% of code parallelized with MPI
• ∼ 93% of code parallelized with OpenMP
960 atomsLaboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Accuracy: Total energies and forces
Several systems 44 – 450 atoms• compared with standard BigDFT
• total energies accuracy: ∼ 1 meV/atom
• forces accuracy: ∼ a few meV/bohr
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Geometry optimization
1e−03
1e−02
1e−01
1e+00
RM
S F
(eV
/Å)
0
5
10
Tim
e (m
in.)
0
20
40
60
80
100
0 5 10 15
Cum
. tim
e (m
in.)
CubicReset
Reformat 0.00
0.05
0.10
0 5 10 15
∆ R
(Å
)
288 atoms
Further savings through support function reuse
• Pulay forces not needed • lower prefactor than single point
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Energy differences
Silicon cluster with a vacancy defect• 291 atoms
• need 9 support functions per Si
• accuracy in defect energy of 12 meV
pristine vacancy ∆ ∆-∆cubic
eV eV eV meVcubic −20674.2 −20563.1 111.167 –4/1 (sp-like) −20667.6 −20556.5 111.038 1299/1 (spd-like) −20672.9 −20561.7 111.155 12
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Fragment approach• By dividing a system into fragments,
we can avoid optimizing the supportfunctions entirely for large systems
• Substantially reduces the cost(need an efficient reformatting)
• Many useful applications, including theexplicit treatment of solvents
• A necessary first step towards atight-binding like approach, with eachatom as a fragment
Reformatting the minimal basis set in the same grid
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Support function reuse
As we have seen, the reuse of support functions lends itself tosome interesting applications, e.g. geometry optimizations,charged systemsFor this, we need an accurate and efficient reformatting scheme
0.001
0.01
0.1
1
10
100
E-
Ecubic
(meV
/ato
m)
Cubic eggboxLinear eggbox
Linear template
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Transfer integrals and site energies
0.00
0.20
0.40
0.60
0.80
1.00
0 30 60 90
Ene
rgy
(eV
)
Rotation angle
DZ J12 LUMOTZP J12 LUMOTMB J12 LUMO
0.00
0.20
0.40
0.60
0.80
1.00
0 30 60 90
Ene
rgy
(eV
)
Rotation angle
DZ J12 HOMOTZP J12 HOMOTMB J12 HOMO
−4.00
−3.50
−3.00
−2.50
−2.00
−1.50
0 30 60 90
Ene
rgy
(eV
)
Rotation angle
DZ e1,2 LUMOTZP e1,2 LUMOTMB e1,2 LUMO
−6.00
−5.50
−5.00
−4.50
−4.00
0 30 60 90E
nerg
y (e
V)
Rotation angle
DZ e1,2 HOMOTZP e1,2 HOMOTMB e1,2 HOMO
BigDFT compared to ADF fragment approach• support functions from molecules reused in dimers
• Application to OLED (Organic Light-Emitting Diode)
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Transfer integrals for OLEDs (I)
We want to consider environment effects in realistic ‘host-guest’morphologies: 6192 atoms, 100 molecules
Host molecule
Guest molecule
Laboratoire de Simulation Atomistique http://inac.cea.fr/L Sim Thierry Deutsch
BigDFT
BigDFT
Wavelets
Operations
Performances
O(N )
Minimal basis
Performance
Fragment
Applications
Perspectives
Conclusions
Transfer integrals for OLEDs (II)
Transfer integrals (left) and site energies (right) calculated with(bottom) and without (top) constrained DFT