Lin Lin Computational Research Division, Lawrence Berkeley National Laboratory Laboratoire Jacques-Louis Lions, Paris 6, June 2013 Supported by Luis Alvarez fellowship in LBNL, DOE SciDAC and BES Partnership. 1 Fast Algorithms for Electronic Structure Analysis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Lin Lin
Computational Research Division, Lawrence Berkeley National Laboratory
Laboratoire Jacques-Louis Lions,
Paris 6, June 2013
Supported by Luis Alvarez fellowship in LBNL, DOE SciDAC and BES Partnership.
1
Fast Algorithms for Electronic Structure Analysis
Acknowledgment Collaborators of past and ongoing projects on this topic: • Roberto Car, Princeton University • Mohan Chen, Princeton University • Weinan E, Princeton University and Peking University • Alberto Garcia, Institute de Ciencia de Materiales de Barcelona • Lixin He, University of Science and Technology in China • Georg Huhs, Barcelona Supercomputing Center • Mathias Jacquelin, Lawrence Berkeley National Laboratory • Juan Meza, UC Merced • Jianfeng Lu, Duke University • Chao Yang, Lawrence Berkeley National Laboratory • Lexing Ying, Stanford University
2
Electronic structure theory Main goal: Given fixed atomic positions 𝑅𝛼 𝛼=1
𝑀 , compute the ground state electron energy 𝐸𝑒( 𝑅𝛼 ). Useful in a large number of applications. Ground state electron wavefunction Ψ𝑒(𝑟1,⋯ , 𝑟𝑁; 𝑅𝛼 )
−12�Δ𝑖 −��
𝑍𝛼𝑟𝑖 − 𝑅𝛼
+12�
1𝑟𝑖 − 𝑟𝑗
𝑁
𝑖,𝑗=1,𝑖≠𝑗
𝑁
𝑗=1
𝑀
𝛼=1
𝑁
𝑖=1
Ψ𝑒 = 𝐸𝑒 𝑅𝛼 Ψ𝑒
Curse of dimensionality
The fundamental laws necessary to the mathematical treatment of large parts of physics and the whole of chemistry are thus fully known, and the difficult lies only in the fact that application of these laws leads to equations that are too complex to be solved.
–P. Dirac, 1929
3
Pople diagram
John Pople, Nobel Prize in Chemistry, 1998
Acc
urac
y
CI CCSD(T)
RPA
MP2
DFT TB
10 100 1000 10000
Number of atoms
Density functional theory (DFT): best compromise between efficiency and accuracy. Most widely used electronic structure theory for condensed matter systems.
4
Density functional theory [S. Redner, Citation Statistics from 110 Years of Physical Review]
5
Density functional theory [S. Redner, Citation Statistics from 110 Years of Physical Review]
6
Kohn-Sham density functional theory
• Efficient: Single particle theory • Accurate: Exact ground state energy for
1) Very costly step. 2) Limiting practical calculations
to hundreds of atoms
Cubic scaling of KSDFT
10
• KS orbitals are delocalized in the global domain.
• N atoms. 𝑂(𝑁) grid points. 𝑂(𝑁) KS orbitals. • Orthogonalization of an 𝑂 𝑁 × 𝑂(𝑁) matrix ⇒ 𝑂 𝑁3
scaling, regardless of what eigensolver is being used. Cannot efficiently use high performance supercomputers.
• Conclusion: DO NOT directly treat KS orbitals that are
delocalized in the global domain.
Evaluation: Alternatives? • Linear scaling algorithms
• Near-sightedness [Kohn, 1996] • Truncation based algorithm: low to intermediate accuracy • Only applicable to insulators.
[Bowler and Miyazaki, Rep. Prog. Phys 2012] “…The second challenge is that of metallic systems: there is no clear route to linear-scaling solution for systems with low or zero gaps and extended electronic structure…”
• Difficult task:
• Accurate and efficient • Uniformly applicable to metals as well as insulators.
• Trapezoid rule for periodic function gives geometric convergence
22
Pole expansion
23
Numerical result H: Tight binding model on a 2D grid
24
Outline
PEXSI: Pole EXpansion Selected Inversion
• Pole Expansion • Selected Inversion • How it works in practice
25
Selected inversion
𝜌 ≈ diag�𝜔𝑖
𝐻 − 𝑧𝑖𝜇
𝑄
𝑖=1
• All the diagonal elements of an inverse matrix. • 𝐻 is a sparse matrix, but 𝐻 − 𝑧𝑖𝜇 −1 is a full matrix. • Naïve approach: 𝑂 𝑁3 . • Need selected inversion.
26
Selected inversion: basic idea • 𝐿𝐿𝐿𝑇 factorization
𝐴 =𝐴11 𝐴21𝑇
𝐴21 �̂�22= 1 0
𝐿21 𝜇𝐴11 0
0 𝑆221 𝐿21𝑇0 𝜇
𝐿21 = 𝐴21𝐴11−1, 𝑆22 = �̂�22 − 𝐴21𝐿21𝑇
• Inversion
𝐴−1 = 𝐴11−1 + 𝐿21𝑇 𝑆22−1𝐿21 −𝐿21𝑇 𝑆22−1
−𝑆22−1𝐿21 𝑆22−1
27
Observation: If 𝐿21 is sparse, 𝐿21𝑇 𝑆22−1𝐿21 only require rows and columns of 𝑆22−1 corresponding to the sparsity pattern of 𝐿21.
Selected inversion • 𝐴 = 𝐿𝐿𝐿𝑇: 𝐴−1 restricted to the non-zero pattern of 𝐿 is “self-
contained”. Exact method with exact arithmetic.
• For KS Hamiltonian discretized by local basis set, the cost of selected inversion is 𝑂(𝑁) for 1D systems, 𝑂 𝑁1.5 for 2D systems, and 𝑂(𝑁2) for 3D systems.
• Combined with pole expansion: At most 𝑂 𝑁2 scaling for solving Kohn-Sham problem.
• Idea of selected inversion dates back to [Erisman and Tinney, 1975],
[Takakashi et al 1973]; For electronic structure [LL-Lu-Ying-Car-E, 2009]; For quantum transport [Li, Darve et al, 2008]
30
SelInv: Numerical results SelInv: a selected inversion package for general sparse symmetric matrix written in FORTRAN. [LL-Yang-Meza-Lu-Ying-E, TOMS, 2011]
31
Outline
PEXSI: Pole EXpansion Selected Inversion
• Pole Expansion • Selected Inversion • How it works in practice
32
Force
33
𝐹𝜇 = −𝑇𝑟 𝛾𝜕𝐻𝜕𝑅𝜇
+ 𝑇𝑟 𝛾𝐸𝜕𝑆𝜕𝑅𝜇
• Including both the Hellmann-Feynman force and the Pulay force • Energy density matrix
𝛾𝐸 = 𝐶𝑓𝐸 Ξ − 𝜇 𝐶𝑇 𝑓𝐸 𝑥 − 𝜇 = 𝑥𝑓(𝑥 − 𝜇) • Pole expansion with the same shift but different weight • The same selected elements of 𝐻 − 𝑧𝑖𝑆 −1
• Similar treatment for other physical quantities
[LL-Chen-Yang-He, JPCM, 2013, in press]
Numerical examples with atomic orbitals
Boron Nitride Nanotube
Carbon Nanotube
34
Sparsity is the key
35
Accuracy of the pole expansion
36
PEXSI
Efficiency of the selected inversion
37
Carbon
nanotube (metallic) SZ: single-zeta (4 basis per atom) DZP: Double-zeta with polarization (13 basis per atom)
All on a single core, 80 poles (not parallelized) and 2 iterations for chemical potential.
Geometry optimization: BNNT
38
Truncated BNNT. 504 B atoms, 504 N atoms, 16 H atoms
Geometry optimization: BNNT
39
PEXSI in parallel • Distributed memory parallel selected inversion for general
matrix (factorization is based on SuperLU_DIST), preliminary version scalable to 64 ~ 256 procs. More efficient version under progress (ongoing work with Mathias Jacquelin and Chao Yang)
• Pole expansion parallelized. With 40 poles used in practice, PEXSI can scale to 256*40~10,000 procs.
• C++ implementation. Nearly black-box interface, being integrated to SIESTA (ongoing work with Alberto Garcia, Georg Huhs and Chao Yang)
40
PEXSI in parallel
41
C-BN-C layered system, weak scaling for more than 10,000 atoms. All examples use 40*256=10240 procs on hopper.
ScaLAPACK performance: 230 sec for 2532 atoms using 768 processors and does not scale beyond.
Conclusion • Pole Expansion and Selected Inversion (PEXSI) method for
KSDFT at large scale.
• Based on the sparsity of Hamiltonian and overlap matrix. Require local basis set with small number of basis per atom (such as NAO and GTO, not applicable to PW)
• Accurate calculation of density, total energy, free energy and force (no truncation) for insulating and metallic systems.
• 𝑂(𝑁) for quasi-1D system, 𝑂(𝑁1.5) for quasi-2D system, and 𝑂(𝑁2) for 3D bulk systems.
• Black-box: suitable for all codes localized basis set such atomic orbitals.