Hierarchic Matrices P. Sa lek Sparse QM Error Control HMLib Parallelization Hierarchic Data Structures for Sparse Matrix Representation in Large-scale DFT/HF Calculations Pawe l Sa lek Theoretical Chemistry, KTH, Stockholm 16-18 October 2007, Link¨ oping
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
Hierarchic Data Structures for Sparse MatrixRepresentation in Large-scale DFT/HF
Calculations
Pawe l Sa lekTheoretical Chemistry, KTH, Stockholm
Block size determined by the architecture performance,not chemistry.
Low overhead random element access.
Blocked algorithms easy to express:1 Matrix multiplication, also by transposed matrices.2 Use of matrix symmetry.3 INverse CHolesky factorisation (INCH).
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
Block Size Tradeoff
Smaller block size → more opportunity for screening.
Lowest level (block) multiplication expressed in terms ofBLAS calls.
Template expansion will generate (instantiate) code for allthe remaining hierarchy levels.
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
Intel MKL vs HML Benchmark
HML design allows for easy implementation of symmetricmatrix multiplication (sysq: S = αT 2 + βS) as needed byTC2:sysq twice faster than general sparse multiplications.
0 5000 10000 150000
10
20
30
40
50
Matrix size
Tim
e (s
econ
ds)
Matrix multiplication benchmark: water clusters/3−21G
dgemm (MKL)dgemm (HML τ = 10−6)
dsysq (HML τ = 10−6)
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
OpenMP Parallelization
Parallel programs necessary toefficiently use modern multi-corehardware.
OpenMP less invasive and easierto load-balance.
Problems with scaling and. . .compiler support.
Poor compiler support! GNU gccis the only reliable,OpenMP-enabled compilerknown to us so far.
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
Details of OpenMP Parallelization
Pick a level in the hierarchy, run aparallel loop with dynamicscheduling over it.
Approach trivial to implement.
Higher levels: coarse loaddistribution.
Lower levels: thread startupoverhead.
0 00 0 0
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
Exceptions and OpenMP
OpenMP and C++ exceptions do interact.
Threads must catch any exceptions that are generated.The behavior is undefined otherwise.
We do the right thing (in case you ask).
#pragma omp parallel forfor (int i = 0; i < MAX; i++) {try {// Heavy lifting here} catch (...) { /* Handle it nicely. */ }
}
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
Compiler Problems
GNU C++ OpenMP support since 4.1(?). No problemsfound. Sequential performance lower than itscompetitors.
Portland C++ fairly warns that it cannot handle exceptionsand OpenMP at the same time. A honestwarning but. . .
Intel C++ 3 versions tried. All of them had bugs either insequential code or in OpenMP parallelization.8.1 fails to generate correct sequential code;miscompiles OpenMP code as well.9.1 works sequentially; compiler crashes withexecuted with -openmp flag.10.0 fails to generate correct sequential code.Support tickets with Intel are open.
HierarchicMatrices
P. Sa lek
Sparse QM
Error Control
HMLib
Parallelization
OpenMP Speedup
Timings taken on 1.5GHz Itanium2, 4 CPU (luc2, PDC),4 threads.
Glycine-Alanine chain with 1600+ atoms, HF method.GNU C++.