MLD2P4: a package of parallel algebraic multilevel Preconditioners Pasqua D’Ambra, Institute for High- Performance Computing and Networking (ICAR- CNR), Naples Branch, Italy Bologna, March 2008 t work with ela di Serafino, Second University of Naples atore Filippone, University of Rome “Tor-Vergata”
21
Embed
MLD2P4: a package of parallel algebraic multilevel Preconditioners
Bologna, March 2008. MLD2P4: a package of parallel algebraic multilevel Preconditioners. Pasqua D’Ambra , Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy. joint work with Daniela di Serafino, Second University of Naples - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MLD2P4: a package of parallel
algebraic multilevel Preconditioners
Pasqua D’Ambra, Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy
Bologna, March 2008
joint work with Daniela di Serafino, Second University of NaplesSalvatore Filippone, University of Rome “Tor-Vergata”
Pasqua D'Ambra - Bologna March 2008
2
Overview Motivations
Background Objectives
MLD2P4: Multi-Level Domain Decomposition Parallel Preconditioners Package based on PSBLAS Algorithms and computational kernels Software architecture
Some Results & Applications
Pasqua D'Ambra - Bologna March 2008
3
Background
Large-scale applications have to solve
bAx The linear system matrix is:
Real or complex and squareLarge and SparseDistributed among parallel processorsMatrix dimensions and entries, conditioning, sparsity pattern and coupling among variables vary along simulations
Pasqua D'Ambra - Bologna March 2008
4
Background (cont’d)
What is the best method/preconditioner? No absolute winner, experimentation is needed Reliable preconditioners require access to the complete
matrix Parallel implementation is not trivial
Interfacing with application software is required Custom-made interfaces to parallel legacy codes Different interfaces for different
preconditioners/solvers
Pasqua D'Ambra - Bologna March 2008
5
Objectivesdesigning and implementing a suite of
algebraic preconditioners based on Linear Algebra kernels for parallel sparse matrix computations
Flexibility Different preconditioners by single API
Portability & Efficiency Standard base software for serial kernels and data
communications Simplicity of usage
Modern (OO) Fortran 95 features and auxiliary routines for smooth legacy code integration
Pasqua D'Ambra - Bologna March 2008
6
MLD2P4Multi-Level Domain Decomposition
Parallel Preconditioners Package based on PSBLAS
Diagonal Block-Jacobi Additive Schwarz
with arbitrary overlap Algebraic
multi-level Schwarz
PSBLASParallel Sparse Basic Linear Algebra Subprograms
PSBLAS (Filippone et al., http://www.ce.uniroma2.it/psblas/)
Basic Linear Algebra Operations with Sparse Matrices on MIMD Architectures
Iterative Sparse Linear SolversCG, BiCG, CGS, BiCGSTAB,
RGMRES,…
Ap
pl.
MPI
BLACSBasic Linear Algebra
Communication Subprograms
F95
SBLAS (Duff et al.)
Base
sw
Parallel Sparse Matrix Operations
matrix-matrix products, matrix-vector products, … K
ern
elsParallel Sparse Matrix
Managementallocate, build, update,
…
F77
Pasqua D'Ambra - Bologna March 2008
8
MLD2P4 DesignAlgorithms
Algebraic multi-level Schwarz preconditioners based on smoothed aggregation
good trade-off between parallelism and convergence optimal scalability for symmetric positive-definite matrices algebraic framework allows general-purpose application
Pasqua D'Ambra - Bologna March 2008
9
(1-lev) Schwarz: basic ingredients
patternsparsity symmetric nnA Adjacency graph of A
P. D’Ambra, D. di Serafino, S. Filippone, On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners, Applied Numerical Mathematics, 57, 2007.
2-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec.
Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI) or with UMFPACK (2LDU) on diagonal blocks
3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec.
Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (3LDI) or with UMFPACK (3LDU) on diagonal blocks
60 10rrk
Stopping criterion: or maxitUnit right-hand side and null starting guessRow-block distribution of matrices: # submatrices = # procs
Pasqua D'Ambra - Bologna March 2008
16
thm matrices: number of iterations
npOV=0
RAS 2LDI 2LDU 3LDI 3LDU
1 613 190 - 70 -
2 705 184 - 72 -
4 761 206 - 74 -
8 688 202 44 67 28
16 748 211 61 70 36
32 766 186 81 69 51
64 809 196 113 86 68
thm1n = 600000
nnz = 2996800
64 Intel Itanium dual-processornodes connected by QSNetII
npOV=1
RAS 2LDI 2LDU 3LDI 3LDU
1 613 190 - 70 -
2 923 183 - 76 -
4 684 178 - 63 -
8 937 191 34 62 27
16 688 172 57 68 33
32 714 181 74 65 45
64 720 180 107 77 62
Pasqua D'Ambra - Bologna March 2008
17
thm matrices: execution times and speed-ups (OV=1; best execution times:3LDU)
64 Intel Itanium dual-processornodes connected by QSNetII
Pasqua D'Ambra - Bologna March 2008
18
Application test case
large eddy simulation of incompressible turbulent flows in a bi-periodical
channel main computational kernel
nonsymmetric and singular linear systems arising from elliptic PDE with Neumann b.c.
A. Aprovitola, P. D’Ambra, F. M. Denaro, D. di Serafino, S. Filippone, Application of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Large-Eddy Simulations of Wall-bounded Turbulent Flows: First Experiments, RT-ICAR-NA-2007-02, July 2007.