Jorge Castellanos - [email protected]. CLCAR 2009, Mérida, Venezuela, Septiembre 24/2009 1 A Cholesky Out-of-Core factorization Jorge Castellanos Germán Larrazábal Centro Multidisciplinario de Visualización y Cómputo Científico (CEMVICC) Facultad Experimental de Ciencias y Tecnología (FACYT) Universidad de Carabobo Valencia - Venezuela
27
Embed
Jorge Castellanos - [email protected]. CLCAR 2009, Mérida, Venezuela, Septiembre 24/[email protected] 1 A Cholesky Out-of-Core factorization Jorge Castellanos.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Out-of-core - motivationThe computational kernel that consumes the more of CPU time in a engineer numerical simulation package is the linear solver
The linear solver is mainly based on a classic solving method, i.e., direct or iterative method
The problem matrix associated to the equations system is sparse and have a big size
Iterative methods are highly parallel (based on matrix-vector product) but they are not general and they need preconditioning
Direct methods (Cholesky, LDLT or LU) are more general (inputs: matrix A and rhs vector b), but they main advantage is the amount of memory needed when the problem size A increases
Algorithms which are designed to have efficient performance when their data structure is stored in disk are called “out-of-core algorithms” (Toledo, 1999).
Out-of core applications handle very large data sets to store in conventional internal memory
There are a group of applications (parallel out-of-core) with data sets whose address space exceeds the capacity of the virtual memory
The out-of-core concept permits users to solve efficiently large problems using inexpensive computers
Storage in disk is cheaper than storage in main memory (DRAM) and its actual cost rate is 1 to 100 (dec, 2007)
An out-of-core algorithm running on a machine with a limited memory can give a better cost/performance ratio than in-core algorithms running on a machine with enough memory
In 1984, J. Reid creates the TREESOLV, written in fortran to solve big linear equation sytems based on the multifrontal algorithm
In 1999, E. Rothberg and R. Shreiber, review the implementation of 3 out-of-core methods for the Cholesky factorization. Each is based on a partitioning of the matrix into panels
En 2009, J. Reid and J. Scott present a Cholesky Out-of-core solver written in fortran 95, whose operation is based on a virtual memory package that provides the facilities to read/write hard disk files
To support the nesting of ForOOCMatrix_Row macros, the cache used in earlier versions was modified to have multiple ways
To mitigate the effect of misses increasing caused by the nested macro, a Reference Prediction Table (RPT) based on the proposal of Chen & Baer (1994) was incorporated
To ensure that the input/output cache data worked in parallel with computation to hide the latency caused by misses and pre-fetches an Outstanding Request List based on the scheme proposed by D. Kroft (1987) was implemented
The out-of-core kernel supports efficiently the Cholesky sparse matrix factorization because it shows big saves in memory use with overheads less than the 148% in CPU time
For a better performance of the out-of-core support, we believe it is important to have a high efficiency prefetch algorithm, as in this work, which reduces the adverse effect of cache misses
The prefetch algorithm must operate in parallel with the computation to take advantage of multicore technology present in most of the modern personal computers
Future workTo incorporate improvements in the parallelization of the prefetch algorithm to reduce the penalty in execution time
To study the effect of the block size of the cache and the total size of the cache (number of blocks) in order to define heuristics for automatic selection of these parameters
To extend the use of the out-of-core layer to other UCSparseLib library functions, including direct methods: LU and LDLT
To implement the out-of-core support for other methods of solving sparse linear systems such as iterative methods and algebraic multigrid
ReferencesJ. Castellanos and G. Larrazábal, Implementación out-of-core para producto matriz-vector y transpuesta de matrices dispersas, Conferencia Latinoamericana de Computación de Alto Rendimiento, Santa Marta, Colombia. Pags. 250--256. ISBN: 978--958--708--299--9, 2007.
J. Castellanos and G. Larrazábal, Soporte out-of-core para operaciones básicas con matrices dispersas, En Desarrollo y avances en métodos numéricos para ingeniería y ciencias aplicadas, Sociedad Venezolana de Métodos Numéricos en Ingeniería, Caracas, Venezuela. ISBN: 978--980--7161--00--8, 2008.
T.F. Chen and J.L. Baer, A performance study of software and hardware data prefetching schemes, In International Symposium on Computer architecture, Proceedings of the 21st annual international symposium on Computer Architecture, Chicago Ill, USA, Pages 223-232, 1994.Z.
Bai, J. Demmel, J. Dongarra, A. Ruhe and H. van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, 2000.
ReferencesN. I. M. Gould, J. A.~Scott and Y. Hu, A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations, ACM Trans. Math. Softw. 33,2, Article 10, 2007.
G. Karypis and V. Kumar, A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, Vol. 20, No. 1, pp. 359—392, 1999.
D. Kroft, Lockup-free instruction fetch/prefetch cache organization, In International Symposium on Computer architecture, Proceedings of the 8th annual symposium on Computer Architecture, Minneapolis, Minnesota USA, Pages 81-87, 1981.
G. Larrazábal, UCSparseLib: Una biblioteca numérica para resolver sistemas lineales dispersos, Simulación Numérica y Modelado Computacional, SVMNI, TC19--TC25, ISBN:980-6745-00-0, 2004.
ReferencesG. Larrazábal, Técnicas algebráicas de precondicionamiento para la resolución de sistemas lineales, Departamento de Arquitectura de Computadores (DAC), Universidad Politécnica de Cataluña, Barcelona, Spain. Tesis Doctoral ISBN: 84--688--1572--1, 2002.
D.A. Patterson and J.L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann, Third Edition, 2005.
J. Reid, TREESOLV, a Fortran package for solving large sets of linear finite element equations, Report CSS 155. AERE Harwell, Harwell, U.K., 1984.
J. Reid and J. Scott, An out-of-core sparse Cholesky solver, ACM Transactions on Mathematical Software (TOMS), Volume 36, Issue 2, Article No. 9, 2009.
E. Rothberg and R. Schreiber, Efficient Methods for Out-of-Core Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, Vol 21, Issue 1, pages: 129 - 144, 1999.
S. Toledo, A survey of out-of-core algorithms in numerical linear algebra, In External Memory Algorithms and Visualization, J. Abello and J. S. Vitter, Eds., DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1999.