Dense Linear Algebra on Heterogeneous Platforms: State of the Art and Trends Paolo Bientinesi AICES, RWTH Aachen [email protected]ComplexHPC Spring School 2013 Heterogeneous computing - Impact on algorithms June 7th, 2013 Uppsala University, Sweden Paolo Bientinesi (AICES, RWTH Aachen) 1 / 34
79
Embed
Dense Linear Algebra on Heterogeneous Platforms: State of · PDF fileDense Linear Algebra on Heterogeneous Platforms: State of the Art and Trends Paolo Bientinesi AICES, RWTH Aachen
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dense Linear Algebra on Heterogeneous Platforms:State of the Art and Trends
Linear systemsAx = b, AX = B, least squares, . . .
EigenproblemsAx = λx, AX = BXΛ, SVD, . . .
Support routinesfactorizations, reductions, . . .
Paolo Bientinesi (AICES, RWTH Aachen) 4 / 34
Dense Linear Algebra
Matrix equations
AX +XB = C, A = A+A−1
2 , . . .
Linear systemsAx = b, AX = B, least squares, . . .
EigenproblemsAx = λx, AX = BXΛ, SVD, . . .
Support routinesfactorizations, reductions, . . .
Paolo Bientinesi (AICES, RWTH Aachen) 4 / 34
Dense Linear Algebra
Matrix equations
AX +XB = C, A = A+A−1
2 , . . .
Linear systemsAx = b, AX = B, least squares, . . .
EigenproblemsAx = λx, AX = BXΛ, SVD, . . .
Support routinesfactorizations, reductions, . . .
Paolo Bientinesi (AICES, RWTH Aachen) 4 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLAS
BLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLAS
BLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLAS
BLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLAS
BLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLASBLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLASBLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Organization in layers
other librariesScaLAPACK, Elemental, PETSc, . . .
LAPACKLU = A, LLT = A, QR = A, QTTQ = A, . . .
BLASBLAS-3: C :=C +AB A,B,C,L ∈ Rn×n
C :=L−1B
BLAS-2: y := y +Ax A,L ∈ Rn×n, x, y ∈ Rn
y :=L−1x
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ Rdot :=α+ xT y
Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34
Example: AX = B (full A)
AX = BLinear System
LU = ALU
Factorization
LX = BTriangularSystem
C = AB + CGemm
LX = BTriangularSystem
C = AB + CGemm
C = AB + CGemm
Paolo Bientinesi (AICES, RWTH Aachen) 6 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Why BLAS-3? Why GEMM?
BLAS #FLOPS Mem. refs. Ratio
Level 3 2n3 4n2 n/2
Level 2 2n2 n2 2
Level 1 2n 3n 2/3
BLAS-3: C :=C +AB A,B,C,∈ Rn×n
BLAS-2: y := y +Ax A ∈ Rn×n, x, y ∈ Rn
BLAS-1: y := y + αx x, y ∈ Rn, α ∈ R
Morale BLAS-3: The larger the problem the better, as long as it fits in memory.GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK.
Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34
Part 1: Blocked algorithms
Simple example: Cholesky factorization
Input: Matrix A, symmetric and positive definite.
Goal: Determine L (lower triangular matrix) such that LLT = A