Top Banner
Lecture 2: From Algorithms to Code Robert van de Geijn The University of Texas at Austin
60

Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The.....

Feb 07, 2018

Download

Documents

hoangcong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Lecture 2: From Algorithms to Code

Robert van de GeijnThe University of Texas at Austin

Page 2: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Overview

• Motivating Example: Cholesky FactorizationPart I: The Traditional Way• Basic Linear Algebra Subprograms

– Level 1 BLAS– Level 2 BLAS– Level 3 BLAS

Part II: The FLAME Way• The FLAME@lab API• The FLAMEC API (Application Programming Interface)• Elemental (Targeting distributed memory architectures)

Page 3: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

A Motivating Example: CholeskyFactorization

Page 4: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

j

i

j

k

k

Page 5: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Performance

Page 6: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Part 1: The Traditional Way

Page 7: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Overview

• Motivating Example: Cholesky FactorizationPart I: The Traditional Way• Basic Linear Algebra Subprograms

– Level 1 BLAS– Level 2 BLAS– Level 3 BLAS

Part II: The FLAME Way• The FLAME@lab API• The FLAMEC API (Application Programming Interface)• Elemental (Targeting distributed memory architectures)

Page 8: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Basic Linear Algebra Subprograms

• Interface to commonly used fundamental linear algebra functionality

– Level‐1 BLAS: vector‐vector operations

– Level‐2 BLAS: matrix‐vector operations

– Level‐3 BLAS: matrix‐matrix operations

Page 9: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Level‐1 BLAS

• C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Soft., 5(3):308–323, Sept. 1979.

• Meant to allow portable high‐performance to be achieved on the vector supercomputers of the 1970s.

• Used to code LINPACK (predecessor of LAPACK, predecessor of libflame)

Page 10: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

BLAS1: Examples

Page 11: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Cholesky Factorization

Page 12: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

j n‐j

n‐j

j

k

k

n‐k+1

Page 13: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

The Problem with Vector‐Vector Operations

• Perform O(n) computation with O(n) data• Memory is much slower than floating point arithmetic

• If vectors are in main memory, “feeding the beast” becomes a problem

• When used to compute an operation like Cholesky factorization, the vectors are in main memory

Page 14: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory Hierarchy

registers

L1

L2

main memory

``expensive’’

``cheap’’

``fast’’

``slow’’

Page 15: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory Hierarchy

registers

L1

L2

main memory

Page 16: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Performance

Page 17: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Level‐2 BLAS

• Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of FORTRAN basic linear algebra subprograms. ACM Trans. Math. Soft., 14(1):1–17, March 1988.

• Improve data reuse as memory became slower

Page 18: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

BLAS2: Examples

Page 19: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Cholesky Factorization

Page 20: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

j n‐j

n‐j

j

Page 21: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

The Problem with Matrix‐Vector Operations

• Perform O(n2) computation with O(n2) data• Memory is much slower than floating point arithmetic

• If matrix is in main memory, “feeding the beast” becomes a problem

• When used to compute an operation like Cholesky factorization, the matrix is in main memory

Page 22: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory Hierarchy

registers

L1

L2

main memory

Page 23: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Performance

Page 24: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Level‐3 BLAS

• Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Iain Duff.A set of level 3 basic linear algebra subprograms. ACM Trans. Math.  Soft., 16(1):1–17, March 1990.

• Further improve data reuse as cache memories became popular

Page 25: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

BLAS3: Examples

Page 26: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

The Benefits of Matrix‐Matrix Operations

• Perform O(n3) computation with O(n2) data• Overcomes the memory bottleneck

Page 27: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Optimizing Matrix‐Matrix Multiplication

Page 28: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory

Page 29: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory

Page 30: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory

Page 31: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory

Page 32: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory 

CPU CachesAT

Page 33: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Memory

L2‐Cache

L1‐Cache

Registers

ATPack A

Page 34: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Performance of GotoBLAS

Page 35: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Casting Algorithms in Terms of BLAS3

• Algorithms that cast most computation in terms of level‐2 BLAS are often called unblocked algorithms

• Algorithms that cast most computation in terms of level‐3 BLAS are often called blockedalgorithms

• We need to derive an algorithm that casts most computation in terms of level‐3 BLAS

Page 36: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 37: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 38: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 39: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 40: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 41: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 42: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 43: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

do j=1, n, nbjb = min( nb, n-j+1 )call chol( jb, A( j, j ), lda )call dtrsm(

‘Right’, ‘Lower triangular’, ‘Transpose’, ‘Nonunit diag’,N-J-JB+1, JB, 1.0d00, A( j, j ), lda, A( j+jb, j ), lda )

call dsyrk( ‘Lower triangular’, ‘No transpose’, N-J-JB+1, JB,-1.0d00, A( j+jb, j ), lda, 1.0d00, A( j+jb, j+jb ), lda )

enddo

Page 44: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

n‐j‐jb+

1

j n‐j‐jb+1

j

do j=1, n, nbjb = min( nb, n-j+1 )call chol( jb, A( j, j ), lda )call dtrsm(

‘Right’, ‘Lower triangular’, ‘Transpose’, ‘Nonunit diag’,N-J-JB+1, JB, 1.0d00, A( j, j ), lda, A( j+jb, j ), lda )

call dsyrk( ‘Lower triangular’, ‘No transpose’, N-J-JB+1, JB,-1.0d00, A( j+jb, j ), lda, 1.0d00, A( j+jb, j+jb ), lda )

enddo

jb

Page 45: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Performance

Page 46: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Disadvantages of using BLAS3

• Indexing gets confusing

Page 47: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Part 3: The FLAME Way

Page 48: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Overview

• Motivating Example: Cholesky FactorizationPart I: The Traditional Way• Basic Linear Algebra Subprograms

– Level 1 BLAS– Level 2 BLAS– Level 3 BLAS

Part II: The FLAME Way• The FLAME@lab API• The FLAMEC API (Application Programming Interface)• Elemental (Targeting distributed memory architectures)

Page 49: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

FLAME APIs

• Paolo Bientinesi, Enrique S. Quintana‐Orti, Robert A. van de Geijn.  Representing linear algebra algorithms in code: the FLAME application program interfaces.  ACM Trans. on Mathem. Softw., 2005

• APIs have been defined for Mscript (Matlab, octave, Mathscript), C, Labview, C++ + MPI, …

• Used to implement – libflame (modern alternative for LAPACK)– Elemental (modern alternative for ScaLAPACK)

Page 50: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Cholesky factorization in FLAME@labfunction [ A_out ] = Chol_blk_var3( A, nb_alg )

[ ATL, ATR, ...ABL, ABR ] = FLA_Part_2x2( A, ...

0, 0, 'FLA_TL' );while ( size( ATL, 1 ) < size( A, 1 ) )

b = min( size( ABR, 1 ), nb_alg );[ A00, A01, A02, ...

A10, A11, A12, ...A20, A21, A22 ] = ...

FLA_Repart_2x2_to_3x3( ATL, ATR, ...ABL, ABR, b, b, 'FLA_BR' );

%----------------------------------------------------------%

A11 = Chol_unb_var1( A11 );A21 = A21 * inv( tril( A11 ) )';A22 = A22 - tril( A21 * A21' );

%----------------------------------------------------------%[ ATL, ATR, ...

ABL, ABR ] = ...FLA_Cont_with_3x3_to_2x2( A00, A01, A02, ...

A10, A11, A12, ...A20, A21, A22, 'FLA_TL' );

endA_out = [ ATL, ATR

ABL, ABR ];return

Page 51: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

FLAME@lab Demo

(Turn on QuickTime recording!)

Page 52: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Cholesky factorization in FLAME@labint Chol_blk_var3( FLA_Obj A, int nb_alg ){

< declarations >FLA_Part_2x2( A, &ATL, &ATR,

&ABL, &ABR, 0, 0, FLA_TL );while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ){

b = min( FLA_Obj_length( ABR ), nb_alg );FLA_Repart_2x2_to_3x3

( ATL, /**/ ATR, &A00, /**/ &A01, &A02,/* ************* */ /* ******************** */

&A10, /**/ &A11, &A12,ABL, /**/ ABR, &A20, /**/ &A21, &A22,

b, b, FLA_BR );/*--------------------------------------------------*/Chol_unb_var3( A11 );FLA_Trsm( FLA_RIGHT, FLA_LOWER_TRIANGULAR,

FLA_TRANSPOSE, FLA_NONUNIT_DIAG,FLA_ONE, A11, A21 );

FLA_Syrk( FLA_LOWER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_MINUS_ONE, A21, FLA_ONE, A22 );

/*--------------------------------------------------*/FLA_Cont_with_3x3_to_2x2

( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12,

/* ************** */ /* ****************** */&ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL );

}}

Page 53: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Overview

• Motivating Example: Cholesky FactorizationPart I: The Traditional Way• Basic Linear Algebra Subprograms

– Level 1 BLAS– Level 2 BLAS– Level 3 BLAS

Part II: The FLAME Way• The FLAME@lab API• The FLAMEC API (Application Programming Interface)• Elemental (Targeting distributed memory architectures)

Page 54: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

FLAMEC Demo

Page 55: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 56: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab
Page 57: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Overview

• Motivating Example: Cholesky FactorizationPart I: The Traditional Way• Basic Linear Algebra Subprograms

– Level 1 BLAS– Level 2 BLAS– Level 3 BLAS

Part II: The FLAME Way• The FLAME@lab API• The FLAMEC API (Application Programming Interface)• Elemental (Targeting distributed memory architectures)

Page 58: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

PartitionDownDiagonal( A, ATL, ATR,

ABL, ABR, 0 );while( ABR.Height() > 0 ){

RepartitionDownDiagonal( ATL, /**/ ATR, A00, /**/ A01, A02,/*************/ /****************/

/**/ A10, /**/ A11, A12,ABL, /**/ ABR, A20, /**/ A21, A22 );

A21_VC_Star.AlignWith( A22 );A21_MC_Star.AlignWith( A22 );A21_MR_Star.AlignWith( A22 );//----------------------------------------------------//A11_Star_Star = A11;advanced::internal::LocalChol( Lower, A11_Star_Star );A11 = A11_Star_Star;

A21_VC_Star = A21;basic::internal::LocalTrsm( Right, Lower, ConjugateTranspose, NonUnit, (F)1, A11_Star_Star, A21_VC_Star );

A21_MC_Star = A21_VC_Star;A21_MR_Star = A21_VC_Star;

// (A21^T[* ,MC])^T A21^H[* ,MR]// = A21[MC,* ] A21^H[* ,MR] = (A21 A21^H)[MC,MR]basic::internal::LocalTriangularRankK( Lower, ConjugateTranspose, (F)-1, A21_MC_Star, A21_MR_Star, (F)1, A22 );

A21 = A21_MC_Star;//----------------------------------------------------//A21_VC_Star.FreeAlignments();A21_MC_Star.FreeAlignments();A21_MR_Star.FreeAlignments();SlidePartitionDownDiagonal( ATL, /**/ ATR, A00, A01, /**/ A02,

/**/ A10, A11, /**/ A12,/*************/ /*****************/ABL, /**/ ABR, A20, A21, /**/ A22 );

} 58

From FLAME algorithm to Elemental implementation

Jack Poulson et al. "Elemental: A New Framework for Distributed Memory Dense Matrix Computations.” TOMS. Accepted (pending final approval).

Page 59: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Summary

• BLAS are widely used in scientific computing• Casting computation in terms of matrix‐matrix multiplication facilitates high performance

• Abstraction is a wonderful thing

Page 60: Lecture From Algorithms to Code - Scientific · PDF file• Basic Linear Algebra Subprograms – Level 1 BLAS – Level 2 BLAS – Level 3 BLAS Part II: The FLAME Way • The FLAME@lab

Welcome to the Wonderful World of FLAME

Willkommen in der wunderbarenWelt der FLAME