Top Banner
April 19, 2010 HIPS 2010 1 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan
35

April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

Dec 18, 2015

Download

Documents

Elfreda Burns
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 1

Transforming Linear Algebra Libraries: From Abstraction to

Parallelism

Ernie Chan

Page 2: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 2

Motivation

Statically

Page 3: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 3

Outline

Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic

GraphPerformanceConclusion

Page 4: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 4

Inversion of a Triangular Matrix

Formal Linear Algebra Methods Environment (FLAME) High-level abstractions for expressing linear

algebra algorithms

Triangular Inversion (Trinv)

R := U-1

Page 5: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 5

Inversion of a Triangular Matrix

Page 6: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 6

Inversion of a Triangular Matrix

LAPACK-style Implementation

DO J = 1, N, NB

JB = MIN( NB, N-J+1 )

CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’,

$ JB, N-J-JB+1, -ONE, A( J, J ), LDA,

$ A( J, J+JB ), LDA )

CALL DGEMM( ‘No transpose’, ‘No transpose’,

$ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA,

$ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA )

CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’,

$ J-1, JB, ONE, A( J, J ), LDA,

$ A( 1, J ), LDA )

CALL DTRTI2( ‘Upper’, ‘Non-unit’,

$ JB, A( J, J ), LDA, INFO )

ENDDO

Page 7: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 7

Inversion of a Triangular Matrix

FLASH Matrix of matrices

Page 8: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 8

Inversion of a Triangular Matrix

FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL );

while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL );}

Page 9: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 9

Inversion of a Triangular Matrix

Extensible Markup Language (XML)<?xml version="1.0" encoding="ISO-8859-1"?>

<Function name="FLA_Trinv" type="blk" variant="3">

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Declaration>

<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

</Declaration>

<Loop>

<Guard>A</Guard>

<Update>

<Statement name="FLA_Trsm">

<Option type="side">FLA_LEFT</Option>

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Option type="trans">FLA_NO_TRANSPOSE</Option>

<Option type="diag">FLA_NONUNIT_DIAG</Option>

<Parameter>FLA_MINUS_ONE</Parameter>

<Parameter partition="11">A<Parameter>

<Parameter partition="12">A<Parameter>

<Statement name="FLA_Gemm">

<Option type="trans">FLA_NO_TRANSPOSE</Option>

<Option type="trans">FLA_NO_TRANSPOSE</Option>

<Parameter>FLA_ONE<Parameter>

Page 10: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 10

Inversion of a Triangular Matrix

Extensible Markup Language (XML) Cont. <Parameter partition="01">A</Parameter>

<Parameter partition="12">A</Parameter>

<Parameter>FLA_ONE</Parameter>

<Parameter partition="02">A</Parameter>

</Statement>

<Statement name="FLA_Trsm">

<Option type="side">FLA_RIGHT</Option>

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Option type="trans">FLA_NO_TRANSPOSE</Option>

<Option type="diag">FLA_NONUNIT_DIAG</Option>

<Parameter>FLA_ONE</Parameter>

<Parameter partition="11">A</Parameter>

<Parameter partition="01">A</Parameter>

</Statement>

<Statement name="FLA_Trinv">

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Option type="diag">FLA_NONUNIT_DIAG</Option>

<Parameter partition="11">A</Parameter>

</Statement>

</Update>

</Loop>

</Function>

Page 11: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 11

Outline

Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic

GraphPerformanceConclusion

Page 12: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 12

Requisite Semantic Information

Partitioning Scheme<?xml version="1.0" encoding="ISO-8859-1"?>

<Function name="FLA_Trinv" type="blk" variant="3">

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Declaration>

<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

</Declaration>

<Loop>

<Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->

<Update>

<Statement name="FLA_Trsm“>

<!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->

</Statement>

<Statement name="FLA_Gemm“>

<!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->

</Statement>

<Statement name="FLA_Trsm“>

<!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->

</Statement>

<Statement name="FLA_Trinv“>

<!–- ‘Upper’, ‘Non-unit’, A11 -->

</Statement>

</Update>

</Loop>

</Function>

Page 13: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 13

Requisite Semantic Information

Problem Size*<?xml version="1.0" encoding="ISO-8859-1"?>

<Function name="FLA_Trinv" type="blk" variant="3">

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Declaration>

<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

</Declaration>

<Loop>

<Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->

<Update>

<Statement name="FLA_Trsm“>

<!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->

</Statement>

<Statement name="FLA_Gemm“>

<!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->

</Statement>

<Statement name="FLA_Trsm“>

<!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->

</Statement>

<Statement name="FLA_Trinv“>

<!–- ‘Upper’, ‘Non-unit’, A11 -->

</Statement>

</Update>

</Loop>

</Function>

Page 14: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 14

Requisite Semantic Information

Updates<?xml version="1.0" encoding="ISO-8859-1"?>

<Function name="FLA_Trinv" type="blk" variant="3">

<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>

<Declaration>

<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>

</Declaration>

<Loop>

<Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->

<Update>

<Statement name="FLA_Trsm“>

<!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->

</Statement>

<Statement name="FLA_Gemm“>

<!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->

</Statement>

<Statement name="FLA_Trsm“>

<!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->

</Statement>

<Statement name="FLA_Trinv“>

<!–- ‘Upper’, ‘Non-unit’, A11 -->

</Statement>

</Update>

</Loop>

</Function>

Page 15: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 15

Requisite Semantic Information

Input and Output Parameters<?xml version="1.0" encoding="ISO-8859-1"?>

<Function name="FLA_Trsm">

<Declaration>

<Operand type=“scalar“ inout=“in">alpha</Operand>

<Operand type="matrix“ inout=“in">A</Operand>

<Operand type="matrix“ inout=“both“>B</Operand>

</Declaration>

</Function>

<Function name="FLA_Gemm">

<Declaration>

<Operand type=“scalar“ inout=“in">alpha</Operand>

<Operand type="matrix“ inout=“in">A</Operand>

<Operand type="matrix“ inout=“in">B</Operand>

<Operand type=“scalar“ inout=“in">beta</Operand>

<Operand type="matrix“ inout="both">C</Operand>

</Declaration>

</Function>

<Function name="FLA_Trinv">

<Declaration>

<Operand type="matrix“ inout="both">A</Operand>

</Declaration>

</Function>

Page 16: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 16

Outline

Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic

GraphPerformanceConclusion

Page 17: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 17

Static Generation of a DAG

Code Generation Convert XML representation to FLASH code

generation intermediary Annotated with input and output information

Create directed acyclic graph (DAG) by statically unrolling the loop

Operations on submatrix blocks (tasks) are vertices Data dependencies between tasks are edges

Page 18: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 18

Static Generation of a DAG

Data Dependencies Flow (read-after-write)

S1: A = B + C;

S2: D = A + E; Anti (write-after-read)

S3: F = A + G;

S4: A = H + I; Output (write-after-write)

S5: A = J + K;

S6: A = L + M;

Page 19: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 19

Static Generation of a DAG

Page 20: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 20

Static Generation of a DAG

Problem Size Problem size cannot be determined a priori Fix the block size or loop unrolling factor

Balance between instruction footprint and data granularity of tasks

Example Trinv on 3x3 matrix of blocks

Page 21: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 21

Static Generation of a DAG

Trinv Iteration 1

Trinv2 Trsm0 Trsm1

Page 22: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 22

Static Generation of a DAG

Trinv Iteration 2

Trsm5 Gemm4

Trinv6 Trsm3

Page 23: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 23

Static Generation of a DAG

Trinv Iteration 3

Trsm7

Trsm8

Trinv9

Page 24: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 24

Static Generation of a DAG

Trsm1

Trinv2

Trsm0

Gemm4

Trsm5

Trinv9

Trsm3

Trsm7 Trsm8

Trinv6

Page 25: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 25

Outline

Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic

GraphPerformanceConclusion

Page 26: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 26

Performance

LabVIEW Graphical, data flow programming language (G)

Anti-dependencies cannot exist in G• Copies are made when wire is split

Page 27: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 27

Performance

Page 28: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 28

Performance

Target Architecture 16-core AMD processor

4 socket quad-core Opteron 1.9 GHz 4 GB of RAM per socket

LabVIEW 8.6 Windows XP

Basic Linear Algebra Subprograms (BLAS) MKL 7.2

Page 29: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 29

Performance

Page 30: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 30

Performance

Results Parallelism

Exploit parallelism inherent within DAG Hierarchical matrix storage

Spatial locality Overhead

Copy matrix from flat row-major storage to hierarchical matrix and back

Page 31: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 31

Performance

Page 32: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 32

Outline

Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic

GraphPerformanceConclusion

Page 33: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 33

Conclusion

Instantiate linear algebra algorithm using a code generation intermediary

Statically produce a directed acyclic graph by fixing block size or loop unrolling factor

XML → FLASH → DAG

Page 34: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 34

Acknowledgments

Jim Nagle, Robert van de Geijn We thank the other members of FLAME team

for their support

Funding National Instruments NSF Grants

CCF—0540926 CCF—0702714

Page 35: April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

April 19, 2010 HIPS 2010 35

Conclusion

More Information

http://www.cs.utexas.edu/~flame

Questions?

[email protected]