Top Banner
Outline Hierarchical Memory Systems Necessity of implementing efficient y = Ax Sparse matrix Column ordering algorithms Experiments Conclusion and Future Work A Note on the Performance of Sparse Matrix-vector Multiplication with Column Reordering Sardar Anisul Haque University of Western Ontario, Ontario, Canada Shahadat Hossain University of Lethbridge, Alberta, Canada June 25, 2009
39

A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

Apr 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

A Note on the Performance of SparseMatrix-vector Multiplication with Column

Reordering

Sardar Anisul HaqueUniversity of Western Ontario, Ontario, Canada

Shahadat HossainUniversity of Lethbridge, Alberta, Canada

June 25, 2009

Page 2: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

1 Hierarchical Memory Systems

2 Necessity of implementing efficient y = Ax

3 Sparse matrix

4 Column ordering algorithmsColumn Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

5 ExperimentsExperimental SetupExperimental result

6 Conclusion and Future Work

Page 3: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Principle of Locality

The principle of locality states that most programs do not accesstheir code and data uniformly. There are mainly two types oflocality:

1 Spatial locality: It refers to the observation that mostprograms tend to access data sequentially.

2 Temporal locality: It refers to the observation that mostprograms tend to access data that was accessed previously.

Page 4: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Performance gap between CPU speed and Main memoryspeed

CPU speed improvement: 35% to 55% (in a year).

Main memory latency improvement: 7% (in a year).

Page 5: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Hierarchical Memory Systems

Page 6: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Data Locality in sparse matrix-vector multiplication

Page 7: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Computing y = Ax on modern superscalar architecture oftenexhibits

Poor data locality.

Large volume of load operations from memory compared tothe floating point operations.

Indirect access to the data.

Loop overhead.

Page 8: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Improving Data Locality of x in computing y = Ax

Preprocess A by permuting the rows or columns of A in such a waythat

the number of nonzero block is reduced to improve the spatiallocality of x .

the nonzeros of each column are consecutive to improve thetemporal locality of x .

But this preprocessing phase can be computationally expensive.

Page 9: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Conjugate Gradient Algorithm

An iterative method to obtain numerical solution of largesystem of linear equations Ax = b.

In this method, A remains unchanged and we need to multiplyit with a vector.

The method may require a good number of iterations beforeconvergence.

Page 10: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Storage schemes for sparse matrices

The names of some well known storage schemes for sparsematrices are given below.

Compressed Row Storage (CRS) scheme.

Fixed-size Block Storage (FSB) Scheme.

Block Compressed Row Storage (BCRS) scheme.

Page 11: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

FSB Scheme

We define a nonzero block as a sequence of k ≥ 1 contiguousnonzero elements in a row.We will denote this storage scheme by FSBl, where the lastcharacter l represents the length of the nonzero block. Forexample, FSB2 represents fixed-size block storage scheme of length2. In FSBl scheme the given sparse matrix A is expressed as a sumof two matrices A1 and A2; A1 stores all the nonzero block of size land A2 stores the rest (in CRS scheme).

Page 12: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

A = A1 + A2 considering l = 2

Page 13: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Page 14: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Column ordering problem

We define column ordering problem as follows.Given an m × n sparse matrix A, find a permutation of columnsthat minimizes β, where β is the total number of nonzero blocks inA.

Page 15: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Weight of intersection

Columns j and l of matrix A are said to intersect if there is a row isuch that aij 6= 0 and ail 6= 0. The weight of intersection of anytwo columns j and l , denoted by wjl , is the number of rows inwhich they intersect.

Page 16: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Column ordering algorithms

The names of some column ordering algorithms are given below.

Column intersection ordering.

Similarity ordering.

Local Improvement ordering.

Binary reflected gray code ordering.

Page 17: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Column intersection ordering algorithm

Page 18: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Similarity ordering algorithm

In this column ordering algorithm, the weight of intersectionbetween two columns i and j is the number of rows in which bothof them have either zero or nonzero.

Page 19: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Similarity ordering algorithm (contd..)

Page 20: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Local Improvement ordering algorithm

Page 21: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Local Improvement ordering algorithm (contd..)

Page 22: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Binary reflected gray code ordering

The main scientific contribution of this thesis is as follows:We propose column ordering algorithm based on binary reflectedgray code for sparse matrices. To the best of our knowledge we arethe first to consider gray codes for column ordering in sparsematrix-vector multiplication. We call it binary reflected gray codeordering or BRGC algorithm.

Page 23: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Binary reflected gray code

Gp = [0Gp−10 , . . . , 0Gp−1

2p−1−1, 1Gp−1

2p−1−1, . . . , 1Gp−1

0 ]

G 3 = [000, 001, 011, 010, 110, 111, 101, 100]

Page 24: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Binary reflected gray code ordering algorithm (BRGC)(contd..)

Page 25: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Example: cavity26

Page 26: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Example: bcsstk35

Page 27: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Data locality and column ordering algorithms

Let π is the column permutation found by column intersectionordering or local improvement ordering or similarity orderingalgorithm. Here column π[i + 1] is found by looking at thenonzeros of column π[i ]. But the data locality of A should beevaluated over more than pairs of columns.

Page 28: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Data locality and column ordering algorithms (contd..)

Page 29: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Data locality and column ordering algorithms (contd..)

Page 30: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Column Intersection orderingSimilarity orderingLocal Improvement orderingBinary reflected gray code ordering

Features of BRGC ordering algorithm

It improves both temporal and spatial locality of x incomputing y = Ax .

The given column ordering of input matrix has no effect on it.

It does not change the sparsity structure of a banded matrixmuch.

Page 31: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Experimental SetupExperimental result

Table: Computing platforms

Name Compaq ibm sun

Processor name AMD Athlon(tm)64 Intel pentium4 Ultra sparc-IIe

3500+

Processor Speed 2.2 GHz 2.8 GHz 550 MHz

RAM 512 MB 1 GB 384 MB

OS Linux Linux Sun Solaries

L2 Cache 512 KB 512 KB 256 KB

L2 Cache type 16-way set 8-way set 8-way set

associative associative associative and

direct mapped

L2 Cache line size 64 bytes 64 bytes 64 bytes

Page 32: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Experimental SetupExperimental result

Input matrices

26 matrices from linear programming problem, structural problem,optimization problem, economic problem, circuit simulationproblem etc.Source: Tim Davis, University of Florida Sparse Matrix Collection,url: http: www.cise.ufl.edu/research/sparse. Access Date: April10, 2008.

Page 33: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Experimental SetupExperimental result

Performance measure

We use CPU time ( for example tA,SpMxV (compaq,crs,Obrgc ) ) asperformance measure.

Performance ratio

We define performance ratio asrA,SpMxV (pl ,ss,ra) =

tA,SpMxV (pl,ss,ra)

min{tA,SpMxV (pl,ss,ANY )}

Page 34: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Experimental SetupExperimental result

Evaluation method

Finally, the performance of a SpMxV (pl , ss, ra) can be measuredby the following cumulative distribution function:ρSpMxV (pl ,ss,ra)(τ) = 1

|Γ|size{A ∈ Γ : rA,SpMxV (pl ,ss,ra) ≤ τ}, where,Γ is the set of input matrices.

Page 35: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Experimental SetupExperimental result

Page 36: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Experimental SetupExperimental result

Page 37: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Conclusion

If the distribution of nonzeros of a sparse matrix is very muchsparse or the number of nonzero blocks is very high thenpermuting the rows or columns of that sparse matrix isnecessary.

Fixed-size block storage scheme performs better than CRSand BCRS schemes.

We found BRGC ordering is competitive with other columnordering algorithms during sparse matrix-vector multiplication.

Page 38: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Future direction

Applicability of BRGC ordering to other sparse matrixproblems requires further investigation.

Use of register blocking and cache blocking method in sparsematrix-vector multiplication in addition to BRGC ordering.

Applying BRGC ordering in fixed size blocking storageschemes (both rows and columns) of sparse matrices.

Page 39: A Note on the Performance of Sparse Matrix-vector ...moreno/HPCA-ACA-2009/Sardar-ACA-2009.pdf · Hierarchical Memory Systems Necessity of implementing e cient y = Ax Sparse matrix

OutlineHierarchical Memory Systems

Necessity of implementing efficient y = AxSparse matrix

Column ordering algorithmsExperiments

Conclusion and Future Work

Thank you