Top Banner
Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan [email protected] http://cse.sc.edu/~yanyh
52

Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Aug 21, 2018

Download

Documents

truongnhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Lecture9:DenseMatricesandDecomposition

1

CSCE569ParallelComputing

DepartmentofComputerScienceandEngineeringYonghong Yan

[email protected]://cse.sc.edu/~yanyh

Page 2: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Review:ParallelAlgorithmDesignandDecomposition

• IntroductiontoParallelAlgorithms– TasksandDecomposition– ProcessesandMapping

• DecompositionTechniques– RecursiveDecomposition– DataDecomposition– ExploratoryDecomposition– HybridDecomposition

• CharacteristicsofTasksandInteractions– TaskGeneration,Granularity,andContext– CharacteristicsofTaskInteractions.

2

Page 3: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Decomposition,Tasks,andDependencyGraphs

• Decomposeworkintotasksthatcanbeexecutedconcurrently• Decompositioncouldbeinmanydifferentways.• Tasksmaybeofsame,different,orevenindeterminatesizes.• Taskdependencygraph:

– node=task– edge=controldependence,output-inputdependency– Nodependency==parallelism

3

Page 4: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

DegreeofConcurrency

• Definition:thenumberoftasksthatcanbeexecutedinparallel• Maychangeoverprogramexecution

• Metrics– Maximumdegreeofconcurrency

• Maximumnumberofconcurrenttasksatanypointduringexecution.

– Averagedegreeofconcurrency• Theaveragenumberoftasksthatcanbeprocessedinparallelovertheexecutionoftheprogram

• Speedup:serial_execution_time/parallel_execution_time• Inverserelationshipofdegreeofconcurrencyandtask

granularity– Taskgranularityé(lesstasks),degreeofconcurrencyê– Taskgranularityê(moretasks),degreeofconcurrencyé

4

Page 5: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

CriticalPathLength

• Adirectedpath: asequenceoftasksthatmustbeserialized– Executedoneafteranother

• Criticalpath:– Thelongestweightedpaththroughoutthegraph

• Criticalpathlength:shortesttimeinwhichtheprogramcanbefinished– Lowerboundonparallelexecutiontime

5

Abuildingproject

Page 6: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

CriticalPathLengthandDegreeofConcurrencyDatabasequerytaskdependencygraph

Questions:Whatarethetasksonthecriticalpathforeachdependencygraph?Whatistheshortestparallelexecutiontime?Howmanyprocessorsareneededtoachievetheminimumtime?Whatisthemaximumdegreeofconcurrency?Whatistheaverageparallelism(averagedegreeofconcurrency)?

Totalamountofwork/(criticalpathlength)2.33(63/27)and1.88(64/34)

6

Page 7: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

• Finertaskgranularityèmoreoverheadoftaskinteractions– Overheadasaratioofusefulworkofatask

• Example:sparsematrix-vectorproductinteractiongraph

• Assumptions:– eachdot(A[i][j]*b[j])takesunittimetoprocess– eachcommunication(edge)causesanoverheadofaunittime

• Ifnode0isatask:communication=3;computation=4• Ifnodes0,4,and5areatask:communication=5;computation=15

– coarser-graindecomposition→smallercommunication/computationratio(3/4vs 5/15)

TaskInteractionGraphs,Granularity,andCommunication

7

Page 8: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ProcessesandMapping

Agoodmappingmustminimizeparallelexecutiontimeby:

• Mappingindependenttaskstodifferentprocesses– Maximizeconcurrency

• Tasksoncriticalpathhavehighpriorityofbeingassignedtoprocesses

• Minimizinginteractionbetweenprocesses– mappingtaskswithdenseinteractionstothesameprocess.

• Difficulty:thesecriteriaoftenconflictwitheachother– E.g.Nodecomposition,i.e.onetask,minimizesinteractionbut

nospeedupatall!

8

Page 9: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

RecursiveDecomposition:Min

9

procedure SERIAL_MIN (A, n)min = A[0];for i := 1 to n − 1 doif (A[i] < min) min := A[i];

return min;

procedure RECURSIVE_MIN (A,n)if (n= 1)thenmin :=A [0];elselmin :=RECURSIVE_MIN (A,n/2 );rmin :=RECURSIVE_MIN (&(A[n/2]),n- n/2);if (lmin <rmin)thenmin :=lmin;elsemin :=rmin;

returnmin;

Findingtheminimuminavectorusingdivide-and-conquer

Applicabletootherassociativeoperations,e.g.sum,AND…Knownasreductionoperation

Page 10: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

DataDecomposition-- Themostcommonlyusedapproach

• Steps:1. Identifythedataonwhichcomputationsareperformed.2. Partitionthisdataacrossvarioustasks.

• Partitioninginducesadecompositionoftheproblem,i.e.computationispartitioned

• Datacanbepartitionedinvariousways– Criticalforparallelperformance

• Decompositionbasedon– outputdata– inputdata– input+outputdata– intermediatedata

10

Page 11: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OutputDataDecomposition:ExampleCount the frequency of item sets in database transactions

11

• Decomposetheitemsetstocount– eachtaskcomputestotalcountforeach

ofitsitemsets– appendtotalcountsforitemsetsto

producetotalcountresult

Page 12: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

InputDataPartitioning:Example

12

Page 13: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Densematrixalgorithms

• DenselinearalgebraandBLAS• Imageprocessing/stencil• Iterativemethods

13

Page 14: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Motifs

TheMotifs(formerly“Dwarfs”)from“TheBerkeleyView” (Asanovic etal.)formkeycomputationalpatterns

14TheLandscapeofParallelComputingResearch:AViewfromBerkeleyhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf

Page 15: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Denselinearalgebra• Softwarelibrarysolvinglinearsystem

• BLAS(BasicLinearAlgebraSubprogram)– Vector,matrixvector,matrixmatrix

• LinearSystems:Ax=b• LeastSquares:choosextominimize||Ax-b||2

– Overdetermined orunderdetermined– Unconstrained,constrained,weighted

• EigenvaluesandvectorsofSymmetricMatrices• Standard(Ax=λx),Generalized(Ax=λBx)

• EigenvaluesandvectorsofUnsymmetric matrices• Eigenvalues,Schur form,eigenvectors,invariantsubspaces• Standard,Generalized

• SingularValuesandvectors(SVD)– Standard,Generalized

• Differentmatrixstructures– Real,complex;Symmetric,Hermitian,positivedefinite;dense,triangular,banded…

• Levelofdetail– SimpleDriver– ExpertDriverswitherrorbounds,extra-precision,otheroptions– Lowerlevelroutines(“applycertainkindoforthogonaltransformation”,matmul…) 15

Page 16: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS(BasicLinearAlgebraSubprogram)

• BLAS1,1973-1977– 15operations(mostly)onvectors(1-darray)

• “AXPY”(y =α·x+y ),dotproduct,scale(x=α·x )– Upto4versionsofeach(S/D/C/Z),46routines,3300LOC– WhyBLAS1?TheydoO(n1)opsonO(n1)data:AXPY

• 2nflopson3nread/writes• Computationalintensity=(2n)/(3n)=2/3

16

Page 17: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS2

• BLAS2,1984-1986– 25operations(mostly)onmatrix/vectorpairs– “GEMV”:y=α·A·x +β·x,“GER”:A=A+α·x·yT,x=T-1·x– Upto4versionsofeach(S/D/C/Z),66routines,18KLOC

• WhyBLAS2?TheydoO(n2)opsonO(n2)data– Computationalintensitystilljust~(2n2)/(n2)=2

17

X=

Page 18: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3

• BLAS 3,1987-1988– 9operations(mostly)onmatrix/matrixpairs

• “GEMM”:C=α·A·B+β·C,C=α·A·AT+β·C,B=T-1·B– Upto4versionsofeach(S/D/C/Z),30routines,10KLOC– WhyBLAS3?TheydoO(n3)opsonO(n2)data

• Computationalintensity(2n3)/(4n2)=n/2– bigatlast!• Goodformachineswithcaches,deepmem hierarchy

18

A[M][K]*B[K][N]=C[M][N]

Page 19: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

DecompositionforAXPY,MatrixVector,andMatrixMultiplication

19

Page 20: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS1:AXPY

• y=α·x+y– x andy arevectorsofsizeN

• InC,x[N],y[N]– α isscalar

• Decompositionissimple– Niterations(NelementsofXandY)aredistributedamongthreads– 1:1mappingbetweeniterationandelementofXandY– XandYareshared

20

chunk=3

T0 T1

Page 21: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS2:MatrixVectorMultiplication

• y=A·x– A[M][N],x[N],y[N]

• Row-wisedecomposition

21

Mt

i_start

Page 22: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

22

A[M][K]*B[k][N]=C[M][N]• Base• Base_1:columnmajororderofaccess• row1D_dist• column1D_dist• rowcol2D_dist

• DecompositionistocalculateMtandNt

M

K N

Page 23: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

23

• Row-based1-D

!!!!!!!!!!!!!!!!!!X!!!!!!!!!!!!!!!!!!!!!!!!!!=!

!!!!!!!A!!!!!!!!!X!!!!!!!!!!!!B!!!!!!!!!!!!=!!!!!!!!!!!!!!C!

T0!T1!T2!T3!

Page 24: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

24

• Column-based1-D

!!!!!!!!!!!!!!!!!!X!!!!!!!!!!!!!!!!!!!!!!!!!!=!

!!!!!!!A!!!!!!!!!X!!!!!!!!!!!!B!!!!!!!!!!!!=!!!!!!!!!!!!!!C!

T0!!!T1!!T2!!T3!

Page 25: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

25

• Row/Column-based2-D

NeednestedparallelismexportOMP_NESTED=true

Page 26: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Densematrixalgorithms

• DenselinearalgebraandBLAS• Imageprocessing/stencil• Iterativemethods

26

Page 27: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

WhatisMultimedia• Multimedia is a combination of

text, graphic, sound, animation, and video that is delivered interactively to the user by electronic or digitally manipulated means.

https://en.wikipedia.org/wiki/Multimedia

Videoscontainsframe(images)

Page 28: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFormatandProcessing

• Pixels– Imagesarematrixofpixels

• Binaryimages– Eachpixeliseither0or1

Page 29: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFormatandProcessing

• Pixels– Imagesarematrixofpixels

• Grayscale images– Eachpixelvaluenormallyrangefrom0(black)to255(white)– 8bitsperpixel

Page 30: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFormatandProcessing

• Pixels– Imagesarematrixofpixels

• Colorimages– Eachpixelhasthree/fourvalues(4bitsor8bitseach)each

representingacolorscale

Page 31: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Histogram

• Animagehistogramisagraphofpixelintensity(onthe x-axis)versusnumberofpixels(onthe y-axis).The x-axishasallavailablegraylevels,andthe y-axisindicatesthenumberofpixelsthathaveaparticulargray-levelvalue.

31https://www.allaboutcircuits.com/technical-articles/image-histogram-characteristics-machine-learning-image-processing/

Page 32: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

HistogramsofMonochromeImage

32http://homepages.inf.ed.ac.uk/rbf/BOOKS/PHILLIPS/cips2edsrc/HIST.C

Page 33: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

HistogramofColorImages

• Imagedensity

33https://docs.opencv.org/3.4.0/d3/dc1/tutorial_basic_linear_transform.html

Page 34: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OpenMP ParallelizationofHistogram

• Decompositionbasedonoutput(pixelvalues,0- 255)– Eachthreadsearchesthewholeimagetoonlycountthose

pixelsthathavethevalueitshouldcountfor• E.g.with4threads:0-63forthread0,64-127forthread1,…

• Decompositionbasedontheinput(image)– Eachthreadsearchpartoftheimagetocountallthepixels

andstorethepartial histogramlocally– Addupallthepartialhistogram

34

Page 35: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFiltering

• Changingpixelvaluesbydoinga convolution betweenakernel(filter)andan image.

Page 36: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFiltering:Themagicofthefiltermatrix

• http://lodev.org/cgtutor/filtering.html• https://en.wikipedia.org/wiki/Kernel_(image

_processing)

• Itisthebasicofconvolutionneuralnetwork

Page 37: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ConvolutionNeuralNetworkforObjectDetection

• Pooling:sample-baseddiscretizationprocess

37

http://cs231n.github.io/convolutional-networks/

Page 38: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OpenMP ParallelizationofImageFiltering

• Decompositionaccordingtotheinputimage• Sinceinputandoutputimagesareseparate,itisstraightforward– Couldberow1D,col1D,rowcol2D

• False-sharingforwritingboundaryofoutputimages

38

Page 39: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Densematrixalgorithms

• DenselinearalgebraandBLAS• Imageprocessing/stencil• Iterativemethods

39

Page 40: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

IterativeMethods

• Iterativemethodscanbeexpressedinthegeneralform:x(k) =F(x(k-1))

Hopefully:x(k)® s(solutionofmyproblem)

• Widevarietyofcomputationalscienceproblem– CFD,moleculardynamics,weather/climateforecast,

cosmology,

• Willitconverge?Howrapidly?

Page 41: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

IterativeStencilApplications

Loopuntilsomeconditionistrue

PerformcomputationwhichinvolvescommunicatingwithN,E,W,Sneighborsofapoint(5pointstencil)

[Convergencetest?]

Stencilissimilarasimagefiltering/convolution

x(k) =F(x(k-1))

Page 42: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Jacobi.c

• Assignment2and3:

42https://passlab.github.io/CSCE569/Assignment_2/jacobi.c

Page 43: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Jacobi

• Aniterativemethodforapproximatingthesolutiontoasystemoflinearequations.

• Ax=b wheretheith equationis

• a’sandb’sareknown,wanttosolveforx’s

inniii bxaxaxa =+++ ,11,11, !

úû

ùêë

é-= å

¹ijjjii

iii xaba

x ,,

1

Page 44: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OpenMP ParallelizationofJacobi

• Similarasimagefiltering– Enclosedbythewhile tobeiterative

• omp parallel forouterwhile loop• omp for forinnerfor loops• single andreductionareneeded

44

Page 45: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

GhostCellExchange

• Forassignment3:

45

Page 46: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Background:Cmultidimensionalarray

46

Page 47: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Vector/MatrixandArrayinC

• Chasrow-majorstorageformultipledimensionalarray– A[2,2]isfollowedbyA[2,3]

• 3-dimensionalarray– B[3][100][100]

• Thinkitasrecursivedefinition– A[4][10][32]

47

charA[4][4]

Page 48: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ColumnMajor

Fortraniscolumnmajor

48

Page 49: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayLayout:WhyWeCare?

1.Makesabigdifferenceforaccessspeed• Forperformance,setupcodetogoinrowmajororderinC

– Caching:eachreadfrommemorywillbringotheradjacentelementstothecacheline

• (Bad)Example:4vs 16accesses– matmul_base_1

49

for i = 1 to nfor j = 1 to n

A[j][i] = value

Page 50: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayLayout:WhyWeCare?

2.Affectdecompositionanddatamovement• Decompositionmaycreatesubmatrices thatareinnon-contiguousmemorylocations,e.g.A3andB1

• Submatrices incontiguousmemorylocationof2-Drowmajormatrix– Asingle-rowsubmatrix,e.g.A2– Asubmatrix formedwithadjacentrowswithfullcolumn

length,e.g.A1

50

A1

A2 B1

A3

Page 51: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayLayout:WhyWeCare?

2.Affectdecompositionandsubmatrix• Roworcolumnwisedistributionof2-Drow-majorarray• #ofdatamovementtoexchangedatabetweenT0andT1

– Row-wise:onememorycopybyeach– Column-wise:16copieseach

51T0T1T2T3

Row-wisedistribution Column-wisedistribution

Page 52: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayandpointersinC

• InC,anarrayisapointer+dimensionality– Theyareliterallythesameinbinary,i.e.pointertothefirst

element,referencedasbaseaddress• Castandassignmentfromarraytopointe,int A[M][N]

• A,&A[0][0],andA[0]havethesamevalue,i.e.thepointertothefirstelementofthearray

• Castapointertoanarray– int *ap;int (*A)[N]=(int(*)[N])ap;A[i][j]….

• Addresscalculationforarrayreferences– AddressofA[i][j]=A+(i*N+j)*sizeof (int)

52