Top Banner
Parallel Cholesky Decomposition in Julia (6.338 Project) Omar Mysore December 16, 2011 Project Introduction Sparse matrices dominate numerical simulation and technical computing. Their applications are practically limitless, ranging from solving partial differential equations to convex optimization to big data. Because of the wide use of sparse matrices, the objective of this project was to implement sparse Cholesky decomposition in parallel using the Julia language. In addition to developing the Cholesky decomposition feature in the Julia language and investigating the effects of parallelization, this project served the purpose of aiding the development of sparse matrix support in Julia. A working sparse parallel Cholesky decomposition solver was developed. Although, the current implementation contains limits in terms of speed and capability, it is hoped that this implementation can serve as a means to further development. The remainder of this report will discuss the basics of Cholesky decomposition and the algorithm used, the opportunities and methods of parallelization, the results, and the potential for future work. Cholesky Decomposition The Cholesky decomposition of a symmetric positive definite matrix A determines the lower‐triangular matrix L, where LL T = A. Although it is limited to symmetric positive
19

Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

Oct 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

ParallelCholeskyDecompositioninJulia(6.338Project)

OmarMysore

December16,2011

ProjectIntroduction

Sparsematricesdominatenumericalsimulationandtechnicalcomputing.Their

applicationsarepracticallylimitless,rangingfromsolvingpartialdifferentialequationsto

convexoptimizationtobigdata.Becauseofthewideuseofsparsematrices,theobjective

ofthisprojectwastoimplementsparseCholeskydecompositioninparallelusingtheJulia

language.InadditiontodevelopingtheCholeskydecompositionfeatureintheJulia

languageandinvestigatingtheeffectsofparallelization,thisprojectservedthepurposeof

aidingthedevelopmentofsparsematrixsupportinJulia.

AworkingsparseparallelCholeskydecompositionsolverwasdeveloped.Although,

thecurrentimplementationcontainslimitsintermsofspeedandcapability,itishopedthat

thisimplementationcanserveasameanstofurtherdevelopment.Theremainderofthis

reportwilldiscussthebasicsofCholeskydecompositionandthealgorithmused,the

opportunitiesandmethodsofparallelization,theresults,andthepotentialforfuturework.

CholeskyDecomposition

TheCholeskydecompositionofasymmetricpositivedefinitematrixAdeterminesthe

lower‐triangularmatrixL,whereLLT=A.Althoughitislimitedtosymmetricpositive

Page 2: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

definitematrices,thesematricesoftenappearinfieldssuchasconvexoptimization.The

followingequationsareusedfrom[2].

ThebasicdenseCholeskydecompositionalgorithmconsistsofrepeatedly

performingthefactorizationbelowonamatrix,Aofsizen,wheredisaconstantvalueand

dandvisann‐1by1columnvector.

Oncethisfactorizationiscompleted,thefirstcolumnofLisdetermined.Thenthesame

factorizationisperformedonC‐vv/d,andtheprocessisrepeateduntilallofthecolumnsof

Larefound.Similarly,thesameprocesscanbedoneforblockmatrices:

CholeskyDecompositionforSparseMatrices

Forsparsematrices,severaladditionalstepsaretakeninordertotakeadvantageofthe

substantiallyfewernonzerovalues.Firstthefill‐ins,andthestructureofLaredetermine,

withoutcalculatingthevalues.Nextthetreeofdependenciesisfound,andfinallythe

valuesofLarecalculated.Foradetailedexplanationofthemethodsummarizedinthis

report,see[2].

Page 3: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

ThefirststepinsparseCholeskydecompositionistodeterminethestructureofL

withoutexplitlydeterminingL.AllofthefollowingimagesareusedfromJohnGilbert’s

slides[1].SupposeAhasthestructurebelow:

Then,thegoalwouldbetodeterminethestructureofL,whichisshowbelow:

Thereddotsareknownasthefill‐ins,sincethesevaluesarenonzeroinL,butzeroinA.

AlthoughthestructureofLisknown,noneofthespecificvaluesareknown.Inorderto

determinethestructureofL,thegraphsofthematricesareused.Belowisthegraphofthe

previouslyshownmatrixA:

Page 4: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

Fromthisgraph,thegraphofLcaneasilybedeterminedbyconnectingthehigher

numberedneighborsofeachnode/columninthegraph.BelowisthegraphofLwiththe

fill‐insinred:

OncethestructureofLisdetermined,thedependencytreecanbedetermined.Inorderto

calculatethedependencytree,theparentofeachnodemustbedetermined,andtheparent

ofagivennode/columnistheminimumrowvalueofanonzerovalueinthegivencolumn

ofL,notincludingthediagonalvalues.Forexample,forthegraphofLpreviouslyshown,

thedependencytreeisshownbelow:

Page 5: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

Afterthedependencytreeisdetermined,thevaluesofLcanbecalculated.Thebasic

equationsfordeterminingeachcolumnofLareshownbelow:

ForeachcolumnjinA

DeterminethefrontalmatrixFj,where

And,

Here,T[j]‐{j}representallofchildrennodes,whichhavetoalready

havebeendeterminedinordertocalculateLj.

ThisisthebasicformulationfordeterminingthesparseCholeskydecomposition.All

equationsandimagesusedinthisandtheprevioussectionwereobtainedfrom[1]and[2].

Formoredetailssee[2].

Page 6: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

ParallelizationandImplementationinJulia

ForsparseCholeskydecomposition,therearethreeprimaryopportunitiesfor

parallelization.First,indeterminingthestructureofL.Second,insimultaneously

computingcolumns,whichareindependent.Third,inturningtheadditionstepforeach

columnintoaparallelreduction.

InordertodeterminethestructureofL,eachcolumnorblockofcolumnscanbe

senttodifferentprocessors,andthenecessaryfill‐inscanbedetermined.Oncethefill‐ins

aredetermined,theycanbeaddedtoL.ThefollowingJuliafunctiondemonstratesthis:

functionfillinz(L)L=tril(L)L=spones(L)

refs=[remote_call((mod(i,nprocs()))+1,pfillz,L[:,i],i)|i=1:size(L)[1]fori=1:size(L)[1]q=fetch(refs[i])forj=1:length(q)/2L[q[2*j‐1],q[2*j]]=1endendreturnLend

TheinputtothisfunctionisthematrixA,forwhichwewouldliketheCholesky

decomposition.Thevector,refs,sendseachcolumntodifferentprocessors,andthefor‐

loopobtainstheresultsandaddsthefill‐ins.

Thetreestructureofdependenciesallowsforfurtherparallelization.Aspreviously

discussed,foramatrixA,thedependencytreemightlooklikethefollowingfigure:

Page 7: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

Inthiscase,columns1,2,4,and6ofLcanallbedeterminedwithoutanyothercolumnsofL,

andtheycanbedeterminedsimultaneously.Thefollowingfor‐loop,whichispartofthe

mainsparseCholeskyfunction,performsthisprocessofgoingthroughthelevelsofthetree

andsendingallofthecolumnsatthesameleveltodifferentprocessors:

fori=1:size(kids)[1]k=[]forj=1:size(kids)[1]iftree[j]==i‐1k=vcat(k,[j])endendrefs=[remote_call(mod(i,nprocs())+1,spcholhelp,A,L,k[i],kids)|i=1:length(k)]form=1:length(refs)lcol=fetch(refs[m])L[:,k[m]]=lcolendend

Inthisforloop,theindexiloopsthroughallofthepossiblevaluesforthenumberof

childrenacolumncanhave.ThevectorrefscontainsallofthecolumnsofLthatare

calculatedinparallelforagivenstageofthetree.

Thefinallevelofparallelizationiswithinthefunctionwhichdeterminesthevalues

ofthecolumnsL.Duringthecalculation,anumberofmatricesequaltothenumberof

Page 8: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

childrenofthegivencolumnmustbeadded.Ratherthanaddingserially,thisisdonewith

aparallelfor‐loop.Itisshownbelow:

addz=@parallel(+)fori=1:size(kids)[1]‐L[nzs,convert(Int16,kids[i])]*L[nzs,convert(Int16,kids[i])]'end

ResultsandDiscussion

Theprimaryobjectiveoftheprojectwastowriteafunction,whichperformsCholesky

decompositiononasparsematrix.Thisfunctioniscalledspchol(),andtheinputargument

isthematrix.Thisfunctionseemstoworkforallmatricestested.Averysimpleexampleis

shown:

julia>A10x10Float64Array:121.033.00.00.033.00.055.00.044.00.033.010.00.00.012.00.017.02.012.00.00.00.01.00.00.09.00.00.00.00.00.00.00.01.08.00.00.00.00.00.033.012.00.08.083.00.023.06.012.02.00.00.09.00.00.0162.018.063.00.00.055.017.00.00.023.018.038.018.020.04.00.02.00.00.06.063.018.054.00.03.044.012.00.00.012.00.020.00.017.00.00.00.00.00.02.00.04.03.00.014.0julia>u=spchol(A)10x10Float64Array:11.00.00.00.00.00.00.00.00.00.03.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.03.03.00.08.01.00.00.00.00.00.00.00.09.00.00.09.00.00.00.00.05.02.00.00.02.02.01.00.00.00.00.02.00.00.00.07.00.01.00.00.04.00.00.00.00.00.00.00.01.00.00.00.00.00.02.00.00.03.00.01.0julia>u*u'10x10Float64Array:

Page 9: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

121.033.00.00.033.00.055.00.044.00.033.010.00.00.012.00.017.02.012.00.00.00.01.00.00.09.00.00.00.00.00.00.00.01.08.00.00.00.00.00.033.012.00.08.083.00.023.06.012.02.00.00.09.00.00.0162.018.063.00.00.055.017.00.00.023.018.038.018.020.04.00.02.00.00.06.063.018.054.00.03.044.012.00.00.012.00.020.00.017.00.00.00.00.00.02.00.04.03.00.014.0julia>A10x10Float64Array:121.033.00.00.033.00.055.00.044.00.033.010.00.00.012.00.017.02.012.00.00.00.01.00.00.09.00.00.00.00.00.00.00.01.08.00.00.00.00.00.033.012.00.08.083.00.023.06.012.02.00.00.09.00.00.0162.018.063.00.00.055.017.00.00.023.018.038.018.020.04.00.02.00.00.06.063.018.054.00.03.044.012.00.00.012.00.020.00.017.00.00.00.00.00.02.00.04.03.00.014.0

Above,Aandu*u’areidentical.

Testswereconductedtoseetheeffectsofparallelization.Allmatricesweregeneratedin

thesamemanner,byfirstdeclaringA=tril(round(10*sprand(n,n,.3))+eye(n));andthen

A=A*A’.Theresultsareshowninthetablebelow

Size Timeon1processor Timeon4processors Paralleltoserial

ratio

10x10 0.0025s .189s 76

50x50 0.101s .52s 5

100x100 2.8s 2.8s 1

150x150 24.5s 22.1s 0.9

Page 10: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

Forsmallermatrices,itappearsasthoughserialisbetter,becausethecommunication

betweenprocessorsdominates.Asthesizeofmatricesincreases,parallelseemsto

substantiallyimproverelativetoserial.Moretestsneedtobeconductedinorderto

understandthebehaviorofthisfunction.

LimitationsandFurtherWork

Asstatedpreviously,moretestsneedtobeconductedinordertounderstandthe

performanceandlimitationsofthespchol()function.Additionally,amajorlimitationisthe

factthatalthoughthealgorithmisdesignedforsparsematrices,itcurrentlyonlyworksfor

fullsparsematrices(andofcoursefullmatrices).Thisisalsoapossiblecauseofthemajor

timeincreasewithrespecttomatrixdimensionsshownintheresults.Initiallythe

algorithmwasdevelopedthisway,becauseoferrorsinvolvingindexingcolumnsofmatrics

thatweredeclaredassparse.Whiletheseerrorswererecentlyfixed,thealgorithm

currentlystilldoesnotworkforsparsematricesunlesstheyaredeclaredasfull.Next

stepsinvolvedebuggingthis.

ThoughtsandQuestionsAboutJulia

Asstatedpreviously,currentlythespchol()functionpresentedinthisreport,treats

allofthematricesandvectorsasfull.Forexample,thefunctiontril()iscalledbyspchol():

Page 11: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

functiontril(B)c=zeros(size(B)[1],size(B)[1])fori=1:size(B)[1]c[i:size(B)[1],i]=B[i:size(B)[1],i]endreturncend

ClearlythisfunctiontreatsBandcasdensematrices.Ibelievethisraisesseveral

questions.Firstly,ifsparsematrixfunctionsareimplementedinJulia,shouldtheybe

designedtoworkforbothdenseandsparsematrices?Forexample,iftrilassumedBwas

sparseandaccessedB.nzval,thenitcouldneverworkfordensematrices.Another

questionthisraisesiswhatwouldcauseafunctiondesignedfordensematricesnotwork

forasparsematrix?IfIrunthespchol()functiononasparsematrixthatisdeclaredasa

sparsematrix,Igetincorrectresultsorerrors.ItonlyworksifImakethesparsematrix

full.Thisisabitproblematic,andshouldbeaddressedinthefuture.Whileattemptingto

debugthis,Ifoundaninterestingresult:

julia>A10‐by‐10sparsematrixwith21nonzeros: [1, 1] =6.0 [2, 1] =9.0 [3, 1] =11.0 [7, 1] =1.0 [9, 1] =2.0 [2, 2] =1.0 [6, 2] =15.0 [3, 3] =1.0 [6, 3] =6.0 [10, 3] =5.0 [4, 4] =1.0 [8, 4] =3.0 [5, 5] =1.0 [6, 5] =6.0

Page 12: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

[6, 6] =9.0 [7, 7] =1.0 [8, 7] =12.0 [8, 8] =3.0 [9, 8] =1.0 [9, 9] =9.0 [10, 10] =1.0julia>full(A*A')10x10Float64Array:36.054.066.00.00.00.06.00.012.00.054.082.099.00.00.015.09.00.018.00.066.099.0122.00.00.06.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.015.00.00.00.00.00.00.00.00.06.00.00.00.00.00.00.00.00.00.00.00.00.03.00.00.00.00.00.00.012.00.00.00.00.00.00.00.00.00.00.00.05.00.00.00.00.00.00.00.0julia>full(A)*full(A')10x10Float64Array:36.054.066.00.00.00.06.00.012.00.054.082.099.00.00.015.09.00.018.00.066.099.0122.00.00.06.011.00.022.05.00.00.00.01.00.00.00.03.00.00.00.00.00.00.01.06.00.00.00.00.00.015.06.00.06.0378.00.00.00.030.06.09.011.00.00.00.02.012.02.00.00.00.00.03.00.00.012.0162.03.00.012.018.022.00.00.00.02.03.086.00.00.00.05.00.00.030.00.00.00.026.0

Uponinspection,full(A*A’)isnotequalto(full(A))*(full(A))’.Isuspectthattheseshouldbe

equal,butsomethingseemstobecausingaproblem.

Page 13: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

Conclusion

Thefunctionspchol()ispresentedinthisreport.Asstatedintheprevioussection,some

workandtestingstillneedstobedone.Iwouldbehappytocontributeinanyway

possible.

References

[1]JohnGilbert’sslidesfromDay1ofSparseMatrixDaysatMIT.Availableat

http://www.cs.ucsb.edu/~gilbert/talks/talks.htm.

[2]Liu,JosephW.H.“TheMultifrontalMethodforSparseMatrixSoluation:Theoryand

Practice.SIAMReview.Vol.34,No.1(Mar.,1992,pp.82‐109.

Acknowledgements

IwouldliketothankAlanEdelman,JeffBezanson,andViralShah.AdditionallyIwouldlike

tothankeveryonewhohasbeendevelopingtheJulialanguage.

AppendixA:Runningthecode

Allofcodeisfoundinspcholp.jandmustberunwith@everywhereload(“spcholp.j”).To

findtheCholeskydecompositionofA,runspchol(A).

Page 14: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

AppendixB:TheCodeforspcholp.j:

functionspones(A)A=sparse(A)A.nzval=ones(length(A.nzval),)returnfull(A)endfunctiontril(B)c=zeros(size(B)[1],size(B)[1])fori=1:size(B)[1]c[i:size(B)[1],i]=B[i:size(B)[1],i]endreturncend@everywherefunctionpfillz(b,i)r=[]forj=1:(length(b))ifb[j]==1fork=1:(length(b))if(j>i)&&(k>j)&&b[k]==1r=vcat(r,[k,j])endendendendreturnrend

Page 15: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

functionfillinz(L)L=tril(L)L=spones(L)refs=[remote_call((mod(i,nprocs()))+1,pfillz,L[:,i],i)|i=1:size(L)[1]]fori=1:size(L)[1]q=fetch(refs[i])forj=1:length(q)/2L[q[2*j‐1],q[2*j]]=1endendreturnLendfunctionnzindex(g)b=[]fori=1:length(g)ifg[i]!=0b=vcat(b,[i])endendreturnbendfunctionpary(L)u=size(L)par=zeros(u[1]‐1,1)form=1:(u[1]‐1)dad=nzindex(L[:,m])ifsize(dad)[1]==1par[m]=length(L[:,m])elsepar[m]=dad[2]endendreturnparend

Page 16: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

functionkiddies(par)dim=length(par)+1kids=zeros(dim,dim)fori=1:length(par)kids[i,par[i]]=iendforj=1:size(kids)[1]p=nzindex(kids[:,j])fork=1:length(p)kids[:,j]=kids[:,j]+kids[:,p[k]]endendreturnkidsend

Page 17: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

@everywherefunctionfrontconstruct(A,L,colj,kids)col=L[:,colj]nzs=nzindex(col)Uj=zeros(length(nzs),length(nzs))ifsize(kids)[1]!=0addz=@parallel(+)fori=1:size(kids)[1]‐L[nzs,convert(Int16,kids[i])]*L[nzs,convert(Int16,kids[i])]'endelseaddz=zeros(length(nzs),length(nzs))endFj=zeros(length(nzs),length(nzs))Fj[:,1]=A[nzs,colj]Fj[1,:]=A[colj,nzs]F=Fj+Uj+addzalpha=sqrt(F[1,1]);r=F[:,1]lz=vcat(alpha,(1/alpha)*r[2:length(r)])returnlzend

Page 18: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

@everywherefunctionspcholhelp(A,L,i,kids)kidset=nonzeros(kids[:,i])iflength(kidset)==0kidset=[]endlz=frontconstruct(A,L,i,kidset)forj=1:size(A)[1]ifL[j,i]==1L[j,i]=lz[1]lz=lz[2:length(lz)]endendreturnL[:,i]end

Page 19: Parallel Cholesky Decomposition in Julia (6.338 Project)courses.csail.mit.edu/18.337/2011/projects/Mysore_report.pdf · Parallel Cholesky Decomposition in Julia (6.338 Project) Omar

functionspchol(A)B=AL=fillinz(B)par=pary(L)kids=kiddies(par)tree=zeros(size(kids)[1])fori=1:size(kids)[1]tree[i]=length(nonzeros(kids[:,i]))endfori=1:size(kids)[1]k=[]forj=1:size(kids)[1]iftree[j]==i‐1k=vcat(k,[j])endendrefs=[remote_call(mod(i,nprocs())+1,spcholhelp,A,L,k[i],kids)|i=1:length(k)]form=1:length(refs)lcol=fetch(refs[m])L[:,k[m]]=lcolendendreturnLend