CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi
Jan 03, 2016
CSE5304—Project Proposal
Parallel Matrix Multiplication
Tian Mi
An naive version with MPI
P1
P2
…
Pi
…
PN
Result:
An naive version with MPI
Pi Pi
An naive version with MPI
Processor0 reads input fileProcessor0 distributes one matrixProcessor0 broadcasts the other matrixAll processors in parallel
Do the multiplication of each piece of data
Processor0 gathers the resultProcessor0 writes result to output file
MPI_Scatter
MPI_Scatter
MPI_Bcast
MPI_Bcast
MPI_Gather
MPI_Gather
Data generation
Data generation in R with package “igraph”
Integer in range of [-1000, 1000]Matrix size:
Matrix 512*512 1024*1024 2048*2048 4096*4096
File size 2.69 MB 10.7 MB 43.1 MB 172 MB
Result
Data size: 1024*1024# Processors Experiments(second) Average(s) Speedup
1 44 41 45 37 42 41.8 1
2 23 20 21 19 22 21 1.99
4 11 10 19 18 16 14.8 2.82
8 10 9 8 9 10 9.2 4.54
16 9 9 11 9 6 8.8 4.75
32 8 10 8 7 7 8 5.23
64 8 8 8 8 8 8 5.23
128 10 9 6 8 9 8.4 4.98
Result
Data size: 1024*1024
05
1015202530354045
1 2 4 8 16 32 64 128
# processors
time
(s)
Result
Data size: 1024*1024
0
1
2
3
4
5
6
1 2 4 8 16 32 64 128
# processors
spee
dup
Result
Data size: 2048*2048
# Processors Time(s) Speedup
1 751 1
2 498 1.508032
4 258 2.910853
8 127 5.913386
16 84 8.940476
32 51 14.72549
64 55 13.65455
128 48 15.64583
Result
Data size: 2048*2048
0100200300400
500600700800
1 2 4 8 16 32 64 128
# processors
time
(s)
Result
Data size: 2048*2048
02468
1012141618
1 2 4 8 16 32 64 128
# processors
spee
dup
Result
Data size: 4096*4096
# Processors Time(s) Speedup
1 5920 1
2 3630 1.630854
4 2813 2.104515
8 925 6.4
16 745 7.946309
32 576 10.27778
64 #DIV/0!
128 #DIV/0!
Analysis
To see the superlinear speedup increase the computation, which is not dominan
t enough larger matrix and larger integer
However, larger matrix or long integer will also increase the communication time (broadcast, scatter, gather)
Cannon's algorithm--Example
http://www.vampire.vanderbilt.edu/education-outreach/me343_fall2008/notes/parallelMM_10_09.pdf
Cannon's algorithm
Still Implementing and debuggingNo result to share at present
Thank you
Questions & Comments?