Top Banner
Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science and Informatics University College Dublin ___________________________________________________ ____ HeteroPar’06 Barcelona Sept. 28, 2006
20

Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Matrix Multiplication on Two Interconnected Processors

Brett A. Becker and Alexey Lastovetsky

Heterogeneous Computing Laboratory

School of Computer Science and Informatics

University College Dublin

_______________________________________________________

HeteroPar’06 Barcelona Sept. 28, 2006

Page 2: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Outline

● Motivation and Goals

● Introduction: ‘Straight-Line’ Partitionings

● The ‘Square-Corner’ Partitioning - Minimizing the Total Volume of Communication

● MPI Experiments / Results

● Conclusion / Future Work

Page 3: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Motivation and Goals

● Partitioning algorithms for MMM designed for n processors result in partitionings which are not always optimal on a small number of processors

● We seek to lower the Total Volume of Communication by utilizing a new partitioning strategy.

● Our ultimate interest is to determine if the Square-Corner partitioning

is a viable technique for deployment on 2 interconnected Clusters.

Page 4: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Background: Straight-Line Partitioning

p

iii whS

1

)(

Total Volume of Inter-Processor Communication (TVC) is proportional to the Sum of Half-Perimeters (S)

Lower Bound (L) of S is when all partitions are square

p

iiaL

1

2

Page 5: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Straight-Line Partitioning

From: Olivier Beaumont, Vincent Boudet, Fabrice Rastello and Yves Robert, “Matrix-Matrix Multiplication on Heterogeneous Platforms”, IEEE Transactions on Parallel and Distributed Systems, 2001, Vol.12, No.10, pp.1033-1051.

Average and Minimum values of L

S

for two million randomly generated

areas

Page 6: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Background: Straight-Line Partitioning2 Processors

NwhwhwhSi

ii 3)( 2211

2

1

NL

NaLi

i

2,0 as

)(22 22

1

The Straight-Line Partitioning can not meet the lower bound, L

Page 7: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Background: Straight-Line Partitioning2 Processors

2TVC ,0 as N

Total Volume of Inter-Processor Communication (TVC) = N 2

Page 8: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Introduction: Square-Corner Partitioning

0TVC ,0 as X

N2TVC

Page 9: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner Partitioning

NS

whwhwhSi

ii

2,0 as

)( 2211

2

1

NL

NaLi

i

2,0 as

)(22 22

1

The Square-Corner Partitioning can meet the lower bound, L

Page 10: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner Partitioning

Average and Minimum values of L

Sfor 2 million randomly generated areas

Power Ratio > 3:1

Adapted From: Olivier Beaumont, Vincent Boudet, Fabrice Rastello and Yves Robert, “Matrix-Matrix Multiplication on Heterogeneous Platforms”, IEEE Transactions on Parallel and Distributed Systems, 2001, Vol.12, No.10, pp.1033-1051.

Page 11: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner PartitioningMinimizing the TVC

The Square-Corner Partitioning has a lower Total Volume of Communication compared to the Straight-Line Partitioning Provided the Processor Power Ratio is > 3:1

The Total Volume of Communication is minimized when the slower processor’s partition is a square

Theorem:

Theorem:

Page 12: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Results: Square-Corner Partitioning

Matrix-Matrix Multiplication, N=6500, Bandwidth = 80Mb/s

Lower TVC Lower Communication Time Lower Execution Time

Average Reduction in Communication Time = 45%

Average Reduction in Execution Time = 14%

Page 13: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Results: Square-Corner Partitioning

Matrix-Matrix Multiplication, N=6500, Bandwidth = 380Mb/s

Average Reduction in Communication Time = 44%

Lower TVC Lower Communication Time Lower Execution Time

Average Reduction in Execution Time = 10%

Page 14: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner Partitioning Overlapping Communication and Computation

A sub-partition of Processor 1’s C Partition is Immediately Calculable

Page 15: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner Partitioning Overlapping Communication and Computation

Overlapping more than doubled advantage of Square-Corner algorithm. ● No Overlapping → 17% faster than Straight-Line algorithm. ● Overlapping → 39% faster than Straight-Line algorithm.

Algorithm Execution Time Speedup

Straight-Line 83s 0.94Square-Corner (No Overlapping) 69s 1.13Square-Corner (Overlapping) 51s 1.53Sequential 78s N/A

MM Multiplication, N=4500, Bandwidth=100Mb/s, Ratio=5:1,

Page 16: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner Partitioning Two Cluster Architecture

Total of 20 Homogeneous Nodes in 2 Clusters

Page 17: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Square-Corner Partitioning Two Clusters

Algorithm Execution Time Speedup

Straight-Line 123s 1.04Square-Corner 115s 1.11Sequential 128s N/A

MM Multiplication, N=9000, Bandwidth=100Mb/s

All Machines are Homogeneous. One Cluster of 4, One Cluster of 16

Page 18: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Conclusions

● The Square-Corner Partitioning reduces the Total Volume of Communication provided the processor power ratio is > 3:1

● The possibility of Overlapping Communication and Computation can bring further reductions in Execution Time

● The Square-Corner Partitioning is viable on Two Clusters

_______________________________________________________

Page 19: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Current and Future Work

● We have successfully extended the Square-Corner Partitioning to Three Processors

To do:

● Experiment on more Two-Cluster architectures

● Overlap Communication and Computation on Two Clusters

● Extend to Three-Processor Algorithm to Three Clusters

_______________________________________________________

Page 20: Matrix Multiplication on Two Interconnected Processors Brett A. Becker and Alexey Lastovetsky Heterogeneous Computing Laboratory School of Computer Science.

Acknowledgements

This work was supported by: