A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE Ken Habgood, Itamar Arel Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science Department of Electrical Engineering & Computer Science The University of Tennessee The University of Tennessee GABRIEL CRAMER (1704-1752)
22
Embed
A Condensation-based Low Communication Linear Systems Solver Utilizing Cramer's Rule
Gabriel cramer (1704-1752). A Condensation-based Low Communication Linear Systems Solver Utilizing Cramer's Rule. Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science The University of Tennessee. Outline. Motivation & problem statement Algorithm review - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A CONDENSATION-BASED LOW COMMUNICATION LINEAR SYSTEMS SOLVER UTILIZING CRAMER'S RULE
Ken Habgood, Itamar ArelKen Habgood, Itamar ArelDepartment of Electrical Engineering & Computer ScienceDepartment of Electrical Engineering & Computer ScienceThe University of TennesseeThe University of Tennessee
GABRIEL CRAMER (1704-1752)
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline2
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Introduction
Mainstream approach: Gaussian Elimination e.g. LU decomposition
Looking for a lower communication overhead, efficient parallel solver
Targeting an unpopular approach: Cramer’s Rule
3
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline5
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Proposed Algorithm Flow6
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Matrix “Mirroring”
1,42,43,44,4
1,32,33,34,3
1,22,23,24,2
1,12,13,14,1
4,43,42,41,4
4,33,32,31,3
4,23,22,21,2
4,13,12,11,1
aaaa
aaaa
aaaa
aaaa
mirror
aaaa
aaaa
aaaa
aaaa
Mirroring example
1,42,4
1,32,3
4,43,4
4,33,3
''
''
''
''
aa
aa
aa
aa
Applying Chio’s condensation yields:
7
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline8
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Accuracy and Numerical Stability Backward error estimation
Theoretical estimate of rounding error
E matrix depends on two items The largest element in A or b The growth factor of the algorithm
Same growth factor as LU-decomposition with partial pivoting
9
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Forward Error Comparisons
Matrix Size
κ(A)
Max Matlab
Max GSL
Avg Matlab
Avg GSL
1000 x 1000 506930 2.39E-09 1.93E-10 1.03E-10 5.38E-12
2000 x 2000 790345 4.52E-09 5.36E-09 1.01E-10 7.27E-12
3000 x 3000 1540152 1.95E-08 1.84E-08 1.12E-10 2.09E-11
4000 x 4000 12760599 4.81E-08 5.62E-08 1.43E-10 7.91E-11
5000 x 5000 765786 2.92E-08 4.39E-08 1.18E-10 3.46E-11
6000 x 6000 1499430 8.67E-08 8.70E-08 1.37E-10 6.04E-11
7000 x 7000 3488010 9.92E-08 8.95E-08 1.27E-10 5.15E-11
8000 x 8000 8154020 9.09E-08 9.43E-08 1.86E-10 7.85E-11
10
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Forward Error - Residual
Matrix Size κ(A)
Max Residual
Avg Residual
1000 x 1000 506930 3.14E-08 4.46E-09
2000 x 2000 790345 6.72E-09 9.48E-10
3000 x 3000 1540152 2.79E-08 3.28E-09
4000 x 4000 12760599 1.06E-05 1.34E-06
5000 x 5000 765786 2.00E-08 2.65E-09
6000 x 6000 1499430 2.95E-08 3.86E-09
7000 x 7000 3488010 1.99E-08 2.44E-09
8000 x 8000 8154020 1.94E-08 2.32E-09
11
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
MATLAB Matrix Gallery
Special Matrix
Avg Matlab Residual
Matlab Residual
clement — Tridiagonal matrix with zero diagonal entries 1.40E-05 7.43E+133 7.85E+144
orthog — Orthogonal and nearly orthogonal matrices 1.03E-07 1.09E-14 2.80E-08
randjorth — Random J-orthogonal matrix 1.55E-04 1.68E-00 1.13E-04
12
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Outline13
Motivation & problem statement
Algorithm review Numerical accuracy &
stability Parallel Implementation Communication Results
Source: http://tridane.faculty.asu.edu
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Serial Performance
Results support the theoretical ~2.5x complexity ratio
14
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Algorithm Processing Flow15
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Overview of Parallel Implementation
16
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Parallel Implementation (cont’)
17
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Two phases of parallel communication Parallel Chio’s
Gather Columns
Overall Bandwidth
Communication Complexity
N
k
N
k
NNkkkkN
0
log
0
2
2
2
221
2
2
NFN
FP
P
F
NNdoublebytes
2
31
2
3/8
22
2
FPPFFN 12 /log2
N: Original matrix size, P: number of processors, F: gather columns size
18
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Communication Overhead19
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
Point at which Communication “dead time” matches computational workload
Where’s the Breakeven Point?
0
05.1
5.2
322
22223
223
32
3
CdN
CC
CC
pp
NdpNdNp
dpNp
NdN
p
N
Assuming dC = .05 and N = 1000, the breakeven processors point would be P~142
20
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu
overhead Many more “broadcasts” than “unicasts” Comm. function of problem size not processors
Next steps … Optimize parallel implementation Spare matrix version
21
EECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.eduEECS Department / University of Tennessee EECS Department / University of Tennessee http://mil.engr.utk.edu