-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Parallel Numerical AlgorithmsChapter 4 – Sparse Linear
Systems
Section 4.3 – Iterative Methods
Michael T. Heath and Edgar Solomonik
Department of Computer ScienceUniversity of Illinois at
Urbana-Champaign
CS 554 / CSE 512
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 1 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Outline
1 Serial Iterative MethodsStationary Iterative MethodsKrylov
Subspace Methods
2 Parallel Iterative
MethodsPartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
3 PreconditioningSimple PreconditionersDomain
DecompositionIncomplete LU
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 2 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Iterative Methods for Linear Systems
Iterative methods for solving linear system Ax = b beginwith
initial guess for solution and successively improve ituntil
solution is as accurate as desired
In theory, infinite number of iterations might be required
toconverge to exact solution
In practice, iteration terminates when residual ‖b−Ax‖, orsome
other measure of error, is as small as desired
Iterative methods are especially useful when matrix A issparse
because, unlike direct methods, no fill is incurred
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 3 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Jacobi Method
Beginning with initial guess x(0), Jacobi method computesnext
iterate by solving for each component of x in terms ofothers
x(k+1)i =
(bi −
∑j 6=i
aijx(k)j
)/aii, i = 1, . . . , n
If D, L, and U are diagonal, strict lower triangular, andstrict
upper triangular portions of A, then Jacobi methodcan be
written
x(k+1) = D−1(b− (L + U)x(k)
)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 4 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Jacobi Method
Jacobi method requires nonzero diagonal entries, whichcan
usually be accomplished by permuting rows andcolumns if not already
true
Jacobi method requires duplicate storage for x, since
nocomponent can be overwritten until all new values havebeen
computed
Components of new iterate do not depend on each other,so they
can be computed simultaneously
Jacobi method does not always converge, but it isguaranteed to
converge under conditions that are oftensatisfied (e.g., if matrix
is strictly diagonally dominant),though convergence rate may be
very slow
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 5 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Gauss-Seidel Method
Faster convergence can be achieved by using each newcomponent
value as soon as it has been computed ratherthan waiting until next
iteration
This gives Gauss-Seidel method
x(k+1)i =
(bi −
∑ji
aijx(k)j
)/aii
Using same notation as for Jacobi, Gauss-Seidel methodcan be
written
x(k+1) = (D + L)−1(b−Ux(k)
)Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 6 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Gauss-Seidel Method
Gauss-Seidel requires nonzero diagonal entries
Gauss-Seidel does not require duplicate storage for x,since
component values can be overwritten as they arecomputed
But each component depends on previous ones, so theymust be
computed successively
Gauss-Seidel does not always converge, but it isguaranteed to
converge under conditions that aresomewhat weaker than those for
Jacobi method (e.g., ifmatrix is symmetric and positive
definite)
Gauss-Seidel converges about twice as fast as Jacobi, butmay
still be very slow
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 7 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
SOR Method
Successive over-relaxation (SOR ) uses step to nextGauss-Seidel
iterate as search direction with fixed searchparameter ωSOR
computes next iterate as
x(k+1) = x(k) + ω(x(k+1)GS − x
(k))
where x(k+1)GS is next iterate given by
Gauss-SeidelEquivalently, next iterate is weighted average of
currentiterate and next Gauss-Seidel iterate
x(k+1) = (1− ω)x(k) + ω x(k+1)GSIf A is symmetric, the SOR can
be written as theapplication of a symmetric matrix; this is the
SymmetricSuccessive Over-Relaxation method
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 8 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
SOR Method
ω is fixed relaxation parameter chosen to
accelerateconvergence
ω > 1 gives over-relaxation, while ω < 1
givesunder-relaxation (ω = 1 gives Gauss-Seidel method)
SOR diverges unless 0 < ω < 2, but choosing optimal ω
isdifficult in general except for special classes of matrices
With optimal value for ω, convergence rate of SOR methodcan be
order of magnitude faster than that of Gauss-Seidel
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 9 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Conjugate Gradient Method
If A is n× n symmetric positive definite matrix, thenquadratic
function
φ(x) = 12xTAx− xTb
attains minimum precisely when Ax = b
Optimization methods have form
xk+1 = xk + α sk
where α is search parameter chosen to minimize objectivefunction
φ(xk + α sk) along sk
For method of steepest descent, sk = −∇φ(x)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 10 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Conjugate Gradient Method
For special case of quadratic problem,
Negative gradient is residual vector
−∇φ(x) = b−Ax = r
Optimal line search parameter is given by
α = rTk sk/sTkAsk
Successive search directions can easily beA-orthogonalized by
three-term recurrence
Using these properties, we obtain conjugate gradientmethod (CG )
for linear systems
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 11 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Conjugate Gradient Method
x0 = initial guessr0 = b−Ax0s0 = r0for k = 0, 1, 2, . . .
αk = rTk rk/s
TkAsk
xk+1 = xk + αkskrk+1 = rk − αkAskβk+1 = r
Tk+1rk+1/r
Tk rk
sk+1 = rk+1 + βk+1skend
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 12 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Conjugate Gradient Method
Key features that make CG method effective
Short recurrence determines search directions that
areA-orthogonal (conjugate)Error is minimal over space spanned by
search directionsgenerated so far
Minimum error property implies that method producesexact
solution after at most n steps
In practice, rounding error causes loss of orthogonality
thatspoils finite termination property, so method is
usediteratively
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 13 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Conjugate Gradient Method
Error is reduced at each iteration by factor of
(√κ− 1)/(
√κ+ 1)
on average, where
κ = cond(A) = ‖A‖ · ‖A−1‖ = λmax(A)/λmin(A)
Thus, convergence tends to be rapid if matrix
iswell-conditioned, but can be arbitrarily slow if matrix
isill-conditioned
But convergence also depends on clustering ofeigenvalues of
A
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 14 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Stationary Iterative MethodsKrylov Subspace Methods
Nonsymmetric Krylov Subspace Methods
CG is not directly applicable to nonsymmetric or
indefinitesystems
CG cannot be generalized to nonsymmetric systemswithout
sacrificing one of its two key properties (shortrecurrence and
minimum error)
Nevertheless, several generalizations have beendeveloped for
solving nonsymmetric systems, includingGMRES, QMR, CGS, BiCG, and
Bi-CGSTAB
These tend to be less robust and require more storagethan CG,
but they can still be very useful for solving largenonsymmetric
systems
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 15 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Parallel Implementation
Iterative methods for linear systems are composed of
basicoperations such as
vector updates (saxpy)inner productsmatrix-vector
multiplicationsolution of triangular systems
In parallel implementation, both data and operations
arepartitioned across multiple tasks
In addition to communication required for these basicoperations,
necessary convergence test may requireadditional communication
(e.g., sum or max reduction)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 16 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Partitioning of Vectors
Iterative methods typically require several vectors,including
solution x, right-hand side b, residualr = b−Ax, and possibly
others
Even when matrix A is sparse, these vectors are usuallydense
These dense vectors are typically uniformly partitionedamong p
tasks, with given task holding same set ofcomponent indices of each
vector
Thus, vector updates require no communication, whereasinner
products of vectors require reductions across tasks,at cost we have
already seen
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 17 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Partitioning of Sparse Matrix
Sparse matrix A can be partitioned among tasks by rows,by
columns, or by submatrices
Partitioning by submatrices may give uneven distribution
ofnonzeros among tasks; indeed, some submatrices maycontain no
nonzeros at all
Partitioning by rows or by columns tends to yield moreuniform
distribution because sparse matrices typically haveabout same
number of nonzeros in each row or column
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 18 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Row Partitioning of Sparse Matrix
Suppose that each task is assigned n/p rows, yielding ptasks,
where for simplicity we assume that p divides n
In dense matrix-vector multiplication, since each task ownsonly
n/p components of vector operand, communication isrequired to
obtain remaining components
If matrix is sparse, however, few components may actuallybe
needed, and these should preferably be stored inneighboring
tasks
Assignment of rows to tasks by contiguous blocks orcyclically
would not, in general, result in desired proximityof vector
components
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 19 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Graph Partitioning
Desired data locality can be achieved by partitioning graphof
matrix, or partitioning underlying grid or mesh for
finitedifference or finite element problem
For example, graph can be partitioned into p pieces bynested
dissection, and vector components correspondingto nodes in each
resulting piece assigned to same task,with neighboring pieces
assigned to neighboring tasks
Then matrix-vector product requires relatively
littlecommunication, and only between neighboring tasks
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 20 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Two-Dimensional Partitioning
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 21 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Sparse MatVec with 2-D Partitioning
Partition entries of both x and y across processorsPartition
entries of A accordingly
(a) Send entries xj to processors with nonzero aij for some
i
(b) Local multiply-add: yi = yi + aijxj(c) Send partial inner
products to relevant processors
(d) Local sum: sum partial inner products
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 22 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Sparse MatVec with 2-D Partitioning
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 23 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Graph Partitioning Methods Types
Coordinate-basedCoordinate bisection
Inertial
Geometric
Multilevel
Coordinate-free
Level structure
Spectral
Combinatorial refinement (e.g., Kernighan-Lin)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 24 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Graph Partitioning Software
ChacoJostleMeshpartMetis/ParMetisMondriaanPartyScotchZoltan
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 25 / 52
http://www.cs.sandia.gov/~bahendr/chaco.htmlhttp://staffweb.cms.gre.ac.uk/~c.walshaw/jostle/http://www.cerfacs.fr/algor/Softs/MESHPART/http://glaros.dtc.umn.edu/gkhome/views/metishttp://www.math.uu.nl/people/bisseling/Mondriaan/mondriaan.htmlhttp://wwwcs.uni-paderborn.de/fachbereich/AG/monien/RESEARCH/PART/party.htmlhttp://www.labri.fr/perso/pelegrin/scotch/http://www.cs.sandia.gov/Zoltan/
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Coordinate-based partitioning
Assume graph embedded in d-dimensional space, so wehave
coordinates for nodesFind centerpoints, every hyperplane that
includes one is agood partition
G. Miller, S. Teng, W. Thurston, S. Vavasis, 1997Possible to
find vertex separators of size O(n(d−1)/d) formeshes with constant
aspect ratio – max relative distanceof edges in space
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 26 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Coordinate-free partitioning
Partitioning is hard for arbitrary graphsLevel partitioning –
breadth-first-search (BFS) tree levelsMaximal k-independent sets –
select vertices separated bydistance k and create partitions by
combining vertices withnearest vertex in k-independent setSpectral
partitioning looks at eigenvalues of Laplacianmatrix L of a graph G
= (V,E),
lij =
i = j : degree(V (i))(i, j) ∈ E : −1(i, j) /∈ E : 0
the eigenvector of L with the second smallest eigenvalue(the
Fiedler vector) provides a good partition of G
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 27 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Parallel Jacobi
We have already seen example of this approach withJacobi method
for 1-D Laplace equation
Contiguous groups of variables are assigned to each task,so most
communication is internal, and externalcommunication is limited to
nearest neighbors in 1-D mesh
More generally, Jacobi method usually parallelizes well
ifunderlying grid is partitioned in this manner, since
allcomponents of x can be updated simultaneously
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 28 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Parallel Gauss-Seidel and SOR
Unfortunately, Gauss-Seidel and SOR methods requiresuccessive
updating of solution components in given order(in effect, solving
triangular system), rather than permittingsimultaneous updating as
in Jacobi method
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 29 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Row-Wise Ordering for 2-D Grid
G (A)
5 6 7 8
1 2 3 4
9 10 11 12
13 14 15 16
A
××××
××× ×××
××××
××× ×××
××××
××× ×××
××××
××× ×××
× ××
×× ×
×
×× ×
×
×
× ××
×× ×
×
×× ×
×
×
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 30 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Red-Black Ordering
Apparent sequential order can be broken, however, ifcomponents
are reordered according to coloring ofunderlying graph
For 5-point discretization on square grid, for example,
coloralternate nodes in each dimension red and others black,giving
color pattern of chess or checker board
Then all red nodes can be updated simultaneously, as canall
black nodes, so algorithm proceeds in alternatingphases, first
updating all nodes of one color, then those ofother color,
repeating until convergence
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 31 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Red-Black Ordering for 2-D Grid
G (A)
11 3 12 4
1 9 2 10
5 13 6 14
15 7 16 8
A
×××
×
×××
×
×××
×
×××
×
××× ×
×× ××××
×
×
××
×
××
×
×
×× ×
×
×
×××××× × ×
××
×
×
×××
××
×
×
×× ×
××
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 32 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Multicolor Orderings
More generally, arbitrary graph requires more colors, sothere is
one phase per color in parallel algorithm
Nodes must also be partitioned among tasks, and loadshould be
balanced for each color
Reordering nodes may affect convergence rate, however,so gain in
parallel performance may be offset somewhat byslower convergence
rate
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 33 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Sparse Triangular Systems
More generally, multicolor ordering of graph of matrixenhances
parallel performance of sparse triangularsolution by identifying
sets of solution components that canbe computed simultaneously
(rather than in usualsequential order for triangular solution)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 34 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Parallel 1D 2-pt Stencil
Normally, halo exchange done before every stencil
application
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 35 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
In-time Blocking
Instead apply stencil repeatedly before larger halo exchange
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 36 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
In-time Blocking
Instead apply stencil repeatedly before larger halo exchange
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 37 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
In-time Blocking
Instead apply stencil repeatedly before larger halo exchange
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 38 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
In-time Blocking
Instead apply stencil repeatedly before larger halo exchange
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 39 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Analysis of in-time blocking for 1D mesh
For 1-D 2-pt stencil (3-pt stencil similar)
Consider t steps, and execute s without messages
Bring down latency cost by a factor of s, for t = Θ(n),
weimprove latency cost from Θ(αn) to Θ(αp) with s = n/p
Also improves flop-to-byte ratio of local subcomputations
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 40 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Analysis of in-time blocking for 2-D mesh
For 2-D mesh, there is more complexity
Consider t steps and execute s ≤√n/p without msgs
Need to do asymptotically more computation andinterprocessor
communication if s >
√n/p
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 41 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
PartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
Asynchronous or Chaotic Relaxation
Using updated values for solution components inGauss-Seidel and
SOR methods improves convergencerate, but limits parallelism and
requires synchronization
Alternatively, in computing next iterate, each processorcould
use most recent value it has for each solutioncomponent, rather
than waiting for latest value on anyprocessor
This approach, sometimes called asynchronous or
chaoticrelaxation, can be effective, but stochastic
behaviorcomplicates analysis of convergence and convergence
rate
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 42 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Simple PreconditionersDomain DecompositionIncomplete LU
Preconditioning
Convergence rate of iterative methods depends oncondition number
and can often be substantiallyaccelerated by preconditioning
Apply method to M−1A, where M is chosen so thatM−1A is better
conditioned than A, and systems of formMz = Ay are easily solved
for z
Typically, M is (block)-diagonal or triangular
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 43 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Simple PreconditionersDomain DecompositionIncomplete LU
Basic Preconditioners: M for M−1A
Polynomial (most commonly Chebyshev)
M−1 = poly(A)
neither M nor M−1 explicitly formed, latter applied bymultiple
SpMVs
Diagonal (Jacobi) (diagonal scaling)
M = diag(d)
sometime can use d = diag(A), easy but ineffective
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 44 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Simple PreconditionersDomain DecompositionIncomplete LU
Block Preconditioners: M for M−1AConsider partitioning
A =
[B EF C
]=
B1 E1. . . . . .
Bp EpF1 C11 . . . C1p
. . ....
. . ....
Fp Cp1 . . . Cpp
Block-diagonal (domain decomposition)
M =
[B
I
]so M−1A
[y1y2
]=
[y1 + B
−1Ey2Fy1 + Cy2
]iterative methods with M−1A can compute each B−1i inparallel to
get and apply B−1
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 45 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Simple PreconditionersDomain DecompositionIncomplete LU
Sparse Preconditioners: M for M−1A
Incomplete LU (ILU) computes an approximate LUfactorization,
ignoring fill entries throughout regular sparse LU
Let S ∈ N2 to be a sparsity mask and compute L,U on SLevel-0 ILU
factorization
ILU[0] : (i, j) ∈ S if and only if aij 6= 0
Given [L(0),U (0)]← ILU[0](A), our preconditioner will beM =
L(0)U (0) ≈ ALevel-1 ILU factorization
ILU[1] : (i, j) ∈ S if ∃k, l(0)ik u(0)kj 6= 0 or aij 6= 0
(generate fill only if updates are from
non-newly-filledentries)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 46 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
Simple PreconditionersDomain DecompositionIncomplete LU
Parallel Incomplete LU
When the number of nonzeros per row is small, computingILU[0] is
as hard as triangular solves with the factors
Elimination tree is given by spanning tree of original
graph,
filled graph = original graph
No need for symbolic factorization and lessermemory-usage
However, no general accuracy guarantees
ILU can be approximated iteratively with high concurrency(see
Chow and Patel, 2015)
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 47 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
References – Iterative Methods
R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J.
Dongarra,V. Eijkhout, R. Pozo, C. Romine and H. van der Vorst,
Templates for theSolution of Linear Systems: Building Blocks for
Iterative Methods,SIAM, 1994
A. Greenbaum, Iterative Methods for Solving Linear Systems,
SIAM,1997
Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed.,
SIAM,2003
H. A. van der Vorst, Iterative Krylov Methods for Large Linear
Systems,Cambridge University Press, 2003
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 48 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
References – Parallel Iterative Methods
J. W. Demmel, M. T. Heath, and H. A. van der Vorst, Parallel
numericallinear algebra, Acta Numerica 2:111-197, 1993
J. J. Dongarra, I. S. Duff, D. C. Sorenson, and H. A. van der
Vorst,Numerical Linear Algebra for High-Performance Computers,
SIAM,1998
I. S. Duff and H. A. van der Vorst, Developments and trends in
theparallel solution of linear systems, Parallel Computing
25:1931-1970,1999
M. T. Jones and P. E. Plassmann, The efficient parallel
iterative solutionof large sparse linear systems, A. George, J. R.
Gilbert, and J. Liu, eds.,Graph Theory and Sparse Matrix
Computation, Springer-Verlag, 1993,pp. 229-245
H. A. van der Vorst and T. F. Chan, Linear system solvers:
sparseiterative methods, D. E. Keyes, A. Sameh, and V.
Venkatakrishnan,eds., Parallel Numerical Algorithms, pp. 91-118,
Kluwer, 1997
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 49 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
References – Preconditioning
T. F. Chan and H. A. van der Vorst, Approximate and
incompletefactorizations, D. E. Keyes, A. Sameh, and V.
Venkatakrishnan, eds.,Parallel Numerical Algorithms, pp. 167-202,
Kluwer, 1997
M. J. Grote and T. Huckle, Parallel preconditioning with
sparseapproximate inverses, SIAM J. Sci. Comput. 18:838-853,
1997
Y. Saad, Highly parallel preconditioners for general sparse
matrices,G. Golub, A. Greenbaum, and M. Luskin, eds., Recent
Advances inIterative Methods, pp. 165-199, Springer-Verlag,
1994
H. A. van der Vorst, High performance preconditioning, SIAM J.
Sci.Stat. Comput. 10:1174-1185, 1989
E. Chow and A. Patel, Fine-grained parallel incomplete LU
factorization,SIAM journal on Scientific Computing, 37(2),
pp.C169-C193, 2015.
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 50 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
References – Graph Partitioning
B. Hendrickson and T. Kolda, Graph partitioning models for
parallelcomputing, Parallel Computing, 26:1519-1534, 2000.
G. Karypis and V. Kumar, A fast and high quality multilevel
scheme forpartitioning irregular graphs, SIAM J. Sci. Comput.
20:359-392, 1999
G. L. Miller, S.-H. Teng, W. Thurston, and S. A. Vavasis,
Automatic meshpartitioning, A. George, J. R. Gilbert, and J. Liu,
eds., Graph Theoryand Sparse Matrix Computation, Springer-Verlag,
1993, pp. 57-84
A. Pothen, Graph partitioning algorithms with applications to
scientificcomputing, D. E. Keyes, A. Sameh, and V. Venkatakrishnan,
eds.,Parallel Numerical Algorithms, pp. 323-368, Kluwer, 1997
C. Walshaw and M. Cross, Parallel optimisation algorithms for
multilevelmesh partitioning, Parallel Comput. 26:1635-1660,
2000
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 51 / 52
-
Serial Iterative MethodsParallel Iterative Methods
Preconditioning
References – Chaotic Relaxation
R. Barlow and D. Evans, Synchronous and asynchronous
iterativeparallel algorithms for linear systems, Comput. J.
25:56-60, 1982
R. Bru, L. Elsner, and M. Newmann, Models of parallel chaotic
iterationmethods, Linear Algebra Appl. 103:175-192, 1988
G. M. Baudet, Asynchronous iterative methods for
multiprocessors, J.ACM 25:226-244, 1978
D. Chazan and W. L. Miranker, Chaotic relaxation, Linear Algebra
Appl.2:199-222, 1969
A. Frommer and D. B. Szyld, On asynchronous iterations, J.
Comput.Appl. Math. 123:201-216, 2000
Michael T. Heath and Edgar Solomonik Parallel Numerical
Algorithms 52 / 52
Serial Iterative MethodsStationary Iterative MethodsKrylov
Subspace Methods
Parallel Iterative
MethodsPartitioningOrderingCommunication-Avoiding Iterative
MethodsChaotic Relaxation
PreconditioningSimple PreconditionersDomain
DecompositionIncomplete LU