8/3/2019 Chap 15 Slides
1/23
The Conjugate Gradient Method
Tom Lyche
University of Oslo
Norway
The Con u ate Gradient Method . 1/
http://www.ifi.uio.no/http://www.ifi.uio.no/8/3/2019 Chap 15 Slides
2/23
Plan for the day
The methodAlgorithm
Implementation of test problems
Complexity
Derivation of the method
Convergence
The Con u ate Gradient Method . 2/
8/3/2019 Chap 15 Slides
3/23
The Conjugate gradient method
Restricted to positive definite systems: Ax = b,A Rn,n positive definite.Generate {xk} by xk+1 = xk + kpk,pk is a vector, the search direction,
k is a scalar determining the step length.
In general we find the exact solution in at most niterations.
For many problems the error becomes small after a fewiterations.
Both a direct method and an iterative method.
Rate of convergence depends on the square root of thecondition number
The Con u ate Gradient Method . 3/
8/3/2019 Chap 15 Slides
4/23
The name of the game
Conjugate means orthogonal; orthogonal gradients.But why gradients?
Consider minimizing the quadratic function Q : Rn
R
given by Q(x) := 12xTAx xTb.The minimum is obtained by setting the gradient equalto zero.
Q(x) = Ax b = 0 linear system Ax = bFind the solution by solving r = bAx = 0.
The sequence {xk} is such that {rk} := {bAxk} isorthogonal with respect to the usual inner product in Rn.
The search directions are also orthogonal, but with
respect to a different inner product.
The Con u ate Gradient Method . 4/
8/3/2019 Chap 15 Slides
5/23
The algorithm
Start with some x0. Set p0 = r0 = bAx0.For k = 0, 1, 2, . . .
xk+1
= xk
+ kp
k,
k= r
Tk rk
pTkApk
rk+1 = bAxk+1 = rk kApkpk+1 = rk+1 + kpk, k =
rTk+1rk+1
rT
k rk
The Con u ate Gradient Method . 5/
8/3/2019 Chap 15 Slides
6/23
Example
2 11 2
[ x1x2 ] = [ 10 ]
Start with x0 = 0.
p0
= r0 = b = [1, 0]T
0 =rT0 r0
pT0Ap0= 12 , x1 = x0 + 0p0 = [
00 ] +
12 [
10 ] =
1/2
0
r1 = r0 0Ap0 = [10 ]
1
2 21
= 0
1/2, r
T
1 r0 = 0
0 =rT1 r1rT0 r0
= 14 , p1 = r1 + 0p0 =
01/2
+ 14 [
10 ] =
1/41/2
,
1 = rT1 r1
pT1Ap1= 23 ,
x2 = x1 + 1p1 =
1/20
+ 23
1/41/2
=
2/31/3
r2 = 0, exact solution.
The Con u ate Gradient Method . 6/
8/3/2019 Chap 15 Slides
7/23
Exact method and iterative method
Orthogonality of the residuals implies that xm is equal to the solutionx of Ax = b for some m n.For if xk = x for all k = 0, 1, . . . , n 1 then rk = 0 for
k = 0, 1, . . . , n 1 is an orthogonal basis forRn
. But then rn Rn
isorthogonal to all vectors in Rn so rn = 0 and hence xn = x.
So the conjugate gradient method finds the exact solution in at most
n iterations.
The convergence analysis shows that x xkA typically becomessmall quite rapidly and we can stop the iteration with k much smaller
that n.
It is this rapid convergence which makes the method interesting and
in practice an iterative method.
The Con u ate Gradient Method . 7/
8/3/2019 Chap 15 Slides
8/23
Conjugate Gradient Algorithm
[Conjugate Gradient Iteration] The positive definite linear system Ax = b is
solved by the conjugate gradient method. x is a starting vector for the iteration. The
iteration is stopped when ||rk||2/||r0||2 tol or k > itmax. itm is the number ofiterations used.
function [ x , i tm ]= cg (A, b , x , t o l , i tmax ) r=bAx ; p=r ; rho=r r ;rho0=rho ; f o r k=0: i tmax
i f s q r t ( rho / rho0)
8/3/2019 Chap 15 Slides
9/23
A family of test problems
We can test the methods on the Kronecker sum matrix
A = C1I+IC2 =
C1
C1
. . .
C1
C1
+
cI bI
bI cI bI
. . . . . . . . .
bI cI bI
bI cI
,
where C1 = tridiagm(a,c,a) and C2 = tridiagm(b,c,b).
Positive definite if c > 0 and c |a| + |b|.
The Con u ate Gradient Method . 9/
8/3/2019 Chap 15 Slides
10/23
m = 3, n = 9
A =
2c a 0 b 0 0 0 0 0
a 2c a 0 b 0 0 0 0
0 a 2c 0 0 b 0 0 0
b 0 0 2c a 0 b 0 0
0 b 0 a 2c a 0 b 0
0 0 b 0 a 2c 0 0 b
0 0 0 b 0 0 2c a 0
0 0 0 0 b 0 a 2c a
0 0 0 0 0 b 0 a 2c
b = a = 1, c = 2: Poisson matrixb = a = 1/9, c = 5/18: Averaging matrix
The Con u ate Gradient Method . 10/
8/3/2019 Chap 15 Slides
11/23
Averaging problem
jk = 2c + 2a cos(jh) + 2b cos(kh), j ,k = 1, 2, . . . , m .a = b = 1/9, c = 5/18
max =5
9
+ 4
9
cos(h), min =5
9 4
9
cos(h)
cond2(A) =maxmin
= 5+4 cos(h)54 cos(h) 9.
The Con u ate Gradient Method . 11/
8/3/2019 Chap 15 Slides
12/23
2D formulation for test problems
V= vec(x). R= vec(r), P = vec(p)Ax = b DV+ V E= h2F,D = tridiag(a,c,a)
Rm,m, E= tridiag(b,c,b)
Rm,m
vec(Ap) = DP+ PE
The Con u ate Gradient Method . 12/
8/3/2019 Chap 15 Slides
13/23
Testing
[Testing Conjugate Gradient ] A = trid(a,c,a,m) Im + Im trid(b,c,b,m) Rm2,m2
function [V , i t ]= cg te s t (m, a , b , c , t o l , i tmax )
h=1/(m+1); R=hhones(m);D=sparse ( t r i d i a g o n a l ( a , c , a ,m) ) ; E=sparse ( t r i d i a g o n a l ( b , c , b ,m) ) ;
V=zeros (m,m) ; P=R; rho=sum(sum(R.R) ) ; rho0=rho ;f o r k=1: i tmax
i f s q r t ( rho / rho0)
8/3/2019 Chap 15 Slides
14/23
The Averaging Problem
n 2 500 10 000 40 000 1 000 000 4 000 000
K 22 22 21 21 20
Table 1: The number of iterations K for the averag-
ing problem on a
n n grid. x0 = 0 tol = 108
Both the condition number and the required number of iterations are
independent of the size of the problem
The convergence is quite rapid.
The Con u ate Gradient Method . 14/
8/3/2019 Chap 15 Slides
15/23
Poisson Problem
jk = 2c + 2a cos(jh) + 2b cos(kh), j ,k = 1, 2, . . . , m .a = b = 1, c = 2max = 4 + 4 cos (h), min = 4
4cos(h)
cond2(A) =maxmin
= 1+cos(h)1cos(h) = cond(T)2.
cond2(A) = O(n).
The Con u ate Gradient Method . 15/
Th P i bl
8/3/2019 Chap 15 Slides
16/23
The Poisson problem
n 2 500 10 000 40 000 160 000
K 140 294 587 1168
K/
n 1.86 1.87 1.86 1.85
Using CG in the form of Algorithm 8 with = 108 and x0 = 0 we list
K, the required number of iterations and K/
n.
The results show that K is much smaller than n and appears to be
proportional to
n
This is the same speed as for SOR and we dont have to estimateany acceleration parameter!
n is essentially the square root of the condition number of A.
The Con u ate Gradient Method . 16/
C l it
http://-/?-http://-/?-8/3/2019 Chap 15 Slides
17/23
Complexity
The work involved in each iteration is1. one matrix times vector (t = Ap),
2. two inner products (pTt and rTr),
3. three vector-plus-scalar-times-vector (x = x + ap,r = r at and p = r + (rho/rhos)p),
The dominating part of the computation is statement 1.Note that for our test problems A only has O(5n) nonzeroelements. Therefore, taking advantage of the sparseness ofA we can compute t in O(n) flops. With such an
implementation the total number of flops in one iteration isO(n).
The Con u ate Gradient Method . 17/
M C l it
8/3/2019 Chap 15 Slides
18/23
More Complexity
How many flops do we need to solve the test problemsby the conjugate gradient method to within a giventolerance?
Average problem. O(n) flops. Optimal for a problemwith n unknowns.
Same as SOR and better than the fast method based
on FFT.Discrete Poisson problem: O(n3/2) flops.
same as SOR and fast method.
Cholesky Algorithm: O(n2) flops both for averaging andPoisson.
The Con u ate Gradient Method . 18/
A l i d D i ti f th M th d
8/3/2019 Chap 15 Slides
19/23
Analysis and Derivation of the Method
Theorem 3 (Orthogonal Projection). LetSbe a subspace of a finitedimensional real or complex inner product space(V,F, , , ). To eachx Vthere is a unique vectorp Ssuch that
xp, s = 0, for alls S. (1)
x
x
x - p
p=P
S S
The Con u ate Gradient Method . 19/
B t A i ti
8/3/2019 Chap 15 Slides
20/23
Best Approximation
Theorem 4 (Best Approximation). LetSbe a subspace of a finitedimensional real or complex inner product space(V,F, , , ). Letx V, andp S. The following statements are equivalent
1.x
p
,s = 0,
for allsS.
2. x s > xp for alls Swiths = p.
If(v
1, . . . ,vk)
is an orthogonal basis for S then
p =k
i=1
x,vi
vi,vi
vi. (2)
The Con u ate Gradient Method . 20/
D i ti f CG
8/3/2019 Chap 15 Slides
21/23
Derivation of CG
Ax = b, A Rn,n
is pos. def., x, b Rn
(x,y) := xTy, x,y Rn
x,y
:= xTAy = (x,Ay) = (Ax,y)
xA = xTAxW0 = {0}, W1 = span{b}, W2 = span{b,Ab},Wk = span{b,Ab,A
2
b, . . . ,Ak1
b}W0 W1 W2 Wk dim(Wk)
k, w
Wk
Aw
Wk+1
xk Wk, xk x,w = 0 for all w Wkp0 = r0 := b, pj = rj
j1i=0
rj ,pipi,pi
pi, j = 1, . . . , k .
The Con u ate Gradient Method . 21/
Convergence
8/3/2019 Chap 15 Slides
22/23
Convergence
Theorem 5. Suppose we apply the conjugate gradient method to apositive definite systemAx = b. Then theA-norms of the errors satisfy
||x
xk
||A
||x x0||A 2
1
+ 1k
, for k 0,where = cond2(A) = max/min is the 2-norm condition number of
A.This theorem explains what we observed in the previoussection. Namely that the number of iterations is linked to
, the square root of the condition number of A. Indeed,
the following corollary gives an upper bound for the numberof iterations in terms of
.
The Con u ate Gradient Method . 22/
8/3/2019 Chap 15 Slides
23/23
Corollary 6. If for some > 0 we havek 12 ln(
2 ) then||x xk||A/||x x0||A .
The Con u ate Gradient Method . 23/