Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Modern iterative methods

For basic iterative methods, converge linearlyModern iterative methods, converge faster– Krylov subspace method

• Steepest descent method• Conjugate gradient (CG) method --- most popular• Preconditioning CG (PCG) method• GMRES for nonsymmetric matrix

– Other methods (read yourself)• Chebyshev iterative method• Lanczos methods• Conjugate gradient normal residual (CGNR)

cDxRDxcxRxDbxA mm 1)(1)1(

bxxAxxx TT

x n

2

1:)()(min

Modern iterative methods

Ideas:– Minimizing the residual – Projecting to Krylov subspace

Thm: If A is an n-by-n real symmetric positive definite matrix, then

have the same solutionProof: see details in class

bxxAxxbxA TT

xx nn

2

1min)(min

bAbxbAx T 11

2

1*)(*

bxxAxxdxdxx TT

mm

mmmm

2

1:)()(min )()()1(

Steepest decent method

Suppose we have an approximation Choose the direction as negative gradient of

– If

– Else, choose to minimize

*xxc

ccxxxxc rxAbbxAxdcc

:|)(|)(

cd

c

)(x

!solution!exact theis 0 ccc xxAbr

)( cc dx


Computation

Choose asc

cT

c

cT

c

cTc

cTc

c rAr

rr

dAd

dd

)()(2

1)(

)()(2

1)(

)()()(2

1)(

2

2

cTcc

Tcc

cTcc

Tcc

Tcccc

Tcccc

dddAdx

bxAddAdx

bdxdxAdxdx

Algorithm– Steepest descent method

)xx

xAbrrαxx

rArrrαmm

r

mxAbr

x

mm

mmmm

mm

mT

mmT

mm

m

10)1()(

)(11

)1()(

11111

)0(0

)0(

10 (e.g. until

&

)/()( & 1

0 while

0set & Compute

guess Initial

Theory

Suppose A is symmetric positive definite.Define A-inner product

Define A-norm


yAxyAxyx TA

),(),(

xAxxxx TAA

),(

2

Amm

mmm

mm

mmmm

rr

rrαxAbr

rαxxx

),(

),( &

& guess Initial

)(

)()1()0(

Theory

Thm: For steepest decent method, we have

Proof: Exercise

*)()()(

11

2

1)(

)(

11

2

1)(*)()(

)1(

2

1)1(

2

1)()(

xxAk

bAbxAk

bAbxxx

m

Tm

Tmm

Theory

Rewrite the steepest decent method

Let errors

Lemma: For the method, we have m

mmmmmm reexxexxe

)1()()()()()( ~*~~*

(m)m

Amm

mmmm

(m))(m

mm

mmmm

mm

xA brrr

rrαrx x

xxrxx

),(

),(~

~)1(

1

)1()()()1(

0),()~,(

~)1()1()1()()1(

)1()()1(

Amm

Ammm

mm

mm

m

reeee

eee

Theory

Thm: For steepest decent method, we have

Proof: See details in class (or as an exercise)

A

m

A

m

A

m

A

m

AmmA

m

A

mmmA

m

A

m

eAk

Ake

eere

eeee

)(

2

2)1(

)()1(222)1(

2)1()(22)1(2)(

1)(

1)(

nrom.-A of sense thein

convergently monotional algorithm The

~


Performance– Converge globally, for any initial data– If , then it converges very fast– If , then it converges very slow!!!

Geometric interpretation– Contour plots are flat!!– Local best direction (steepest direction) is not necessarily a global best direction – Computational experience shows that the method suffers a decreasing

convergence rate after a few iteration steps because the search directions become linearly dependent!!!

)1()(2 OAk

1)(2 Ak

Conjugate gradient (CG) method

Since A is symmetric positive definite, A-norm

In CG method, the direction vectors are chosen to be A-orthogonal (and called as conjugate vectors), i.e.

xAxxxx TAA

),(

midAd mT

i ,0)(

CG method

In addition, we take the new direction vector as a linear combination of the old direction vector and the descent direction as

By the assumption we get 0)( 1 mT

m dAd

mT

m

mT

mmm

Tmmm

dAd

dArdAdr

)(

)()(0

)( 1 m

mmmmm xAbrdrd

Algorithm– CG Method

(0)

(0)0 0 0

( 1) ( )1

1 2

Choose initial guess

Compute & set

For 0,1,..., do

Compute ( ) / ( )

&

If

T Tm m m m m

m mm m m m m m

m

x

r b A x d r

m

α r r d A d

x x α d r r α A d

r

10

1 11 1

(e.g. 10 , then

( ) &

( )

endif

endfor

Tm m

m m m m mTm m

)

r rd r d

r r

An example

An example

Initial guess

The approximate solutions

1

1

1

7

7

7

511

151

115

xbxAbA

0000.0

0000.0

0000.0)0(x

0003.1

0003.1

0003.1

,1429.0,

0000.7

0000.7

0000.7)1(

000 xdr

CG method

In CG method, are A-orthogonal!

Define the linear space as

Lemma: In CG method, for m=0,1,…., we have

– Proof: See details in class or as an exercise

1 & mm dd

0),()(A respect to with 111 AmmmT

mmm dddAddd

0 11

span{ , , , } { | , }m

m i i ii

d d d y y d

},,,{span

},,,{span},,,{span

000

1010

rArAr

rrrdddm

mm

CG method

In CG method, is A-orthogonal to or

Lemma: In CG method, we have

– Proof: See details in class or as an exercise

Thm: Error estimate for CG method

1md

mddd

,,, 10

A respect to with},,,{span 101 mm dddd

jirrdAd jT

ijT

i ,0)(,0)(

min

max)(&*

:&,2,1,01)(

1)(2

2

1

22)()(

2

2

)0(

)(

AAAkxxe

xAxxmAk

Ak

e

e

mm

T

A

m

A

A

m

CG method

Computational cost– At each iteration, 2 matrix-vector multiplications. This

can be further reduced to 1 matrix-vector multiplications

– At most n steps, we can get the exact solution!!!Convergence rate depends on the condition #– K2(A)=O(1), converges very fast!!– K2(A)>>1, converges slow but can be accelerated by

preconditioning!!

Preconditioning

Ideas: Replace by satisfying

– C is symmetric positive definite – is well-conditioned, i.e. – can be easily solved

Conditions for choosing the preconditioning matrix– as small as possible– is easy to compute– Trade-off

bxA

bxA~~~

xxC ~A~ )()

~( 22 AkAk

)~

(2 Ak1C

bCbxCxACCA 111 ~~~

Algorithm– PCG Method

(0) (0)0

0 0 0 0

( 1) ( )1

Choose initial guess & compute

Solve & set

For 0,1,..., do

Compute ( ) / ( )

&

T Tm m m m m

m mm m m m m

x r b A x

Cr r d r

m

α r r d A d

x x α d r r α

101 2

1 1

1 11 1

If (e.g. 10 , then

Solve

( ) &

( )

endif

endfor

m

m

m m

Tm m

m m m m mTm m

A d

r )

Cr r

r rd r d

r r

Preconditioning

Ways to choose the matrix C (read yourself)– Diagonal part of A– Tri-diagonal part of A– m-step Jacobi preconditioner– Symmetric Gauss-Seidel preconditioner– SSOR preconditioner– In-complete Cholesky decomposition– In-complete block preconditioning– Preconditioning based on domain decomposition– …….

Extension of CG method to nonsymmetric

Biconjugate gradient (BiCG) method: – Solve simultaneously– Works well for A is positive definite, not symmetric– If A is symmetric, BiCG reduces to CG

Conjugate gradient squared (CGS) method– A has a special formula in computing Ax, its transport hasn’t– Multiplication by A is efficient but multiplication by its transport

is not

& TAx b A y b

Krylov subspace methods

Problem I. Linear systemProblem II. Variational formulation

Problem III. Minimization problem

– Thm1: Problem I is equivalent to Problem II– Thm2: If A is symmetric positive definite, they are equivalent

bxA

),(),(2

1

2

1:)()(min bxxxAbxxAxxx TT

x n

nvvbvxA

),(),(


To reduce problem size, we replace by a subspace

Subspace minimization: – Find – Such that

Subspace projection

n

),(),(2

1

2

1)()(min )(

)0(bxxxAbxxAxxx TTm

Sx m

guess initial an with},,,{span )0(110

)0( xdddSSx mmm

1( ) (0)

0

m

mk k

k

x x d

mm

kkm

SvvbvxA

mkdbdxA

),(),(

10),(),()(

)(

: )0(n)(m

m Sxx


To determine the coefficients, we have – Normal Equations

– It is a linear system with degree m!!

m=1: line minimization or linear search or 1D projection

By converting this formula into an iteration, we reduce the original problem into a sequence of line minimization (successive line minimization ).

1,,1,0)()()()(

1,,1,0)( )(

0)0(

1

0

1

0

)0(

mkrdxAbddAd

mkbddxAd

Tk

Tk

m

lll

Tk

Tkl

m

ll

Tk

00)0()1(

00

000

)(

)(dxx

dAd

rdT

T

For symmetric matrix

Positive definite– Steepest decent method

– CG method

– Preconditioning CG method

Non-positive definite – MINRES (minimum residual method)

kk rd

0),( 11 Akkkkkk dddrd

2

)(min mxAb

),,(},,,{span 001

00)0( mrAKrArArSSx m

mm

For nonsymmetric matrix

Normal equations method (or CGNR method)

GMRES (generalized minimium residual method)– Saad & Schultz, 1986 – Ideas:

• In the m-th step, minimize the residual over the set

• Use Arnoldi (full orthogonal) vectors instead of Lanczos vectors• If A is symmetric, it reduces to the conjugate residual method

bAbAAAbxAbxA TT

~

&~

with~~

2

)(mxAb

),,(},,,{span 001

00)0( mrAKrArArSSx m

mm

Algorithm– GMRES

(0)

(0)0 1,0 0 2

1,

1 1,

,

Choose initial guess

Compute & set 0

while 0

/ 1 &

for 1

( ) &

k k

k k k k k k

Ti,k i k k k i k i

x

r b A x h r k

h

q r h k k r A q

i , k

h q r r r h q

( ) (0)1, 1 0 12 2

( ) ( 1) 10

end

with min

until (e.g. 10

kk k k k k , k k

k k

h r x x Q y h e H y

x x )

More topics on Matrix computations

Eigenvalue & eigenvector computations

If A is symmetric: Power method

If A is general matrix– Householder matrix (transform)

– QR method

nnn xAxxA

with

IPPPPPvvvvv

IP TTnTT

2&0 2

matrix triangle &matrix orthogonal with RQRQA

221 &matrix enbergupper Hess nT PPPUCUAU

More topics on matrix computations

Singular value decomposition (SVD)Thm: Let A be an m-by-n real matrix, there exists orthogonal matrices U & V

such that

Proof: Exercise

nmA

nnn

mmm vvvVuuuU ],,,[&],,,[ 2121

)(rank&0 with

},min{},,,{diag

121

21

Ar

nmpVAU

prr

nmp

T

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Documents