hkumath.hku.hkhkumath.hku.hk/~wkc/course/part3.pdf · PART III (5) Computation with Markov Chains: Iterative Methods-Vector Norms and Matrix Norms-Power Method for Matrix Eigenvalues-Iterative

PART III

(5) Computation with Markov Chains: Iterative Methods

-Vector Norms and Matrix Norms-Power Method for Matrix Eigenvalues-Iterative Methods for System of Linear Equations-Spectral Radius-Steepest Descent Method and Conjugate Gradient Method

(6) Markovian Queueing Networks, Manufacturing and Re-manufacturingSystems

-A Single Markovian Queue (M/M/s/n-s-1)-Two-Queue Free Models-Two-Queue Overflow Free Models-Manufacturing and Re-manufacturing Systems:

Computational science is now the third paradigm of science,complementing theory and experiment.

Kenneth G. Wilson, Nobel Laureate in Physics. (Wikipedia)

http://hkumath.hku.hk/∼wkc/course/part3.pdf

1

5 Computation with Markov Chains: Iterative Methods

5.1 From Numerical Analysis to Scientific Computing

• It was until 20th century, “Numerical Analysis” became a rec-ognized mathematical discipline though there were a lot of numeri-cal methods for getting approximate solution of many mathematicalproblems in the ancient time.

• In fact, many numerical methods bear the name of great mathe-maticians such as Gauss, Fourier, Jacobi and Newton etc.

• For example, Newton’s divided difference method is a famous in-terpolation formula for fitting a polynomial for a given set of pointsand Gaussian elimination is a direct method for solving a system oflinear equations.

2

Figure 1: Gauss (1777-1798) (left) Fourier (1768-1830) (Right)

Figure 2: Jacobi (1804-1851) (Left) Newton (1642-1727) (Right) Taken from Wikipedia

3

• The followings are two books on history of Numerical Analy-sis and Scientific Computing:

(1) H. H. Goldstine, A History of Numerical Analysis from the 15thThrough the 19th Century, Springer, New York, 1977.

(2) H. H. Goldstine, The Computer from Pascal to von Neumann,Princeton University Press, Princeton, NJ, 1972.

• You can also consult the slides on the topic “Key Moments inthe History of Numerical Analysis” (pdf) by Michele Benzi(The remainder of the note on history of Numerical Analysis is par-tially taken from the slides).

http : //history.siam.org/pdf/nahist Benzi.pdf

4

• There was very little work in Numerical Analysis/ Compu-tational Mathematics before 1940. The following is a remarkablework on numerical method for solving differential equations.

• L. F. Richardson, The approximate arithmetical solution by finitedifferences of physical problems involving differential equationswith an application to stresses in a masonry dam, PhilosophicalTransactions of the Royal Society of London, A, 210 (1910) 307-357.

• Theoretically speaking, to approximate a differential equation bythe finite difference method is an excellent idea. However, solv-ing a large system of linear equations (thousands or millions)is computationally infeasible without a computer! Thus it seems thatit is useless in doing research in such area.

5

• Two important events trigger the development of Numerical Anal-ysis and Scientific computing:

(i) The second world war (1939);(ii) The invention of the first digital computer (1945).

• Due to the second war, a lot of European refugees, especially fromNazi Germany and Fascism Italy have moved to US.

• Many of them were scientists and gave important contributions tothe war effort.

6

Figure 3: Courant (1888-1972) Taken from Wikipedia.

One important giant, Richard Courant, who was the successor of David Hilbert as the director of the famous Mathematical Institute in

Gottingen (a leading center for research in quantum physics in 1920-30). Courant left Germany in 1933 as he was classified as a Jew by the

Nazis. After one year in Cambridge, Courant went to New York City where he became a professor at New York University in 1936. He was given

the task of founding an institute for graduate studies in mathematics, a task which he carried out very successfully. The Courant Institute of

Mathematical Sciences (as it was renamed in 1964) continues to be one of the most respected research centers in applied mathematics. (Taken

from Wikipedia).

7

Figure 4: von Neumann (1903-1957) Taken from Wikipedia.

Another giant is John von Neumann. Between 1926 and 1930, he taught at the University of Berlin, the youngest in its history. His father,

Max von Neumann died in 1929. In 1930, von Neumann, his mother, and his brothers emigrated to the United States. Von Neumann was

invited to Princeton University, New Jersey, in 1930, and, subsequently, was one of the first four people selected for the faculty of the Institute

for Advanced Study (two of the others being Albert Einstein and Kurt Gdel), where he remained a mathematics professor from its formation

in 1933 until his death. (Taken from Wikipedia).

8

Figure 5: The first ENIAC (1945) Taken from Wikipedia.

The first large-scale electronic computer: Electronic Numerical Integrator and Com-puter (ENIAC) was built in 1945.

9

Figure 6: Goldstine (1913-2004) Taken from Wikipedia.

After World War II Goldstine joined von Neumann and Burks at the Institute for Advanced Study in Princeton where they built a computer

referred to as the IAS machine. The IAS machine influenced the design of IBM’s early computers, through von Neumann who was a consultant

to IBM. When von Neumann died in 1958, the IAS computer project terminated. von Neumann and Goldstine started the pioneering work in

numerical linear algebra. Taken from Wikipedia.

10

Figure 7: Hestenes (1906-1991) (Left) Lanczos (1893-1974)(Middle) Stiefel (1909-1978) (Right) Taken from Wikipedia

M. Hestenes together with C. Lanczos and E. Stiefel invented the conjugate gra-dient (CG) method in 1950s.

CG method is an iterative method for solving a symmetric positive definite linearsystem.

11

Figure 8: Young (1923-2008) Taken from Wikipedia.

David M. Young designed the Successive Over-Relaxation (SOR), a variantof the Gauss-Seidel method for solving a system of linear equations which has fasterconvergence in 1950s. He is also called Dr. SOR.

12

5.2 Vector Norms and Matrix Norms

Definition 1 On a vector space V , a norm is a function ∥ · ∥ from V to theset of non-negative real numbers such that

(i) ∥x∥ > 0 ∀x ∈ V and x = 0;

(ii) ∥λx∥ = |λ|∥x∥ ∀x ∈ V, λ ∈ R;

(iii) ∥x + y∥ ≤ ∥x∥ + ∥y∥ ∀x,y ∈ V .

Proposition 1 The followings are three popular vector norms: 1

(a) ℓ2-norm: ∥x∥2 =(

n∑i=1

x2i

)12

where x = (x1, . . . , xn)T ;

(b) ℓ1-norm: ∥x∥1 =n∑

i=1

|xi| =(

n∑i=1

|xi|1)1

;

(c) ℓ∞-norm: ∥x∥∞ = max1≤i≤n

{|xi|}.

1An iterative method produces a sequence of approximate solution for the solution of a system of linear equations. To measure the error of the approximates,we have to introduce a measurement: vector norm and matrix norm.

13

Proof: (i) We note that if x = 0, then there is at least one xi = 0. Thus(n∑

i=1

x2i

)12

= 0.

(ii) We note that

||λx||2 =

(n∑

i=1

(λxi)2

)12

= |λ|

(n∑

i=1

x2i

)12

.

(iii) Moreover, we also have((n∑

i=1

x2i

)12

+

(n∑

i=1

y2i

)12

)2

=

(n∑i=1

x2i

)+

(n∑i=1

y2i

)+ 2

(n∑i=1

x2i

)12(

n∑i=1

y2i

)12

≥(

n∑i=1

x2i

)+

(n∑i=1

y2i

)+ 2

(n∑i=1

|xi||yi|)

=

(n∑i=1

(|xi| + |yi|)2)≥(

n∑i=1

(|xi + yi|)2).

Hence the result follows.

14

(b) (i) We note that if x = 0, then there is at least one xi = 0. Thus n∑i=1

|xi|

= 0.

(ii) Second, we have

||λx||1 =

n∑i=1

(|λxi|)

= |λ|

n∑i=1

|xi|

.

(iii) Moreover, we have(n∑i=1|xi|

)+

(n∑i=1|yi|

)=

n∑i=1

(|xi| + |yi|) ≥n∑i=1

(|xi + yi|).


15

(c) (i) If x = 0, then there is at least one xi = 0. Thus we have

maxi{|xi|} = 0.

(ii) We also have

||λx||∞ = maxi{|λxi|} = |λ|max

i{|λxi|}.

(iii) Finally, we have

||x + y||∞ = maxi{|xi + yi|}≤ maxi{|xi|} +maxi{|yi|}= ||x||∞ + ||y||∞.


• We have the above THREE popular vector norms. Is there any other vectornorm? In fact, the answer is yes and we shall introduce a family of vector normswith the above 3 vector norms being a particular case.

16

5.2.1 The ℓp-norm

For p ≥ 1, the following is a vector norm

||x||p =

(n∑

i=1

|xi|p)1

p

.

(i) It is clear that if x = 0 then ||x||p > 0.(ii) We have

||λx||p =

(n∑

i=1

|λxi|p)1

p

= |λ|

(n∑

i=1

|xi|p)1

p

= |λ|||x||p.

(iii) Finally we have to show that

||x + y||p ≤ ||x||p + ||y||p,

i.e. (n∑i=1

|xi + yi|p)1

p

≤

(n∑

i=1

|xi|p)1

p

+

(n∑

i=1

|yi|p)1

p

.

For p = 1, we have proved and we shall consider p > 1.

17

Lemma 1 Let p > 1 and define q such that 1p +

1q = 1. Then for any non-

negative a and b, we have

a1pb

1q ≤ a

p+

b

q.

Proof: The inequality is true for b = 0 or b = a.Case 1: Assume that a ≥ b > 0, let x = a/b ≥ 1.Then the inequality can be re-written as

x1p − 1 ≤ 1

p(x− 1) for x ≥ 1.

Define

f (x) = x1p − 1− 1

p(x− 1)

and we have

f ′(x) =1

px

1p−1 − 1

p=

1

p

(x−

1q − 1

).

Since f (1) = 0 and f ′(x) ≤ 0 for x ≥ 1. Then we have f (x) ≤ 0 for x ≥ 1.Case 2: Assume b > a > 0 we let x = b/a. The proof is similar to Case 1.

18

Lemma 2 Let p > 1 and q be defined such that 1p +

1q = 1. Then

n∑i=1

|xiyi| ≤

(n∑

i=1

|xi|p)1

p(

n∑i=1

|yi|q)1

q

.

Proof: Let

A =

(n∑

i=1

|xi|p)1

p

, B =

(n∑

i=1

|yi|q)1

q

, ai =|xi|p

Ap, bi =

|yi|p

Bp

By Lemma 1 we have for i = 1, 2, . . . , n

a1p

i b1q

i =|xi||yi|AB

≤ aip+biq.

n∑i=1

|xi||yi| ≤ AB

n∑i=1

(aip+biq

)=

(1

p+

1

q

)AB =

(n∑

i=1

|xi|p)1

p(

n∑i=1

|yi|q)1

q

.

19

• Now for p > 1 and x,y = 0, we haven∑

i=1

|xi + yi|p =n∑i=1

|xi + yi||xi + yi|p−1 ≤n∑

i=1

|xi||xi + yi|p−1 +n∑

i=1

|yi||xi + yi|p−1

By Lemma 2, we have

n∑i=1

|xi||xi + yi|p−1 ≤

(n∑

i=1

|xi|p)1

p(

n∑i=1

|xi + yi|(p−1)q)1

q

andn∑i=1

|yi||xi + yi|p−1 ≤

(n∑i=1

|yi|p)1

p(

n∑i=1

|xi + yi|(p−1)q)1

q

.

Hence

n∑i=1

|xi + yi|p ≤

( n∑i=1

|xi|p)1

p

+

(n∑i=1

|yi|p)1

p

( n∑i=1

|xi + yi|p)1

q

and the result follows.

20

5.2.2 The ℓ∞-norm

For ℓ∞-norm, one can regard it as :

limp→∞||x||p = lim

p→∞

(n∑

i=1

|xi|p)1

p

.

Let |x| = max1≤i≤n{|xi|}, we note that

||x||p = |x| ·

(n∑

i=1

(|xi||x|

)p)1

p

and |xi|/|x| ≤ 1. Thus we have

1 ≤

(n∑

i=1

(|xi||x|

)p)1

p

≤ n1p

and

1 ≤ limp→∞

(n∑

i=1

(|xi||x|

)p)1

p

≤ limp→∞

n1p = 1.

21

• We conclude that

limp→∞

||x||p = limp→∞

n∑i=1

|xi|p1

p

= max1≤i≤n

{|xi|}

and therefore we define

||x||∞ = max1≤i≤n

{|xi|}.

Definition 2 The matrix norm of an n × n square matrix A isdefined as

∥A∥M = sup {∥Au∥ : u ∈ Rn, ∥u∥ = 1} . (5.1)

Remark 1We note that ∥·∥ is a vector norm and it can be ∥·∥∞or ∥ · ∥1 or ∥ · ∥2. A matrix norm is induced by a vector norm.

22

Proposition 2 If ∥ · ∥ is any vector norm of Rn, then (5.1) defines a normon the linear space of all n× n matrices.

Proof: (i) Suppose A = 0, let say Aij = 0, we let

x = (0 0 · · · 0 1︸︷︷︸jth entry

0 · · · 0)T then ∥x∥ = 1

and ∥Ax∥ = ∥Aj∥ > 0 where Aj is the jth column of A. Hence ∥A∥M > 0.

(ii) We have

∥λA∥M = sup{∥λAu∥ : ∥u∥ = 1} = |λ| sup{∥Au∥ : ∥u∥ = 1} = |λ|∥A∥M .

(iii) Finally we have to show that

∥A +B∥M ≤ ∥A∥M + ∥B∥M .

∥A +B∥M = sup{∥(A +B)u∥ : ∥u∥ = 1}≤ sup{∥Au∥ + ∥Bu∥ : ∥u∥ = 1}≤ sup{∥Au∥ : ∥u∥ = 1} + sup{∥Bu∥ : ∥u∥ = 1}= ∥A∥M + ∥B∥M .

23

Proposition 3 We have ∥Ax∥ ≤ ∥A∥M∥x∥.

Proof: Case 1: For x = 0, the result is obvious.

Case 2: For x = 0, let

u =x

∥x∥then ∥u∥ = 1. We have

∥A∥M ≥ ∥Au∥ =∥∥∥∥Ax∥x∥

∥∥∥∥ .Hence

∥A∥M ≥1

∥x∥∥Ax∥

and therefore ∥A∥M∥x∥ ≥ ∥Ax∥.

Remark 2 If A = I then

∥A∥M = sup{∥Au∥ = ∥u∥ : u ∈ Rn, ∥u∥ = 1} = 1.

That is to say ∥I∥M = 1.

24

Proposition 4 ∥AB∥M ≤ ∥A∥M · ∥B∥M .

Proof: From Proposition 3 above, we have

∥ABx∥ ≤ ∥A∥M∥Bx∥ ∀x ∈ Rn

≤ ∥A∥M∥B∥M∥x∥.Hence

∥AB∥M = sup{∥ABx∥ : x ∈ Rn, ∥x∥ = 1}≤ sup{∥A∥M∥B∥M∥x∥,x ∈ Rn, ∥x∥ = 1}= ∥A∥M · ∥B∥M .

25

Proposition 5

∥A∥M∞ = max1≤i≤n

n∑

j=1

|aij|

.

Proof:

∥A∥M∞ = sup {∥Ax∥∞ : x ∈ Rn, ∥x∥∞ = 1} = sup∥x∥∞=1

{∥Ax∥∞}

= sup∥x∥∞=1

{max1≤i≤n

|n∑

j=1

Aijxj|}

= max1≤i≤n

sup∥x∥∞=1

{|n∑

j=1

Aijxj|} = max1≤i≤n

{n∑

j=1

|Aij|}

• Note thatsup∥x∥∞=1

{−2x1 + 3x2 − 4x3} = 2 + 3 + 4 = 9.

The above is achieve by taking x1 = x3 = −1 and x2 = 1.

26

Proposition 6

∥A∥M1 = max1≤j≤n

{n∑

i=1

|Aij|

}.

Proof: We have

∥A∥M1 = sup {∥Ax∥1 : x ∈ Rn, ∥x∥1 = 1} = sup∥x∥1=1

{∥Ax∥1}

= sup∥x∥1=1

{n∑i=1

|n∑

j=1

Aijxj|}

≤ sup∥x∥1=1

{n∑

j=1

|xj|n∑

i=1

|Aij|} = 1 · max1≤j≤n

{n∑

i=1

|Aij|}.

We note that if

max1≤j≤n

{n∑i=1

|Aij|} =n∑

i=1

|Aik|

and this can be achieved by letting

x = ek = (0, 0, . . . , 0, 1︸︷︷︸kth

, 0. · · · )T .

27

Proposition 7

||A||M2 =√λmax(AT · A).

Proof: We note that

||A||2M2= sup{||Ax||22 : ||x||22 = 1}.

Since AT · A is a symmetric matrix, there exists a matrix P such that

AT · A = P T ·D · P and P T · P = I.

Here D is a diagonal matrix containing all the eigenvalues of AT · A. Hence

||Ax||22 = (Ax)T · Ax = xTAT · Ax = (Px)T ·D · (Px).

We also observe that

xTx = 1 if and only if (Px)T · Px = xTP T · Px = 1.

Hence by letting y = Px we have

||A||2M2= sup{yT ·D · y : ||y||22 = 1} = λmax(A

T · A).

28

Example 1 Suppose

A =

2 −1 03 2 10 1 2

Then we have

||A||M1= max{5, 4, 3} = 5.

||A||M∞ = max{3, 6, 3} = 6.

||A||M2=

√λmax(AAT ) =

√16.5498 = 4.0681.

29

Definition 3 A sequence of vectors vk converges to a vector v, i.e.,(limk→∞

vk = v)

iflimk→∞∥vk − v∥ = 0.

Example 2 Consider

vk =

(1− 1

2k

1 + 13k

)∈ R2

We note that

vk − (1, 1)T = (− 1

2k,1

3k)T .

Hence

∥vk − (1, 1)T∥1 =1

2k+

1

3kand

∥vk − (1, 1)T∥∞ =1

2k.

All of them tends to 0 as n→∞. We say limk→∞

vk = v.

Remark 3 The same concept can be applied to a sequence of matrices. One canreplace the vector norm ∥ · ∥ by a matrix norm ∥ · ∥M .

30

Proposition 8 If A is an n × n matrix such that ∥A∥M < 1 then (I − A)−1

exists and equals∞∑k=0

Ak.

Proof: Suppose that (I −A) is not invertible then there exists x = 0 such that(I − A)x = 0.

• Letu = x/∥x∥

then∥u∥ = 1 and u = Au.

This implies that ∥A∥M ≥ 1. Hence (I − A) is invertible and (I − A)−1 exists.

• To prove

(I − A)−1 =

∞∑k=0

Ak,

we will prove that

(I − A) ·

( ∞∑k=0

Ak

)= I.

31

• In other words, we wish to prove

limm→∞

∥∥∥∥∥∥(I − A)m∑k=0

Ak − I

∥∥∥∥∥∥M

= 0.

We note that

(I − A)m∑k=0

Ak =m∑k=0

Ak −m∑k=0

Ak+1

= I − Am+1.

Therefore

limm→∞

∥∥∥∥∥(I − A)m∑k=0

Ak − I

∥∥∥∥∥M

= limm→∞

∥∥Am+1∥∥M

≤ limm→∞

∥A∥m+1M

= 0.

32

Example 3

A =

(0.5 0.40.3 0.6

)∥A∥M∞ = 0.9 < 1

...

...

A100 =

(1.1 1.51.1 1.5

)× 10−5

(I − A)−1 =

(0.5 −0.4−0.3 0.4

)−1= I + A + A2 + · · · +=

(4 53.75 6.25

)

33

• Hence we have

I + A + · · · + A100 =

(3.99999 4.99993.74999 6.2499

).

We note the error E in

∥E∥M∞ = ∥A101 + A102 + · · · + ∥M∞.We have

∥E∥M∞ ≤ ∥A101∥M∞ + ∥A102∥M∞ + · · ·+

≤ ∥A∥101M∞ + ∥A∥102M∞ + · · ·+= 0.9101 + 0.9102 + · · ·+= 0.9101 · ( 1

1−0.9)= 10 · 0.9101.

34

5.3 Power Method for Matrix Eigenvalues

We discuss the problem of computing the dominant eigenvalueand its corresponding eigenvector of a square matrix. Let the n× nmatrix A satisfies:

(i) There is a single eigenvalue of maximum modulus. Let the eigen-values λ1, λ2, · · · , λn be labeled so that

|λ1| > |λ2| ≥ |λ3| ≥ · · · ≥ |λn|.(ii) To briefly discuss the idea, we assume that there is a linearly in-

dependent set of n unit eigenvectors. This means that there is abasis {

u(1),u(2), · · · ,u(n)}

for Rn such that

Au(i) = λiu(i), i = 1, 2, · · · , n,

and ∥u(i)∥ = 1.

35

• Begin with an initial vector x(0), we write

x(0) = a1u(1) + a2u

(2) + · · · + anu(n).

Here{u(i)}is a basis (unit vector) for Rn.

• Now

Akx(0) = a1Aku(1) + . . . + anA

ku(n)

= a1λk1u

(1) + . . . + anλknu

(n) because Au(i) = λiu(i)

= λk1

{a1u

(1) +

(λ2

λ1

)k

a2u(2) + . . . +

(λn

λ1

)k

anu(n)

}.

• We remark that the convergent “speed” of the power method depends on the“gap” between |λ1| and |λ2|. That is to say the smaller the value of |λ2|/|λ1|, thefaster the rate will be as one can observe that

1 >

∣∣∣∣λ2

λ1

∣∣∣∣ ≥ ∣∣∣∣λ3

λ1

∣∣∣∣ ≥ · · · ≥ ∣∣∣∣λn

λ1

∣∣∣∣ .

36

• Since|λi||λ1|

< 1 for i = 2, . . . , n,

we have

limk→∞

|λi|k

|λ1|k= 0 for i = 2, . . . , n.

Hence we haveAkx(0) ≈ a1λ

k1u

(1).

• Definerk+1 =

Ak+1x(0)

∥Akx(0)∥we have rk+1 =

Ark||rk||

.

We note that

limk→∞

rk+1 = limk→∞

a1λk+11 u(1)

∥a1λk1u

(1)∥= λ1u

(1)

where ∥ · ∥ can be ∥ · ∥1, ∥ · ∥2 or ∥ · ∥∞. Therefore we have

limk→∞

rk+1

∥rk+1∥= u(1),

λ1 can be found by comparing Au(1) and u(1).

37

Example 4 Consider

A =

2 1 01 2 10 1 2

, with initial guess x(0) =

111

.

We take the vector norm ∥ · ∥ to be ∥ · ∥2x(1) = (1.7321, 2.3094, 1.7321)T ,

x(2) = (1.7150, 2.4010, 1.7150)T ,

x(3) = (1.7086, 2.4121, 1.7086)T ,

x(4) = (1.7074, 2.4139, 1.7074)T .

∥r1∥2 = 3.3665,∥r2∥2 = 3.4128,∥r3∥2 = 3.4142,∥r4∥2 = 3.4142.

• Therefore λ1 ≈ 3.4142 and u(1) ≈ (1.7074, 2.4139, 1.7074)T .38

5.4 Iterative Solutions for Matrix Equations

Proposition 9 If ||H||M < 1, then the following iterative scheme

xk+1 = H · xk + r

will converge to the solution of the linear system (I −H)x = r for any givenx0.

Proof: We note that

xk+1 = H · xk + r = H2 · xk−1 + (I +H) · r = · · · = Hk+1x0 +

k∑m=0

Hm · r.

By Proposition 8 we have

limk→∞

Hk+1 · x0 = 0 and limk→∞

k∑m=0

Hm · r = (I −H)−1r.

Hencelimk→∞

xk+1 = (I −H)−1r

which is the solution of the linear system (I −H)x = r.

39

Proposition 10 If A and B are n × n matrices such that ∥I − AB∥M < 1then A and B are invertible. Furthermore

A−1 = B

∞∑k=0

(I − AB)k and B−1 =

∞∑k=0

(I − AB)kA.

Proof: Now from Proposition 8, the matrix (AB) is invertible and this impliesthat both A and B are invertible. Now

(AB)−1 =

∞∑k=0

(I − AB)k

i.e.

B−1A−1 = (AB)−1 =

∞∑k=0

(I − AB)k.

Hence we have

B−1 =

∞∑k=0

(I − AB)k · A and A−1 = B ·∞∑k=0

(I − AB)k.

40

Proposition 11 If ∥I −B−1A∥M < 1 then the following iterative scheme 2

xk+1 = xk +B−1(b− Axk) = B−1b︸︷︷︸r

+ (I −B−1A)︸︷︷︸H

xk

will converge to the solution of the linear system Ax = b.

Proof: Using Proposition 8, we have

B−1 =

∞∑k=0

(I − AB)kA if ∥I − AB∥M < 1.

Replace B by A and A by B−1, we get

A−1 =

∞∑k=0

(I −B−1A)kB−1

if ∥I −B−1A∥M < 1. Clearly we have

limk→∞

∥∥(I −B−1A)k+1∥∥M

= 0.

Therefore by Proposition 9, letH = I−B−1A and r = B−1b, we have xk convergesto the solution of Ax = b.

2This is called the preconditioning techniques. To solve Ax = b, if ||I − A||M ≥ 1, try to find a matrix B such that ||I − B−1A||M < 1. Then we solveB−1Ax = B−1b instead of the original one.

41

Example 5 Consider the following n× n linear system

Anx = r

(n ≥ 2) where

An =

2n 1 1 · · · 11 . . . . . . . . . ...1 . . . . . . . . . 1... . . . 1 2n 11 · · · 1 1 2n

.

Suggest a preconditioner matrix B such that the iterative scheme:

xk+1 = (In −B−1An)xk +B−1r

converges to the solution of Anx = r for any given x0. With yoursuggestion, what is the computational cost in each iteration?

42

• Take B = 2nIn, then we can check

||In −B−1An||M1=

n− 1

2n<

1

2< 1.

The scheme converges to the solution of Anx = r.

• In each iteration, the main computational cost comes from thematrix-vector multiplication of the formB−1Any. The cost of gettingAny is O(n) because

Ay = ((2n− 1)In + (1, 1, · · · , 1)T (1, 1, · · · , 1))y= (2n− 1)y + (1, 1, · · · , 1)T

∑ni=1 yi.

It is clear that the cost is O(n). Furthermore, there is no cost ingetting B−1.

• Since B is a diagonal matrix, the cost of B−1y is O(n). Finallythe addition of two vector in Rn is O(n). Therefore the total cost periteration is O(n).

43

5.5 Iterative Methods Based on Splitting Matrix

• There are at least three different ways of splitting a matrix A:

A =

12

13 0

13 1 1

30 1

312

=

1 0 00 1 00 0 1

+

−12

13 0

13 0 1

30 1

3 −12

case 1

=

12 0 00 1 00 0 1

2

+

0 13 0

13 0 1

30 1

3 0

case 2

=

12 0 013 1 00 1

312

+

0 13 0

0 0 13

0 0 0

case 3

= S + (A− S).

• NowAx = (S + (A− S))x = b

and therefore Sx + (A− S)x = b. Hence we may write

x = S−1b− S−1(A− S)x

where we assume that S−1 exists.

44

•Given an initial guess x0 of the solution ofAx = b, one may consider the followingiterative scheme:

↙ iteration matrixxk+1 = S−1b− S−1(A− S)xk

(5.2)

• Clearly if xk → x as k →∞ then we have

x = A−1b.

• From the results in the previous sections, we know that Eq. (5.2) converges ifand only if there is a matrix norm ||.||M such that

||S−1(A− S)||M < 1.

Therefore we have the following proposition.

Proposition 12 If∥S−1(A− S)∥M < 1

then the iterative scheme (5.2) converges to the solution of

Ax = b.

45

Example 6 Let A be the matrix and b be the right hand side vector. We usex0 = (0 0 0)T as the initial guess.

Case 1: S =

1 0 00 1 00 0 1

.

xk = b− (A− I)xk

=

5105

− −1

213 0

13 0 1

30 1

3 −12

xk

x1 = (5.0000 10.0000 5.0000)T

x2 = (4.1667 6.6667 4.1667)T

x3 = (4.8611 7.2222 4.8611)T

x4 = (5.0231 6.7593 5.0231)T

...

x30 = (5.9983 6.0014 5.9983)T .

When S = I , this is called the Richardson Method.

46

Example 7 Case 2: S =

12 0 00 1 00 0 1

2

• Therefore

xk+1 = S−1b− S−1(A− S)xk

=

101010

− 1

21

12

−1 0 13 0

13 0 1

30 1

3 0

xk

= (10 10 10)T −

0 23 0

13 0 1

30 2

3 0

xk

x1 = (10.0000 10.0000 10.0000)T

x2 = (3.3333 3.3333 3.3333)T

x3 = (7.7778 7.7778 7.7778)T

...

x30 = (6.0000 6.0000 6.0000)T .

When S = Diag (a11, · · · , ann). This is called the Jacobi method.

47

Example 8 Case 3: S =

12 0 013 1 00 1

312

xk+1 = S−1b− S−1(A− S)xk

=

10203509

− 1

2 0 013 1 00 1

312

−1 0 13 0

0 0 13

0 0 0

xk

x1 = (10.000020

3

50

9)T

x2 = (5.5556 6.2963 5.8025)T

x3 = (5.8025 6.1317 5.9122)T

x4 = (5.9122 6.0585 5.9610)T

...

x14 = (6.0000 6.0000 6.0000)T .

When S = Lower triangular part of the matrix A. This method is called theGauss-Seidel method.

48

5.6 Spectral Radius

Definition 4 Given an n× n square matrix the spectral radius of A is defined as

ρ(A) = max{|λ| : det(A− λI) = 0}

or in other words if λ1, λ2, · · · , λn are eigenvalues of A then ρ(A) = maxi{|λi|}.

Example 9

A = A1 + A2 ≡(0 01 0

)+

(0 −10 0

)=

(0 −11 0

)then eigenvalues of A are ±i and |i| = | − i| = 1. Therefore ρ(A) = 1 in this case.While ρ(A1) = ρ(A2) = 0.

Remark 4 Here ρ does NOT define a norm for the square matrices. We note

1 = ρ(A1 + A2) > ρ(A1) + ρ(A2) = 0 + 0 = 0.

Proposition 13 If A = PBP−1 then ρ(A) = ρ(B).

Proof: Because A and B have the same set of eigenvalues.

49

Lemma 3 Every square matrix A is unitarily similar to anupper triangular matrix C, i.e., there exists S such that

S∗ · A · S = C and S · S∗ = I.

Proposition 14 Every square matrix A is similar to an uppertriangular matrix whose off-diagonal elements are arbitrarily small.

Proof: From Lemma 3 we have

S−1AS =

c11 c12 · · · · · · c1n

c22 c23...

. . . . . . cn−2ncn−1 n−1 cn−1n

0 cnn

an upper triangular matrix.

50

• Define

E =

ε 0ε2

. . .

0 εn

to be a diagonal matrix where ε = 0.• Now we have

E−1(S−1AS)E =

c11 εc12 ε2c13 · · · εn−1c1n

c22 εc23...

. . . . . . ...cn−1 n−1 εcn−1 n

0 cnn

(SE)−1A( SE︸︷︷︸

dependson ε

) =

c11 0. . .

0 cnn

+ U

|Uij| ={|cijεj−i| ≤ ε|cij| j > i.0 j ≤ i.

We note that ∥U∥M∞ ≤ ε∥c∥M∞ ≤ ε′.

51

Proposition 15 For any square matrix A, we have

ρ(A) = inf∥·∥M{∥A∥M}.

Proof: We first show that

ρ(A) ≤ inf∥·∥M{∥A∥M}.

• Let λ be an eigenvalue of A and x be the corresponding eigenvector,

↙ property ofnorm ↙ eigen-

value ↙ Thm. onmatrix norm

|λ|∥x∥ = ∥λx∥ = ∥Ax∥ ≤ ∥A∥M∥x∥• Hence |λ| ≤ ∥A∥M , where ∥ · ∥M is an arbitrary matrix norm.

• This implies that

ρ(A) ≤ ∥A∥M ∀ ∥ · ∥Mand therefore

ρ(A) ≤ inf∥·∥M{∥A∥M} .

52

Next we wish to show that

inf∥·∥M{∥A∥M} ≤ ρ(A).

• For any square matrix A, there exists Sε such that

S−1ε ASε =

λ1 0

λ2. . .

0 λn

+ T

where ∥T∥M∞ ≤ ε for any ε > 0.• We have

∥S−1ε ASε∥M∞ ≤

∥∥∥∥∥∥ λ1

. . .λn

∥∥∥∥∥∥M∞

+ ∥T∥M∞

≤ ρ(A) + ε

because λi are eigenvalues of A.

53

• We note that∥A∥

Mε= ∥S−1ε ASε∥M∞

defines a matrix norm.

• Therefore the matrix norm ∥A∥Mε≤ ρ(A) + ε.

Since ε can be arbitrary small (but not equal to zero),

inf∥·∥M{∥A∥M} ≤ ρ(A).

Remark 5 Ifρ(A) < 1

then there exists

∥ · ∥M such that ∥A∥M < 1.

54

Proposition 16 The iterative scheme

xk = Gxk−1 + c

converges to(I −G)−1c

for any starting vectors x(0) and c if and only if ρ(G) < 1.

Proof: We note that

x1 = Gx0 + c;

x2 = G2x0 +Gc + c;... ...

xk = Gkx0 +k−1∑j=0

Gjc.

Now there exists ∥·∥M such that ∥G∥M < 1 and therefore ∥Gk∥M →0 as k →∞.

55

• We havek−1∑j=0

Gj → (I −G)−1 as k →∞.

Hencexk → (I −G)−1c as k →∞.

• Suppose ρ(G) ≥ 1 then there exists u = 0 such that

Gu = λu and |λ| ≥ 1.

Let x0 = c = u then

xk = λku +

k−1∑j=0

λju =

k∑j=0

λiu =1− λk+1

1− λu

and1− λk+1

1− λdiverges if |λ| ≥ 1.

56

Proposition 17 (The Gershgorin’s theorem) The eigenvalues of an n × nmatrix A are contained in the union of the following n disks Di where

Di =

z ∈ C : |z − Aii| ≤ −|Aii| +n∑

j=1

|Aij|

.

Proof: • Let λ be an eigenvalue of A and x be its correspondingeigenvector such that ||x||∞ = |xi| = 1.• This can be done by dividing x by |xi| = maxj{|xj|}.• Since Ax = λx, we have

λxi =

n∑j=1

aijxj and therefore (λ− aii)xi =

n∑j=1,j =i

aijxj.

Hence

|λ− aii| = |(λ− aii)xi| ≤n∑

j=1,j =i|aijxj| ≤

n∑j=1,j =i

|aij|.

Therefore λ ∈ Di.

57

Proposition 18 If Q is a column stochastic matrix of a Markovchain then ρ(Qk) = 1 (non-negative matrix and all column sumsare one).

Proof: We note that 1Q = 1 where 1 = (1, 1, . . . , 1).

• Therefore1Qk = 1.

This means that 1 is an eigenvalue of Qk. Thus we conclude thatρ(Qk) ≥ 1.

• By using the Gershgorin’s theorem and the fact that all the entriesof Qk are non-negative, all the column sums of Qk are equal to one,we have ρ(Qk) ≤ 1.

• Hence we conclude that ρ(Qk) = 1.

58

Proposition 19 The iterative scheme

xk+1 = S−1b− S−1(A− S)xk

= (I − S−1A)xk + S−1b

converges to A−1b if and only if ρ(I − S−1A) < 1.

Proof: Take G = I − S−1A and c = S−1b.

Proposition 20 If A is row (column) diagonally dominant i.e., for i = 1, 2, . . . , n

2|Aii| >n∑

j=1

|Aij|

(2|Aii| >

n∑i=1

|Aij|

)then the Gauss-Seidel method converges for any starting x0.

Proof: Let S be the lower triangular part of A. From Proposition 19 above, oneonly needs to show that ρ(I − S−1A) < 1.

• Let λ be an eigenvalue of (I−S−1A) and x be its corresponding eigenvector suchthat ∥x∥∞ = 1. We want to show |λ| < 1.

59

• We have(I − S−1A)x = λx

and therefore

Sx− Ax = λSx for x = (x1, x2, . . . , xn)T .

In other words, we have0 −a12 · · · −a1n... 0... . . . −an−1n0 · · · 0

x1x2...xn

=

a11 0 · · · 0a21 a22 . . . ...... . . . 0

an1 · · · · · · ann

λx1λx2...

λxn

.

• Therefore

−(a12x2 + · · · + a1nxn) = a11λx1−(a23x3 + · · · + a2nxn) = λ(a21x1 + a22x2)

...

−an−1nxn = λ(an−11x1 + · · · + an−1n−1xn−1).

60

• In general we have

−n∑

j=i+1

aijxj = λ

i∑j=1

aijxj for i = 1, · · · , n− 1.

• Since ∥x∥∞ = 1, there exists i such that

|xi| = 1 ≥ |xj|.

• For this i we have

|λ||aii| = |λaiixi| ≤n∑

j=i+1

|aij| + |λ|i−1∑j=1

|aij|

and therefore

|λ| ≤n∑

j=i+1

|aij|

/|aii| − i−1∑j=1

|aij|

< 1

61

5.7 The Successive Over-Relaxation (SOR) Method

Solving Ax = b, one may split A as follows:

A = L + wD︸︷︷︸+(1− w)D + U

L = strictly lower triangular part; D = Diagonal part; U = strictly uppertriangular part.

Example 10[2 11 2

]=

[0 01 0

]︸︷︷︸

L

+w

[2 00 2

]︸︷︷︸

D

+(1− w)

[2 00 2

]︸︷︷︸

D

+

[0 10 0

]︸︷︷︸

U

One may consider the iterative scheme with S = L + wD as follows:

xn+1 = S−1b + S−1(S − A)xn

= S−1b + (I − S−1A)xn.

We remark thatI − S−1A = I − (L + wD)−1A.

62

• Moreover, when w = 1, the method is just the Gauss-Seidel method. Thismethod is called the SOR method.

• It is clear that the method converges if and only if the iteration matrix has aspectral radius less than one.

Proposition 21 The SOR method converges to the solution of Ax = b if andonly if

ρ(I − (L + wD)−1A) < 1.

• We are going to prove that if A is a positive definite Hermitian matrixand w > 1

2 then the SOR method converges. Let us recall that

1. A is Hermitian if A = A∗.

2. Define < x,y >=n∑i=1

xiyi.

Then < x, λy >= λ < x,y > and < λx,y >= λ < x,y >.

3. If A is Hermitian then < Ax,y >=< x, Ay >.

4. A is positive definite if < Ax,x > > 0 for x = 0.

63

Proposition 22 Let A be a positive definite Hermitian matrix. If w > 12

then the SOR method converges.

Proof: We will show that ρ(I − S−1A) < 1 then SOR method converges.

• Let λ be an eigenvalue ofG = I − S−1A

and x be the corresponding eigenvector.

• Therefore we haveGx = (I − S−1A)x = λx. (5.3)

Lety = (I −G)x = S−1Ax. (5.4)

Then we haveSy = (L + wD)y = S · S−1Ax = Ax.

64

Hence we conclude that

< Ly + wDy,y >= < Ly,y > + < wDy,y >=< Ax,y > . (5.5)

We note that

AGx = A(I − S−1A)x = Ax− Ay ( by (5.3) & (5.4)) (5.6)

= Sy − Ay = (S − A)y (5.7)

= (−(1− w)D − U)y. (5.8)

Hence

< y, AGx >= − < y, Dy > + < y, wDy > − < y, Uy > . (5.9)

Adding Eq. (5.5) and Eq. (5.9) together we get

− < y, Dy > +< Ly,y > − < y, Uy >+ < wDy,y >+ < y, wDy >=< Ax,y > + < y, AGx > .

(5.10)

We note that< Ly,y >=< y, L∗y >=< y, Uy >

then we have

< wDy,y > − < y, (1− w)Dy >=< Ax,y > + < y, AGx > . (5.11)

65

Sincey = (I −G)x = x− λx = (1− λ)x

we have

< wDy,y >=< w(1− λ)Dx, (1− λ)x >= w(1− λ)(1− λ) < Dx,x >

and

< y, Dy >=< (1− λ)x, (1− λ)Dx >= (1− λ)(1− λ) < x, Dx > .

We note that (1− λ)(1− λ) = |1− λ|2 and

< y, wDy >=< Dy, wy >= |1− λ|2w < Dx,x >= |1− λ|2w < Dx,x > .

Hence L.H.S of Eq. (5.11) becomes

(2w − 1)|1− λ|2 < Dx,x > .

and R.H.S of Eq. (5.11) becomes

< Ax,y > + < y, AGx > = < Ax, (1− λ)x > + < (1− λ)x, Aλx >

= (1− λ) < Ax,x > +(1− λ)λ < x, Ax >= (1− |λ|2) < Ax,x > .

66

• Now we observe that

(2w − 1)︸︷︷︸>0

|1− λ|2︸︷︷︸≥0

< Dx,x >︸︷︷︸>0

= (1− |λ|2)< Ax,x >︸︷︷︸>0

.

Thus we have |λ| ≤ 1, but we wish to prove |λ| < 1. We shall show that |λ| = 1.

Case 1: If λ = −1 then

4(2w − 1) < Dx,x >= 0· < Ax,x >= 0.

This is NOT possible.

Case 2: If λ = 1 then y = (1− λ)x = 0. We have 0 = Sy = Ax. Again this isNOT possible.

Hence we have |λ| < 1, i.e. ρ(I − S−1A) < 1.

Proposition 23 If A is a positive definite Hermitian matrix then the Gauss-Seidel Method converges to the solution of Ax = b.

Proof: Take w = 1 and apply Proposition 22. The result follows.

67

5.8 Steepest Descent Method and Conjugate Gradient Method

We consider the problem of solving Ax = b such that

(1) A is an n× n matrix;(2) A is symmetric, i.e., AT = A;(3) A is positive definite, i.e., xTAx > 0 for x = 0.

Remark 6 Condition (3) implies that A−1 exists.

Recall the properties of the inner product in Rn:

< x,y >= xTy =

n∑i=1

xiyi.

(i) < x,y >=< y,x >;

(ii) < αx,y >= α < x,y >;

(iii) < x, αy >= α < x,y >;

(iv) < x + y, z >=< x, z > + < y, z >;

(v) < x, Ay >=< ATx,y >.

68

Proposition 24 If A is symmetric and positive definite, then theproblem of solving

Ax = b

is equivalent to the problem of minimizing

q(x) =< x, Ax > −2 < x,b > .

Proof: Let v be a vector and t be a scalar. We consider thefunction

q(x + tv) = < x + tv, A(x + tv) > −2 < x + tv,b >= < x, Ax > +t < x, Av > +t < v, Ax >

+t2 < v, Av > −2 < x,b > −2t < v,b >

= q(x) + 2t < v, Ax >− 2t < v,b > +t2 < v, Av >

= q(x) + 2t < v, Ax− b > +t2 < v, Av > .

69

• Now one can regard it as a function of t

q(x + tv) = f (t) = q(x) + 2 < v, Ax− b > t+ < v, Av > t2.

In fact it is a quadratic function in t. Moreover f (t) attains minimumat t s.t. f ′(t) = 0, i.e.

2 < v, Ax− b > +2 < v, Av > t = 0.

• Solving the equation we have

t∗ =< v,b− Ax >

< v, Av >.

• We remark that < v, Av >= 0 because A is positive definite.

70

• Thereforeq(x + t∗v) = q(x) + t∗ {2 < v, Ax− b > + < v, Av > t∗}

= q(x) + t∗ {2 < v, Ax− b > + < v,b− Ax >}= q(x) + t∗ {< v, Ax− b >}= q(x)− <v,b−Ax>2

< v, Av >︸︷︷︸←−non-negative• We note that reduction in the value of q(x) always occurs in pass-ing from x to x + t∗v. (unless < v,b− Ax >= 0, in this case v isorthogonal to b− Ax. )

• So if b− Ax = 0 then we can find a vector v such that

< v,b− Ax > = 0 and q(x + t∗v) < q(x)

and x is NOT the minimizer of q(x).• If b−Ax = 0 then q(x+ t∗v) = q(x) for any vector v. Thereforex is the minimizer.

71

• One may design an iterative method by using the idea in Proposition 24.

• Given A, an n× n symmetric positive definite matrix and b is an n× 1 vector.

• With an x0, an initial guess of the solution of Ax = b we develop an iterativealgorithm namely the steepest decent method. The iterative method reads:

Input: Max, A, b, x0.Error-tol. k = 0,r0 = b− Ax0, initial residual.

While ∥rk∥ < Error-tol or k < Max

rk = b− Axk

tk = < rk, rk > / < rk, Ark >;xk+1 = xk + tk · rk;k = k + 1;

end

72

We remark that rk is the search direction and tk is the step size.

In the iterative method, t = t∗ in Proposition 24 by letting

v = r = b− Ax.

Example 11

A =

(2 11 2

), b =

(45

), x0 =

(00

).

k xk t ∥rk∥21 (1.34 1.68)T 0.3361 6.40312 (0.98 1.97)T 0.9762 0.47243 (1.01 1.99)T 0.3361 0.10124 (0.99 1.99)T 0.9762 0.00755 (1.00 1.99)T 0.3361 0.0016... ... ... ...

The true solution is (1, 2)T . This method, “steepest descent” is rarely used becauseits convergence rate is “too slow”.

73

5.8.1 Conjugate Gradient Method

Definition 5 (A-orthonormality.) Assuming thatA is an n×n symmetric positivedefinite matrix, suppose that a set of vectors {u1,u2, · · · ,un} is provided and hasthe property

⟨ui, Auj⟩ = δij

where

δij =

{1 if i = j0 if i = j.

This property is called the A-orthonormality. Clearly it is a generalization of theordinary orthonormality where A = In.

Remark 7 Here ∥x∥2A =< x, Ax > defines a norm in Rn.

Proposition 25 LetU = (u1, · · · ,un)

thenUTAU = In.

Proof: It follows from the definition.

74

Proposition 26 The set {u1, · · · ,un} forms a basis for Rn.

Proof: We need only to show that {ui} are independent.

Supposen∑i=1

αiui = 0

then

0 =

⟨n∑

i=1

αiui, Auj

⟩j = 1, · · · , n

=

n∑i=1

αi ⟨ui, Auj⟩

= αj ⟨uj, Auj⟩ = αj.

Hence αj = 0 for j = 1, · · · , n.

• This shows that {ui} are independent and hence form a basis for Rn.

75

Proposition 27 Let {u1, · · · ,un} be an A-orthonormal system. Define thefollowing recursive scheme:

xi = xi−1 + ⟨b− Axi−1,ui⟩ui

for i = 1, 2, · · · , n iteratively in which x0 is an arbitrary vector in Rn then wehave

Axn = b.

Proof: Defineti = ⟨b− Axi−1,ui⟩ .

The iterative method readsxi = xi−1 + tiui.

We note thatAxi = Axi−1 + tiAui.

Therefore

Axn = Axn−1 + tnAun

= Axn−2 + tn−1Aun−1 + tnAun... ...

76

• Finally we haveAxn = Ax0 + t1Au1 + · · · + tnAun.

• Now⟨Axn − b,ui⟩ = ⟨Ax0 − b,ui⟩ + ti.

Since

ti = ⟨b− Axi−1,ui⟩=⟨b− Ax0 + Ax0︸︷︷︸−Ax1 + Ax1︸︷︷︸+ · · · − Axi−1,ui

⟩= ⟨b− Ax0,ui⟩ + ⟨Ax0 − Ax1,ui⟩

+ ⟨Ax1 − Ax2,ui⟩ + · · · + ⟨Axi−2 − Axi−1,ui⟩= ⟨b− Ax0,ui⟩ + ⟨−t1Au1,ui⟩ + · · · + ⟨−ti−1Aui−1,ui⟩= ⟨b− Ax0,ui⟩ .

Hence⟨Axn − b,ui⟩ = 0, i = 1, · · · , n and Axn − b = 0.

Because Axn − b is orthonormal to all ui and it must be the zero vector.

77

Definition 6 (A-orthogonal). Assuming A is an n × n symmetricpositive definite matrix, then a set of vector

{v1, · · · ,vn}is said to be A-orthogonal if⟨

vi, Avj⟩= 0 whenever i = j.

Proposition 27 can be extended as follows.

Proposition 28 Let{v1, · · · ,vn}

be an A-orthogonal system of non-zero vectors for a symmetricand positive definite n× n matrix A. Define

xi = xi−1 +⟨b− Axi−1,vi⟩⟨vi, Avi⟩

vi

in which x0 is arbitrary, then Axn = b.

78

• The CG algorithm reads:

Given an initial guess x0, A, b, Max, tol:r0 = b− Ax0;v0 = r0;

For k = 0 to Max−1 doIf ||vk||2 = 0 then stop

tk =< rk, rk > / < vk, Avk >;xk+1 = xk + tkvk;rk+1 = rk − tkAvk;

If ||rk+1, rk+1||2 < tol then stop

vk+1 = rk+1 +<rk+1,rk+1><rk,rk>

vk;

end;

output xk+1, ||rk+1||2.79

• The main computational cost in the CG algorithm comes from the matrix-vectormultiplication of the form Ax. It takes at most O(n2) operations.

• If A is not symmetric, one can consider the normal equation:

ATAx = ATb.

• The algorithm converges in at most n steps. However, it can be faster as theconvergence rate of this method also depends on the spectrum of the matrix An.For example if the spectrum of An is contained in an interval, i.e. σ(An) ⊆ [a, b],then the error in the i-th iteration is given by

||ei||||e0||

≤ 2

(√b−√a√

b +√a

)i

i.e. the convergence rate is linear. Hence the approximate upper bound for thenumber of iterations required to make the relative error ||ei||||e0||

≤ δ is given by

1

2

(√b

a− 1

)log

(2

δ

)+ 1.

80

• Very often CG method is used with a matrix called preconditionerto accelerate its convergence rate.

• A good preconditioner C should satisfy the following conditions.

(i) The matrix C can be constructed easily;

(ii) Given right hand side vector r, the linear system Cy = r can besolved efficiently; and

(iii) the spectrum (or singular values) of the preconditioned systemC−1A should be clustered around one.

81

• In the Preconditioned Conjugate Gradient (PCG) method,we solve the linear system

C−1Ax = C−1b

instead of the original linear system

Ax = b.

We expect the fast convergence rate of the PCG method can com-pensate much more than the extra cost in solving the preconditionersystem

Cy = r

in each iteration step of the PCG method.

82

• Apart from the approach of condition number, in fact, condition(iii) is also very commonly used in proving convergence rate. In thefollowing we give the definition of clustering.

Definition 7We say that a sequence of matrices Sn of size n hasa clustered spectrum around one if for all ϵ > 0, there exist non-negative integers n0 and n1, such that for all n > n0, at most n1eigenvalues of the matrix

S∗nSn − In

have absolute values larger than ϵ.

• One sufficient condition for the matrix to have eigenvalues clusteredaround one is that

Hn = In + Ln

where In is the n × n identity matrix and Ln is a low rank matrix(rank(Ln) is bounded above and independent of the matrix size n).

83

5.9 A Summary of Learning Outcomes

1. Able to compute a given vector norm, for example, ||.||1, ||.||2 and ||.||∞.2. Able to define and compute a matrix norm (induced by a vector norm via thesup definition) and examples such as ||.||M1, ||.||M2 and ||.||M∞.

3. Able to show some properties of a matrix norm, e.g. ||A·B||M ≤ ||A||M ·||B||M .

4. Able to recognize and apply the iterative scheme: xk+1 = (I − A)xk + r forsolving the solution of Ax = r under the condition that for some matrix normwe have ||I − A||M < 1.

5. Able to recognize the preconditioning techniques. Find a matrix B such that

||I −B−1A||M < 1

for some matrix norm and B−1 is easy to be solved. Then apply the iterativescheme to solve B−1Ax = B−1r instead of Ax = r.

6. Able to apply the power method for the largest eigenvalue and eigenvector of asquare matrix.

7. Able to program and apply conjugate gradient type methods for solving systemof linear equations.

84

6 Markovian Queueing Networks, Manufacturing and Re-manufacturing Sys-

tems

6.1 A Single Markovian Queue (M/M/s/n-s-1)

• λ input rate (Arrival Rate),• µ output rate (Service Rate, Production Rate).

ss− 1

...

321

m��p pm��p pm��p p...

m��p pm��p pm��p p

1 2 3 · · · j · · · n− s− 1

p p p p p p · · · p p p p p p · · · λ�

�µ

�µ

�µ

�µ

�µ

�µ

: empty buffer in queue

p p : customer waiting in queue

m��p p : customer being served

85

6.1.1 The Steady-state Distribution

• Let pi be the steady-state probability that i customers in the queueing system.Here p = (p0, . . . , pn−1)

T is the steady-state probability vector.

• Important for system performance analysis, e.g. average waiting time of the cus-tomers in long run.

• B. Bunday, Introduction to Queueing Theory, Arnold, N.Y., (1996).

• Here pi governed by the Kolmogorov equations:

Out Going Rate Incoming Rate

� - - �pi−1 pi pi+1λ

iµpi−1 pi pi+1

(i + 1)µ1

λ

86

�

-

µ

λ��

0�

-

2µ

λ��

1 · · · �

-

sµ

λ��

s · · · ��n-1

�

-

sµ

λThe Markov Chain of the M/M/s/n-s-1 Queue

• We are solving: A0p0 = 0,∑pi = 1,

pi ≥ 0.

• A0, the generator matrix, is given by the n× n tridiagonal matrix:

A0 =

λ −µ−λ λ + µ −2µ 0

−λ λ + 2µ −3µ· · ·−λ λ + sµ −sµ

· · ·0 −λ λ + sµ −sµ

−λ sµ

.

87

6.2 Two-Queue Free Models

s1

s1 − 1

...

321



1 2 3 · · · j · · · n1 − s1 − 1

p p p p p p · · · p p p p p p · · · λ1�

�µ1

�µ1

�µ1

�µ1

�µ1

�µ1

s2

s2 − 1

...

321



1 2 3 · · · k · · · n2 − s2 − 1

p p p p p p · · · p p · · · � λ2

�µ2

�µ2

�µ2

�µ2

�µ2

�µ2




88

• Let pi,j be the probability that i customers in queue 1 and j customers in queue2. Then the Kolmogorov equations for the two-queue network:

Out Going Rate Incoming Rate

� -

?

6

- �

6

?

pi,j+1

pi,j−1

pi−1,j pi,j pi+1,j

jµ2

λ2

λ1

iµ1

pi,j+1

pi,j−1

pi−1,j pi,j pi+1,j

λ2

(j + 1)µ2

(i + 1)µ1

λ1

89

• Again we have to solve A1p = 0,∑pij = 1,

pij ≥ 0.

• The generator matrix A1 is separable (no interaction between the queues):

A1 = A0 ⊗ I + I ⊗ A0.

• Kronecker tensor product of two matrices An×r and Bm×k:

An×r ⊗Bm×k =

a11B · · · · · · a1nBa21B · · · · · · a2nB... ... ... ...

am1B · · · · · · amnB

nm×rk

.

• It is easy to check that the Markov chain of the queueing system is irreducibleand the unique solution is p = p0 ⊗ p0.

90

6.3 2-Queue Overflow Networks

s1

s1 − 1

...

321



1 2 3 · · · j · · · n1 − s1 − 1

p p p p p p · · · p p p p p p p p · · · p p λ1�

��6

�µ1

�µ1

�µ1

�µ1

�µ1

�µ1

s2

s2 − 1

...

321



1 2 3 · · · k · · · n2 − s2 − 1

p p p p p p · · · p p · · · � λ2

�µ2

�µ2

�µ2

�µ2

�µ2

�µ2




91

• The generator matrix A2 is given by

A2 = A0 ⊗ I + I ⊗ A0 +

(0

1

)⊗R0,

where

R0 = λ1

1−1 1 0· ·· ·

0 −1 1−1 0

describes the overflow discipline of the queueing system.

• In fact, we may write

A2 = A1 +

(0

1

)⊗R0,

• Unfortunately analytic solution for the steady-state distribution p is not available.

92

• The generator matrices are sparse and have block structures.

• Direct method (LU decomposition will result in dense matrices L and U)is not efficient in general.

• Block Gauss-Seidel (BGS) is an usual approach for mentioned queueing problems.Its convergence rate is not fast and increase linearly with respect to the size of thegenerator matrix in general.

• Fast algorithm should make use of the block structures and the sparsity of thegenerator matrices. We shall apply Preconditioned Conjugate Gradient type meth-ods.

• R. Varga, Matrix Iterative Analysis, Prentice-Hall, N.J., (1963).

93

6.4 Practical Examples of Queueing Systems

6.4.1 The Telecommunication System

�

µ1

��

s1 Queue 1

�λ1

@@@

@@R•••

�

µn

��

sn Queue n �

λn

��

��

-λMain Queue ��

��s -

µ

Size N

• K. Hellstern, The Analysis of a Queue Arising in Overflow Models, IEEETrans. Commun., 37 (1989).

•W. Ching, R. Chan and X. Zhou, Circulant Preconditioners for MarkovModulated Possion Processes and Their Applications to Manufacturing Sys-tems, SIAM J. Matrix Anal., 17 (1997).

94

•We may regard the telecommunication network as a (MMPP/M/s/s+N) queue-ing system.

• An MMPP is a Poisson Process whose instantaneous rate itself is a stationary ran-dom process which varies according to an irreducible n-state Markov chain (Whenn = 1, it is just the Poisson Process).

• Important in analysis of blocking probability and system utilization.

•M. Neuts, Matrix-Geometric Solutions in Stochastic Models, Johns HopkinsUniversity Press, M.D., (1981).

• J. Flood, Telecommunication Switching Traffic and Networks, Prentice-Hall,N.Y., (1995).

95

• Generator matrix is given by

A3 =

Q + Γ −µI 0−Γ Q + Γ + µI −2µI

. . . . . . . . .

−Γ Q + Γ + sµI −sµI. . . . . . . . .

−Γ Q + Γ + sµI −sµI0 −Γ Q + sµI

,

((N + 1)-block by (N + 1)-block), where

Γ = Λ + λI2n,

Q = (Q1⊗ I2⊗ · · · ⊗ I2) + (I2⊗Q2⊗ I2⊗ · · · ⊗ I2) + · · ·+ (I2⊗ · · · ⊗ I2⊗Qn),

Λ = (Λ1⊗ I2⊗ · · · ⊗ I2) + (I2⊗ Λ2⊗ I2⊗ · · · ⊗ I2) + · · ·+ (I2⊗ · · · ⊗ I2⊗ Λn),

Qj =

(σj1 −σj2−σj1 σj2

)and Λj =

(λj 00 0

).

96

6.4.2 The Manufacturing System of Two Machines in Tandem

M1 -

µ1

��B1

size l

- M2 -

µ2

��B2

size N

-

λ

• Search for optimal buffer sizes l and N (N >> l), which minimizes (1) the aver-age running cost, (2) maximizes the throughput, or (3) minimizes the blocking andthe starving rate.

• G. Yamazaki, T. Kawashima and H. Sakasegawa, Reversibility ofTandem Blocking Queueing Systems Manag., Sci., 31 (1985).

•W. Ching, Iterative Methods for Manufacturing Systems of Two Stationsin Tandem, Applied Maths. Letters, 11 (1998).

97

• The generator matrix is of the form:

A4 =

Λ + µ1I −Σ 0−µ1I Λ +D + µ1I −Σ

. . . . . . . . .

−µ1I Λ +D + µ1I −Σ0 −µ1I Λ +D

,

((l + 1)-block by (l + 1)-block), where

Λ =

0 −λ 0

λ . . .. . . −λ

0 λ

, Σ =

0 0µ2

. . .

. . . . . .

0 µ2 0

,

andD = Diag(µ2, · · · , µ2, 0).

98

6.4.3 The Re-Manufacturing System

-λ

Q?

-µ

N

ProcurementInventory

of Returns

-γ Inventory

of Product

Re-manu-

facturing· · ·

• There are two types of inventory to manage: the serviceable product and thereturned product. The re-cycling process is modelled by an M/M/1/N queue.

• The serviceable product inventory level and the outside procurements are con-trolled by an (r,Q) continuous review policy. Here r is the outside procurementlevel and Q is the procurement quantity. We assume that N >> Q.

•M. Fleischmann, Quantitative Models for Reverse Logistics, (501) LNEMS,Berlin, Springer (2001).

•W. Yuen, W. Ching and M. Ng, A Direct Method for Solving Block-Toeplitzwith Near-Circulant-Block Systems with Applications to Hybrid ManufacturingSystems, Numerical Linear Algebra with Applications, 2005 (12) 957-966.

99

• The generator matrix is given by

A5 =

B −λI 0−L B −λI

. . . . . . . . .

−L B −λI−λI −L BQ

,

where

L =

0 µ 0

0 . . .. . . . . .

. . . µ0 0

, B = λIN+1 +

γ 0−γ γ + µ

. . . . . .. . . γ + µ

0 −γ µ

,

andBQ = B − Diag(0, µ, . . . , µ).

100

6.5 Circulant-based Preconditioners

• Circulant matrices are Toeplitz matrices (constant diagonal entries) such thateach column is a cyclic shift of its preceding column.

• Class of circulant matrices denoted by F .

• C ∈ F implies C can be diagonalized by Fourier matrix F :

C = F ∗ΛF.

HenceC−1x = F ∗Λ−1Fx.

• Eigenvalues of a circulant matrix has analytic form, therefore enhance the spec-trum analysis of the preconditioned matrix.

• C−1x can be done in O(n log n).

• P. Davis, Circulant Matrices, John Wiley and Sons, N.J. (1985).

101

• In our captured problem:

A =

λ −µ 0−λ λ + µ −2µ

· · ·−λ λ + sµ −sµ

· · ·−λ λ + sµ −sµ

0 −λ sµ

.

s(A) =

λ + sµ −sµ −λ−λ λ + sµ −sµ

· · ·−λ λ + sµ −sµ

· · ·−λ λ + sµ −sµ

−sµ −λ λ + sµ

.

We have rank(A− s(A)) = s + 1.

102

6.5.1 The Telecommunication System

• A3 = I ⊗Q + A⊗ I +R⊗ Λ where.

R =

1 0−1 1−1 . . .

. . . 10 −1 0

.

•s(A3) = s(I)⊗Q + s(A)⊗ I + s(R)⊗ Λ.

s(I) = I and s(R) =

1 −1−1 1−1 . . .

. . . 10 −1 1

.

103

6.5.2 The Manufacturing System of Two Machines in Tandem

• Circulant-based approximation of A4 : s(A4) =s(Λ) + µ1I −s(Σ) 0−µ1I s(Λ) + s(D) + µ1I −s(Σ)

. . . . . . . . .

−µ1I s(Λ) + s(D) + µ1I −s(Σ)0 −µ1I s(Λ) + s(D)

,

((l + 1)-block by (l + 1)-block), where

s(Λ) =

λ −λ 0

λ . . .. . . −λ

−λ λ

, s(Σ) =

0 µ2

µ2. . .. . . . . .

0 µ2 0

,

ands(D) = Diag(µ2, · · · , µ2, µ2).

104

6.5.3 The Re-manufacturing System

• circulant-based approximation of A5:

s(A5) =

s(B) −λI 0−s(L) s(B) −λI

. . . . . . . . .

−s(L) s(B) −λI−λI −s(L) s(BQ)

,

where

s(L) =

0 µ 0

0 . . .. . . . . .

. . . µµ 0

, s(B) = λIN+1 +

γ + µ −γ−γ γ + µ

. . . . . .

0 −γ γ + µ

,

ands(BQ) = s(B)− µI.

105

• In fact all the generator matrices A take the form

A =

m∑i=1

n⊗j=1

Aij,

where Ai1 is relatively huge in size.

• Our preconditioner is defined as

C =

m∑i=1

s(Ai1)

n⊗j=2

Aij.

• We note thatF n⊗j=2

I

∗ · C ·F n⊗

j=2

I

=

m∑i=1

Λi1

n⊗j=2

Aij =

ℓ⊕k=1

m∑i=1

λki1

n⊗j=2

Aij

which is a block-diagonal matrix.

106

• One of the advantages of our preconditioner is that it can be inverted in parallelby using a parallel computer easily. This would therefore save a lot of computa-tional cost.

• Theorem: If all the parameters stay fixed then the preconditioned matrix hassingular values clustered around 1. Thus we expect our PCG method convergesvery fast.

• Ai1 ≈ Toeplitz except for rank (s + 1) perturbation≈ s(Ai1) except for rank (s + 1) perturbation.

• R. Chan and W. Ching, Circulant Preconditioners for Stochastic Au-tomata Networks, Numerise Mathematik, (2000).

107

6.5.4 Numerical Results

• Since generator A is non-symmetric, we used the generalized CG method, theConjugate Gradient Squared (CGS) method. This method does not require themultiplication of ATx.

• Our proposed method is applied to the following systems.

(1) The Telecomunications System.

(2) The Manufacturing Systems of Two Machines in Tandem.

(3) The Re-Manufacturing System.• P. Sonneveld, A Fast Lanczos-type Solver for Non-symmetric Linear Sys-tems SIAM J. Sci. Comput., 10 (1989).

• Stopping Criteria:||rn||2||r0||2

< 10−10;

where ||rn||2 = nth step residual.

108

6.5.5 The Telecommunications System

• n, number of external queues; N , size of the main queue.• Cost per Iteration:

I C BGS

O(n2nN) O(n2nN logN) O((2n)2N)

• Number of Iterations:s = 2 n = 1 n = 4N I C BGS I C BGS32 155 8 171 161 13 11064 ∗∗ 7 242 ∗∗ 13 199128 ∗∗ 8 366 ∗∗ 14 317256 ∗∗ 8 601 ∗∗ 14 530512 ∗∗ 8 ∗∗ ∗∗ 14 958

• ’∗∗’ means greater than 1000 iterations.

109

6.5.6 The Manufacturing Systems of Two Machines in Tandem

• l, size of the first buffer; N , size of the second buffer.• Cost per Iteration:

I C BGSO(lN) O(lN logN) O(lN)

• Number of Iterations:l = 1 l = 4

N I C BGS I C BGS32 34 5 72 64 10 7264 129 7 142 139 11 142128 ∗∗ 8 345 ∗∗ 12 401256 ∗∗ 8 645 ∗∗ 12 ∗∗1024 ∗∗ 8 ∗∗ ∗∗ 12 ∗∗


110

6.5.7 The Re-Manufacturing System

• Q, size of the serviceable inventory; N , size of the return inventory.• Cost per iteration:

I C BGSO(QN) O(QN logN) O(QN)

• Number of Iterations:Q = 2 Q = 3 Q = 4

N I C BGS I C BGS I C BGS100 246 8 870 ∗∗ 14 1153 ∗∗ 19 1997200 ∗∗ 10 1359 ∗∗ 14 ∗∗ ∗∗ 19 ∗∗400 ∗∗ 10 ∗∗ ∗∗ 14 ∗∗ ∗∗ 19 ∗∗800 ∗∗ 10 ∗∗ ∗∗ 14 ∗∗ ∗∗ 19 ∗∗


111

6.6 A Summary of Learning Outcomes

1. Able to recognize and apply networks of Markovian queues for realworld problems.

2. Able to apply the network models to real applications in manu-facturing systems, re-manufacturing systems, telecommunicationNetworks.

3. Able to apply the circulant-based preconditioning techniques.

112

hkumath.hku.hkhkumath.hku.hk/~wkc/course/part3.pdf · PART III (5) Computation with Markov Chains: Iterative Methods-Vector Norms and Matrix Norms-Power Method for Matrix Eigenvalues-Iterative

Documents