The University of Sussex – Department of Mathematics G1110 & 852G1 – Numerical Linear Algebra Lecture Notes – Autumn Term 2010 Kerstin Hesse H (w) a = −(w ∗ a) w + w a w (w ∗ a) w −(w ∗ a) w w S w Figure 1: Geometric explanation of the Householder matrix H (w).
198
Embed
NLA Lecture Notes - hesse-kerstin.deTitle NLA_Lecture_Notes.dvi Created Date 1/7/2011 7:46:57 PM
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The University of Sussex – Department of Mathematics
G1110 & 852G1 – Numerical Linear Algebra
Lecture Notes – Autumn Term 2010
Kerstin Hesse
H(w) a = −(w∗ a)w + w
a
w
(w∗ a)w
−(w∗ a)ww
Sw
Figure 1: Geometric explanation of the Householder matrix H(w).
Lecture notes and course material by Holger Wendland, David Kay, and others, who taught
the course ‘Numerical Linear Algebra’ at the University of Sussex, served as a starting point
for the current lecture notes. The current lecture notes are about twice as many pages as the
previous version. Apart from corrections and improvements, many new examples and some
linear algebra revision sections have been added compared to the previous lecture notes.
Contents
Introduction and Motivation iii
0.1 Motivation: An Interpolation Problem . . . . . . . . . . . . . . . . . . . . . . . iii
0.2 Motivation: A Boundary Value Problem . . . . . . . . . . . . . . . . . . . . . . v
The vector 0 is the zero vector, where all entries are zero.
We denote by ei in Rn (or in Cn) the standard ith basis vector containing a one in the ith
component and zeros elsewhere. For example in R3 and C3, the standard basis vectors are
e1 =
1
0
0
, e2 =
0
1
0
, e3 =
0
0
1
.
The vectors x1,x2, . . . ,xm in Rn (or in Cn) are linearly independent if the following holds
true: Ifm∑
j=1
aj xj = a1 x1 + a2 x2 + . . . + am xm = 0 (1.1)
1
2 1.1. Vectors in Rn and Cn
with the real (complex) numbers a1, a2, . . . , am, then the numbers aj , j = 1, 2, . . . , m, are all
zero.
In other words, x1,x2, . . . ,xm are linearly independent if the only real (complex) numbers
a1, a2, . . . , am for which (1.1) holds are a1 = a2 = . . . = am = 0.
The vectors x1,x2, . . . ,xm in Rn (or in C
n) are linearly dependent, if they are not linearly
independent. This means x1,x2, . . . ,xm in Rn (or in Cn) are linearly dependent, if there exist
real (complex) numbers a1, a2, . . . , am not all zero such that (1.1) holds.
Any m > n vectors in Rn (in Cn) are linearly dependent.
Any set of n linearly independent vectors in Rn (in Cn) is a basis for Rn (for Cn). If
v1,v2, . . . ,vn is a basis for Rn (for Cn), then the following holds: For every vector x in Rn
(in Cn), there exist uniquely determined real (complex) numbers a1, a2, . . . , an such that
x =
n∑
j=1
aj vj = a1 v1 + a2 v2 + . . . + an vn.
For a column vector x in Rn or in C
n we denote by xT the transposed (row) vector, that is,
x =
x1
x2
...
xn
and xT =
(x1, x2, . . . , xn
).
Likewise the transpose of a row vector y is the corresponding column vector yT , that is,
y =(y1, y2, . . . , yn
)and yT =
y1
y2
...
yn
.
For a column vector x ∈ Cn we denote by x∗ := xT the conjugate (row) vector, that is,
x =
x1
x2
...
xn
and x∗ = xT =
(x1, x2, . . . , xn
).
Here, y indicates taking the complex conjugate of y ∈ C, that is, if y = a + i b with a, b ∈ R
and i the imaginary unit, then y = a − i b. Likewise the conjugate of a complex row vector y
is the corresponding conjugate column vector y∗ := yT , that is,
y =(y1, y2, . . . , yn
). and y∗ = yT =
y1
y2
...
yn
.
1. Revision: Some Linear Algebra 3
For complex numbers y = a + i b ∈ C with a, b ∈ R, we have
|y| =√
y y =√
(a − i b)(a + i b) =√
a2 + b2.
The Euclidean inner product of two real-valued vectors x,y ∈ Rn is given by
xT y = xT · y =n∑
j=1
xj yj = x1 y1 + x2 y2 + . . . + xn yn.
We note that the Euclidean inner product for Rn is symmetric, that is, xT y = yT x for any
x,y ∈ Rn.
The Euclidean inner product of two complex vectors x,y ∈ Cn is given by
x∗ y = x∗ · y = xT · y =n∑
j=1
xj yj = x1 y1 + x2 y2 + . . . + xn yn.
The Euclidean inner product for Cn satisfies x∗y = y∗x for any x,y ∈ C
n.
The Euclidean norm of a vector x ∈ Rn (or x ∈ Cn) is defined by
‖x‖2 =√
xTx =
(n∑
j=1
|xj |2)1/2
or ‖x‖2 =
√x∗ x =
(n∑
j=1
|xj |2)1/2
.
The geometric interpretation of the Euclidean norm ‖x‖2 of a vector x in Rn or Cn is that ‖x‖2
measures the length of x.
We say that two vectors x and y in Rn (or in Cn) are orthogonal (to each other) if xT y = 0 (or
if x∗ y = 0, respectively). For vectors in Rn, this means geometrically that the angle between
these two vectors is π/2, that is 90◦. A set of m vectors x1,x2, . . . ,xm in Rn (or in Cn) is called
orthogonal, if they are mutually orthogonal, that is, if xj is orthogonal to xk whenever j 6= k.
It is easily checked that a set of orthogonal vectors is, in particular, also linearly independent.
A basis v1,v2, . . . ,vn of Rn is called an orthonormal basis for Rn if the basis vectors have
all length one and are mutually orthogonal, that is
‖vj‖2 = 1 for j = 1, 2, . . . , n, and vTj vk = 0 for j, k = 1, 2 . . . , n with j 6= k.
Likewise, a basis v1,v2, . . . ,vn of Cn is called an orthonormal basis for Cn if the basis vectors
have all length one and are mutually orthogonal, that is,
‖vj‖2 = 1 for j = 1, 2, . . . , n, and v∗j vk = 0 for j, k = 1, 2 . . . , n with j 6= k.
An orthonormal basis v1,v2, . . . ,vn of Rn has a very useful property: Any vector x ∈ Rn
has the representation
x =n∑
j=1
(vTj x)vj (1.2)
4 1.2. Matrices
as a linear combination of the basis vectors v1,v2, . . . ,vn.
The validity of (1.2) is easily established: Assume that x =∑n
j=1 aj vj, and take the Euclidean
inner product with vk. Because v1,v2, . . . ,vn form an orthonormal basis, vTk vj = 0 if j 6= k
and vTk vj = ‖vk‖2
2 = 1 if j = k. Thus
vTk x = vT
k
(n∑
j=1
aj vj
)=
n∑
j=1
aj vTk vj = ak vT
k vk = ak ‖vk‖22 = ak.
Replacing aj = vTj x in x =
∑nj=1 aj vj now verifies (1.2).
Analogously to (1.2) an orthonormal basis v1,v2, . . . ,vn of Cn has the following property: Any
vector x ∈ Cn has the representation
x =
n∑
j=1
(v∗j x)vj (1.3)
as a linear combination with respect to the orthonormal basis v1,v2, . . . ,vn.
Exercise 1 State the properties of an inner product/scalar product for a complex vector space
V , and verify that the Euclidean inner product for Cn has these properties.
Exercise 2 Show formula (1.3).
1.2 Matrices
The matrix A ∈ Cm×n (or A ∈ R
m×n) is an m × n (m rows and n columns) matrix with
complex-valued (or real-valued) entries:
A := (ai,j) = (ai,j)1≤i≤m; 1≤j≤n :=
a1,1 a1,2 · · · a1,n
a2,1 a2,2 · · · a2,n
......
...
am,1 am,2 · · · am,n
,
where ai,j ∈ C (and ai,j ∈ R, respectively).
Occasionally, we will denote the column vectors of a matrix A = (ai,j) in Cm×n (or in Rm×n)
by aj , j = 1, 2, . . . , n, that is
A := (a1, a2, · · · , an), with aj =
a1,j
a2,j
...
am,j
∈ C
m (or ∈ Rm), j = 1, 2, . . . , n.
1. Revision: Some Linear Algebra 5
To denote the (i, j)th entry ai,j of A = (ai,j) we may occasionally also write Ai,j := ai,j or
A(i, j) := ai,j.
A matrix is called square if it has the same number of rows and columns. Thus square matrices
are matrices in Cn×n or R
n×n. The diagonal of a square matrix A = (ai,j) in Cn×n or in Rn×n
are the entries aj,j, j = 1, 2, . . . , n.
Vectors are special cases of matrices, and a column vector x ∈ Cn (or x ∈ Rn) is just an n × 1
matrix. Likewise a row vector in Cn (or Rn) is just an 1 × n matrix.
Two matrices of special importance are the m×n zero matrix, and, among the square matrices,
the n × n identity matrix: The zero matrix in Cm×n and in Rm×n is the m × n matrix that
has all entries zero. We denote the m × n zero matrix by 0. The identity matrix in Cn×n
and in Rn×n is the n × n matrix in which the entries on the diagonal are all one and all other
entries are zero. We denote the n× n identity matrix by I. For example, in C3×3 and R
3×3 we
have
0 =
0 0 0
0 0 0
0 0 0
and I =
1 0 0
0 1 0
0 0 1
.
The scalar multiplication of a matrix A = (ai,j) in Cm×n (or in Rm×n) with a complex (or
real) number µ is defined componentwise, that is, µ A in Cm×n (or in Rm×n, respectively) is
defined by
(µ A)i,j := µ ai,j, i = 1, 2, . . . , m; j = 1, 2, . . . , n. (1.4)
The addition of two m× n matrices A = (ai,j) and B = (bi,j) in Cm×n (or in Rm×n) is defined
componentwise, that is, A + B in Cm×n (or in Rm×n, respectively) is defined by
(A + B)i,j := ai,j + bi,j, i = 1, 2, . . . , m; j = 1, 2, . . . , n. (1.5)
The set Cm×n (or Rm×n) of complex (or real) m×n matrices with the scalar multiplication (1.4)
and the addition (1.5) forms a complex vector space (or real vector space, respectively).
The matrix multiplication A B of A = (ai,j) ∈ Cm×n and B = (bi,j) ∈ Cn×p (or A = (ai,j) ∈Rm×n and B = (bi,j) ∈ Rn×p) gives the matrix C = (ci,j) ∈ Cm×p (or C = (ci,j) ∈ Rm×p,
where we use that xn, xn−1, . . . , xj+1 are known from previous steps. We can continue this
procedure until we have computed all xj , j = n, n − 1, . . . , 2, 1. This procedure is called back
substitution and the following theorem summarizes what we have derived just now.
Theorem 2.15 (back substitution algorithm)
Let A = (ai,j) in Cn×n (or in R
n×n) be an upper triangular matrix that is also invertible.
Then the solution to Ax = b can be computed with O(n2) elementary operations via
xj =1
aj,j
(bj −
n∑
k=j+1
aj,k xk
), j = n, n − 1, . . . , 2, 1. (2.16)
For the definition of the Landau symbol O see Section 1.6.
2. Matrix Theory 29
Proof of Theorem 2.15. Essentially we have derived the theorem above before stating
it; the only part of the statement that needs some consideration are the O(n2) elementary
operations. Let us consider (2.16) for a given fixed j. The computation of xj involves n − j
additions/subtractions and n− j + 1 multiplications/divisions, that is, 2n + 1− 2j elementary
operations. Thus the back substitution algorithm needs in total
n∑
j=1
(2n + 1 − 2j) = (2n + 1)n − 2n∑
j=1
j = (2n + 1)n − (n + 1)n = n2,
that is, O(n2) elementary operations. 2
Example 2.16 (back substitution)
Solve the following linear system with an upper triangular matrix with back substitution:
1 0 2
0 1 −1
0 0 −3
x1
x2
x3
=
3
0
6
.
Solution: Using back substitution, we have
x3 =6
(−3)= −2, x2 =
1
1
(0−(−1) x3
)= x3 = −2, x1 =
1
1
(3−0 x2−2 x3
)= 3−2 (−2) = 7.
Thus the solution is x = (7,−2,−2)T . 2
The MATLAB code for the implementation of the back substitution algorithm is:
function x = back_sub(U,b)
% executes the back substitution algorithm for solving U x = b
% input: U = n by n upper triangular matrix
% b = 1 by n matrix, right-hand side
% output: x = 1 by n vector
n = size(U,1);
x = zeros(1,n);
x(n) = b(n) / U(n,n);
for i = n-1:-1:1
x(i) = (b(i) - U(i,i+1:n) * x(i+1:n)’) / U(i,i);
end
Exercise 26 Solve the following linear system by hand with the back substitution algorithm:
1 1 1
0 2 2
0 0 3
x1
x2
x3
=
−1
3
6
.
30 2.3. Schur Factorization: A Triangular Canonical Form
Exercise 27 Solve the following linear system by hand with the back substitution algorithm:
2 −1 3 1
0 1 2 −1
0 0 −2 1
0 0 0 3
x1
x2
x3
x4
=
12
−3
1
9
.
Exercise 28 Show that the upper triangular matrices with diagonal elements all different from
zero, with the usual matrix multiplication, form a (multiplicative) group.
Exercise 29 Forward substitution: Consider a linear system Ax = b, where A ∈ Rn×n
is a lower triangular matrix, b ∈ Rn the given right-hand side, and x ∈ Rn the unknown
solution. Analogously to the back substitution algorithm we can formulate a forward sub-
stitution algorithm to compute xj, j = 1, 2, . . . , n − 1, n. Derive the forward substitution
algorithm.
2.3 Schur Factorization: A Triangular Canonical Form
In this section we encounter the Schur factorization which guarantees that for any matrix
A ∈ Cn×n, there exists a unitary matrix such that S∗ A S is an upper triangular matrix. Since
the matrix S is unitary, we have S−1 = S∗ and therefore S∗ A S = S−1 A S, that is, we have
a basis transformation (or similarity transformation) with a unitary matrix that transforms A
into an upper triangular matrix. The proof of the Schur factorization is constructive in that
we will explicitly construct the matrix S with the help of so-called Householder matrices or
elementary Hermitian matrices.
Definition 2.17 (Householder matrix or elementary Hermitian matrix)
A Householder matrix or elementary Hermitian matrix is any matrix of the form
H(w) := I − 2ww∗, where w ∈ Cn, with w∗ w = ‖w‖2
2 = 1 or w = 0.
(2.17)
Figure 2.1 illustrates that the Householder H(w) matrix w with w 6= 0 represents a reflection
on the hyperplane
Sw = {z ∈ Cn : w∗ z = 0}
that is orthogonal to w. Indeed consider any vector a ∈ Cn and decompose it into the compo-
nent in the direction of w and the orthogonal part (which lies in Sw):
a = (w∗ a)w +(a − (w∗ a)w
). (2.18)
If we apply H(w) to the vector a, then, from (2.18) and w∗ w = 1,
H(w) a = (I − 2ww∗) a = a− 2ww∗ a
2. Matrix Theory 31
= (w∗ a)w +(a − (w∗ a)w
)− 2ww∗
((w∗ a)w +
(a− (w∗ a)w
))
= (w∗ a)w +(a − (w∗ a)w
)− 2ww∗
((w∗ a)w
)
= (w∗ a)w +(a − (w∗ a)w
)− 2 (w∗ a) (w∗ w)w
= −(w∗ a)w +(a− (w∗ a)w
),
where we have used that a− (w∗ a)w is orthogonal to w. From the last representation we see
that H(w) a is indeed a reflection of a on the hyperplane Sw.
H(w) a = −(w∗ a)w + w
a
w
(w∗ a)w
−(w∗ a)ww
Sw
Figure 2.1: Householder transformation: In the picture w denotes the projection of a onto
the hyperplane Sw. Then a = (w∗ a)w + w, and since w∗ w = 0 and w∗ w = 1, we find
H(w) a = (I − 2ww∗)((w∗ a)w + w) = (w∗ a)w + w − 2 (w∗ a)w = −(w∗ a)w + w.
Example 2.18 (Householder matrix)
Let w = (0,−3/5, 4/5)T . Then ‖w‖2 = 1, and the matrix
H(w) = I − 2
0
− 35
45
(0,− 3
5, 4
5
)= I − 2
0 0 0
0 925
− 1225
0 − 1225
1625
=
1 0 0
0 725
4925
0 4925
− 725
is a 3 × 3 Householder matrix. 2
The next lemma states some properties of Householder matrices.
32 2.3. Schur Factorization: A Triangular Canonical Form
Lemma 2.19 (properties of Householder matrices)
A Householder matrix H(w), given by (2.17), has the following properties:
(i) H(w) is Hermitian, that is,(H(w)
)∗= H(w).
(ii) H(w) is invertible/non-singular.
(iii) det(H(w)
)= −1 for w 6= 0.
(iv) H(w) is unitary, that is,(H(w)
)−1=(H(w)
)∗. Hence, a product of Householder
matrices is unitary.
(v) Storing H(w) only requires the n elements of w.
Proof of Lemma 2.19. From (A B)∗ = B∗ A∗, (A + B)∗ = A∗ + B∗, and (A∗)∗ = A we find
(H(w)
)∗=(I − 2ww∗)∗ = I∗ − 2 (ww∗)∗ = I − 2 (w∗)∗ w∗ = I − 2ww∗ = H(w),
thus proving (i).
Next we work out det(H(w)
). If w = 0 then H(w) = I and hence det
(H(w)
)= 1. If
w 6= 0, then we will show that the eigenvalues of H(w) are 1, with multiplicity n − 1, and −1,
with multiplicity 1. Therefore, we have that det(H(w)
)= (−1) 1n−1 = −1, which proves (iii).
Consider as before the hyperplane Sw = {z ∈ Cn : w∗ z = 0}, which is a (n − 1)-dimensional
subspace of Cn. For any vector a ∈ Sw, we have w∗ a = 0, and hence for a ∈ Sw,
H(w) a =(I − 2ww∗) a = a− 2w (w∗ a) = a,
that is, a is an eigenvector of H(w) corresponding to the eigenvalue λ = 1. Since dim(Sw) =
n − 1, the eigenvalue λ = 1 has at least n − 1 linearly independent eigenvectors and hence it
has at least multiplicity n − 1. Now consider the vector w itself. Then, since w∗ w = 1,
H(w)w =(I − 2ww∗)w = w − 2w (w∗ w) = −w,
that is, w is an eigenvector corresponding to the eigenvalue λ = 1. Combining these results,
we see that the eigenvalue λ = 1 has the multiplicity n − 1 and the eigenvalue λ = −1 has the
multiplicity 1.
From the proof so far it is clear that H(w) is invertible, since we have established that its
determinant is different from zero. Thus (ii) holds true.
To show that H(w) is unitary, we have to show that
(H(w)
)∗H(w) = H(w)
(H(w)
)∗= I.
Since(H(w)
)∗= H(w) from (i), it is enough to show that H(w) H(w) = I. Indeed,
H(w) H(w) =(I − 2ww∗) (I − 2ww∗)
2. Matrix Theory 33
= I − 4ww∗ + 4 (ww∗) (ww∗)
= I − 4ww∗ + 4w (w∗ w)w∗
= I − 4ww∗ + 4ww∗ = I
where we have used the associativity of matrix multiplication and w∗ w = 1.
That the product of Householder matrixes is also unitary follows from the following general
statement: if A and B in Cn×n are unitary, then A B is unitary. Indeed (A B)∗ = B∗ A∗ =
B−1 A−1, and we know that B−1 A−1 is the inverse matrix to A B.
Statement (v) is evident. 2
Lemma 2.20 (construction of Householder matrices)
Let x and y be given vectors in Cn such that y∗ y = x∗ x and y∗ x = x∗ y. Then there exists
a Householder matrix (or an elementary Hermitian matrix) H(w), such that H(w)x = y
and H(w)y = x. If x and y are real then so is w.
Proof of Lemma 2.20. If x = y then w = 0 and H(w) = I. If x 6= y we define
w =x − y√
(x − y)∗ (x − y)=
x − y
‖x − y‖2. (2.19)
Clearly, w∗ w = 1, and from (2.19)
H(w)x =(I − 2ww∗)x = x − (x − y)
2 (x − y)∗ x
(x − y)∗ (x − y), (2.20)
and, using x∗ x = y∗ y = 1 and x∗ y = y∗ x,
2 (x− y)∗ x = 2x∗ x − 2y∗ x =(x∗ x + y∗ y
)−(y∗ x + x∗ y
)= (x − y)∗ (x − y). (2.21)
Substituting (2.21) into (2.20) yields now H(w)x = y. From (H(w))−1 = (H(w))∗ = H(w)
(see (i) and (iv) in Lemma 2.19) and H(w)x = y, we have
H(w)y = H(w)(H(w)x
)=(H(w) H(w)
)x = I x = x.
If the vectors x and y are real, then, from the definition (2.19), clearly the vector w is also
real. 2
Lemma 2.20 is often used to map a given vector x = (x1, x2, . . . , xn)T onto a scalar multiple of
the first unit vector e1 = (1, 0, 0, . . . , 0)T , that is, we want to find a Householder matrix H(w)
such that
H(w)x = c e1
with a suitable complex number c. From the conditions in Lemma 2.20, we find
(c e1)∗ (c e1) = (c c) (e∗
1 e1) = |c|2 = x∗ x = ‖x‖22 ⇒ |c| = ‖x‖2
34 2.3. Schur Factorization: A Triangular Canonical Form
and
(c e1)∗ x = c x1 = x∗ (c e1) = x1 c = c x1 ⇒ c x1 ∈ R,
and thus c is a real multiple of x1. Combining both properties, we see that
c =x1
|x1|‖x‖2,
and we choose the vector w of the Householder matrix from (2.19) as
w =
x − x1
|x1|‖x‖2 e1
√(x − x1
|x1|‖x‖2 e1
)∗(x − x1
|x1|‖x‖2 e1
) =
x − x1
|x1|‖x‖2 e1
∥∥∥∥x − x1
|x1|‖x‖2 e1
∥∥∥∥2
.
We can now use Lemma 2.20 to prove the following Theorem.
Theorem 2.21 (Schur factorization)
Let A be a matrix in Cn×n. Then there exists a unitary matrix S ∈ Cn×n such that S∗ A S
is an upper triangular matrix. This is known as the Schur factorization of A.
Proof of Theorem 2.21. The proof is given by induction over n.
Initial step: Clearly the result holds for n = 1.
Induction step n − 1 → n: Assume now the result holds for all (n − 1)× (n − 1) matrices. We
need to show that the result also holds for all n × n matrices.
Let x be an eigenvector to some eigenvalue λ of A, that is
Ax = λx, where x 6= 0. (2.22)
By Lemma 2.20 and our considerations after this lemma, there exists a Householder matrix
H(w1) such that
H(w1)x = c1 e1 and H(w1) e1 =1
c1
x, (2.23)
with |c1| = ‖x‖2 6= 0. Using(H(w1)
)∗= H(w1) and (2.22) and (2.23) we have
(H(w1)
)∗A H(w1) e1 =
1
c1
H(w1) Ax =λ
c1
H(w1)x =λ
c1
c1 e1 = λ e1.
Since(H(w1)
)∗A H(w1) e1 is the first column of
(H(w1)
)∗A H(w1), we may write the matrix(
H(w1))∗
A H(w1) in the form
(H(w1)
)∗A H(w1) = H(w1) A H(w1) =
λ | a∗
− − −0 | A(1)
, (2.24)
2. Matrix Theory 35
for some vector a ∈ Cn−1 and some (n − 1) × (n − 1) matrix A(1).
By the induction hypothesis there exists an (n − 1) × (n − 1) unitary matrix V , such that
V ∗ A(1) V = T , where T is an upper triangular (n − 1) × (n − 1) matrix. Consider the matrix
S = H(w1)
1 | 0T
− − −0 | V
with S∗ =
1 | 0T
− − −0 | V ∗
(H(w1)
)∗.
From V ∗ V = V V ∗ = I and(H(w1)
)∗H(w1) = H(w1)
(H(w1)
)∗= I, then S S∗ = S∗ S = I,
that is, the matrix S is unitary. We will now show that S∗ A S is an upper triangular matrix.
From (2.24)
S∗ A S =
1 | 0T
− − −0 | V ∗
(H(w1)
)∗A H(w1)
1 | 0T
− − −0 | V
=
1 | 0T
− − −0 | V ∗
λ | a∗
− − −0 | A(1)
1 | 0T
− − −0 | V
=
1 | 0T
− − −0 | V ∗
λ | a∗ V
− − −0 | A(1) V
=
λ | a∗V
− − −0 | V ∗A(1) V
=
λ | a∗V
− − −0 | T
.
Hence, S∗ A S is upper triangular. 2
Example 2.22 (Schur factorization)
Find the Schur factorization of the matrix
A =
3 0 1
0 3 0
1 0 3
.
Solution: To do this we follow the steps of the proof of for Theorem 2.21. First we find the
eigenvalues of A.
p(A, λ) = det(λ I − A
)= det
λ − 3 0 −1
0 λ − 3 0
−1 0 λ − 3
36 2.3. Schur Factorization: A Triangular Canonical Form
= (λ − 3)3 − (λ − 3) = (λ − 3)[(λ − 3)2 − 1
]= (λ − 3) (λ − 2) (λ − 4),
and we see that the eigenvalues are λ1 = 4, λ2 = 3, and λ3 = 2.
We choose λ2 = 3, find a corresponding eigenvector, and determine the Householder matrix
that maps the eigenvector onto c e1. For λ = λ2 = 3, the eigenvector x2 satisfies the linear
system
0 0 −1
0 0 0
−1 0 0
x2 = 0 ⇒ x2 = α
0
1
0
, α ∈ R.
For α = 1, ‖e1‖2 = ‖x2‖ = 1 and e∗1 x2 = x∗
2 e1 = 0. Hence, we choose the vector w1 for the
first Householder matrix as
w1 =x2 − e1
‖x2 − e1‖=
1√2
−1
1
0
.
The corresponding Householder matrix is given by
H(w1) = I−2w1 w∗1 = I−2
1
(√
2)2
−1
1
0
(−1, 1, 0) = I−
1 −1 0
−1 1 0
0 0 0
=
0 1 0
1 0 0
0 0 1
,
and it is easily verified that indeed H(w1)x2 = e1 and H(w1) e1 = x2. Now we execute the
matrix multiplication(H(w1)
)∗A H(w1) = H(w1) A H(w1) and obtain
H(w1) A H(w1) =
0 1 0
1 0 0
0 0 1
3 0 1
0 3 0
1 0 3
0 1 0
1 0 0
0 0 1
=
0 1 0
1 0 0
0 0 1
0 3 1
3 0 0
0 1 3
=
3 0 0
0 3 1
0 1 3
.
Now we consider the 2 × 2 submatrix
A(1) =
(3 1
1 3
),
and determine its eigenvalues:
p(A(1), λ) = det(λ I − A(1)
)= det
(λ − 3 −1
−1 λ − 3
)= (λ − 3)2 − 1 = (λ − 2) (λ − 4).
2. Matrix Theory 37
The eigenvalues of A(1) are λ1 = 4 and λ2 = 2. We choose λ2 = 2 and find a corresponding
eigenvector x(1)2
(−1 −1
−1 −1
)x
(1)2 = 0 ⇒ x
(1)2 = α
(1
−1
), α ∈ R.
For α = 1, ‖x(1)2 ‖2 = ‖
√2 e1‖, where e1 is now the first unit vector in R2, and we have
(x(1)2 )∗ (
√2 e1) = (
√2 e1)
∗ x(1)2 . The vector w
(1)2 of the Householder matrix in R
2 is given by
w(1)2 =
x(1)2 −
√2 e1∥∥x(1)
2 −√
2 e1
∥∥2
=((1 −
√2)2 + 1
)−1/2
(1 −
√2
−1
)=(2 (2 −
√2))−1/2
(1 −
√2
−1
).
Thus the Householder matrix in R2 is given by
H(w(1)2 ) = I − 2w
(1)2
(w
(1)2
)∗= I −
(2 −
√2)−1
(1 −
√2
−1
)(1 −
√2,−1
)
= I +(√
2 (1 −√
2))−1
((1 −
√2)2 −(1 −
√2)
−(1 −√
2) 1
)
= I +(√
2)−1
((1 −
√2) −1
−1 (1 −√
2)−1
)
=
((√
2)−1 −(√
2)−1
−(√
2)−1 −(√
2)−1
)= − 1√
2
(−1 1
1 1
).
The corresponding unitary matrix in R3 is then given by
H2 :=
1 0 0
0 (√
2)−1 −(√
2)−1
0 −(√
2)−1 −(√
2)−1
,
and
H2
(H(w1) A H(w1)
)H2
=
1 0 0
0 (√
2)−1 −(√
2)−1
0 −(√
2)−1 −(√
2)−1
3 0 0
0 3 1
0 1 3
1 0 0
0 (√
2)−1 −(√
2)−1
0 −(√
2)−1 −(√
2)−1
=
1 0 0
0 (√
2)−1 −(√
2)−1
0 −(√
2)−1 −(√
2)−1
3 0 0
0√
2 −2√
2
0 −√
2 −2√
2
=
3 0 0
0 2 0
0 0 4
.
38 2.3. Schur Factorization: A Triangular Canonical Form
Thus we have
S∗ A S =
3 0 0
0 2 0
0 0 4
with the unitary matrix
S := H(w1) H2 =
0 1 0
1 0 0
0 0 1
1 0 0
0 (√
2)−1 −(√
2)−1
0 −(√
2)−1 −(√
2)−1
=
0 (√
2)−1 −(√
2)−1
1 0 0
0 −(√
2)−1 −(√
2)−1
.
In this example the Schur factorization has brought A into diagonal form, but in general this
is not the case, and we only obtain a upper triangular matrix. 2
We note here that the Schur factorization is mainly of ‘theoretical interest’, but is not interesting
for implementation or as an algorithm. For example, you can use the Schur factorization to
prove the statements investigated in the remark and the exercise below.
Remark 2.23 (special case of Theorem 2.21 for A∗ = A)
An important consequence of Theorem 2.21 is that if A is Hermitian, that is, A∗ = A, then the
upper triangular matrix S∗ A S is also Hermitian, and thus S∗ A S = S−1 A S must be a real
diagonal matrix. Furthermore, A is Hermitian if and only if all eigenvalues are real
and there are n orthonormal eigenvectors!
Exercise 30 Construct a Householder matrix that maps the vector (2, 0, 1)T onto the vector
(√
5, 0, 0)T .
Exercise 31 Let A ∈ Cn×n be an Hermitian matrix, that is, A∗ = AT
= A. Without using
any known results, but by just exploiting the definition of an Hermitian matrix, show that A
has only real eigenvalues.
Exercise 32 Let A be a Hermitian matrix.
(a) By using the Schur factorization, show that there exists a unitary matrix S such that
S∗ A S = U , where U is a real diagonal matrix.
(b) Use the Schur factorization to show that Cn has an orthonormal basis of eigenvectors of A.
(c) Show that A is positive definite if and only if all eigenvalues are positive.
(d) Show that if A is positive definite, then det(A) > 0.
(e) Show that A is positive definite if and only if A = Q∗ Q, with some matrix Q satisfying
det(Q) 6= 0.
2. Matrix Theory 39
2.4 Vector Norms
In this section we briefly revise some material on norms that has been covered in ‘Further
Analysis’.
Definition 2.24 (norm, normed linear space, and unit vector)
Let V be a complex (or real) vector space. A norm for V is a function ‖ · ‖ : V → R with
the following properties: For all x,y ∈ V and α ∈ C (or α ∈ R) we have
(i) ‖x‖ ≥ 0; and ‖x‖ = 0 if and only if x = 0,
(ii) ‖αx‖ = |α| ‖x‖, and
(iii) ‖x + y‖ ≤ ‖x‖ + ‖y‖ (triangle inequality).
A vector space V with a norm ‖ · ‖ is called a normed vector space or normed linear
space.
A unit vector with respect to the norm ‖ · ‖ is a vector x ∈ V such that ‖x‖ = 1.
Example 2.25 (norms on Cn and R
n)
Here is a list of the most important vector norms for Cn (or Rn): for x = (x2, x2, . . . , xn)T in
Cn or Rn, define the p-norms by
(a) 1-norm: ‖x‖1 :=
n∑
j=1
|xj | = |x1| + |x2| + . . . + |xn|,
(b) 2-norm or Euclidean norm: ‖x‖2 :=
(n∑
j=1
|xj |2)1/2
,
(c) p-norm: ‖x‖p :=
(n∑
j=1
|xj |p)1/p
=(|x1|p + |x2|p + . . . + |xn|p
)1/p
, for 1 ≤ p < ∞,
(d) ∞-norm: ‖x‖∞ := max1≤j≤n
|xj |.
We will use these p-norms frequently in this course, in particular, for p ∈ {1, 2,∞}. 2
Example 2.26 (unit balls with respect to some norms on Rn)
The unit ball in Rn with respect to the p-norm ‖ · ‖p is defined to be the set
Bp :={x ∈ R
n : ‖x‖p ≤ 1}.
For R2, Figure 2.2 below shows the unit balls B1, B2, and B∞ with respect to the 1-norm,
the Euclidean norm, and the ∞-norm, respectively. Only the unit ball with respect to the
Euclidean norm looks like we imagine a ball. 2
The following theorem generalizes the Cauchy-Schwarz inequality.
40 2.4. Vector Norms
1−1
1
−1
B∞
B2
B1
Figure 2.2: The unit balls Bp in R2 with respect to the p-norms ‖ · ‖1 (blue), ‖ · ‖2 (black), and
‖ · ‖∞ (red), respectively.
Theorem 2.27 (Holder’s inequality)
The p-norms ‖ · ‖p, where 1 ≤ p ≤ ∞ for Cn (and R
n), defined in Example 2.25 above,
satisfy Holder’s inequality:
|x∗ y| ≤ ‖x‖p ‖y‖q , where1
p+
1
q= 1.
The special case p = q = 2 is known as the Cauchy-Schwarz inequality.
The next definition has far reaching consequences.
Definition 2.28 (equivalent norms)
Two norms ‖ · ‖ and ‖| · ‖| for a (real or complex) vector space V are called equivalent if
there are two positive constants c1 and c2 such that
c1‖x‖ ≤ ‖|x‖| ≤ c2‖x‖ for all x ∈ V.
A norm allows us to define the notion of convergence of sequences.
Definition 2.29 (convergence with respect to a norm)
Let V be a (real or complex) vector space, and let ‖·‖ : V → R be a norm for V . A sequence
{xk} ⊂ V converges with respect to ‖ · ‖ to x ∈ V if
limk→∞
‖xk − x‖ = 0.
If two norms are equivalent, then they define the same notion of convergence.
2. Matrix Theory 41
Theorem 2.30 (equivalence of all norms on Cn (or Rn))
On Cn (or R
n) all norms are equivalent.
Proof of Theorem 2.30. It suffices to show that all norms are equivalent to the 1-norm
‖ · ‖1. Let ‖ · ‖ be any norm on Cn. The representation x =
∑nj=1 xj ej of any vector x =
(x1, x2, . . . , xn)T shows that
‖x‖ =
∥∥∥∥∥
n∑
j=1
xj ej
∥∥∥∥∥ ≤n∑
j=1
‖xj ej‖ =
n∑
j=1
|xj| ‖ej‖ ≤(
max1≤j≤n
‖ej‖)‖x‖1 =: M ‖x‖1,
with M := max1≤j≤n ‖ej‖. From this we can conclude that the norm ‖·‖ is Lipschitz continuous
with respect to the 1-norm ‖ · ‖1, that is,
∣∣‖x‖ − ‖y‖∣∣ ≤ ‖x − y‖ ≤ M ‖x − y‖1 for all x,y ∈ C
n.
(See Exercise 35 for the first inequality.) Since the unit sphere S1 = {x ∈ Cn : ‖x‖1 = 1}in Cn is closed and bounded, the unit sphere S1 is compact. Hence the norm ‖ · ‖ attains its
minimum and maximum on S1 (because it is a continuous function), that is, there are positive
constants c1 and c2 such that
c1 ≤ ‖x‖ ≤ c2 for all x ∈ Cn with ‖x‖1 = 1. (2.25)
For general x ∈ Cn \ {0}, we apply (2.25) to the unit vector y = x/‖x‖1 and obtain
c1 ≤1
‖x‖1‖x‖ ≤ c2 ⇒ c1 ‖x‖1 ≤ ‖x‖ ≤ c2 ‖x‖1. (2.26)
The second estimate in (2.26) holds for all x ∈ Cn \ {0} and trivially also for x = 0. Thus we
have verified that ‖ · ‖ and ‖ · ‖1 are equivalent . 2
Example 2.31 (equivalent norms on Rn)
For example, we have for all x ∈ Rn
‖x‖2 ≤ ‖x‖1 ≤√
n ‖x‖2 ,
‖x‖∞ ≤ ‖x‖2 ≤√
n ‖x‖∞ ,
‖x‖∞ ≤ ‖x‖1 ≤ n ‖x‖∞ .
The estimates will be proved in Exercise 36 below. 2
Exercise 33 Show that ‖ · ‖1 and ‖ · ‖∞ are norms for Cn.
Exercise 34 By considering the inequality
0 ≤(α x + β y)T
(αx + β y)
and choosing α, β ∈ R appropriately, for x,y ∈ Rn, prove the Cauchy-Schwarz inequality.
42 2.5. Matrix Norms
Exercise 35 Prove the lower triangle inequality: If ‖ · ‖ is a norm for a vector space V ,
then ∣∣‖x‖ − ‖y‖∣∣ ≤ ‖x − y‖ for all x,y ∈ V.
Hint: Consider writing y = x + (y − x), and use the triangle inequality.
Exercise 36 Show the inequalities in Example 2.31
2.5 Matrix Norms
When analyzing matrix algorithms we will require the use of matrix norms, since they allow
us to estimate whether the matrix is ‘well-conditioned’. If the matrix is not ‘well-conditioned’,
(which means usually if the matrix is close to being singular), then the quality of a numerically
computed solution could be poor.
We recall that the set of all complex (or real) m × n matrices with the usual component-wise
matrix addition and scalar multiplication forms a complex (or real) vector space.
Lemma 2.32 (m × n matrices form a vector space)
The set Cm×n (or Rm×n) of all complex (or real) m × n matrices with the component-wise
addition of A = (ai,j) and B = (bi,j), defined by
(A + B)i,j := ai,j + bi,j , i = 1, 2, . . . , m; j = 1, 2, . . . , n,
and the component-wise scalar multiplication of α ∈ C (or α ∈ R) and A = (ai,j), defined
by
(α A)i,j := α ai,j , i = 1, 2, . . . , m; j = 1, 2, . . . , n,
is a complex (or real) vector space.
Proof of Lemma 2.32. The proof is straight-forward and was covered in Exercise 3. 2
Since Cm×n (and Rm×n) are vector spaces, we can introduce norms on them. For many purposes,
it is convenient to introduce norms, which are ‘induced’ by given vector norms on Cm and Cn
(and Rm and Rn, respectively). We will now explore the concept of an ‘induced matrix norm’.
In this section, we write ‖·‖(m) to indicate that ‖·‖(m) is a norm on an m-dimensional vector
space (usually Cm or Rm).
Let A ∈ Cm×n. Since the unit sphere S = {x ∈ C
n : ‖x‖(n) = 1} in the finite dimensional
space Cn with norm ‖ · ‖(n) is closed and bounded, and hence compact, the real-valued function
‖Ax‖(m) assumes its supremum on the unit sphere S. Thus
supx∈Cn,
‖x‖(n)=1
‖Ax‖(m) = ‖Ax′‖(m) =: C for some x′ ∈ Cn with ‖x′‖(n) = 1, (2.27)
2. Matrix Theory 43
and, in particular, the supremum has a finite value C. We also see from (2.27) that
‖Ax‖(m) ≤ C for all x ∈ Cn with ‖x‖(n) = 1 ⇔ ‖Ay‖(m) ≤ C ‖y‖(n) for all y ∈ C
n,
where the equivalence follows by setting x = y/‖y‖(n) for y 6= 0. This motivates the definition
of the induced matrix norm.
Definition 2.33 (induced matrix norm)
Let ‖·‖(m) and ‖·‖(n) be norms on Cm (or R
m) and Cn (or R
n), respectively, and let A ∈Cm×n (or Rm×n). The induced matrix norm of A is defined by
‖A‖(m,n) := supx∈Cn,x 6=0
‖Ax‖(m)
‖x‖(n)
= supx∈Cn,
‖x‖(n)=1
‖Ax‖(m) .
Obviously, for any x ∈ Cn (or Rn), with x 6= 0,
‖Ax‖(m)
‖x‖(n)
≤ ‖A‖(m,n) ,
and hence,
‖Ax‖(m) ≤ ‖A‖(m,n) ‖x‖(n) for all x ∈ Cn (or x ∈ R
n). (2.28)
Exercise 37 Let ‖ · ‖(m) and ‖ · ‖(n) be norms for Cm and C
n, respectively, and let A ∈ Cm×n.
Show that the induced matrix norm ‖ · ‖(m,n) satisfies the properties of a norm.
Definition 2.34 (compatible matrix norm)
Let ‖·‖(m) and ‖·‖(n) be norms on Cm (or Rm) and Cn (or Rn), respectively, and let ‖ ·‖(m,n)
denote the induced matrix norm on Cm×n (or Rm×n). Let ‖| ·‖| denote another matrix norm
on Cm×n (or Rm×n). We say that a matrix norm ‖| · ‖| is compatible with the induced
matrix norm ‖ · ‖(m,n) if
‖Ax‖(m) ≤ ‖|A‖| ‖x‖(n) for all x ∈ Cn(or x ∈ R
n).
We observe that by the definition of the induced matrix norm ‖ · ‖(m,n), any compatible matrix
norm ‖| · ‖| clearly satisfies
‖A‖(m,n) ≤ ‖|A‖| for all A ∈ Cm×n
(or A ∈ R
m×n), (2.29)
and, in fact, (2.29) characterizes a compatible matrix norm.
The next two definitions introduce important matrix norms.
44 2.5. Matrix Norms
Definition 2.35 (Frobenius norm)
The Frobenius norm for an m × n matrix A = (ai,j) (in Cm×n or R
m×n) is given by
‖A‖F =
(m∑
i=1
n∑
j=1
|ai,j|2)1/2
. (2.30)
Definition 2.36 (induced matrix p-norms)
Let A ∈ Cm×n (or A ∈ Rm×n). For 1 ≤ p ≤ ∞, equip Cm (or Rm) and Cn (or Rn) with the
corresponding p-norm, where
‖x‖p :=
(d∑
j=1
|xj|p)1/p
, x ∈ Cd (or x ∈ R
d), 1 ≤ p < ∞,
and
‖x‖∞ := sup1≤j≤d
|xj |, x ∈ Cd (or x ∈ R
d).
Then the induced matrix p-norms for A ∈ Cm×n (or A ∈ Rm×n) are given by
‖A‖p = supx∈Cn,x 6=0
‖Ax‖p
‖x‖p
= supx∈Cn,‖x‖p=1
‖Ax‖p .
The next Theorem gives more explicit formulas for some induced matrix p-norms.
Theorem 2.37 (important induced matrix p-norms)
Let A = (ai,j) be an m × n matrix in Cm×n (or Rm×n), and let σ1, σ2, . . . , σn be the non-
negative eigenvalues of the Hermitian matrix A∗ A (or of the symmetric matrix AT A,
respectively). Then the induced matrix p-norms, for p ∈ {1, 2,∞}, are given by
‖A‖1 = max1≤j≤n
(m∑
i=1
|ai,j|)
= max1≤j≤n
‖aj‖1 , (2.31)
‖A‖2 =√
max1≤j≤n
|σj| =√
max1≤j≤n
σj, (2.32)
‖A‖∞ = max1≤i≤m
(n∑
j=1
|ai,j|)
, (2.33)
where the vector aj denotes the jth column vector of A.
In Theorem 2.37, note that since A∗A (or AT A) is Hermitian (or symmetric, respectively), due
to Theorem 2.10 (and Theorem 2.11, respectively), the matrix A∗A (and AT A) has only real
2. Matrix Theory 45
eigenvalues. Let σ be an eigenvalue of A∗A (or AT A), and let x 6= 0 be a corresponding
eigenvector. Then for A ∈ Cm×n
A∗ Ax = σ x ⇒ x∗ A∗ Ax = σ x∗ x ⇒ ‖Ax‖22 = σ ‖x‖2
2 ⇒ σ =‖Ax‖2
2
‖x‖22
≥ 0,
and for A ∈ Rm×n
AT Ax = σ x ⇒ xT AT Ax = σ xTx ⇒ ‖Ax‖22 = σ ‖x‖2
2 ⇒ σ =‖Ax‖2
2
‖x‖22
≥ 0.
Thus we see that the eigenvalues of A∗A (and AT A, respectively) are indeed non-negative.
Proof of Theorem 2.37. To derive the induced matrix 1-norm of a matrix A ∈ C(m×n),
consider x ∈ Cn with ‖x‖1 =∑n
j=1 |xj | = 1. Then for such x, with aj = (a1,j, a2,j , . . . , am,j)T ,
‖Ax‖1 =
∥∥∥∥∥
n∑
j=1
xjaj
∥∥∥∥∥1
≤n∑
j=1
‖xj aj‖1 ≤n∑
j=1
|xj| ‖aj‖1
≤(
max1≤j≤n
‖aj‖1
) n∑
k=1
|xk| =
(max1≤j≤n
‖aj‖1
)‖x‖1 = max
1≤j≤n‖aj‖1 , (2.34)
where we have used ‖x‖1 = 1 in the last step. Furthermore, we may choose x = ek, where
j = k maximizes ‖aj‖1, that is, we have ‖ak‖1 = max1≤j≤n ‖aj‖1. Then
‖A ek‖1 = ‖ak‖1 = max1≤j≤n
‖aj‖1 . (2.35)
From (2.35) we see that the upper bound in (2.34) is attained for x = ek and hence from (2.34)
and (2.35)
max1≤j≤n
‖aj‖1 = ‖ak‖1 = ‖A ek‖1 ≤ supx∈Cn,‖x‖1=1
‖Ax‖1 ≤ max1≤j≤n
‖aj‖1 ,
which (using the sandwich theorem) verifies (2.31).
In the case of the induced matrix 2-norm of a matrix A ∈ C(m×n), we have
‖A‖2 = supx∈Cn,‖x‖2=1
‖Ax‖2 = supx∈Cn,‖x‖2=1
√(Ax)∗(Ax) = sup
x∈Cn,‖x‖2=1
√x∗ A∗ Ax. (2.36)
Since, A∗A is Hermitian there are n orthonormal eigenvectors, z1, z2, . . . , zn corresponding to
the real non-negative eigenvalues σ1, σ2, . . . , σn. Let x = α1 z1 + α2 z2 + · · · + αn zn so that
x∗ x = |α1|2 + |α2|2 + · · · + |αn|2. Then
x∗ A∗ Ax =
(n∑
j=1
αj zj
)∗
A∗ A
(n∑
k=1
αk zk
)
=
(n∑
j=1
αj zj
)∗( n∑
k=1
αk A∗ A zk
)
46 2.5. Matrix Norms
=
(n∑
j=1
αj zj
)∗( n∑
k=1
αk σk zk
)
=
n∑
j=1
n∑
k=1
αj αk σk z∗j zk
= σ1 |α1|2 + σ2 |α2|2 + . . . + σn |αn|2
≤(
max1≤j≤n
σj
)(|α1|2 + |α2|2 + . . . + |αn|2
)2
=
(max1≤j≤n
σj
)x∗ x =
(max1≤j≤n
σj
)‖x‖2
2 , (2.37)
where we have used A∗A zk = σk zk, k = 1, 2, . . . , n, and the orthonormality z∗j zk = δj,k of the
eigenvectors z1, z2, . . . , zn. Hence, from (2.36) and (2.37)
‖A‖2 = supx∈Cn,‖x‖2=1
√x∗ A∗ Ax ≤ sup
x∈Cn,‖x‖2=1
√max1≤j≤n
σj ‖x‖2 =√
max1≤j≤n
σj . (2.38)
Finally, let k be such that σk = max1≤j≤n
σj . Then choosing x = zk gives equality, since
‖A zk‖22 = z∗k A∗ A zk = z∗k (σk zk) = σk ‖zk‖2
2 = σk = max1≤j≤n
σj . (2.39)
Thus, from (2.39), (2.38), and ‖zk‖2 = 1,√
max1≤j≤n
σj = ‖A zk‖2 ≤ supx∈Cn,‖x‖2=1
‖Ax‖2 = ‖A‖2 ≤√
max1≤j≤n
σj , (2.40)
and we see from the sandwich theorem that ‖A‖2 =√
max1≤j≤n
σj .
Finally, for the induced matrix ∞-norm, we get, using |xj | ≤ ‖x‖∞ for all j = 1, 2, . . . , n,
‖A‖∞ = supx∈Cn,
‖x‖∞
=1
‖Ax‖∞ = supx∈Cn,
‖x‖∞
=1
max1≤i≤n
∣∣∣∣∣
n∑
j=1
ai,j xj
∣∣∣∣∣ ≤ supx∈Cn,
‖x‖∞
=1
max1≤i≤n
n∑
j=1
|ai,j| |xj|
≤ supx∈Cn,
‖x‖∞
=1
(max1≤i≤n
n∑
j=1
|ai,j|)‖x‖∞ = max
1≤i≤n
n∑
j=1
|ai,j| =
n∑
j=1
|ak,j|, (2.41)
for some k. To show that this upper bound is attained we may choose the vector x =
(x1, x2, . . . , xn)T with components
xj =ak,j
|ak,j|, j = 1, 2, . . . , n,
which satisfies ‖x‖∞ = 1 since |xj | = |ak,j|/|ak,j| = 1 for all j = 1, 2, . . . , n. Then
‖A‖∞ ≥ ‖Ax‖∞ = max1≤i≤n
∣∣∣∣∣
n∑
j=1
ai,jak,j
|ak,j|
∣∣∣∣∣ ≥∣∣∣∣∣
n∑
j=1
ak,jak,j
|ak,j|
∣∣∣∣∣ ≥n∑
j=1
|ak,j|2|ak,j|
=n∑
j=1
|ak,j|. (2.42)
2. Matrix Theory 47
Since the lower bound in (2.42) and the upper bound in (2.41) coincide, we see from the sand-
wich theorem that (2.33) holds true. 2
Example 2.38 (matrix norms)
Consider the real 3 × 3 matrix A, defined by
A =
1 0 −4
2 0 2
0 3 0
.
Then
‖A‖1 = max1≤j≤n
(n∑
i=1
|ai,j|)
= max{3, 3, 6} = 6,
and
‖A‖∞ = max1≤i≤n
(n∑
j=1
|ai,j|)
= max{5, 4, 3} = 5.
We compute AT A and obtain
AT A =
1 2 0
0 0 3
−4 2 0
1 0 −4
2 0 2
0 3 0
=
5 0 0
0 9 0
0 0 20
.
Since the matrix AT A is diagonal, the eigenvalues are the elements on the diagonal. Hence
AT A has the eigenvalues σ1 = 20, σ2 = 9, and σ3 = 5 Hence, we have
‖A‖2 =√
max {20, 9, 5} =√
20. 2
Let ‖·‖(ℓ), ‖·‖(m) and ‖·‖(n) be norms on Cℓ, Cm and Cn, respectively. Let A be an ℓ×m matrix
where the second term is a geometric progression. Hence, using δ ≤ 1/2 and δ ≤ ǫ/(2r) by
definition of δ,
∥∥S−1 A S∥∥∞ ≤ ρ(A) + δ r
(1 − δn−1
1 − δ
)≤ ρ(A) + δ r
1
1 − δ≤ ρ(A) +
ǫ
2 rr 2 ≤ ρ(A) + ǫ. (2.48)
Since ‖S−1AS‖∞ is the matrix norm induced by the vector norm ‖x‖S−1,∞ := ‖S−1 x‖∞,
x ∈ Cn, (see Exercise 46 below), from (i) and (2.48),
ρ(A) ≤∥∥S−1 A S
∥∥∞ ≤ ρ(A) + ǫ,
and the proof is complete. 2
Exercise 46 Let S ∈ Cn×n be an invertible matrix. Show that ‖x‖ := ‖S−1x‖∞ defines a norm
for Cn. Show that this norm induces the matrix norm
‖A‖ := ‖S−1 A S‖∞, A ∈ Cn×n.
The next result (see Theorem 2.45 below) considers a particular type of series of matrices, the
so-called Neumann series, and conditions for its convergence. The Neumann series can be seen
2. Matrix Theory 53
as a generalization of the geometric series. Theorem 2.45 below also gives a condition on A
under which the matrix I − A is non-singular (or invertible).
Definition 2.44 (convergence of sequences of matrices w.r.t. matrix norm)
Let {Xr} be a sequence of m×n matrices in Cm×n or Rm×n. We say that the sequence {Xr}converges with respect to a given matrix norm ‖·‖, with limit X ∈ Cm×n or X ∈ Rm×n,
respectively, if for every ǫ > 0 there is an N = N(ǫ) such that
‖X − Xr‖ < ǫ for all r ≥ N.
Equivalently, {Xr} in Cm×n (or in Rm×n) converges with respect to ‖ · ‖ to X ∈ Cm×n (or
X ∈ Rm×n) if
limr→∞
‖Xr − X‖ = 0.
In ‘Further Analysis’, we have learned that convergence can be defined for any normed linear
space; the definition above is just one special case, where the linear space is the set of real or
complex m×n matrices and where the norm is the given matrix norm. From the general notion
of convergence in a normed linear space it is known that the limit of a convergent sequence is
unique. However, we can also easily verify this for the concrete case in Definition 2.44 above.
Assume the sequence {Xr} has two limits X and Y . Then, given ǫ > 0, there exist N = N(ǫ)
and M = M(ǫ) such that
‖X − Xr‖ <ǫ
2for all r ≥ N, and ‖Y − Xr‖ <
ǫ
2for all r ≥ M.
Choose R := max{N, M}; then from the triangle inequality for r ≥ R,
Finally, by the assumption that (6.21) holds for i = j, we have that
Aj p1 = A(Aj−1 p1
)= A
(j∑
k=1
µk pk
)= A
(j−1∑
k=1
µk pk
)+ µj Apj , (6.26)
with some coefficients µ1, µ2, . . . , µj. From the assumption that (6.21) holds for i = j, the first
term in the last expression in (6.26) can be written as
A
(j−1∑
k=1
µk pk
)= A
(j−2∑
ℓ=0
δℓ Aℓ p1
)=
j−2∑
ℓ=0
δℓ Aℓ+1 p1 =
j−1∑
ℓ=1
δℓ−1 Aℓ p1,
with some coefficients δ0, δ1, . . . , δj−2. Thus the first term in the last representation in (6.26) is
in Kj(p1, A) = span{r1, r2, . . . , rj}. Since αj 6= 0 for j ≤ m, from (6.19),
Apj =1
αj
(rj − rj+1
), (6.27)
and we see that the second term in the last representation of (6.26) is in span{r1, r2, . . . , rj+1}.Thus Aj p1 is in span{r1, r2, . . . , rj+1}, and from (6.21) for i = j, we conclude
Kj+1(p1, A) ⊂ span{r1, r2, . . . , rj+1}. (6.28)
Combining (6.23), (6.25), and (6.28), yields that (6.21) holds also for i = j + 1.
6. The Conjugate Gradient Method 143
It remains to show that dim(Ki(p1, A)
)= i. Since the CG method runs through m steps and
i ≤ m, we have ri 6= 0. Thus by the definition of pi, we have pi 6= 0, and hence the search
directions p1,p2, . . . ,pi are non-zero and, from Theorem 6.6, p1,p2, . . . ,pn are A-conjugate.
Thus, from Lemma 6.4, the search directions p1,p2, . . . ,pi are linearly independent, and hence
dim(Ki(p1, A)
)= dim
(span{p1,p2, . . . ,pi}
)= i
which concludes the proof. 2
Now we can show that in formula (6.18) on page 133 βj,i = 0 for 1 ≤ i ≤ j − 1.
Remark 6.10 (proof that βj,i = 0 for 1 ≤ i ≤ j − 1 in formula (6.18))
The definition of the new search direction (6.17) was
pj+1 = rj+1 +
j∑
k=1
βj,k pk,
and we determined, from the demand that the search directions p1,p2, . . . ,pj+1 are A-
conjugate,
βj,i = −rT
j+1 Api
pTi Api
, 1 ≤ i ≤ j.
We claimed that βj,i = 0 for 1 ≤ i ≤ j − 1, and now we are in the position to prove this
claim: From (6.19), using that αi 6= 0 for 1 ≤ i ≤ j − 1 (since the CG method has not yet
terminated),
Api =1
αi
(ri − ri+1
),
and thus for 1 ≤ i ≤ j − 1
rTj+1 Api =
1
αirT
j+1
(ri − ri+1
)=
1
αi
(rT
j+1 ri − rTj+1 ri+1
)= 0
from Theorem 6.6 (2).
Recall that the true solution x of Ax = b and the approximations xi of the CG method can
be written as (see (6.12))
x = x1 +n∑
j=1
αj pj and xi = x1 +i−1∑
j=1
αj pj , 1 ≤ i ≤ n + 1. (6.29)
Moreover, by Lemma 6.9, it is possible to represent an arbitrary element x from the affine linear
space x1 + Ki−1(p1, A) by
x = x1 +i−1∑
j=1
βj pj, (6.30)
144 6.3. Convergence of the Conjugate Gradient Method
and, in particular, xi ∈ x1 + Ki−1(p1, A). Since the directions p1,p2, . . . ,pn are A-conjugate,
we can conclude from (6.29) and (6.30) that
∥∥x − xi
∥∥2
A=
∥∥∥∥∥
n∑
j=i
αj pj
∥∥∥∥∥
2
A
=
(n∑
j=i
αj pj
)T
A
(n∑
k=i
αk pk
)
=n∑
j=i
n∑
k=i
αj αk pTj Apk =
n∑
j=i
|αj|2 pTj Apj
≤n∑
j=i
|αj |2 pTj Apj +
i−1∑
j=1
|αj − βj|2 pTj Apj
=
∥∥∥∥∥
n∑
j=i
αj pj +
i−1∑
j=1
(αj − βj)pj
∥∥∥∥∥
2
A
=
∥∥∥∥∥
n∑
j=1
αj pj −i−1∑
j=1
βj pj
∥∥∥∥∥
2
A
=
∥∥∥∥∥x − x1 −i−1∑
j=1
βj pj
∥∥∥∥∥
2
A
=∥∥x − x
∥∥2
A,
where x is the vector given by (6.30). Since this holds for an arbitrary x ∈ x1 +Ki−1(p1, A) we
have proved that
∥∥x − xi
∥∥2
A≤∥∥x − x‖2
A for all x ∈ x1 + Ki−1(p1, A).
We formulate this as a theorem.
Theorem 6.11 (CG method gives best approximations in affine Krylov spaces)
Let A ∈ Rn×n be symmetric and positive definite and b ∈ R
n. Let the CG method applied
to solving Ax = b stop after m ≤ n steps. The approximation xi, i ∈ {1, 2, . . . , m}, from
the CG method gives the best approximation to the solution x of Ax = b in the affine
Krylov space x1 + Ki−1(p1, A) with respect to the A-norm ‖ · ‖A. That is,
∥∥x − xi
∥∥A≤∥∥x − x
∥∥A
for all x ∈ x1 + Ki−1(p1, A). (6.31)
We note that (6.31) implies that
∥∥x − xi
∥∥A
= minx∈x1+Ki−1(p1,A)
∥∥x − x∥∥
A= inf
x∈x1+Ki−1(p1,A)
∥∥x − x∥∥
A.
The same idea shows that the iteration sequence is ‘monotone’, that is, ‖x − xi‖ is strictly
monotonically decreasing as i increases.
6. The Conjugate Gradient Method 145
Corollary 6.12 (error of CG iterations is decreasing)
Let A ∈ Rn×n be symmetric and positive definite and b ∈ R
n. Let the CG method for the
solution of Ax = b stop after m ≤ n steps. The sequence {xi} of approximations xi of the
CG method is monotone in the sense that
∥∥x − xi+1
∥∥A
< ‖x − xi‖A for all 1 ≤ i ≤ m. (6.32)
Before we give the formal proof of Corollary 6.12, we observe that (6.32) with ≤ instead of < is
entirely natural from Theorem 6.11: Since xi is the best approximation of x in x1+Ki−1(p1, A),
and since
x1 + Ki−1(p1, A) ⊂ x1 + Ki(p1, A),
the best approximation in the larger affine space x1 +Ki(p1, A) cannot have a larger error than
the best approximation in x1 + Ki−1(p1, A).
Proof of Corollary 6.12. Using the same notation as in the proof of Theorem 6.11, we have
from (6.29)
x − xi =
n∑
j=i
αj pj =
n∑
j=i+1
αj pj + αi pi =(x − xi+1
)+ αi pi.
Thus, using ‖y‖2A = yT Ay and the fact that pi is A-conjugate to x − xi+1 =
∑nj=i+1 αj pj ,
∥∥x − xi
∥∥2
A=
∥∥(x − xi+1
)+ αi pi
∥∥2
A
=[(
x − xi+1
)+ αi pi
]TA[(
x − xi+1
)+ αi pi
]
=(x − xi+1
)TA(x − xi+1
)+ 2 αi
(x − xi+1
)TApi + |αi|2 pT
i Api
=∥∥x − xi+1
∥∥2
A+ |αi|2 pT
i Api
≥∥∥x − xi+1
∥∥2
A, (6.33)
where we have used in the last step that pTi Api ≥ 0 since A is positive definite.
Since the CG method stops after m steps and since i ≤ m, ri 6= 0, and, from Theorem 6.6
(3), pi 6= 0 and αi 6= 0 and thus |αi|2 pTi Api > 0. Thus from (6.33) ‖x−xi‖A > ‖x−xi+1‖A. 2
Next, let us rewrite this approximation problem in the original basis of the Krylov space. In
doing so, we denote the set of all polynomials of degree less than or equal to n by πn.
For a polynomial P (t) =∑i−1
j=0 γj tj in πi−1 and a matrix A ∈ Rn×n we write
P (A) :=i−1∑
j=0
γj Aj and P (A)x =i−1∑
j=0
γj Aj x.
If x is an eigenvector of A with eigenvalue λ, that is, Ax = λx, then, from Aj x = λj x, clearly
P (A)x =i−1∑
j=0
γj λj x =
(i−1∑
j=0
γj λj
)x = P (λ)x. (6.34)
146 6.3. Convergence of the Conjugate Gradient Method
Theorem 6.13 (estimate of best approximation in affine Krylov space)
Let A ∈ Rn×n be symmetric and positive definite, having the eigenvalues λ1 ≥ λ2 ≥ . . . ≥
λn > 0. Let b ∈ Rn. Let x denote the solution of Ax = b, and let xi, i = 1, 2, . . . , m + 1,
denote the approximations from the CG method, where we assume that the CG method stops
after m ≤ n steps. Then
∥∥x − xi
∥∥A≤
inf
P∈πi−1,P (0)=1
[max1≤j≤n
|P (λj)|] ∥∥x − x1
∥∥A.
Proof of Theorem 6.13. Let us express an arbitrary x ∈ x1 + Ki−1(p1, A), where
i ∈ {1, 2, . . . , m + 1}, as
x = x1 +
i−1∑
j=1
γj Aj−1 p1 =: x1 + Q(A)p1, (6.35)
introducing the polynomial
Q(t) =
i−1∑
j=1
γj tj−1
in πi−2, where we set Q = 0 for i = 1. Using p1 = r1 = b − Ax1 = A x − Ax1 = A (x − x1),
yields for x, given by (6.35),
x − x = x −(x1 + Q(A)p1
)=(x − x1) − Q(A) A
(x − x1
)=(I − Q(A) A
) (x − x1
).
Therefore, using (I − Q(A) A)T = I − A Q(A) = I − Q(A) A (since AT = A and since A
commutes with Q(A)),
∥∥x − x∥∥2
A=
∥∥(I − Q(A) A) (
x − x1
)∥∥2
A
=(x − x1
)T (I − Q(A) A
)A(I − Q(A) A
)(x − x1
)
=:(x − x1
)TP (A) A P (A)
(x − x1
), (6.36)
with the polynomial P in πi−1 defined by
P (t) = 1 − t Q(t) = 1 −i−1∑
j=1
γj tj = t0 −i−1∑
j=1
γj tj .
Clearly P (0) = 1, and thus we have shown that for every x ∈ x1 + Ki−1(p1, A), where
i ∈ {1, 2, . . . , m}, that
∥∥x − x∥∥2
A=(x − x1
)TP (A) A P (A)
(x − x1
),
with some polynomial P ∈ πi−1 satisfying P (0) = 1.
6. The Conjugate Gradient Method 147
On the other hand, if P ∈ πi−1 with P (0) = 1 is given then we can define the polynomial
Q(t) = (1 − P (t))/t in πi−2, which leads to an element x from x1 + Ki−1(p1, A), defined by
x = x1 + Q(A)p1.
For this x the calculation in (6.36) above holds, with
P (t) = 1 − t Q(t) = 1 − t
(1 − P (t)
)
t= P (t).
Thus we see that for every P ∈ πi−1 with P (0) = 1, there exists some x in x1 + Ki−1(p1, A)
such that ∥∥x − x∥∥2
A=(x − x1
)TP (A) A P (A)
(x − x1
).
Thus we have shown that
infx∈x1+Ki−1(p1,A)
∥∥x − x∥∥
A= inf
P∈πi−1,P (0)=1
√(x − x1
)TP (A) A P (A)
(x − x1
). (6.37)
Theorem 6.11 yields therefore, for i = 1, 2, . . . , m + 1,
∥∥x − xi
∥∥A
= minx∈x1+Ki−1(p1,A)
∥∥x − x∥∥
A= inf
x∈x1+Ki−1(p1,A)
∥∥x − x∥∥
A
= infP∈πi−1,P (0)=1
√(x − x1)T P (A) A P (A) (x− x1). (6.38)
Next, let w1,w2, . . . ,wn be an orthonormal basis of Rn consisting of eigenvectors of the positive
definite symmetric matrix A associated to the eigenvalues λ1, λ2, . . . , λn. Then, we can represent
every vector using this basis. In particular, with such a representation
x − x1 =
n∑
j=1
ρj wj,
with some coefficients ρ1, ρ2, . . . , ρn ∈ R. Thus we can conclude that
P (A) A P (A)(x − x1
)=
n∑
j=1
ρj P (A) A P (A)wj =
n∑
j=1
ρj
[P (λj)
]2λj wj,
where we have used (6.34). This leads to (where we use wTj wk = 0 if j 6= k)
(x − x1
)TP (A) A P (A)
(x − x1
)=
n∑
j=1
n∑
k=1
ρk ρj
[P (λj)
]2λj wT
k wj
=n∑
j=1
|ρj |2 |P (λj)|2 λj
148 6.3. Convergence of the Conjugate Gradient Method
≤(
max1≤ℓ≤n
|P (λℓ)|2) n∑
j=1
|ρj |2 λj
=
(max1≤ℓ≤n
|P (λℓ)|2)( n∑
j=1
ρj wj
)T
A
(n∑
k=1
ρk wk
)
=
(max1≤ℓ≤n
|P (λℓ)|2) ∥∥x − x1
∥∥2
A.
Substituting this into (6.38) and using
max1≤ℓ≤n
|P (λℓ)|2 =
(max1≤ℓ≤n
|P (λℓ)|)2
gives the desired inequality. 2
Since clearly
max1≤ℓ≤n
|P (λℓ)| ≤ maxλ∈[λn,λ1]
|P (λ)|
we have, from Theorem 6.13, the following upper bound for the error
∥∥x − xi
∥∥A≤
inf
P∈πi−1,P (0)=1
‖P‖L∞([λn,λ1])
∥∥x − x1
∥∥A, (6.39)
with the supremum norm
‖P‖L∞([a,b]) = supx∈[a,b]
|P (x)|.
Note that the smallest eigenvalue λn and the largest eigenvalue λ1 can be replaced by estimates
λn ≤ λn and λ1 ≥ λ1.
As stated in the theorem below (which we will not prove) the minimum of the term in the
round parentheses in (6.39) can be computed explicitly. Its solution involves the Chebychev
polynomials defined by
Tn(t) := cos(n arccos(t)
), t ∈ [−1, 1].
We observe that clearly
|Tn(t)| ≤ 1 for all t ∈ [−1, 1], (6.40)
since | cos(ϕ)| ≤ 1 for all ϕ ∈ R.
6. The Conjugate Gradient Method 149
Theorem 6.14 (minimization problem)
Let λ1 > λn > 0. In the problem
inf{‖P‖L∞([λn,λ1]) : P ∈ πi−1 with P (0) = 1
}
the infimum is attained by
P ∗(t) =Ti−1
(2t−λ1−λn
λ1−λn
)
Ti−1
(λ1+λn
λ1−λn
) , t ∈ [λn, λ1].
The Chebychev polynomials satisfy the following inequality
1
2
(1 +
√t
1 −√
t
)n
≤∣∣∣∣Tn
(1 + t
1 − t
)∣∣∣∣ ≤ 1, t ∈ [0, 1]. (6.41)
To apply this estimate, and derive the final estimate on the convergence of the CG method, we
set γ = λn/λ1 ∈ (0, 1) and use
λ1 + λn
λ1 − λn=
1 + λn
λ1
1 − λn
λ1
=1 + γ
1 − γ. (6.42)
From (6.39), Theorem 6.14, and (6.41), we have
∥∥x − xi
∥∥A
≤
inf
P∈πi−1,P (0)=1
‖P‖L∞([λn,λ1])
∥∥x − x1
∥∥A
=
supt∈[λnλ1]
∣∣∣Ti−1
(2t−λ1−λn
λ1−λn
)∣∣∣∣∣∣Ti−1
(λ1+λn
λ1−λn
)∣∣∣
∥∥x − x1
∥∥A
≤ 1∣∣∣Ti−1
(1+γ1−γ
)∣∣∣
∥∥x − x1
∥∥A
≤ 2
(1 −√
γ
1 +√
γ
)i−1 ∥∥x − x1
∥∥A, (6.43)
where we have used (6.40) and (6.42) in the third step and the lower bound from (6.41) in the
fourth step. Finally (6.43) and the fact that (see Remark 3.4)
γ =λn
λ1
⇒ 1
γ= λ1
1
λn
= ‖A‖2 ‖A−1‖2 = κ2(A)
show the following result.
150 6.3. Convergence of the Conjugate Gradient Method
Theorem 6.15 (CG method error estimate)
Let A ∈ Rn×n be symmetric and positive definite, and let λ1 ≥ λ2 ≥ . . . ≥ λn > 0 be the
eigenvalues of A. Let b ∈ Rn, and let x be the solution of Ax = b. The sequence of
iterations {xi} generated by the CG method satisfies the error estimate
∥∥x − xi
∥∥A≤ 2
∥∥x − x1
∥∥A
(1 −√
γ
1 +√
γ
)i−1
= 2∥∥x − x1
∥∥A
(√κ2(A) − 1√κ2(A) + 1
)i−1
,
where γ = 1/κ2(A) = λn/λ1.
The method also converges if the matrix is almost singular (that is, if λd+1 ≈ . . . ≈ λn ≈ 0),
because (√
κ2(a) − 1)/(√
κ2(a) + 1) < 1. If λd+1 ≈ . . . ≈ λn ≈ 0, then κ2(A) is very large,
and one should try to use appropriate matrix manipulations to reduce the condition number
of A, such as a transformation of the smallest eigenvalues to zero and of the bigger eigenvalues
to an interval of the form (λ, λ(1 + ε)) ⊂ (0,∞) with ε as small as possible. In general it does
not matter if, during this process, the rank of the matrix is reduced. Generally, this leads only
to a small additional error but also to a significant increase in the convergence rate of the CG
method.
Exercise 88 Let A ∈ Rn×n be a real positive definite symmetric n×n matrix. Let λmin denote
the smallest eigenvalue of A, and let λmax denote the largest eigenvalue of A. Prove that
λmin = infx∈Rn\{0}
xT Ax
xT xand λmax = sup
x∈Rn\{0}
xT Ax
xT x.
Chapter 7
Calculation of Eigenvalues
In this last chapter of the lecture notes, we are concerned with the numerical computation of
the eigenvalues and eigenvectors of a square matrix. Let A ∈ Cn×n be a square matrix. A
non-zero vector x ∈ Cn is an eigenvector of A and λ ∈ C is its corresponding eigenvalue if
Ax = λx.
The eigenvalues are the roots of the characteristic polynomial
p(A, λ) = det(λ I − A
).
However, for larger matrices it is not practicable to actually compute the characteristic polyno-
mial, let alone its roots. Hence, we will look at more feasible methods to numerically determine
eigenvalues and eigenvectors.
7.1 Basic Localisation Techniques
A very rough way of getting an estimate of the location of the eigenvalues is given by the
following theorem.
Theorem 7.1 (Gershgorin disks)
The eigenvalues of a matrix A in Cn×n (or in Rn×n) are contained in the union⋃n
j=1 Kj of
the disks
Kj :=
λ ∈ C : |λ − aj,j| ≤n∑
k=1,k 6=j
|aj,k|
, 1 ≤ j ≤ n.
151
152 7.1. Basic Localisation Techniques
Proof of Theorem 7.1. For an eigenvalue λ ∈ C of A we can choose an eigenvector
x ∈ Cn \ {0} with ‖x‖∞ = max1≤i≤n |xi| = 1. From Ax − λx = 0 we can conclude that
(n∑
k=1
ai,k xk
)− λ xi =
(ai,i − λ
)xi +
n∑
k=1,k 6=i
ai,k xk = 0, 1 ≤ i ≤ n,
and thus(ai,i − λ
)xi = −
n∑
k=1,k 6=i
ai,k xk = 0, 1 ≤ i ≤ n.
If we pick an index i = j with |xj | = ‖x‖∞ = 1 then the statement follows via
|aj,j − λ| =∣∣(aj,j − λ
)xj
∣∣ =
∣∣∣∣∣
n∑
k=1,k 6=j
aj,k xk
∣∣∣∣∣ ≤n∑
k=1,k 6=j
|aj,k| |xk| ≤(
max1≤i≤n
|xi|)
︸ ︷︷ ︸= ‖x‖∞ = 1
n∑
k=1,k 6=j
|aj,k| ≤n∑
k=1,k 6=j
|aj,k|,
where we have used ‖x‖∞ = 1 by assumption. Therefore, we know that the eigenvalue λ is
contained in the disk Kj where j is an index for which |xj | = ‖x‖∞ = 1. Since the eigen-
value λ was arbitrary, we know that all eigenvalues are contained in the union of the disks Kj ,
j = 1, 2, . . . , n. 2
Exercise 89 Consider the matrix
A =
32
0 12
0 3 0
12
0 32
.
(a) Compute the exact eigenvalues of A by hand.
(b) Use the Theorem 7.1 on the Gershgorin disks to estimate the location of the eigenvalues.
The Rayleigh quotient introduced below allows us to approximate an eigenvalue, if we are able
to approximate a corresponding eigenvector.
Definition 7.2 (Rayleigh quotient)
The Rayleigh quotient of a vector x ∈ Rn \ {0} with respect to a real matrix A ∈ Rn×n,
is the scalar
R(x) :=xT Ax
xT x.
The Rayleigh quotient of a vector x ∈ Cn\{0} with respect to a complex matrix A ∈ Cn×n,
is the scalar
R(x) :=x∗ Ax
x∗ x.
7. Calculation of Eigenvalues 153
Obviously, if x is an eigenvector then R(x) is the corresponding eigenvalue: Indeed, if Ax = λx,
then
R(x) =xT Ax
xT x=
λxT x
xT x= λ and R(x) =
x∗ Ax
x∗ x=
λx∗ x
x∗ x= λ, (7.1)
respectively.
We will deal now with symmetric matrices A = AT in Rn×n. For such matrices it is known
(see Theorem 2.11) that they have n real eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn, and that there
are n corresponding orthonormal eigenvectors w1,w2, . . . ,wn ∈ Rn, that is, the eigenvectors
w1,w2, . . . ,wn satisfy
wTj wk = δj,k, j, k = 1, 2 . . . , n.
In particular, w1,w2, . . . ,wn form an orthonormal basis of Rn. Then, every x ∈ Rn can be
represented as
x =n∑
k=1
ck wk, (7.2)
and the coefficients can be determined via
wTj x =
n∑
k=1
ck wTj wk =
n∑
k=1
ck δj,k = cj ⇒ cj = wTj x.
Moreover, the Euclidean norm of x, given by (7.2) can be computed via
‖x‖22 =
(n∑
j=1
cj wj
)T ( n∑
k=1
ck wk
)=
n∑
j=1
n∑
k=1
cj ck wTj wk =
n∑
j=1
n∑
k=1
cj ck δj,k =
n∑
k=1
c2k.
Theorem 7.3 (convergence of the Rayleigh quotient)
Suppose A ∈ Rn×n is symmetric. Assume that we have a sequence {x(j)} of vectors in Rn
which converges to an eigenvector wJ of A with eigenvalue λJ , and assume that {x(j)} is
normalized, that is, ‖x(j)‖2 = 1 for all j. Then we have
limj→∞
R(x(j)) = R(wJ) = λJ , (7.3)
and ∣∣R(x(j)) − R(wJ)∣∣ = O
(‖x(j) −wJ‖2
2
). (7.4)
Proof of Theorem 7.3. Since the Rayleigh quotient R(x) depends continuously on x, the
first equality in (7.3) is clear and the second equality follows from (7.1).
It remains to show (7.4). For this purpose, we use that, since A is symmetric, there are n
eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn and n corresponding orthonormal eigenvectors w1,w2, . . . ,wn.
With this orthonormal basis of eigenvectors, x(j) has a representation
x(j) =
n∑
k=1
ck wk, (7.5)
154 7.1. Basic Localisation Techniques
where we notation-wise suppress the fact that the coefficients ck = c(j)k depend also on j. Then,
since Awj = λj wj , we find
Ax(j) = A
(n∑
k=1
ck wk
)=
n∑
k=1
ck Awk =n∑
k=1
ck λk wk. (7.6)
Since wTj wk = 0 if j 6= k, and wj wT
j = ‖wj‖22 = 1 for all j = 1, 2, . . . , n, and ‖x(j)‖2
2 =
(x(j))T x(j) = 1 for all j, from (7.6) and (7.5),
R(x(j))
=(x(j))T Ax(j)
(x(j))T x(j)
= (x(j))T Ax(j)
=
(n∑
i=1
ci wi
)T ( n∑
k=1
ck λk wk
)
=n∑
k=1
n∑
i=1
ci ck λk wTi wk︸ ︷︷ ︸
= δi,k
=n∑
k=1
λk c2k.
Using R(wJ) = λJ from (7.1), this gives
R(x(j)) − R(wJ) =n∑
k=1
λk c2k − λJ = λJ (c2
J − 1) +n∑
k=1,k 6=J
λk c2k.
Thus
∣∣R(x(j)) − R(wJ)∣∣ ≤ |λJ | |c2
J − 1| +n∑
k=1,k 6=J
|λk| c2k
≤(
max1≤i≤n
|λi|)(
|c2J − 1| +
n∑
k=1,k 6=J
c2k
)
=
(max1≤i≤n
|λi|)(
|(2 cJ − 2) + (c2J − 2 cJ + 1)| +
n∑
k=1,k 6=J
c2k
)
=
(max1≤i≤n
|λi|)(
|2 (cJ − 1) + (cJ − 1)2| +n∑
k=1,k 6=J
c2k
)
≤(
max1≤i≤n
|λi|)(
2 |cJ − 1| + (cJ − 1)2 +
n∑
k=1,k 6=J
c2k
). (7.7)
7. Calculation of Eigenvalues 155
From ‖x(j)‖2 = ‖wJ‖2 = 1 and (7.5) and wTi wk = 0 if i 6= k,
∥∥x(j) − wJ
∥∥2
2=(x(j) −wJ
)T (x(j) − wJ
)= ‖x(j)‖2
2︸ ︷︷ ︸= 1
+ ‖wJ‖22︸ ︷︷ ︸
= 1
−2wTJ x(j)
= 2 − 2wTJ
(n∑
k=1
ck wk
)= 2 − 2
n∑
k=1
ck wTJ wk︸ ︷︷ ︸
= δk,J
= 2 − 2 cJ = 2 (1 − cJ). (7.8)
On the other hand, from (7.5),
∥∥x(j) −wJ
∥∥2
2=
∥∥∥∥∥
n∑
k=1
ck wk −wJ
∥∥∥∥∥
2
2
=
∥∥∥∥∥(cJ − 1)wJ +
n∑
k=1,k 6=J
ck wk
∥∥∥∥∥
2
2
=
((cJ − 1)wJ +
n∑
i=1,i6=J
ci wi
)T((cJ − 1)wJ +
n∑
k=1,k 6=J
ck wk
)
= (cJ − 1)2 wTJ wJ︸ ︷︷ ︸= 1
+2 (cJ − 1)n∑
k=1,k 6=J
ck wTJ wk︸ ︷︷ ︸= 0
+n∑
i=1,i6=J
n∑
k=1,k 6=J
ci ck wTi wk︸ ︷︷ ︸
= δi,k
= (cJ − 1)2 +
n∑
k=1,k 6=J
c2k. (7.9)
Substituting (7.8) and (7.9) into the last line of (7.7) yields
∣∣R(x(j))−R(wJ )∣∣ ≤
(max1≤i≤n
|λi|)(∥∥x(j) −wJ
∥∥2
2+∥∥x(j) −wJ
∥∥2
2
)= 2
(max1≤i≤n
|λi|)‖x(j)−wJ‖2
2,
which proves (7.4). 2
In analogy to Theorem 7.3, we can prove the following corresponding theorem for Hermitian
matrices in Cn×n.
Theorem 7.4 (convergence of Rayleigh quotient)
Suppose A ∈ Cn×n is Hermitian. Assume that we have a sequence {x(j)} of vectors in Cn
which converges to an eigenvector wJ of A with eigenvalue λJ , and assume that {x(j)} is
normalized, that is, ‖x(j)‖2 = 1 for all j. Then we have
limj→∞
R(x(j)) = R(wJ) = λJ ,
and ∣∣R(x(j)) − R(wJ)∣∣ = O
(‖x(j) −wJ‖2
2
).
Theorems 7.3 and 7.4 tell us that the Rayleigh quotient converges with a quadratic order to λJ
as {x(j)} converges to xJ .
156 7.2. The Power Method
7.2 The Power Method
In this section we no longer assume that A ∈ Rn×n is symmetric. Suppose A has a dominant
eigenvalue λ1, that is, we have
|λ1| > |λ2| ≥ |λ3| ≥ · · · ≥ |λn|. (7.10)
For such matrices, it is now our goal to determine the dominant eigenvalue λ1 and a corre-
sponding eigenvector. Also assume that there exist n real eigenvalues λ1, λ2, . . . , λn ∈ R and
n linearly dependent eigenvectors w1,w2, . . . ,wn ∈ Rn corresponding to the real eigenvalues
λ1, λ2, . . . , λn, and satisfying ‖w1‖2 = ‖w2‖2 = . . . = ‖wn‖2 = 1. (Note that this is not always
the case, since we have neither assumed that A is symmetric nor that the n eigenvalues are dis-
tinct. Since A is not assumed to be symmetric, the eigenvectors w1,w2, . . . ,wn can in general
not be chosen orthogonal to each other.) Then the eigenvectors w1,w2, . . . ,wn form a basis of
Rn, and we can represent every x as
x =
n∑
j=1
cj wj .
with some coefficients c1, c2, . . . , cn ∈ R. Using Awj = λj wj, j = 1, 2, . . . , n, shows that
Am x =
n∑
j=1
cj Am wj =
n∑
j=1
cj λmj wj = λm
1
(c1w1 +
n∑
j=2
cj
(λj
λ1
)m
wj
)=: λm
1
(c1w1 + Rm
),
(7.11)
with the remainder term
Rm :=n∑
j=2
cj
(λj
λ1
)m
wj. (7.12)
The vector sequence {Rm} tends to the zero vector 0 for m → ∞, since the triangle inequality,
‖wj‖2 = 1 for all j = 1, 2 . . . , n, and |λj/λ1| < 1 for j = 2, . . . , n (from (7.10)) imply
‖Rm‖2 =
∥∥∥∥∥
n∑
j=2
cj
(λj
λ1
)m
wj
∥∥∥∥∥2
≤n∑
j=2
|cj|∣∣∣∣λj
λ1
∣∣∣∣m
‖wj‖2︸ ︷︷ ︸= 1
=
n∑
j=2
|cj|∣∣∣∣λj
λ1
∣∣∣∣m
→ 0 as m → ∞.
If c1 6= 0, this means that from (7.11)
1
λm1
Am x = c1 w1 + Rm → c1 w1 as m → ∞,
and c1 w1 is an eigenvector of A for the eigenvalue λ1. Of course, this is so far only of limited
value, since we do not know the eigenvalue λ1 and hence cannot form the quotient (Am x)/λm1 .
Another problem comes from the fact that the norm of Amx converges to zero if |λ1| < 1 and
to infinity if |λ1| > 1. Both problems can be resolved if we normalize appropriately.
7. Calculation of Eigenvalues 157
For example, if we take the Euclidean norm of Am x, then from the last step in (7.11) and
‖wj‖2 = 1 for j = 1, 2, . . . , n,
‖Amx‖22 =
∥∥λm1
(c1 w1 + Rm
)∥∥2
2
= λ2m1
(c1 w1 + Rm
)T (c1 w1 + Rm
),
= λ2m1
(|c1|2 wT
1 w1︸ ︷︷ ︸= 1
+2 c1 wT1 Rm + RT
m Rm︸ ︷︷ ︸= ‖Rm‖2
2
)
= λ2m1
(|c1|2 + rm
), (7.13)
where rm ∈ R is defined by
rm := 2 c1 wT1 Rm + ‖Rm‖2
2. (7.14)
Clearly limm→∞ Rm = 0 implies limm→∞ rm = 0. Thus, from (7.13), we can write
‖Am x‖2 = |λ1|m√|c1|2 + rm, with lim
m→∞rm = 0. (7.15)
From (7.15), we find
‖Am+1 x‖2
‖Am x‖2= |λ1|
‖Am+1x‖2
|λ1|m+1
|λ1|m‖Am x‖2
= |λ1|√
|c1|2 + rm+1√|c1|2 + rm
→ |λ1| for m → ∞, (7.16)
which gives the real eigenvalue λ1 up to its sign. To determine also its sign and a corresponding
eigenvector, we refine the method as follows.
Definition 7.5 (power method or von Mises iteration)
Let A ∈ Rn×n, and assume that A has n real eigenvalues λ1, λ1, . . . , λn ∈ R (counted with
multiplicity) and a corresponding set of real normalized eigenvectors w1,w2, . . . ,wn ∈ Rn
that form a basis of Rn. (That is, w1,w2, . . . ,wn are linearly independent, and ‖wj‖2 = 1
and Awj = λj wj, j = 1, 2, . . . , n.) Suppose that A has a dominant eigenvalue λ1, that is,
|λ1| > |λj| for j = 2, 3, . . . , n. The power method or von Mises iteration is defined in
the following way: First, we choose a starting vector
x(0) =n∑
j=1
cj wj , with c1 6= 0,
and set y(0) = x(0)/‖x(0)‖2. Then, for m = 1, 2, . . . we compute:
1. x(m) = Ay(m−1),
2. y(m) = σmx(m)
‖x(m)‖2, where the sign σm ∈ {−1, 1} is chosen such that y(m)T y(m−1) ≥ 0.
The choice of the sign σm means that the angle between y(m−1) and y(m) is in [0, π/2], which
means that we are avoiding a ‘jump’ when moving from y(m−1) to y(m).
158 7.2. The Power Method
The condition c1 6= 0 is usually satisfied simply because of numerical rounding errors and is no
real restriction. We note that c1 6= 0 is equivalent to wT1 x(0) 6= 0.
The power method can be implemented with the following MATLAB code:
function [lambda,v] = power_method(A,z,J)
%
% executes the power method or von Mises iteration
%
% input: A = real n by n matrix with real eigenvalues
% and n linearly independent real eigenvectors;
% it is assumed that A has a dominant eigenvalue
% z = n by 1 starting vector for the power method
% J = number of iterations
%
% output: lambda = approximation of dominant eigenvalue
% v = approximation of corresponding normalized eigenvector
%
y = z / norm(z);
for j = 1:J
x = A * y;
sigma = sign(x’ * y);
y = sigma * ( x / norm(x) );
end
lambda = sigma * norm(x);
v = y;
The theorem below gives information about the convergence of the power method.
Theorem 7.6 (convergence of power method)
Let A ∈ Rn×n, and assume that A has n real eigenvalues λ1, λ1, . . . , λn ∈ R (counted with
multiplicity) and a corresponding set of real normalized eigenvectors w1,w2, . . . ,wn ∈ Rn
that form a basis of Rn. (That is, w1,w2, . . . ,wn are linearly independent, and ‖wj‖2 = 1
and Awj = λj wj, j = 1, 2, . . . , n.) Suppose that A has a dominant eigenvalue λ1, that is,
|λ1| > |λj| for j = 2, 3, . . . , n. Then the iterations of the power method satisfy:
(i) ‖x(m)‖2 → |λ1| for m → ∞,
(ii) y(m) converges to a normalized eigenvector of A with the eigenvalue λ1,
(iii) σm → sign(λ1) for m → ∞, that is, σm = sign(λ1) for sufficiently large m.
Before we prove the theorem, we give a numerical example.
7. Calculation of Eigenvalues 159
Example 7.7 (power method)
Consider the real 3 × 3 matrix
A =
0 −2 2
−2 −3 2
−3 −6 5
.
From Example 2.4 we know that the eigenvalues of this matrix are λ1 = 2, λ2 = 1, and λ3 = −1
and that the eigenvectors are real. A normalized eigenvector to the dominant eigenvalue λ1 = 2
is given by w1 = (√
2)−1 (1, 0, 1)T , and normalized eigenvectors to the eigenvalues λ2 = 1 and
λ3 = −1 are given by w2 = (√
5)−1 (2,−1, 0)T and w3 = (√
2)−1 (0, 1, 1)T .
We compute the first two steps of the power method with starting vector x(0) = (1, 1, 1)T by
hand. It can be easily verified that x(0) = −√
2w1 +√
5w2 + 2√
2w3, that is the condition
c1 = −√
2 6= 0 is satisfied.
We start with
y(0) =x(0)
‖x(0)‖2=
1√3
1
1
1
.
In the first step of the power method we find
x(1) = Ay(0) =1√3
0 −2 2
−2 −3 2
−3 −6 5
1
1
1
=
1√3
0
−3
−4
.
Since
σ1 = sign((y(0))T x(1)
)= sign
1√3
1
1
1
T
1√3
0
−3
−4
= sign
(1
3(−7)
)= −1
and
‖x(1)‖2 =1√3
√(−3)2 + (−4)2 =
5√3,
we find
y(1) = σ1x(1)
‖x(1)‖2
= −√
3
5
1√3
0
−3
−4
=
1
5
0
3
4
.
In the second step of the power method we find
x(2) = Ay(1) =1
5
0 −2 2
−2 −3 2
−3 −6 5
0
3
4
=1
5
2
−1
2
.
160 7.2. The Power Method
Since
σ2 = sign((y(1))T x(2)
)= sign
1
5
0
3
4
T
1
5
2
−1
2
= sign
(5
25
)= 1
and
‖x(2)‖2 =1
5
√22 + (−1)2 + 22 =
√9
5=
3
5,
we find
y(2) = σ2x(2)
‖x(2)‖2=
5
3
1
5
2
−1
2
=
1
3
2
−1
2
.
Using the MATLAB code given above we obtain the following approximations to λ1 = 2 and the
Note that, since we do not require the product of the Householder matrices, we do not explicitly
produce the Householder matrices, thereby reducing storage significantly.
In the kth step of this method the main work lies in calculating the two matrix-vector multi-
plications (see MATLAB code above) and each of these matrix-vector multiplications requires
O(nk) floating point operations. This is done n − 2 times. Therefore, the overall number of
floating point iterations is O(n3).
182 7.6. QR Algorithm
7.6 QR Algorithm
This methods aims at computing all eigenvalues of a given matrix A ∈ Rn×n simultaneously.
It benefits from the Hessenberg form of a matrix.
Definition 7.21 (QR method for the computation of eigenvalues)
Let A ∈ Rn×n and define A0 := A. For m = 0, 1, 2, . . . decompose Am in the form Am =
Qm Rm with an orthogonal matrix Qm and an upper triangular matrix Rm (that is, we use
the QR factorization of Am). Then, form the swapped product
Am+1 := Rm Qm. (7.42)
Since Qm is an orthogonal matrix, from Am = Qm Rm we have QTm Am = Rm. Obviously, from
(7.42)
Am+1 = QTm Am Qm = Q−1
m Am Qm. (7.43)
The representation (7.43) shows that all matrices Am in the QR method have the same eigen-
values as the initial matrix A0 = A.
Theorem 7.22 (properties of the QR transformation)
Let Am be the matrices computed in the QR method. The QR transformation Am 7→ Am+1
respects the upper Hessenberg form of a matrix, that is, if Am is an upper Hessenberg matrix,
then Am+1 is also an upper Hessenberg matrix. In particular, a given symmetric tridiagonal
A matrix remains a symmetric tridiagonal matrix under the QR transformation. If A ∈Rn×n is in upper Hessenberg form then its QR factorization can be computed in O(n2)
operations.
Proof of Theorem 7.22. From (7.43) we can immediately conclude that if Am is symmetric
so is Am+1. Indeed, if Am = ATm, then
ATm+1 =
(QT
m Am Qm
)T= QT
m ATm (QT
m)T = QTm Am Qm = Am+1.
If Am is in upper Hessenberg form then we can compute the QR factorization of Am in n − 1
steps. In each step, a transformation is performed by multiplication from the left with a
Householder matrix, where in the jth step the Householder matrix maps the entry with index
(j + 1, j) to zero. If we denote the Householder matrix in the jth step by Hj+1,j then we have
that
Hn,n−1 · · · H3,2 H2,1 Am = Rm.
with an upper triangular matrix Rm. Since the Householder matrices are orthogonal matrices
Am =(HT
2,1 HT3,2 · · · HT
n,n−1
)Rm = Qm Rm with Qm = HT
2,1 HT3,2 · · · HT
n,n−1.
7. Calculation of Eigenvalues 183
Thus the matrix Am+1 in the QR method is given by
Am+1 = Rm Qm = Rm
(HT
2,1 H3,2 · · · HTn,n−1
)︸ ︷︷ ︸
=Qm
. (7.44)
From the properties of the Householder matrices Hj+1,j right-multiplication with the transposes
of the Householder matrices only modifies entries of Rm with indices (i, j), where i ≤ j + 1.
Thus Am+1 is also a an upper Hessenberg matrix.
If A is a symmetric tridiagonal matrix, then we know from the beginning of this proof that Am
is also symmetric. In particular, a symmetric tridiagonal matrix is in upper Hessenberg form,
and thus Am is in upper Hessenberg form. Since Am is also symmetric, we see that Am is a
symmetric tridiagonal matrix. 2
For the analysis of the convergence of the QR method, we need two auxiliary results. The first
one concerns the uniqueness of the QR factorization of a matrix A.
Lemma 7.23 (uniqueness of QR factorization)
Let A ∈ Rn×n be non-singular. If A = Q R with an orthogonal matrix Q and an upper
triangular matrix R, then Q and R are unique up to the signs of the diagonal entries.
Proof of Lemma 7.23. Assume that we have two QR decompositions A = Q1 R1 and
A = Q2 R2. This gives
Q1 R1 = Q2 R2 ⇔ R1 R−12 = QT
1 Q2 =: S,
which shows that S is an orthogonal matrix and an upper triangular matrix. (Note that the
inverse of an upper triangular matrix is also upper triangular and that the product of two upper
triangular matrices is also an upper triangular matrix.) The inverse S−1 of the upper triangular
matrix S is also upper triangular. Since S−1 = ST as S is an orthogonal matrix, we also have
that ST is an upper triangular matrix. Hence we have that S is also a lower triangular matrix.
Thus S has to be a diagonal matrix. An orthogonal diagonal matrix can only have diagonal
entries ±1. If we fix the signs of the diagonal entries of R1 to be the same as those of R2, it
follows that S = (si,j) can only be the identity (since si,i = (R1 R−12 )i,i = (R1)i,i (R
−12 )i,i because
R1 and R2 are upper triangular and sign((R−1
2 )i,i
)= sign
((R2)i,i
)). Then
R1 R−12 = QT
1 Q2 = I ⇒ R1 = R2 and Q1 = Q2,
and we see that, apart from the signs of the diagonal entries, the matrices Q and R of the QR
factorization are uniquely determined. 2
The second lemma is technical and will be needed in the proof of the convergence of the QR
method.
184 7.6. QR Algorithm
Lemma 7.24 (technial lemma for QR method)
Let D := diag(d1, d2, . . . , dn) ∈ Rn×n be a diagonal matrix with
and let L = (ℓi,j) ∈ Rn×n be a normalized lower triangular matrix. Let Lm denote the lower
triangular matrix with entries ℓi,j (di/dj)m for i ≥ j. Then, we have
Lm Dm = Dm L ⇔ Lm = Dm L (D−1)m, m ∈ N0. (7.45)
Furthermore, Lm converges linearly to the identity matrix for m → ∞.
Proof of Lemma 7.24. First we observe that multiplication from the left or right of any
matrix B with a diagonal matrix D will leave any zero entries in B invariant. Therefore,
since Dm and (D−1)m are diagonal matrices and L is a normalized lower triangular matrix,
the matrix Dm L (D−1)m is also a lower triangular matrix. Next we compute its entries: Since
Dm = diag(dm1 , dm
2 , . . . , dmn ) and (D−1)m = diag(d−m
1 , d−m2 , . . . , d−m
n ) we obtain from matrix
multiplication
(Dm L (D−1)m
)i,j
= dmi ℓi,j d−m
j = ℓi,j
(di
dj
)m
, 1 ≤ j ≤ i ≤ n,
and we see that indeed Dm L (D−1)m = Lm. We have (Lm)i,i = ℓi,i = 1, since L is a normalized
lower triangular matrix, and, since |di| < |dj| if j < i,
|(Lm)i,j| = |ℓi,j|∣∣∣∣di
dj
∣∣∣∣m
→ 0 for 1 ≤ j < i ≤ n,
which shows that Lm converges linearly to the identity matrix. 2
Theorem 7.25 (convergence of QR method)
Let A ∈ Rn×n be non-singular, and assume that A has n real eigenvalues λ1, λ2, . . . , λn ∈R and n linearly independent corresponding real eigenvectors w1,w2, . . . ,wn ∈ Rn. Let
T ∈ Rn×n be the matrix of corresponding eigenvectors of A, that is, T = (w1,w2, . . . ,wn),
and assume that T−1 possesses an LU factorization without pivoting. Then, the matrices
Am = (a(m)ij ) created by the QR method have the following properties:
(i) The sub-diagonal elements converge to zero, that is, a(m)i,j → 0 for m → ∞ for all i > j.
(ii) The sequences {A2m} and {A2m+1} each converge to an upper triangular matrix.
(iii) The diagonal elements converge to the eigenvalues of A, that is, a(m)i,i → λπ(i) for
m → ∞ for all 1 ≤ i ≤ n, where π : {1, 2, . . . , n} → {1, 2, . . . , n} is some permutation
of the numbers 1, 2, . . . , n.
Furthermore, the sequence of the matrices Qm, created by the QR method, converges to
an orthogonal diagonal matrix, that is, to an diagonal matrix having only 1 or −1 on the
diagonal.
7. Calculation of Eigenvalues 185
Proof of Theorem 7.25. From (7.43), we already know that all generated matrices Am have
the same eigenvalues as A. Let Qm and Rm be the matrices generated in the mth step of the
QR method (that is, Am = Qm Rm and Am+1 = Rm Qm), and introduce the notation
From the definition of Qm−1 as a product of unitary matrices we see that Qm−1 is a unitary
matrix, and from the definition of Rm−1 as a product of upper triangular matrices, we see that
the matrix Rm−1 is an upper triangular matrix.
From repeated application of (7.43) we have
Am = QTm−1 Am−1 Qm−1 = QT
m−1 QTm−2 Am−2 Qm−2 Qm−1 = . . . = QT
m−1 A Qm−1,
that is,
Am = QTm−1 A Qm−1. (7.46)
By induction we can also show that the powers of A have satisfy
Am = Qm−1 Rm−1. (7.47)
Indeed, from the QR factorization A = Q0 R0 = Q0 R0, and (7.47) holds true for m = 1.
Assume that (7.47) holds true for m. Then for m + 1
Am+1 = A Am = A Qm−1 Rm−1 = Qm−1 Am Rm−1 = Qm−1 (Qm Rm) Rm−1 = Qm Rm,
where we have used Qm−1 Am = A Qm−1 (from (7.46)) in the third step and the QR factorization
Am = Qm Rm in the fourth step.
Since Qm−1 is unitary and Rm−1 is upper triangular, (7.47) is a QR factorization Am =
Qm−1 Rm−1 of Am. Since A is non-singular, the mth power Am is also non-singular, and
by Lemma 7.23 the QR factorization of Am is unique, if we assume that all upper triangular
matrices Ri have positive diagonal elements. Thus (7.47) is the unique QR factorization of Am,
where the upper triangular matrix Rm−1 has positive diagonal entries.
If we define D = diag(λ1, λ2, . . . , λn), then we have the relation
A T = T D ⇔ A = T D T−1,
which leads to
Am = T Dm T−1.
By assumption, we have an LU factorization of T−1, that is, T−1 = L U , where L is a normalized
lower triangular matrix and U is an upper triangular matrix. This leads to
Am = T Dm L U. (7.48)
186 7.6. QR Algorithm
By Lemma 7.24 there is a sequence {Lm} of lower triangular matrices, defined by Lm :=
Dm L (D−1)m, which converges to I and which satisfies
Dm L = Lm Dm. (7.49)
Substitution of (7.49) into (7.48) leads to
Am = T Lm Dm U. (7.50)
Using the QR factorization
T = Q R (7.51)
of T with positive diagonal elements in R we derive from substituting (7.51) into (7.50)
Am = Q R Lm Dm U. (7.52)
Since the matrices Lm converge to I as m → ∞, the matrices R Lm have to converge to R as
m → ∞. If we now compute a QR factorization of R Lm as
R Lm = Qm Rm, (7.53)
again with positive diagonal elements in Rm, then we can conclude from the uniqueness of
the QR factorization that Qm has to converge to the identity matrix. Substituting (7.53) into
(7.52), the equation (7.52) can hence be rewritten as
Am = Q Qm Rm Dm U. (7.54)
Next, let us introduce the diagonal and orthogonal matrices
∆m := diag(sm1 , . . . , sm
n ) with smi := sign
(λm
i ui,i
),
where ui,i are the diagonal elements of U . Then, because of ∆2m = I, we can rewrite the
representation of Am in (7.54) as
Am =(Q Qm ∆m
) (∆m Rm Dm U). (7.55)
Since the signs of the diagonal elements of the upper triangular matrix Rm Dm U coincide with
those of Dm U and hence with those of ∆m, we see that (7.55) is a QR factorization of Am with
positive diagonal elements in the upper triangular matrix.
If we compare this QR factorization with the QR factorization (see (7.47))
Am = Qm−1 Rm−1
we note that in both cases the upper diagonal matrix in the QR factorization has positive
diagonal entries. Thus we can thus conclude from the uniqueness of the QR factorization that
Qm−1 = Q Qm ∆m. (7.56)
7. Calculation of Eigenvalues 187
Since Qm converges to the identity matrix, the matrices Qm converge to Q apart from the signs
of the columns. From (7.46), we have
Am = QTm−1 A Qm−1,
and from this and (7.56) we can derive (using that QTm = Q−1
m , since Qm is orthogonal, and
∆−1m = ∆m)
Am = Q−1m−1 A Qm−1 =
(Q Qm ∆m
)−1A(Q Qm ∆m
)
= ∆m Q−1m Q−1 A Q Qm ∆m = ∆m Q−1
m︸︷︷︸→ I
R T−1 A T︸ ︷︷ ︸= D
R−1 Qm︸︷︷︸→ I
∆m,
where we have used in the last step that Q = T R−1 from (7.51). If m tends to infinity then,
because of ∆2m = ∆0 and ∆2m+1 = ∆1, we can conclude the convergence of {A2m} and {A2m+1}to ∆0 R D R−1 ∆0 and ∆1 R D R−1 ∆1, respectively. Both limit matrices are upper triangular
and have the same elements on the diagonal which proves (ii). Thus we have that, if i > j,
a(m)i,j → 0 for m → ∞ which proves statement (i). Moreover, we can conclude that for the
diagonal entries of Am
limm→∞
a(m)i,i =
(∆0 R D R−1 ∆0
)i,i
=(∆1 R D R−1 ∆1
)i,i
, 1 ≤ i ≤ n. (7.57)
Since ∆0 R D R−1 ∆0 =(∆0 R
)D(∆0 R
)−1this matrix has the same eigenvalues as D =
diag(λ1, λ2, . . . , λn). Since ∆0 R D R−1 ∆0 is an upper triangular matrix, we know that its
diagonal entries are the eigenvalues of A, and thus
(∆0 R D R−1 ∆0
)i,i
=(∆0 (Q−1 T ) D (Q−1 T )−1 ∆0
)i,i
=(∆0 QT A Q ∆0
)i,i
, i = 1, 2, . . . , n,
(7.58)
are the eigenvalues of A. We cannot conclude that(∆0 R D R−1 ∆0
)i,i
is the eigenvalue λi, since
the unitary matrix Q in the last representation in (7.58) could provide a basis transformation
QT A Q that yields a permutation of the eigenvalues. Combining (7.58) and (7.57) yields finally
statement (iii).
Finally, from the definition of Qm and from (7.56),
Qm = Q−1m−1 Qm = ∆−1
m Q−1m Q−1 Q Qm+1 ∆m+1 = ∆m Q−1
m Qm+1 ∆m+1.
Using again the convergence of Qm to the identity matrix I, we see that Qm converges to