CHAPTER 1 Matrix Algebra In this chapter we collect results related to matrix algebra which are relevant to this book. Some specific topics which are typically not found in standard books are also covered here. 1.1. Preliminaries Standard notation in this chapter is given here. Matrices are denoted by capital letters A, B etc.. They can be rectangular with m rows and n columns. Their elements or entries are referred to with small letters a ij , b ij etc. where i denotes the i-th row of matrix and j denotes the j -th column of matrix. Thus A = a 11 a 12 ...a 1n a 21 a 22 ...a 1n . . . . . . . . . . . . a m1 a m2 ...a mn Mostly we consider complex matrices belonging to C m×n . Sometimes we will restrict our attention to real matrices belonging to R m×n . Definition 1.1 [Square matrix] An m × n matrix is called square matrix if m = n. Definition 1.2 [Tall matrix] An m × n matrix is called tall ma- trix if m>n i.e. the number of rows is greater than columns. 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CHAPTER 1
Matrix Algebra
In this chapter we collect results related to matrix algebra which are
relevant to this book. Some specific topics which are typically not
found in standard books are also covered here.
1.1. Preliminaries
Standard notation in this chapter is given here. Matrices are denoted
by capital letters A, B etc.. They can be rectangular with m rows
and n columns. Their elements or entries are referred to with small
letters aij, bij etc. where i denotes the i-th row of matrix and j denotes
the j-th column of matrix. Thus
A =
a11 a12 . . . a1n
a21 a22 . . . a1n...
.... . .
...
am1 am2 . . . amn
Mostly we consider complex matrices belonging to Cm×n. Sometimes
we will restrict our attention to real matrices belonging to Rm×n.
Definition 1.1 [Square matrix] An m×n matrix is called square
matrix if m = n.
Definition 1.2 [Tall matrix] An m× n matrix is called tall ma-
trix if m > n i.e. the number of rows is greater than columns.
1
2 1. MATRIX ALGEBRA
Definition 1.3 [Wide matrix] An m × n matrix is called wide
matrix if m < n i.e. the number of columns is greater than rows.
Definition 1.4 [Main diagonal] Let A = [aij] be an m×n matrix.
The main diagonal consists of entries aij where i = j. i.e. main
diagonal is {a11, a22, . . . , akk} where k = min(m,n). Main diagonal
is also known as leading diagonal, major diagonal primary
diagonal or principal diagonal. The entries of A which are not
on the main diagonal are known as off diagonal entries.
Definition 1.5 [Diagonal matrix] A diagonal matrix is a matrix
(usually a square matrix) whose entries outside the main diagonal
are zero.
Whenever we refer to a diagonal matrix which is not square, we
will use the term rectangular diagonal matrix.
A square diagonal matrixA is also represented by diag(a11, a22, . . . , ann)
which lists only the diagonal (non-zero) entries in A.
The transpose of a matrix A is denoted by AT while the Hermitian
transpose is denoted by AH . For real matrices AT = AH .
When matrices are square, we have the number of rows and columns
both equal to n and they belong to Cn×n.
If not specified, the square matrices will be of size n×n and rectangular
matrices will be of size m×n. If not specified the vectors (column vec-
tors) will be of size n×1 and belong to either Rn or Cn. Corresponding
row vectors will be of size 1× n.
For statements which are valid both for real and complex matrices,
sometimes we might say that matrices belong to Fm×n while the scalars
belong to F and vectors belong to Fn where F refers to either the field
of real numbers or the field of complex numbers. Note that this is not
1.1. PRELIMINARIES 3
consistently followed at the moment. Most results are written only for
Cm×n while still being applicable for Rm×n.
Identity matrix for Fn×n is denoted as In or simply I whenever the size
is clear from context.
Sometimes we will write a matrix in terms of its column vectors. We
will use the notation
A =[a1 a2 . . . an
]indicating n columns.
When we write a matrix in terms of its row vectors, we will use the
notation
A =
aT1aT2...
aTm
indicating m rows with ai being column vectors whose transposes form
the rows of A.
The rank of a matrix A is written as rank(A), while the determinant
as det(A) or |A|.
We say that an m × n matrix A is left-invertible if there exists an
n×m matrix B such that BA = I. We say that an m× n matrix A is
right-invertible if there exists an n×m matrix B such that AB = I.
We say that a square matrix A is invertible when there exists another
square matrix B of same size such that AB = BA = I. A square
matrix is invertible iff its both left and right invertible. Inverse of a
square invertible matrix is denoted by A−1.
A special left or right inverse is the pseudo inverse, which is denoted
by A†.
Column space of a matrix is denoted by C(A), the null space by N (A),
and the row space by R(A).
4 1. MATRIX ALGEBRA
We say that a matrix is symmetric when A = AT , conjugate sym-
metric or Hermitian when AH = A.
When a square matrix is not invertible, we say that it is singular. A
non-singular matrix is invertible.
The eigen values of a square matrix are written as λ1, λ2, . . . while the
singular values of a rectangular matrix are written as σ1, σ2, . . . .
The inner product or dot product of two column / row vectors u and
v belonging to Rn is defined as
u · v = 〈u, v〉 =n∑i=1
uivi. (1.1.1)
The inner product or dot product of two column / row vectors u and
v belonging to Cn is defined as
u · v = 〈u, v〉 =n∑i=1
uivi. (1.1.2)
1.1.1. Block matrix
Definition 1.6 A block matrix is a matrix whose entries them-
selves are matrices with following constraints
(1) Entries in every row are matrices with same number of
rows.
(2) Entries in every column are matrices with same number
of columns.
Let A be an m× n block matrix. Then
A =
A11 A12 . . . A1n
A21 A22 . . . A2n
......
. . ....
Am1 Am2 . . . Amn
(1.1.3)
where Aij is a matrix with ri rows and cj columns.
1.1. PRELIMINARIES 5
A block matrix is also known as a partitioned matrix.
Example 1.1: 2x2 block matrices Quite frequently we will be using
2x2 block matrices.
P =
[P11 P12
P21 P22
]. (1.1.4)
An example
P =
a b c
d e f
g h i
We have
P11 =
[a b
d e
]P12 =
[c
f
]P21 =
[g h
]P22 =
[i]
• P11 and P12 have 2 rows.
• P21 and P22 have 1 row.
• P11 and P21 have 2 columns.
• P12 and P22 have 1 column.
�
Lemma 1.1 Let A = [Aij] be an m×n block matrix with Aij being
an ri × cj matrix. Then A is an r × c matrix where
r =m∑i=1
ri (1.1.5)
and
c =n∑j=1
cj. (1.1.6)
Remark. Sometimes it is convenient to think of a regular matrix as a
block matrix whose entries are 1× 1 matrices themselves.
Definition 1.7 [Multiplication of block matrices] Let A = [Aij]
be an m × n block matrix with Aij being a pi × qj matrix. Let
6 1. MATRIX ALGEBRA
B = [Bjk] be an n×p block matrix with Bjk being a qj×rk matrix.
Then the two block matrices are compatible for multiplication
and their multiplication is defined by C = AB = [Cik] where
Cik =n∑j=1
AijBjk (1.1.7)
and Cik is a pi × rk matrix.
Definition 1.8 A block diagonal matrix is a block matrix
whose off diagonal entries are zero matrices.
1.2. Linear independence, span, rank
1.2.1. Spaces associated with a matrix
Definition 1.9 The column space of a matrix is defined as the
vector space spanned by columns of the matrix.
Let A be an m× n matrix with
A =[a1 a2 . . . an
]Then the column space is given by
C(A) = {x ∈ Fm : x =n∑i=1
αiai for some αi ∈ F}. (1.2.1)
Definition 1.10 The row space of a matrix is defined as the
vector space spanned by rows of the matrix.
Let A be an m× n matrix with
A =
aT1aT2...
aTm
1.2. LINEAR INDEPENDENCE, SPAN, RANK 7
Then the row space is given by
R(A) = {x ∈ Fn : x =m∑i=1
αiai for some αi ∈ F}. (1.2.2)
1.2.2. Rank
Definition 1.11 [Column rank] The column rank of a matrix
is defined as the maximum number of columns which are linearly
independent. In other words column rank is the dimension of the
column space of a matrix.
Definition 1.12 [Row rank] The row rank of a matrix is defined
as the maximum number of rows which are linearly independent.
In other words row rank is the dimension of the row space of a
matrix.
Theorem 1.2 The column rank and row rank of a matrix are
equal.
Definition 1.13 [Rank] The rank of a matrix is defined to be
equal to its column rank which is equal to its row rank.
Lemma 1.3 For an m× n matrix A
0 ≤ rank(A) ≤ min(m,n). (1.2.3)
Lemma 1.4 The rank of a matrix is 0 if and only if it is a zero
matrix.
Definition 1.14 [Full rank matrix] An m × n matrix A is called
full rank if
rank(A) = min(m,n).
In other words it is either a full column rank matrix or a full row
rank matrix or both.
8 1. MATRIX ALGEBRA
Lemma 1.5 [Rank of product to two matrices] Let A be an m×nmatrix and B be an n× p matrix then
rank(AB) ≤ min(rank(A), rank(B)). (1.2.4)
Lemma 1.6 [Post-multiplication with a full row rank matrix] Let
A be an m× n matrix and B be an n× p matrix. If B is of rank
n then
rank(AB) = rank(A). (1.2.5)
Lemma 1.7 [Pre-multiplication with a full column rank matrix]
Let A be an m × n matrix and B be an n × p matrix. If A is of
rank n then
rank(AB) = rank(B). (1.2.6)
Lemma 1.8 The rank of a diagonal matrix is equal to the number
of non-zero elements on its main diagonal.
Proof. The columns which correspond to diagonal entries which
are zero are zero columns. Other columns are linearly independent.
The number of linearly independent rows is also the same. Hence their
count gives us the rank of the matrix. �
1.3. Invertible matrices
Definition 1.15 [Invertible] A square matrix A is called invert-
ible if there exists another square matrix B of same size such that
AB = BA = I.
The matrix B is called the inverse of A and is denoted as A−1.
Lemma 1.9 If A is invertible then its inverse A−1 is also invertible
and the inverse of A−1 is nothing but A.
1.3. INVERTIBLE MATRICES 9
Lemma 1.10 Identity matrix I is invertible.
Proof.
II = I =⇒ I−1 = I.
�
Lemma 1.11 If A is invertible then columns of A are linearly
independent.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Assume that columns of A are linearly dependent. Then there exists
u 6= 0 such that
Au = 0 =⇒ BAu = 0 =⇒ Iu = 0 =⇒ u = 0
a contradiction. Hence columns of A are linearly independent. �
Lemma 1.12 If an n × n matrix A is invertible then columns of
A span Fn.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Now let x ∈ Fn be any arbitrary vector. We need to show that there
exists α ∈ Fn such that
x = Aα.
But
x = Ix = ABx = A(Bx).
Thus if we choose α = Bx, then
x = Aα.
10 1. MATRIX ALGEBRA
Thus columns of A span Fn. �
Lemma 1.13 If A is invertible, then columns of A form a basis
for Fn.
Proof. In Fn a basis is a set of vectors which is linearly inde-
pendent and spans Fn. By lemma 1.11 and lemma 1.12, columns of
an invertible matrix A satisfy both conditions. Hence they form a
basis. �
Lemma 1.14 If A is invertible than AT is invertible.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Applying transpose on both sides we get
BTAT = ATBT = I.
Thus BT is inverse of AT and AT is invertible. �
Lemma 1.15 If A is invertible than AH is invertible.
Proof. Assume A is invertible, then there exists a matrix B such
that
AB = BA = I.
Applying conjugate transpose on both sides we get
BHAH = AHBH = I.
Thus BH is inverse of AH and AH is invertible. �
1.3. INVERTIBLE MATRICES 11
Lemma 1.16 If A and B are invertible then AB is invertible.
Proof. We note that
(AB)(B−1A−1) = A(BB−1)A−1 = AIA−1 = I.
Similarly
(B−1A−1)(AB) = B−1(A−1A)B = B−1IB = I.
Thus B−1A−1 is the inverse of AB. �
Lemma 1.17 The set of n×n invertible matrices under the matrix
multiplication operation form a group.
Proof. We verify the properties of a group
Closure: If A and B are invertible then AB is invertible. Hence the
set is closed.
Associativity: Matrix multiplication is associative.
Identity element: I is invertible and AI = IA = A for all invertible
matrices.
Inverse element: If A is invertible then A−1 is also invertible.
Thus the set of invertible matrices is indeed a group under matrix
multiplication. �
Lemma 1.18 An n × n matrix A is invertible if and only if it is
full rank i.e.
rank(A) = n.
Corollary 1.19. The rank of an invertible matrix and its inverse are
same.
12 1. MATRIX ALGEBRA
1.3.1. Similar matrices
Definition 1.16 [Similar matrices] An n×n matrix B is similar
to an n× n matrix A if there exists an n× n non-singular matrix
C such that
B = C−1AC.
Lemma 1.20 If B is similar to A then A is similar to B. Thus
similarity is a symmetric relation.
Proof.
B = C−1AC =⇒ A = CBC−1 =⇒ A = (C−1)−1BC−1
Thus there exists a matrix D = C−1 such that
A = D−1BD.
Thus A is similar to B. �
Lemma 1.21 Similar matrices have same rank.
Proof. Let B be similar to A. Thus their exists an invertible
matrix C such that
B = C−1AC.
Since C is invertible hence we have rank(C) = rank(C−1) = n. Now
using lemma 1.6 rank(AC) = rank(A) and using lemma 1.7 we have
rank(C−1(AC)) = rank(AC) = rank(A). Thus
rank(B) = rank(A).
�
Lemma 1.22 Similarity is an equivalence relation on the set of
n× n matrices.
1.3. INVERTIBLE MATRICES 13
Proof. Let A,B,C be n×n matrices. A is similar to itself through
an invertible matrix I. If A is similar to B then B is similar to itself.
If B is similar to A via P s.t. B = P−1AP and C is similar to B
via Q s.t. C = Q−1BQ then C is similar to A via PQ such that
C = (PQ)−1A(PQ). Thus similarity is an equivalence relation on the
set of square matrices and if A is any n×n matrix then the set of n×nmatrices similar to A forms an equivalence class. �
1.3.2. Gram matrices
Definition 1.17 Gram matrix of columns of A is given by
G = AHA (1.3.1)
Definition 1.18 Gram matrix of rows of A is given by
G = AAH (1.3.2)
Remark. Usually when we talk about Gram matrix of a matrix we
are looking at the Gram matrix of its column vectors.
Remark. For real matrix A ∈ Rm×n, the Gram matrix of its column
vectors is given by ATA and the Gram matrix for its row vectors is
given by AAT .
Following results apply equally well for the real case.
Lemma 1.23 The columns of a matrix are linearly dependent if
and only if the Gram matrix of its column vectors AHA is not
invertible.
Proof. Let A be an m × n matrix and G = AHA be the Gram
matrix of its columns.
If columns of A are linearly dependent, then there exists a vector u 6= 0
such that
Au = 0.
14 1. MATRIX ALGEBRA
Thus
Gu = AHAu = 0.
Hence the columns of G are also dependent and G is not invertible.
Conversely let us assume that G is not invertible, thus columns of G
are dependent and there exists a vector v 6= 0 such that
Gv = 0.
Now
vHGv = vHAHAv = (Av)H(Av) = ‖Av‖22.
From previous equation, we have
‖Av‖22 = 0 =⇒ Av = 0.
Since v 6= 0 hence columns of A are also linearly dependent. �
Corollary 1.24. The columns of a matrix are linearly independent if
and only if the Gram matrix of its column vectors AHA is invertible.
Proof. Columns of A can be dependent only if its Gram matrix is
not invertible. Thus if the Gram matrix is invertible, then the columns
of A are linearly independent.
The Gram matrix is not invertible only if columns of A are linearly
dependent. Thus if columns of A are linearly independent then the
Gram matrix is invertible. �
Corollary 1.25. Let A be a full column rank matrix. Then AHA is
invertible.
Lemma 1.26 The null space of A and its Gram matrix AHA co-
incide. i.e.
N (A) = N (AHA). (1.3.3)
Proof. Let u ∈ N (A). Then
Au = 0 =⇒ AHAu = 0.
1.3. INVERTIBLE MATRICES 15
Thus
u ∈ N (AHA) =⇒ N (A) ⊆ N (AHA).
Now let u ∈ N (AHA). Then
AHAu = 0 =⇒ uHAHAu = 0 =⇒ ‖Au‖22 = 0 =⇒ Au = 0.
Thus we have
u ∈ N (A) =⇒ N (AHA) ⊆ N (A).
�
Lemma 1.27 The rows of a matrix A are linearly dependent if and
only if the Gram matrix of its row vectors AAH is not invertible.
Proof. Rows of A are linearly dependent, if and only if columns
of AH are linearly dependent. There exists a vector v 6= 0 s.t.
AHv = 0
Thus
Gv = AAHv = 0.
Since v 6= 0 hence G is not invertible.
Converse: assuming that G is not invertible, there exists a vector u 6= 0
Thus B − λI is similar to A − λI. Hence due to lemma 1.48, their
determinant is equal i.e.
det(B − λI) = det(A− λI).
This means that the characteristic polynomials of A and B are same.
Since eigen values are nothing but roots of the characteristic polyno-
mial, hence they are same too. This means that the spectrum (the set
of distinct eigen values) is same. �
Corollary 1.70. If A and B are similar to each other then
(a) An eigen value has same algebraic and geometric multiplicity for
both A and B.
(a) The (not necessarily distinct) eigen values of A and B are same.
Although the eigen values are same, but the eigen vectors are differ-
ent.
38 1. MATRIX ALGEBRA
Lemma 1.71 Let A and B be similar with
B = C−1AC
for some invertible matrix C. If u is an eigen vector of A for an
eigen value λ, then C−1u is an eigen vector of B for the same
eigen value.
Proof. u is an eigen vector of A for an eigen value λ. Thus we
have
Au = λu.
Thus
BC−1u = C−1ACC−1u = C−1Au = C−1λu = λC−1u.
Now u 6= 0 and C−1 is non singular. Thus C−1u 6= 0. Thus C−1u is an
eigen vector of B.
�
Theorem 1.72 [Geometric vs. algebraic multiplicity] Let λ be an
eigen value of a square matrix A. Then the geometric multiplicity
of λ is less than or equal to its algebraic multiplicity.
Corollary 1.73. If an n×n matrix A has n distinct eigen values, then
each of them has a geometric (and algebraic) multiplicity of 1.
Proof. The algebraic multiplicity of an eigen value is greater than
or equal to 1. But the sum cannot exceed n. Since there are n distinct
eigen values, thus each of them has algebraic multiplicity of 1. Now
geometric multiplicity of an eigen value is greater than equal to 1 and
less than equal to its algebraic multiplicity. �
1.6. EIGEN VALUES 39
Corollary 1.74. Let an n × n matrix A has k distinct eigen values
λ1, λ2, . . . , λk with algebraic multiplicities r1, r2, . . . , rk and geometric
multiplicities g1, g2, . . . gk respectively. Then
k∑i=1
gk ≤k∑i=1
rk ≤ n.
Moreover ifk∑i=1
gk =k∑i=1
rk
then
gk = rk.
1.6.4. Linear independence of eigen vectors
Theorem 1.75 [Linear independence of eigen vectors for distinct
eigen values] Let A be an n × n square matrix. Let x1, x2, . . . , xk
be any k eigen vectors of A for distinct eigen values λ1, λ2, . . . , λk
respectively. Then x1, x2, . . . , xk are linearly independent.
Proof. We first prove the simpler case with 2 eigen vectors x1 and
x2 and corresponding eigen values λ1 and λ2 respectively.
Let there be a linear relationship between x1 and x2 given by
α1x1 + α2x2 = 0.
Multiplying both sides with (A− λ1I) we get
α1(A− λ1I)x1 + α2(A− λ1I)x2 = 0
=⇒ α1(λ1 − λ1)x1 + α2(λ1 − λ2)x2 = 0
=⇒ α2(λ1 − λ2)x2 = 0.
Since λ1 6= λ2 and x2 6= 0 , hence α2 = 0.
Similarly by multiplying with (A − λ2I) on both sides, we can show
that α1 = 0. Thus x1 and x2 are linearly independent.
40 1. MATRIX ALGEBRA
Now for the general case, consider a linear relationship between x1, x2, . . . , xk
given by
α1x1 + α2x2 + . . . αkxk = 0.
Multiplying by∏k
i 6=j,i=1(A − λiI) and using the fact that λi 6= λj if
i 6= j, we get αj = 0. Thus the only linear relationship is the trivial
relationship. This completes the proof. �
For eigen values with geometric multiplicity greater than 1 there are
multiple eigenvectors corresponding to the eigen value which are lin-
early independent. In this context, above theorem can be generalized
further.
Theorem 1.76 Let λ1, λ2, . . . , λk be k distinct eigen values of
A. Let {xj1, xj2, . . . x
jgj} be any gj linearly independent eigen vec-
tors from the eigen space of λj where gj is the geometric mul-
tiplicity of λj. Then the combined set of eigen vectors given by
{x11, . . . x1g1 , . . . xk1, . . . x
kgk} consisting of
∑kj=1 gj eigen vectors is
linearly independent.
This result puts an upper limit on the number of linearly independent
eigen vectors of a square matrix.
Lemma 1.77 Let {λ1, . . . , λk} represents the spectrum of an n×nmatrix A. Let g1, . . . , gk be the geometric multiplicities of λ1, . . . λk
respectively. Then the number of linearly independent eigen vectors
for A isk∑i=1
gi.
Moreover ifk∑i=1
gi = n
then a set of n linearly independent eigen vectors of A can be found
which forms a basis for Fn.
1.6. EIGEN VALUES 41
1.6.5. Diagonalization
Diagonalization is one of the fundamental operations in linear algebra.
This section discusses diagonalization of square matrices in depth.
Definition 1.31 [Diagonalizable matrix] An n × n matrix A is
said to be diagonalizable if it is similar to a diagonal matrix.
In other words there exists an n × n non-singular matrix P such
that D = P−1AP is a diagonal matrix. If this happens then we
say that P diagonalizes A or A is diagonalized by P .
Remark.
D = P−1AP ⇐⇒ PD = AP ⇐⇒ PDP−1 = A. (1.6.16)
We note that if we restrict to real matrices, then U and D should
also be real. If A ∈ Cn×n (it may still be real) then P and D can be
complex.
The next theorem is the culmination of a variety of results studied so
far.
Theorem 1.78 [Properties of diagonalizable matrices] Let A be a
diagonalizable matrix with D = P−1AP being its diagonalization.
Let D = diag(d1, d2, . . . , dn). Then the following hold
(a) rank(A) = rank(D) which equals the number of non-zero en-
tries on the main diagonal of D.
(a) det(A) = d1d2 . . . dn.
(a) tr(A) = d1 + d2 + . . . dn.
(a) The characteristic polynomial of A is
p(λ) = (−1)n(λ− d1)(λ− d2) . . . (λ− dn).
(a) The spectrum of A comprises the distinct scalars on the diag-
onal entries in D.
42 1. MATRIX ALGEBRA
(a) The (not necessarily distinct) eigenvalues of A are the diagonal
elements of D.
(a) The columns of P are (linearly independent) eigenvectors of
A.
(a) The algebraic and geometric multiplicities of an eigenvalue λ
of A equal the number of diagonal elements of D that equal λ.
Proof. From definition 1.31 we note that D and A are similar.
Due to lemma 1.48
det(A) = det(D).
Due to lemma 1.47
det(D) =n∏i=1
di.
Now due to lemma 1.39
tr(A) = tr(D) =n∑i=1
di.
Further due to lemma 1.69 the characteristic polynomial and spectrum
of A and D are same. Due to lemma 1.67 the eigen values of D are
nothing but its diagonal entries. Hence they are also the eigen values
of A.
D = P−1AP =⇒ AP = PD.
Now writing
P =[p1 p2 . . . pn
]we have
AP =[Ap1 Ap2 . . . Apn
]= PD =
[d1p1 d2p2 . . . dnpn
].
Thus pi are eigen vectors of A.
Since the characteristic polynomials of A and D are same, hence the
algebraic multiplicities of eigen values are same.
From lemma 1.71 we get that there is a one to one correspondence
between the eigen vectors of A and D through the change of basis
1.6. EIGEN VALUES 43
given by P . Thus the linear independence relationships between the
eigen vectors remain the same. Hence the geometric multiplicities of
individual eigenvalues are also the same.
This completes the proof. �
So far we have verified various results which are available if a matrix A
is diagonalizable. We haven’t yet identified the conditions under which
A is diagonalizable. We note that not every matrix is diagonalizable.
The following theorem gives necessary and sufficient conditions under
which a matrix is diagonalizable.
Theorem 1.79 An n× n matrix A is diagonalizable by an n× nnon-singular matrix P if and only if the columns of P are (linearly
independent) eigenvectors of A.
Proof. We note that since P is non-singular hence columns of P
have to be linearly independent.
The necessary condition part was proven in theorem 1.78. We now
show that if P consists of n linearly independent eigen vectors of A
then A is diagonalizable.
Let the columns of P be p1, p2, . . . , pn and corresponding (not neces-
sarily distinct) eigen values be d1, d2, . . . , dn. Then
Api = dipi.
Thus by letting D = diag(d1, d2, . . . , dn), we have
AP = PD.
Now since columns of P are linearly independent, hence P is invertible.
This gives us
D = P−1AP.
Thus A is similar to a diagonal matrix D. This validates the sufficient
condition. �
44 1. MATRIX ALGEBRA
A corollary follows.
Corollary 1.80. An n×n matrix is diagonalizable if and only if there
exists a linearly independent set of n eigenvectors of A.
Now we know that geometric multiplicities of eigen values of A provide
us information about linearly independent eigenvectors of A.
Corollary 1.81. Let A be an n× n matrix. Let λ1, λ2, . . . , λk be its k
distinct eigen values (comprising its spectrum). Let gj be the geometric
multiplicity of λj.Then A is diagonalizable if and only if
n∑i=1
gi = n. (1.6.17)
1.6.6. Symmetric matrices
This subsection is focused on real symmetric matrices.
Following is a fundamental property of real symmetric matrices.
Theorem 1.82 Every real symmetric matrix has an eigen value.
The proof of this result is beyond the scope of this book.
Lemma 1.83 Let A be an n×n real symmetric matrix. Let λ1 and
λ2 be any two distinct eigen values of A and let x1 and x2 be any
two corresponding eigen vectors. Then x1 and x2 are orthogonal.
Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xT2Ax1 = λ1xT2 x1
=⇒ xT1ATx2 = λ1x
T1 x2
=⇒ xT1Ax2 = λ1xT1 x2
=⇒ xT1 λ2x2 = λ1xT1 x2
=⇒ (λ1 − λ2)xT1 x2 = 0
=⇒ xT1 x2 = 0.
1.6. EIGEN VALUES 45
Thus x1 and x2 are orthogonal. In between we took transpose on both
sides, used the fact that A = AT and λ1 − λ2 6= 0. �
Definition 1.32 [Orthogonally diagonalizable matrix] A real n×nmatrix A is said to be orthogonally diagonalizable if there
exists an orthogonal matrix U which can diagonalize A, i.e.
D = UTAU
is a real diagonal matrix.
Lemma 1.84 Every orthogonally diagonalizable matrix A is sym-
metric.
Proof. We have a diagonal matrix D such that
A = UDUT .
Taking transpose on both sides we get
AT = UDTUT = UDUT = A.
Thus A is symmetric. �
Theorem 1.85 Every symmetric matrix A is orthogonally diago-
nalizable.
We skip the proof of this theorem.
1.6.7. Hermitian matrices
Following is a fundamental property of Hermitian matrices.
Theorem 1.86 Every Hermitian matrix has an eigen value.
The proof of this result is beyond the scope of this book.
46 1. MATRIX ALGEBRA
Lemma 1.87 The eigenvalues of a Hermitian matrix are real.
Proof. Let A be a Hermitian matrix and let λ be an eigen value
of A. Let u be a corresponding eigen vector. Then
Au = λu
=⇒ uHAH = uHλ
=⇒ uHAHu = uHλu
=⇒ uHAu = λuHu
=⇒ uHλu = λuHu
=⇒ ‖u‖22(λ− λ) = 0
=⇒ λ = λ
thus λ is real. We used the facts that A = AH and u 6= 0 =⇒ ‖u‖2 6=0. �
Lemma 1.88 Let A be an n × n complex Hermitian matrix. Let
λ1 and λ2 be any two distinct eigen values of A and let x1 and
x2 be any two corresponding eigen vectors. Then x1 and x2 are
orthogonal.
Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus
xH2 Ax1 = λ1xH2 x1
=⇒ xH1 AHx2 = λ1x
H1 x2
=⇒ xH1 Ax2 = λ1xH1 x2
=⇒ xH1 λ2x2 = λ1xH1 x2
=⇒ (λ1 − λ2)xH1 x2 = 0
=⇒ xH1 x2 = 0.
Thus x1 and x2 are orthogonal. In between we took conjugate transpose
on both sides, used the fact that A = AH and λ1 − λ2 6= 0. �
1.6. EIGEN VALUES 47
Definition 1.33 [Unitary diagonalizable matrix] A complex n×nmatrix A is said to be unitary diagonalizable if there exists a
unitary matrix U which can diagonalize A, i.e.
D = UHAU
is a complex diagonal matrix.
Lemma 1.89 Let A be a unitary diagonalizable matrix whose di-
agonalization D is real. Then A is Hermitian.
Proof. We have a real diagonal matrix D such that
A = UDUH .
Taking conjugate transpose on both sides we get
AH = UDHUH = UDUH = A.
Thus A is Hermitian. We used the fact that DH = D since D is
real. �
Theorem 1.90 Every Hermitian matrix A is unitary diagonaliz-
able.
We skip the proof of this theorem. The theorem means that if A is
Hermitian then A = UΛUH
Definition 1.34 [Eigen value decomposition of a Hermitian ma-
trix] Let A be an n × n Hermitian matrix. Let λ1, . . . λn be its
eigen values such that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let
Λ = diag(λ1, . . . , λn).
Let U be a unit matrix consisting of orthonormal eigen vectors
corresponding to λ1, . . . , λn. Then The eigen value decomposition
of A is defined as
A = UΛUH . (1.6.18)
48 1. MATRIX ALGEBRA
If λi are distinct, then the decomposition is unique. If they are
not distinct, then
Remark. Let Λ be a diagonal matrix as in definition 1.34. Consider
some vector x ∈ Cn.
xHΛx =n∑i=1
λi|xi|2. (1.6.19)
Now if λi ≥ 0 then
xHΛx ≤ λ1
n∑i=1
|xi|2 = λ1‖x‖22.
Also
xHΛx ≥ λn
n∑i=1
|xi|2 = λn‖x‖22.
Lemma 1.91 Let A be a Hermitian matrix with non-negative eigen
values. Let λ1 be its largest and λn be its smallest eigen values.
λn‖x‖22 ≤ xHAx ≤ λ1‖x‖22 ∀ x ∈ Cn. (1.6.20)
Proof. A has an eigen value decomposition given by
A = UΛUH .
Let x ∈ Cn and let v = UHx. Clearly ‖x‖2 = ‖v‖2. Then
xHAx = xHUΛUHx = vHΛv.
From previous remark we have
λn‖v‖22 ≤ vHΛv ≤ λ1‖v‖22.
Thus we get
λn‖x‖22 ≤ xHAx ≤ λ1‖x‖22.
�
1.6. EIGEN VALUES 49
1.6.8. Miscellaneous properties
This subsection lists some miscellaneous properties of eigen values of a
square matrix.
Lemma 1.92 λ is an eigen value of A if and only if λ + k is an
eigen value of A + kI. Moreover A and A + kI share the same
eigen vectors.
Proof.Ax = λx
⇐⇒ Ax+ kx = λx+ kx
⇐⇒ (A+ kI)x = (λ+ k)x.
(1.6.21)
Thus λ is an eigen value of A with an eigen vector x if and only if λ+k
is an eigen vector of A+ kI with an eigen vector x. �
1.6.9. Diagonally dominant matrices
Definition 1.35 [Diagonally dominant matrix] Let A = [aij] be a
square matrix in Cn×n. A is called diagonally dominant if
|aii| ≥∑j 6=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is greater than or equal to the sum of absolute values of
all the off diagonal elements on that row.
Definition 1.36 [Strictly diagonally dominant matrix] Let A =
[aij] be a square matrix in Cn×n. A is called strictly diagonally
dominant if
|aii| >∑j 6=i
|aij|
holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal
element is bigger than the sum of absolute values of all the off
diagonal elements on that row.
50 1. MATRIX ALGEBRA
Example 1.2: Strictly diagonally dominant matrix Let us con-
sider
A =
−4 −2 −1 0
−4 7 2 0
3 −4 9 1
2 −1 −3 15
We can see that the strict diagonal dominance condition is satisfied for
each row as follows:
row 1 : | − 4| > | − 2|+ | − 1|+ |0| = 3
row 2 : |7| > | − 4|+ |2|+ |0| = 6
row 3 : |9| > |3|+ | − 4|+ |1| = 8
row 4 : |15| > |2|+ | − 1|+ | − 3| = 6
�
Strictly diagonally dominant matrices have a very special property.
They are always non-singular.
Theorem 1.93 Strictly diagonally dominant matrices are non-
singular.
Proof. Suppose that A is diagonally dominant and singular. Then
there exists a vector u ∈ Cn with u 6= 0 such that
Au = 0. (1.6.22)
Let
u =[u1 u2 . . . un
]T.
We first show that every entry in u cannot be equal in magnitude. Let
us assume that this is so. i.e.
c = |u1| = |u2| = · · · = |un|.
1.6. EIGEN VALUES 51
Since u 6= 0 hence c 6= 0. Now for any row i in (1.6.22) , we have
n∑j=1
aijuj = 0
=⇒n∑j=1
±aijc = 0
=⇒n∑j=1
±aij = 0
=⇒ ∓ aii =∑j 6=i
±aij
=⇒ |aii| = |∑j 6=i
±aij|
=⇒ |aii| ≤∑j 6=i
|aij| using triangle inequality
but this contradicts our assumption that A is strictly diagonally dom-
inant. Thus all entries in u are not equal in magnitude.
Let us now assume that the largest entry in u lies at index i with
|ui| = c. Without loss of generality we can scale down u by c to
get another vector in which all entries are less than or equal to 1 in
magnitude while i-th entry is ±1. i.e. ui = ±1 and |uj| ≤ 1 for all
other entries.
Now from (1.6.22) we get for the i-th row
n∑j=1
aijuj = 0
=⇒ ± aii =∑j 6=i
ujaij
=⇒ |aii| ≤∑j 6=i
|ujaij| ≤∑j 6=i
|aij|
which again contradicts our assumption that A is strictly diagonally
dominant.
Hence strictly diagonally dominant matrices are non-singular. �
52 1. MATRIX ALGEBRA
1.6.10. Gershgorin’s theorem
We are now ready to examine Gershgorin’ theorem which provides very
useful bounds on the spectrum of a square matrix.
Theorem 1.94 Every eigen value λ of a square matrix A ∈ Cn×n
satisfies
|λ− aii| ≤∑j 6=i
|aij| for some i ∈ {1, 2, . . . , n}. (1.6.23)
Proof. The proof is a straight forward application of non-singularity
of diagonally dominant matrices.
We know that for an eigen value λ, det(λI − A) = 0 i.e. the matrix
(λI − A) is singular. Hence it cannot be strictly diagonally dominant
due to theorem 1.93.
Thus looking at each row i of (λI − A) we can say that
|λ− aii| >∑j 6=i
|aij|
cannot be true for all rows simultaneously. i.e. it must fail at least for
one row. This means that there exists at least one row i for which
|λ− aii| ≤∑j 6=i
|aij|
holds true. �
What this theorem means is pretty simple. Consider a disc in the
complex plane for the i-th row of A whose center is given by aii and
whose radius is given by r =∑
j 6=i |aij| i.e. the sum of magnitudes of
all non-diagonal entries in i-th row.
There are n such discs corresponding to n rows in A. (1.6.23) means
that every eigen value must lie within the union of these discs. It
cannot lie outside.
This idea is crystallized in following definition.
1.7. SINGULAR VALUES 53
Definition 1.37 [Gershgorin’s disc] For i-th row of matrix A we
define the radius ri =∑
j 6=i |aij| and the center ci = aii. Then the
set given by
Di = {z ∈ C : |z − aii| ≤ ri}
is called the i-th Gershgorin’s disc of A.
We note that the definition is equally valid for real as well as complex
matrices. For real matrices, the centers of disks lie on the real line. For
complex matrices, the centers may lie anywhere in the complex plane.
Clearly there is nothing magical about the rows of A. We can as well
consider the columns of A.
Theorem 1.95 Every eigen value of a matrix A must lie in a
Gershgorin disc corresponding to the columns of A where the Ger-
shgorin disc for j-th column is given by
Dj = {z ∈ C : |z − ajj| ≤ rj}
with
rj =∑i 6=j
|aij|
Proof. We know that eigen values of A are same as eigen values of
AT and columns of A are nothing but rows of AT . Hence eigen values of
A must satisfy conditions in theorem 1.94 w.r.t. the matrix AT . This
completes the proof. �
1.7. Singular values
In previous section we saw diagonalization of square matrices which
resulted in an eigen value decomposition of the matrix. This matrix
factorization is very useful yet it is not applicable in all situations. In
particular, the eigen value decomposition is useless if the square matrix
is not diagonalizable or if the matrix is not square at all. Moreover,
54 1. MATRIX ALGEBRA
the decomposition is particularly useful only for real symmetric or Her-
mitian matrices where the diagonalizing matrix is an F-unitary matrix
(see definition 1.23). Otherwise, one has to consider the inverse of the
diagonalizing matrix also.
Fortunately there happens to be another decomposition which applies
to all matrices and it involves just F-unitary matrices.
Definition 1.38 [Singular value] A non-negative real number σ is
a singular value for a matrix A ∈ Fm×n if and only if there exist
unit-length vectors u ∈ Fm and v ∈ Fn such that
Av = σu (1.7.1)
and
AHu = σv (1.7.2)
hold. The vectors u and v are called left-singular and right-
singular vectors for σ respectively.
We first present the basic result of singular value decomposition. We
will not prove this result completely although we will present proofs of
some aspects.
Theorem 1.96 For every A ∈ Fm×n with k = min(m,n), there
exist two F-unitary matrices U ∈ Fm×m and V ∈ Fn×n and a
sequence of real numbers
σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0
such that
UHAV = Σ (1.7.3)
where
Σ = diag(σ1, σ2, . . . , σk) ∈ Fm×n.
The non-negative real numbers σi are the singular values of A as
per definition 1.38.
1.7. SINGULAR VALUES 55
The sequence of real numbers σi doesn’t depend on the particular
choice of U and V .
Σ is rectangular with the same size as A. The singular values of A lie
on the principle diagonal of Σ. All other entries in Σ are zero.
It is certainly possible that some of the singular values are 0 themselves.
Remark. Since UHAV = Σ hence
A = UΣV H . (1.7.4)
Definition 1.39 [Singular value decomposition] The decomposi-
tion of a matrix A ∈ Fm×n given by
A = UΣV H (1.7.5)
is known as its singular value decomposition.
Remark. When F is R then the decomposition simplifies to
UTAV = Σ (1.7.6)
and
A = UΣV T . (1.7.7)
Remark. Clearly there can be at most k = min(m,n) distinct singular
values of A.
Remark. We can also write
AV = UΣ. (1.7.8)
Remark. Let us expand
A = UΣV H =[u1 u2 . . . um
] [σij
]vH1vH2...
vHn
=m∑i=1
n∑j=1
σijuivHj .
56 1. MATRIX ALGEBRA
Remark. Alternatively, let us expand
Σ = UHAV =
uH1uH2...
uHm
A[v1 v2 . . . vm
]=[uHi Avj
]
This gives us
σij = uHi Avj. (1.7.9)
Following lemma verifies that Σ indeed consists of singular values of A
as per definition 1.38.
Lemma 1.97 Let A = UΣV H be a singular value decomposition
of A. Then the main diagonal entries of Σ are singular values.
The first k = min(m,n) column vectors in U and V are left and
right singular vectors of A.
Proof. We have
AV = UΣ.
Let us expand R.H.S.
UΣ =[∑m
j=1 uijσjk
]= [uikσk] =
[σ1u1 σ2u2 . . . σkuk 0 . . . 0
]where 0 columns in the end appear n− k times.
Expanding the L.H.S. we get
AV =[Av1 Av2 . . . Avn
].
Thus by comparing both sides we get
Avi = σiui for 1 ≤ i ≤ k
and
Avi = 0 for k < i ≤ n.
Now let us start with
A = UΣV H =⇒ AH = V ΣHUH =⇒ AHU = V ΣH .
1.7. SINGULAR VALUES 57
Let us expand R.H.S.
V ΣH =[∑n
j=1 vijσjk
]= [vikσk] =
[σ1v1 σ2v2 . . . σkvk 0 . . . 0
]where 0 columns appear m− k times.
Expanding the L.H.S. we get
AHU =[AHu1 AHu2 . . . AHum
].
Thus by comparing both sides we get
AHui = σivi for 1 ≤ i ≤ k
and
AHui = 0 for k < i ≤ m.
We now consider the three cases.
For m = n, we have k = m = n. And we get
Avi = σiui, AHui = σivi for 1 ≤ i ≤ m
Thus σi is a singular value of A and ui is a left singular vector while vi
is a right singular vector.
For m < n, we have k = m. We get for first m vectors in V
Avi = σiui, AHui = σivi for 1 ≤ i ≤ m.
Finally for remaining n−m vectors in V , we can write
Avi = 0.
They belong to the null space of A.
For m > n, we have k = n. We get for first n vectors in U
Avi = σiui, AHui = σivi for 1 ≤ i ≤ n.
Finally for remaining m− n vectors in U , we can write
AHui = 0.
�
58 1. MATRIX ALGEBRA
Lemma 1.98 ΣΣH is an m×m matrix given by
ΣΣH = diag(σ21, σ
22, . . . σ
2k, 0, 0, . . . 0)
where the number of 0’s following σ2k is m− k.
Lemma 1.99 ΣHΣ is an n× n matrix given by
ΣHΣ = diag(σ21, σ
22, . . . σ
2k, 0, 0, . . . 0)
where the number of 0’s following σ2k is n− k.
Lemma 1.100 [Rank and singular value decomposition] Let A ∈Fm×n have a singular value decomposition given by
A = UΣV H .
Then
rank(A) = rank(Σ). (1.7.10)
In other words, rank of A is number of non-zero singular values of
A. Since the singular values are ordered in descending order in A
hence, the first r singular values σ1, . . . , σr are non-zero.
Proof. This is a straight forward application of lemma 1.6 and
lemma 1.7. Further since only non-zero values in Σ appear on its main
diagonal hence its rank is number of non-zero singular values σi. �
Corollary 1.101. Let r = rank(A). Then Σ can be split as a block
matrix
Σ =
[Σr 0
0 0
](1.7.11)
where Σr is an r × r diagonal matrix of the non-zero singular values
diag(σ1, σ2, . . . , σr). All other sub-matrices in Σ are 0.
1.7. SINGULAR VALUES 59
Lemma 1.102 The eigen values of Hermitian matrix AHA ∈Fn×n are σ2
1, σ22, . . . σ
2k, 0, 0, . . . 0 with n− k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AHA =(UΣV H
)HUΣV H = V ΣHUHUΣV H = V ΣHΣV H .
We note that AHA is Hermitian. Hence AHA is diagonalized by V and
the diagonalization of AHA is ΣHΣ. Thus the eigen values of AHA are
σ21, σ
22, . . . σ
2k, 0, 0, . . . 0 with n− k 0’s after σ2
k.
Clearly
(AHA)V = V (ΣHΣ)
thus columns of V are the eigen vectors of AHA. �
Lemma 1.103 The eigen values of Hermitian matrix AAH ∈Fm×m are σ2
1, σ22, . . . σ
2k, 0, 0, . . . 0 with m−k 0’s after σ2
k. Moreover
the eigen vectors are the columns of V .
Proof.
AAH = UΣV H(UΣV H
)H= UΣV HV ΣHUH = UΣΣHUH .
We note that AHA is Hermitian. Hence AHA is diagonalized by V and
the diagonalization of AHA is ΣHΣ. Thus the eigen values of AHA are
σ21, σ
22, . . . σ
2k, 0, 0, . . . 0 with m− k 0’s after σ2
k.
Clearly
(AAH)U = U(ΣΣH)
thus columns of U are the eigen vectors of AAH . �
Lemma 1.104 The Gram matrices AAH and AHA share the same
eigen values except for some extra 0s. Their eigen values are the
squares of singular values of A and some extra 0s. In other words
60 1. MATRIX ALGEBRA
singular values of A are the square roots of non-zero eigen values
of the Gram matrices AAH or AHA.
1.7.1. The largest singular value
Lemma 1.105 For all u ∈ Fn the following holds
‖Σu‖2 ≤ σ1‖u‖2 (1.7.12)
Moreover for all u ∈ Fm the following holds
‖ΣHu‖2 ≤ σ1‖u‖2 (1.7.13)
Proof. Let us expand the term Σu.
σ1 0 . . . . . . 0
0 σ2 . . . . . . 0...
.... . . . . . 0
0... σk . . . 0
0 0... . . . 0
u1
u2...
uk...
un
=
σ1u1
σ2u2...
σkuk
0...
0
Now since σ1 is the largest singular value, hence
|σrui| ≤ |σ1ui| ∀ 1 ≤ i ≤ k.
Thusn∑i=1
|σ1ui|2 ≥n∑i=1
|σiui|2
or
σ21‖u‖22 ≥ ‖Σu‖22.
The result follows.
A simpler representation of Σu can be given using corollary 1.101. Let
r = rank(A). Thus
Σ =
[Σr 0
0 0
]
1.7. SINGULAR VALUES 61
We split entries in u as u = [(u1, . . . , ur)(ur+1 . . . un)]T . Then
Σu =
Σr
[u1 . . . ur
]T0[ur+1 . . . un
]T =
[σ1u1 σ2u2 . . . σrur 0 . . . 0
]TThus
‖Σu‖22 =r∑i=1
|σiui|2 ≤ σ1
r∑i=1
|ui|2 ≤ σ1‖u‖22.
2nd result can also be proven similarly. �
Lemma 1.106 Let σ1 be the largest singular value of an m × n
matrix A. Then
‖Ax‖2 ≤ σ1‖x‖2 ∀ x ∈ Fn. (1.7.14)
Moreover
‖AHx‖2 ≤ σ1‖x‖2 ∀ x ∈ Fm. (1.7.15)
Proof.
‖Ax‖2 = ‖UΣV Hx‖2 = ‖ΣV Hx‖2since U is unitary. Now from previous lemma we have
‖ΣV Hx‖2 ≤ σ1‖V Hx‖2 = σ1‖x‖2
since V H also unitary. Thus we get the result
‖Ax‖2 ≤ σ1‖x‖2 ∀ x ∈ Fn.
Similarly
‖AHx‖2 = ‖V ΣHUHx‖2 = ‖ΣHUHx‖2since V is unitary. Now from previous lemma we have
‖ΣHUHx‖2 ≤ σ1‖UHx‖2 = σ1‖x‖2
since UH also unitary. Thus we get the result
‖AHx‖2 ≤ σ1‖x‖2 ∀ x ∈ Fm.
�
62 1. MATRIX ALGEBRA
There is a direct connection between the largest singular value and
2-norm of a matrix (see section 1.8.6).
Corollary 1.107. The largest singular value of A is nothing but its
2-norm. i.e.
σ1 = max‖u‖2=1
‖Au‖2.
1.7.2. SVD and pseudo inverse
Lemma 1.108 [Pseudo-inverse of Σ] Let A = UΣV H and let r =
rank(A). Let σ1, . . . , σr be the r non-zero singular values of A.
Then the Moore-Penrose pseudo-inverse of Σ is an n ×m matrix
Σ† given by
Σ† =
[Σ−1r 0
0 0
](1.7.16)
where Σr = diag(σ1, . . . , σr).
Essentially Σ† is obtained by transposing Σ and inverting all its
non-zero (positive real) values.
Proof. Straight forward application of lemma 1.32. �
Corollary 1.109. The rank of Σ and its pseudo-inverse Σ† are same.
i.e.
rank(Σ) = rank(Σ†). (1.7.17)
Proof. The number of non-zero diagonal entries in Σ and Σ† are
same. �
Lemma 1.110 Let A be an m× n matrix and let A = UΣV H be
its singular value decomposition. Let Σ† be the pseudo inverse of
Σ as per lemma 1.108. Then the Moore-Penrose pseudo-inverse of
A is given by
A† = V Σ†UH . (1.7.18)
1.7. SINGULAR VALUES 63
Proof. As usual we verify the requirements for a Moore-Penrose
pseudo-inverse as per definition 1.19. We note that since Σ† is the
pseudo-inverse of Σ it already satisfies necessary criteria.
First requirement:
AA†A = UΣV HV Σ†UHUΣV H = UΣΣ†ΣV H = UΣV H = A.
Second requirement:
A†AA† = V Σ†UHUΣV HV Σ†UH = V Σ†ΣΣ†UH = V Σ†UH = A†.
We now consider
AA† = UΣV HV Σ†UH = UΣΣ†UH .
Thus(AA†
)H=(UΣΣ†UH
)H= U
(ΣΣ†
)HUH = UΣΣ†UH = AA†
since ΣΣ† is Hermitian.
Finally we consider
A†A = V Σ†UHUΣV H = V Σ†ΣV H .
Thus(A†A
)H=(V Σ†ΣV H
)H= V
(Σ†Σ
)HV H = V Σ†ΣV H = A†A
since Σ†Σ is also Hermitian.
This completes the proof. �
Finally we can connect the singular values of A with the singular values
of its pseudo-inverse.
Corollary 1.111. The rank of any m × n matrix A and its pseudo-
inverse A† are same. i.e.
rank(A) = rank(A†). (1.7.19)
Proof. We have rank(A) = rank(Σ). Also its easy to verify that
rank(A†) = rank(Σ†). So using corollary 1.109 completes the proof. �
64 1. MATRIX ALGEBRA
Lemma 1.112 Let A be an m× n matrix and let A† be its n×mpseudo inverse as per lemma 1.110. Let r = rank(A) Let k =
min(m,n) denote the number of singular values while r denote the
number of non-singular values of A. Let σ1, . . . , σr be the non-zero
singular values of A. Then the number of singular values of A† is
same as that of A and the non-zero singular values of A† are
1
σ1, . . . ,
1
σr
while all other k − r singular values of A† are zero.
Proof. k = min(m,n) denotes the number of singular values for
both A and A†. Since rank of A and A† are same, hence the number
of non-zero singular values is same. Now look at
A† = V Σ†UH
where
Σ† =
[Σ−1r 0
0 0
].
Clearly Σ−1r = diag( 1σ1, . . . , 1
σr).
Thus expanding the R.H.S. we can get
A† =r∑i=1
1
σiviu
Hi
where vi and ui are first r columns of V and U respectively. If we
reverse the order of first r columns of U and V and reverse the first r
diagonal entries of Σ† , the R.H.S. remains the same while we are able
to express A† in the standard singular value decomposition form. Thus1σ1, . . . , 1
σrare indeed the non-zero singular values of A†. �
1.7.3. Full column rank matrices
In this subsection we consider some specific results related to singular
value decomposition of a full column rank matrix.
1.7. SINGULAR VALUES 65
We will consider A to be an m × n matrix in Fm×n with m ≥ n and
rank(A) = n. Let A = UΣV H be its singular value decomposition.
From lemma 1.100 we observe that there are n non-zero singular values
of A. We will call these singular values as σ1, σ2, . . . , σn. We will define
Σn = diag(σ1, σ2, . . . , σn).
Clearly Σ is an 2× 1 block matrix given by
Σ =
[Σn
0
]
where the lower 0 is an (m− n)× n zero matrix. From here we obtain
that ΣHΣ is an n× n matrix given by
ΣHΣ = Σ2n
where
Σ2n = diag(σ2
1, σ22, . . . , σ
2n).
Lemma 1.113 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Then ΣHΣ = Σ2n = diag(σ2
1, σ22, . . . , σ
2n)
and ΣHΣ is invertible.
Proof. Since all singular values are non-zero hence Σ2n is invert-
ible. Thus(ΣHΣ
)−1=(Σ2n
)−1= diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
). (1.7.20)
�
Lemma 1.114 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2n‖x‖2 ≤ ‖ΣHΣx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn. (1.7.21)
66 1. MATRIX ALGEBRA
Proof. Let x ∈ Fn. We have
‖ΣHΣx‖22 = ‖Σ2nx‖22 =
n∑i=1
|σ2i xi|2.
Now since
σn ≤ σi ≤ σ1
hence
σ4n
n∑i=1
|xi|2 ≤n∑i=1
|σ2i xi|2 ≤ σ4
1
n∑i=1
|xi|2
thus
σ4n‖x‖22 ≤ ‖ΣHΣx‖22 ≤ σ4
1‖x‖22.
Applying square roots, we get
σ2n‖x‖2 ≤ ‖ΣHΣx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn.
�
We recall from corollary 1.25 that the Gram matrix of its column vec-
tors G = AHA is full rank and invertible.
Lemma 1.115 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let σ1 be its largest singular
value and σn be its smallest singular value. Then
σ2n‖x‖2 ≤ ‖AHAx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn. (1.7.22)
Proof.
AHA = (UΣV H)H(UΣV H) = V ΣHΣV H .
Let x ∈ Fn. Let
u = V Hx =⇒ ‖u‖2 = ‖x‖2.
Let
r = ΣHΣu.
Then from previous lemma we have
σ2n‖u‖2 ≤ ‖ΣHΣu‖2 = ‖r‖2 ≤ σ2
1‖u‖2.
1.7. SINGULAR VALUES 67
Finally
AHAx = V ΣHΣV Hx = V r.
Thus
‖AHAx‖2 = ‖r‖2.
Substituting we get
σ2n‖x‖2 ≤ ‖AHAx‖2 ≤ σ2
1‖x‖2 ∀ x ∈ Fn.
�
There are bounds for the inverse of Gram matrix also. First let us
establish the inverse of Gram matrix.
Lemma 1.116 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let the singular values of A be
σ1, . . . , σn. Let the Gram matrix of columns of A be G = AHA.
Then
G−1 = VΨV H
where
Ψ = diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
).
Proof. We have
G = V ΣHΣV H
Thus
G−1 =(V ΣHΣV H
)−1=(V H)−1 (
ΣHΣ)−1
V −1 = V(ΣHΣ
)−1V H .
From lemma 1.113 we have
Ψ =(ΣHΣ
)−1= diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
).
This completes the proof. �
We can now state the bounds:
68 1. MATRIX ALGEBRA
Lemma 1.117 Let A be a full column rank matrix with singular
value decomposition A = UΣV H . Let σ1 be its largest singular
value and σn be its smallest singular value. Then
1
σ21
‖x‖2 ≤ ‖(AHA
)−1x‖2 ≤
1
σ2n
‖x‖2 ∀ x ∈ Fn. (1.7.23)
Proof. From lemma 1.116 we have
G−1 =(AHA
)−1= VΨV H
where
Ψ = diag
(1
σ21
,1
σ22
, . . . ,1
σ2n
).
Let x ∈ Fn. Let
u = V Hx =⇒ ‖u‖2 = ‖x‖2.
Let
r = Ψu.
Then
‖r‖22 =n∑i=1
∣∣∣∣ 1
σ2i
ui
∣∣∣∣2 .Thus
1
σ21
‖u‖2 ≤ ‖Ψu‖2 = ‖r‖2 ≤1
σ2n
‖u‖2.
Finally (AHA
)−1x = VΨV Hx = V r.
Thus
‖(AHA
)−1x‖2 = ‖r‖2.
Substituting we get the result. �
1.8. MATRIX NORMS 69
1.7.4. Low rank approximation of a matrix
Definition 1.40 An m× n matrix A is called low rank if
rank(A)� min(m,n). (1.7.24)
Remark. A matrix is low rank if the number of non-zero singular
values for the matrix is much smaller than its dimensions.
Following is a simple procedure for making a low rank approximation
of a given matrix A.
(1) Perform the singular value decomposition of A given by A =
UΣV H .
(2) Identify the singular values of A in Σ.
(3) Keep the first r singular values (where r � min(m,n) is the
rank of the approximation). and set all other singular values
to 0 to obtain Σ.
(4) Compute A = UΣV H .
1.8. Matrix norms
This section reviews various matrix norms on the vector space of com-
plex matrices over the field of complex numbers (Cm×n,C).
We know (Cm×n,C) is a finite dimensional vector space with dimension
mn. We will usually refer to it as Cm×n.
Matrix norms will follow the usual definition of norms for a vector
space.
Definition 1.41 A function ‖ · ‖ : Cm×n → R is called a matrix
norm on Cm×n if for all A,B ∈ Cm×n and all α ∈ C it satisfies
the following
Positivity:
‖A‖ ≥ 0
70 1. MATRIX ALGEBRA
with ‖A‖ = 0 ⇐⇒ A = 0.
Homogeneity:
‖αA‖ = |α|‖A‖.
Triangle inequality:
‖A+B‖ ≤ ‖A‖+ ‖B‖.
We recall some of the standard results on normed vector spaces.
All matrix norms are equivalent. Let ‖ · ‖ and ‖ · ‖′ be two different
matrix norms on Cm×n. Then there exist two constants a and b such
that the following holds
a‖A‖ ≤ ‖A‖′ ≤ b‖A‖ ∀ A ∈ Cm×n.
A matrix norm is a continuous function ‖ · ‖ : Cm×n → R.
1.8.1. Norms like lp on Cn
Following norms are quite like lp norms on finite dimensional complex
vector space Cn. They are developed by the fact that the matrix vector
space Cm×n has one to one correspondence with the complex vector
space Cmn.
Definition 1.42 Let A ∈ Cm×n and A = [aij].
Matrix sum norm is defined as
‖A‖S =m∑i=1
n∑j=1
|aij| (1.8.1)
Definition 1.43 Let A ∈ Cm×n and A = [aij].
Matrix Frobenius norm is defined as
‖A‖F =
(m∑i=1
n∑j=1
|aij|2) 1
2
. (1.8.2)
1.8. MATRIX NORMS 71
Definition 1.44 Let A ∈ Cm×n and A = [aij].
Matrix Max norm is defined as
‖A‖M = max1≤i≤m1≤j≤n
|aij|. (1.8.3)
1.8.2. Properties of Frobenius norm
We now prove some elementary properties of Frobenius norm.
Lemma 1.118 The Frobenius norm of a matrix is equal to the
Frobenius norm of its Hermitian transpose.
‖AH‖F = ‖A‖F . (1.8.4)
Proof. Let
A = [aij].
Then
AH = [aji]
‖AH‖2F =
(n∑j=1
m∑i=1
|aij|2)
=
(m∑i=1
n∑j=1
|aij|2)
= ‖A‖2F .
Now
‖AH‖2F = ‖A‖2F =⇒ ‖AH‖F = ‖A‖F
�
Lemma 1.119 Let A ∈ Cm×n be written as a row of column vec-
tors
A =[a1 . . . an
].
Then
‖A‖2F =n∑j=1
‖aj‖22. (1.8.5)
72 1. MATRIX ALGEBRA
Proof. We note that
‖aj‖22 =m∑i=1
‖aij‖22.
Now
‖A‖2F =
(m∑i=1
n∑j=1
|aij|2)
=
(n∑j=1
(m∑i=1
|aij|2))
=
(n∑j=1
‖aj‖22
).
�
We thus showed that that the square of the Frobenius norm of a matrix
is nothing but the sum of squares of l2 norms of its columns.
Lemma 1.120 Let A ∈ Cm×n be written as a column of row vec-
tors
A =
a1
...
am
.Then
‖A‖2F =m∑i=1
‖ai‖22. (1.8.6)
Proof. We note that
‖ai‖22 =n∑j=1
‖aij‖22.
Now
‖A‖2F =
(m∑i=1
n∑j=1
|aij|2)
=m∑i=1
‖ai‖22.
�
We now consider how the Frobenius norm is affected with the action
of unitary matrices.
Let A be any arbitrary matrix in Cm×n. Let U be some unitary matrices
in Cm×m. Let V be some unitary matrices in Cn×n.
1.8. MATRIX NORMS 73
We present our first result that multiplication with unitary matrices
doesn’t change Frobenius norm of a matrix.
Theorem 1.121 The Frobenius norm of a matrix is invariant to
pre or post multiplication by a unitary matrix. i.e.
‖UA‖F = ‖A‖F (1.8.7)
and
‖AV ‖F = ‖A‖F . (1.8.8)
Proof. We can write A as
A =[a1 . . . an
].
So
UA =[Ua1 . . . Uan
].
Then applying lemma 1.119 clearly
‖UA‖2F =n∑j=1
‖Uaj‖22.
But we know that unitary matrices are norm preserving. Hence
‖Uaj‖22 = ‖aj‖22.
Thus
‖UA‖2F =n∑j=1
‖aj‖22 = ‖A‖2F
which implies
‖UA‖F = ‖A‖F .
Similarly writing A as
74 1. MATRIX ALGEBRA
A =
r1...
rm
.we have
AV =
r1V
...
rmV
.Then applying lemma 1.120 clearly
‖AV ‖2F =m∑i=1
‖riV ‖22.
But we know that unitary matrices are norm preserving. Hence
‖riV ‖22 = ‖ri‖22.
Thus
‖AV ‖2F =m∑i=1
‖ri‖22 = ‖A‖2F
which implies
‖AV ‖F = ‖A‖F .
An alternative approach for the 2nd part of the proof using the first
part is just one line
‖AV ‖F = ‖(AV )H‖F = ‖V HAH‖F = ‖AH‖F = ‖A‖F .
In above we use lemma 1.118 and the fact that V is a unitary matrix
implies that V H is also a unitary matrix. We have already shown that
pre multiplication by a unitary matrix preserves Frobenius norm. �
Theorem 1.122 Let A ∈ Cm×n and B ∈ Cn×P be two matrices.
Then the Frobenius norm of their product is less than or equal to
1.8. MATRIX NORMS 75
the product of Frobenius norms of the matrices themselves. i.e.
‖AB‖F ≤ ‖A‖F‖B‖F . (1.8.9)
Proof. We can write A as
A =
aT1...
aTm
where ai are m column vectors corresponding to rows of A. Similarly
we can write B as
B =[b1 . . . bP
]where bi are column vectors corresponding to columns of B. Then
AB =
aT1...
aTm
[b1 . . . bP
]=
aT1 b1 . . . aT1 bP
.... . .
...
aTmb1 . . . aTmbP
=[aTi bj
].
Now looking carefully
aTi bj = 〈ai, bj〉
Applying the Cauchy-Schwartz inequality we have
|〈ai, bj〉|2 ≤ ‖ai‖22‖bj‖22 = ‖ai‖22‖bj‖22
Now
‖AB‖2F =m∑i=1
P∑j=1
|aTi bj|2
≤m∑i=1
P∑j=1
‖ai‖22‖bj‖22
=
(m∑i=1
‖ai‖22
)(P∑j=1
‖bj‖22
)= ‖A‖2F‖B‖2F
which implies
‖AB‖F ≤ ‖A‖F‖B‖Fby taking square roots on both sides. �
76 1. MATRIX ALGEBRA
Corollary 1.123. Let A ∈ Cm×n and let x ∈ Cn. Then
‖Ax‖2 ≤ ‖A‖F‖x‖2.
Proof. We note that Frobenius norm for a column matrix is same
as l2 norm for corresponding column vector. i.e.
‖x‖F = ‖x‖2 ∀ x ∈ Cn.
Now applying theorem 1.122 we have
‖Ax‖2 = ‖Ax‖F ≤ ‖A‖F‖x‖F = ‖A‖F‖x‖2 ∀ x ∈ Cn.
�
It turns out that Frobenius norm is intimately related to the singular
value decomposition of a matrix.
Lemma 1.124 Let A ∈ Cm×n. Let the singular value decomposi-
tion of A be given by
A = UΣV H .
Let the singular value of A be σ1, . . . , σn. Then
‖A‖F =
√√√√ n∑i=1
σ2i . (1.8.10)
Proof.
A = UΣV H =⇒ ‖A‖F = ‖UΣV H‖F .
But
‖UΣV H‖F = ‖ΣV H‖F = ‖Σ‖Fsince U and V are unitary matrices (see theorem 1.121 ).
Now the only non-zero terms in Σ are the singular values. Hence
‖A‖F = ‖Σ‖F =
√√√√ n∑i=1
σ2i .
1.8. MATRIX NORMS 77
�
1.8.3. Consistency of a matrix norm
Definition 1.45 A matrix norm ‖·‖ is called consistent on Cn×n
if
‖AB‖ ≤ ‖A‖‖B‖ (1.8.11)
holds true for all A,B ∈ Cn×n. A matrix norm ‖ · ‖ is called
consistent if it is defined on Cm×n for all m,n ∈ N and eq (1.8.11)
holds for all matrices A,B for which the product AB is defined.
A consistent matrix norm is also known as a sub-multiplicative
norm.
With this definition and results in theorem 1.122 we can see that Frobe-
nius norm is consistent.
1.8.4. Subordinate matrix norm
A matrix operates on vectors from one space to generate vectors in
another space. It is interesting to explore the connection between the
norm of a matrix and norms of vectors in the domain and co-domain
of a matrix.
Definition 1.46 Let m,n ∈ N be given. Let ‖ · ‖α be some norm
on Cm and ‖ · ‖β be some norm on Cn. Let ‖ · ‖ be some norm on
matrices in Cm×n. We say that ‖ · ‖ is subordinate to the vector
norms ‖ · ‖α and ‖ · ‖β if
‖Ax‖α ≤ ‖A‖‖x‖β (1.8.12)
for all A ∈ Cm×n and for all x ∈ Cn. In other words the length of
the vector doesn’t increase by the operation of A beyond a factor
given by the norm of the matrix itself.
If ‖ · ‖α and ‖ · ‖β are same then we say that ‖ · ‖ is subordinate
to the vector norm ‖ · ‖α.
78 1. MATRIX ALGEBRA
We have shown earlier in corollary 1.123 that Frobenius norm is sub-
ordinate to Euclidean norm.
1.8.5. Operator norm
We now consider the maximum factor by which a matrix A can increase
the length of a vector.
Definition 1.47 Let m,n ∈ N be given. Let ‖ · ‖α be some norm
on Cn and ‖ · ‖β be some norm on Cm. For A ∈ Cm×n we define
‖A‖ , ‖A‖α→β , maxx 6=0
‖Ax‖β‖x‖α
. (1.8.13)
‖Ax‖β‖x‖α represents the factor with which the length of x increased
by operation of A. We simply pick up the maximum value of such
scaling factor.
The norm as defined above is known as (α→ β) operator norm,
the (α→ β)-norm, or simply the α-norm if α = β.
Off course we need to verify that this definition satisfies all properties
of a norm.
Clearly if A = 0 then Ax = 0 always, hence ‖A‖ = 0.
Conversely, if ‖A‖ = 0 then ‖Ax‖β = 0 ∀ x ∈ Cn. In particular this is
true for the unit vectors ei ∈ Cn. The i-th column of A is given by Aei
which is 0. Thus each column in A is 0. Hence A = 0.
Now consider c ∈ C.
‖cA‖ = maxx 6=0
‖cAx‖β‖x‖α
= |c|maxx6=0
‖Ax‖β‖x‖α
= |c|‖A‖.
We now present some useful observations on operator norm before we
can prove triangle inequality for operator norm.
For any x ∈ ker(A), Ax = 0 hence we only need to consider vectors
which don’t belong to the kernel of A.
1.8. MATRIX NORMS 79
Thus we can write
‖A‖α→β = maxx/∈ker(A)
‖Ax‖β‖x‖α
. (1.8.14)
We also note that
‖Acx‖β‖cx‖α
=|c|‖Ax‖β|c|‖x‖α
=‖Ax‖β‖x‖α
∀ c 6= 0, x 6= 0.
Thus, it is sufficient to find the maximum on unit norm vectors:
‖A‖α→β = max‖x‖α=1
‖Ax‖β.
Note that since ‖x‖α = 1 hence the term in denominator goes away.
Lemma 1.125 The (α→ β)-operator norm is subordinate to vec-
tor norms ‖ · ‖α and ‖ · ‖β. i.e.
‖Ax‖β ≤ ‖A‖α→β‖x‖α. (1.8.15)
Proof. For x = 0 the inequality is trivially satisfied. Now for
x 6= 0 by definition, we have
‖A‖α→β ≥‖Ax‖β‖x‖α
=⇒ ‖A‖α→β‖x‖α ≥ ‖Ax‖β.
�
Remark. There exists a vector x∗ ∈ Cn with unit norm (‖x∗‖α = 1)
such that
‖A‖α→β = ‖Ax∗‖β. (1.8.16)
Proof. Let x′ 6= 0 be some vector which maximizes the expression
‖Ax‖β‖x‖α
.
Then
‖A‖α→β =‖Ax′‖β‖x′‖α
.
Now consider x∗ = x′
‖x′‖α . Thus ‖x∗‖α = 1. We know that
‖Ax′‖β‖x′‖α
= ‖Ax∗‖β.
80 1. MATRIX ALGEBRA
Hence
‖A‖α→β = ‖Ax∗‖β.
�
We are now ready to prove triangle inequality for operator norm.
Lemma 1.126 Operator norm as defined in definition 1.47 satis-
fies triangle inequality.
Proof. Let A and B be some matrices in Cm×n. Consider the
operator norm of matrix A + B. From previous remarks, there exists
some vector x∗ ∈ Cn with ‖x∗‖α = 1 such that
‖A+B‖ = ‖(A+B)x∗‖β.
Now
‖(A+B)x∗‖β = ‖Ax∗ +Bx∗‖β ≤ ‖Ax∗‖β + ‖Bx∗‖β.
From another remark we have
‖Ax∗‖β ≤ ‖A‖‖x∗‖α = ‖A‖
and
‖Bx∗‖β ≤ ‖B‖‖x∗‖α = ‖B‖
since ‖x∗‖α = 1.
Hence we have
‖A+B‖ ≤ ‖A‖+ ‖B‖.
�
It turns out that operator norm is also consistent under certain condi-
tions.
1.8. MATRIX NORMS 81
Lemma 1.127 Let ‖ · ‖α be defined over all m ∈ N. Let ‖ · ‖β =
‖ · ‖α. Then the operator norm
‖A‖α = maxx 6=0
‖Ax‖α‖x‖α
is consistent.
Proof. We need to show that
‖AB‖α ≤ ‖A‖α‖B‖α.
Now
‖AB‖α = maxx 6=0
‖ABx‖α‖x‖α
.
We note that if Bx = 0, then ABx = 0. Hence we can rewrite as
‖AB‖α = maxBx 6=0
‖ABx‖α‖x‖α
.
Now if Bx 6= 0 then ‖Bx‖α 6= 0. Hence
‖ABx‖α‖x‖α
=‖ABx‖α‖Bx‖α
‖Bx‖α‖x‖α
and
maxBx 6=0
‖ABx‖α‖x‖α
≤ maxBx 6=0
‖ABx‖α‖Bx‖α
maxBx 6=0
‖Bx‖α‖x‖α
.
Clearly
‖B‖α = maxBx 6=0
‖Bx‖α‖x‖α
.
Furthermore
maxBx 6=0
‖ABx‖α‖Bx‖α
≤ maxy 6=0
‖Ay‖α‖y‖α
= ‖A‖α.
Thus we have
‖AB‖α ≤ ‖A‖α‖B‖α.
�
82 1. MATRIX ALGEBRA
1.8.6. p-norm for matrices
We recall the definition of lp norms for vectors x ∈ Cn from (??)
‖x‖p =
(∑n
i=1 |x|pi )
1p p ∈ [1,∞)
max1≤i≤n
|xi| p =∞.
The operator norms ‖ · ‖p defined from lp vector norms are of specific
interest.
Definition 1.48 The p-norm for a matrix A ∈ Cm×n is defined as
‖A‖p , maxx 6=0
‖Ax‖p‖x‖p
= max‖x‖p=1
‖Ax‖p (1.8.17)
where ‖x‖p is the standard lp norm for vectors in Cm and Cn.
Remark. As per lemma 1.127 p-norms for matrices are consistent
norms. They are also sub-ordinate to lp vector norms.
Special cases are considered for p = 1, 2 and ∞.
Theorem 1.128 Let A ∈ Cm×n.
For p = 1 we have
‖A‖1 , max1≤j≤n
m∑i=1
|aij|. (1.8.18)
This is also known as max column sum norm.
For p =∞ we have
‖A‖∞ , max1≤i≤m
n∑j=1
|aij|. (1.8.19)
This is also known as max row sum norm.
Finally for p = 2 we have
‖A‖2 , σ1 (1.8.20)
1.8. MATRIX NORMS 83
where σ1 is the largest singular value of A. This is also known as
spectral norm.
Proof. Let
A =[a1 . . . , an
].
Then
‖Ax‖1 =
∥∥∥∥∥n∑j=1
xjaj
∥∥∥∥∥1
≤n∑j=1
∥∥xjaj∥∥1=
n∑j=1
|xj|∥∥aj∥∥
1
≤ max1≤j≤n
‖aj‖1n∑j=1
|xj|
= max1≤j≤n
‖aj‖1‖x‖1.
Thus,
‖A‖1 = maxx 6=0
‖Ax‖1‖x‖1
≤ max1≤j≤n
‖aj‖1
which the maximum column sum. We need to show that this upper
bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
‖Aej‖1 = ‖aj‖1.
Thus
‖A‖1 ≥ ‖aj‖1 ∀ 1 ≤ j ≤ n.
Combining the two, we see that
‖A‖1 = max1≤j≤n
‖aj‖1.
84 1. MATRIX ALGEBRA
For p =∞, we proceed as follows:
‖Ax‖∞ = max1≤i≤m
∣∣∣∣∣n∑j=1
aijxj
∣∣∣∣∣≤ max
1≤i≤m
n∑j=1
|aij||xj|
≤ max1≤j≤n
|xj| max1≤i≤m
n∑j=1
|aij|
= ‖x‖∞ max1≤i≤m
‖ai‖1
where ai are the rows of A.
This shows that
‖Ax‖∞ ≤ max1≤i≤m
‖ai‖1.
We need to show that this is indeed an equality.
Fix an i = k and choose x such that
xj = sgn(akj).
Clearly ‖x‖∞ = 1.
Then
‖Ax‖∞ = max1≤i≤m
∣∣∣∣∣n∑j=1
aijxj
∣∣∣∣∣≥
∣∣∣∣∣n∑j=1
akjxj
∣∣∣∣∣=
∣∣∣∣∣n∑j=1
|akj|
∣∣∣∣∣=
n∑j=1
|akj|
= ‖ak‖1.Thus,
‖A‖∞ ≥ max1≤i≤m
‖ai‖1
1.8. MATRIX NORMS 85
Combining the two inequalities we get:
‖A‖∞ = max1≤i≤m
‖ai‖1.
Remaining case is for p = 2.
For any vector x with ‖x‖2 = 1,
‖Ax‖2 = ‖UΣV Hx‖2 = ‖U(ΣV Hx)‖2 = ‖ΣV Hx‖2
since l2 norm is invariant to unitary transformations.
Let v = V Hx. Then ‖v‖2 = ‖V Hx‖2 = ‖x‖2 = 1.
Now‖Ax‖2 = ‖Σv‖2
=
(n∑j=1
|σjvj|2) 1
2
≤ σ1
(n∑j=1
|vj|2) 1
2
= σ1‖v‖2 = σ1.
This shows that
‖A‖2 ≤ σ1.
Now consider some vector x such that v = (1, 0, . . . , 0). Then
‖Ax‖2 = ‖Σv‖2 = σ1.
Thus
‖A‖2 ≥ σ1.
Combining the two, we get that ‖A‖2 = σ1. �
1.8.7. The 2-norm
Theorem 1.129 Let A ∈ Cn×n has singular values σ1 ≥ σ2 ≥· · · ≥ σn. Let the eigen values for A be λ1, λ2, . . . , λn with |λ1| ≥|λ2| ≥ · · · ≥ |λn|. Then the following hold
‖A‖2 = σ1 (1.8.21)
86 1. MATRIX ALGEBRA
and if A is non-singular
‖A−1‖2 =1
σn. (1.8.22)
If A is symmetric and positive definite, then
‖A‖2 = λ1 (1.8.23)
and if A is non-singular
‖A−1‖2 =1
λn. (1.8.24)
If A is normal then
‖A‖2 = |λ1| (1.8.25)
and if A is non-singular
‖A−1‖2 =1
|λn|. (1.8.26)
1.8.8. Unitary invariant norms
Definition 1.49 A matrix norm ‖ · ‖ on Cm×n is called unitary
invariant if ‖UAV ‖ = ‖A‖ for any A ∈ Cm×n and any unitary
matrices U ∈ Cm×m and V ∈ Cn×n.
We have already seen in theorem 1.121 that Frobenius norm is unitary
invariant.
It turns out that spectral norm is also unitary invariant.
1.8.9. More properties of operator norms
In this section we will focus on operator norms connecting normed
linear spaces (Cn, ‖ · ‖p) and (Cm, ‖ · ‖q). Typical values of p, q would
be in {1, 2,∞}.
We recall that
‖A‖p→q = maxx 6=0
‖Ax‖q‖x‖p
= max‖x‖p=1
‖Ax‖q = max‖x‖p≤1
‖Ax‖q. (1.8.27)
1.8. MATRIX NORMS 87
Table 1[[5]] shows how to compute different (p, q) norms. Some can be
computed easily while others are NP-hard to compute.
Table 1. Typical (p→ q) norms
p q ‖A‖p→q Calculation
1 1 ‖A‖1 Maximum l1 norm of a column
1 2 ‖A‖1→2 Maximum l2 norm of a column
1 ∞ ‖A‖1→∞ Maximum absolute entry of a matrix
2 1 ‖A‖2→1 NP hard
2 2 ‖A‖2 Maximum singular value
2 ∞ ‖A‖2→∞ Maximum l2 norm of a row
∞ 1 ‖A‖∞→1 NP hard
∞ 2 ‖A‖∞→2 NP hard
∞ ∞ ‖A‖∞ Maximum l1-norm of a row
The topological dual of the finite dimensional normed linear space
(Cn, ‖ · ‖p) is the normed linear space (Cn, ‖ · ‖p′) where
1
p+
1
p′= 1.
l2-norm is dual of l2-norm. It is a self dual. l1 norm and l∞-norm are
dual of each other.
When a matrix A maps from the space (Cn, ‖ · ‖p) to the space (Cm, ‖ ·‖q), we can view its conjugate transpose AH as a mapping from the
space (Cm, ‖ · ‖q′) to (Cn, ‖ · ‖p′).
Theorem 1.130 Operator norm of a matrix always equals the op-
erator norm of its conjugate transpose. i.e.
‖A‖p→q = ‖AH‖q′→p′ (1.8.28)
where1
p+
1
p′= 1,
1
q+
1
q′= 1.
88 1. MATRIX ALGEBRA
Specific applications of this result are:
‖A‖2 = ‖AH‖2. (1.8.29)
This is obvious since the maximum singular value of a matrix and its
conjugate transpose are same.
‖A‖1 = ‖AH‖∞, ‖A‖∞ = ‖AH‖1. (1.8.30)
This is also obvious since max column sum of A is same as the max
row sum norm of AH and vice versa.
‖A‖1→∞ = ‖AH‖1→∞. (1.8.31)
‖A‖1→2 = ‖AH‖2→∞. (1.8.32)
‖A‖∞→2 = ‖AH‖2→1. (1.8.33)
We now need to show the result for the general case (arbitrary 1 ≤p, q ≤ ∞).
Proof. TODO �
Theorem 1.131
‖A‖1→p = max1≤j≤n
‖aj‖p. (1.8.34)
where
A =[a1 . . . , an
].
1.8. MATRIX NORMS 89
Proof.
‖Ax‖p =
∥∥∥∥∥n∑j=1
xjaj
∥∥∥∥∥p
≤n∑j=1
∥∥xjaj∥∥p=
n∑j=1
|xj|∥∥aj∥∥
p
≤ max1≤j≤n
‖aj‖pn∑j=1
|xj|
= max1≤j≤n
‖aj‖p‖x‖1.
Thus,
‖A‖1→p = maxx 6=0
‖Ax‖p‖x‖1
≤ max1≤j≤n
‖aj‖p.
We need to show that this upper bound is indeed an equality.
Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and
0 elsewhere,
‖Aej‖p = ‖aj‖p.
Thus
‖A‖1→p ≥ ‖aj‖p ∀ 1 ≤ j ≤ n.
Combining the two, we see that
‖A‖1→p = max1≤j≤n
‖aj‖p.
�
Theorem 1.132
‖A‖p→∞ = max1≤i≤m
‖ai‖q (1.8.35)
where1
p+
1
q= 1.
90 1. MATRIX ALGEBRA
Proof. Using theorem 1.130, we get
‖A‖p→∞ = ‖AH‖1→q.
Using theorem 1.131, we get
‖AH‖1→q = max1≤i≤m
‖ai‖q.
This completes the proof. �
Theorem 1.133 For two matrices A and B and p ≥ 1, we have
‖AB‖p→q ≤ ‖B‖p→s‖A‖s→q. (1.8.36)
Proof. We start with
‖AB‖p→q = max‖x‖p=1
‖A(Bx)‖q.
From lemma 1.125, we obtain
‖A(Bx)‖q ≤ ‖A‖s→q‖(Bx)‖s.
Thus,
‖AB‖p→q ≤ ‖A‖s→q max‖x‖p=1
‖(Bx)‖s = ‖A‖s→q‖B‖p→s.
�
Theorem 1.134 For two matrices A and B and p ≥ 1, we have
‖AB‖p→∞ ≤ ‖A‖∞→∞‖B‖p→∞. (1.8.37)
Proof. We start with
‖AB‖p→∞ = max‖x‖p=1
‖A(Bx)‖∞.
From lemma 1.125, we obtain
‖A(Bx)‖∞ ≤ ‖A‖∞→∞‖(Bx)‖∞.
Thus,
‖AB‖p→∞ ≤ ‖A‖∞→∞ max‖x‖p=1
‖(Bx)‖∞ = ‖A‖∞→∞‖B‖p→∞.
1.8. MATRIX NORMS 91
�
Theorem 1.135
‖A‖p→∞ ≤ ‖A‖p→p. (1.8.38)
In particular
‖A‖1→∞ ≤ ‖A‖1. (1.8.39)
‖A‖2→∞ ≤ ‖A‖2. (1.8.40)
Proof. Choosing q =∞ and s = p and applying theorem 1.133
‖IA‖p→∞ ≤ ‖A‖p→p‖I‖p→∞.
But ‖I‖p→∞ is the maximum lp norm of any row of I which is 1. Thus
‖A‖p→∞ ≤ ‖A‖p→p.
�
Consider the expression
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
. (1.8.41)
z ∈ C(AH), z 6= 0 means there exists some vector u /∈ ker(AH) such
that z = AHu.
This expression measures the factor by which the non-singular part of
A can decrease the length of a vector.
Theorem 1.136 [5] The following bound holds for every matrix
A:
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
≥ ‖A†‖−1q,p. (1.8.42)
If A is surjective (onto), then the equality holds. When A is bijec-
tive (one-one onto, square, invertible), then the result implies
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
= ‖A−1‖−1q,p. (1.8.43)
92 1. MATRIX ALGEBRA
Proof. The spaces C(AH) and C(A) have same dimensions given
by rank(A). We recall that A†A is a projector onto the column space
of A.
w = Az ⇐⇒ z = A†w = A†Az ∀ z ∈ C(AH).
As a result we can write
‖z‖p‖Az‖q
=‖A†w‖p‖w‖q
whenever z ∈ C(AH). Now minz∈C(AH)z 6=0
‖Az‖q‖z‖p
−1 = maxz∈C(AH)z 6=0
‖z‖p‖Az‖q
= maxw∈C(A)w 6=0
‖A†w‖p‖w‖q
≤ maxw 6=0
‖A†w‖p‖w‖q
.
When A is surjective, then C(A) = Cm. Hence
maxw∈C(A)w 6=0
‖A†w‖p‖w‖q
= maxw 6=0
‖A†w‖p‖w‖q
.
Thus, the inequality changes into equality. Finally
maxw 6=0
‖A†w‖p‖w‖q
= ‖A†‖q→p
which completes the proof.
�
1.8.10. Row column norms
Definition 1.50 Let A be an m× n matrix with rows ai as
A =
a1
...
am
Then we define
‖A‖p,∞ , max1≤i≤m
‖ai‖p = max1≤i≤m
(n∑j=1
|aij|p) 1
p
(1.8.44)
where 1 ≤ p < ∞. i.e. we take p-norms of all row vectors and
then find the maximum.
1.8. MATRIX NORMS 93
We define
‖A‖∞,∞ = maxi,j|aij|. (1.8.45)
This is equivalent to taking l∞ norm on each row and then taking
the maximum of all the norms.
For 1 ≤ p, q <∞, we define the norm
‖A‖p,q ,
[m∑i=1
(‖ai‖p
)q] 1q
. (1.8.46)
i.e., we compute p-norm of all the row vectors to form another
vector and then take q-norm of that vector.
Note that the norm ‖A‖p,∞ is different from the operator norm ‖A‖p→∞.
Similarly ‖A‖p,q is different from ‖A‖p→q.
Theorem 1.137
‖A‖p,∞ = ‖A‖q→∞ (1.8.47)
where1
p+
1
q= 1.
Proof. From theorem 1.132 we get
‖A‖q→∞ = max1≤i≤m
‖ai‖p.
This is exactly the definition of ‖A‖p,∞. �
Theorem 1.138
‖A‖1→p = ‖A‖p,∞. (1.8.48)
Proof.
‖A‖1→p = ‖AH‖q→∞.
From theorem 1.137
‖AH‖q→∞ = ‖AH‖p,∞.
�
94 1. MATRIX ALGEBRA
Theorem 1.139 For any two matrices A,B, we have
‖AB‖p,∞‖B‖p,∞
≤ ‖A‖∞→∞. (1.8.49)
Proof. Let q be such that 1p
+ 1q
= 1. From theorem 1.134, we
have
‖AB‖q→∞ ≤ ‖A‖∞→∞‖B‖q→∞.
From theorem 1.137
‖AB‖q→∞ = ‖AB‖p,∞
and
‖B‖q→∞ = ‖B‖p,∞.
Thus
‖AB‖p,∞ ≤ ‖A‖∞→∞‖B‖p,∞.
�
Theorem 1.140 Relations between (p, q) norms and (p → q)
norms
‖A‖1,∞ = ‖A‖∞→∞ (1.8.50)
‖A‖2,∞ = ‖A‖2→∞ (1.8.51)
‖A‖∞,∞ = ‖A‖1→∞ (1.8.52)
‖A‖1→1 = ‖AH‖1,∞ (1.8.53)
‖A‖1→2 = ‖AH‖2,∞ (1.8.54)
(1.8.55)
Proof. The first three are straight forward applications of theo-
rem 1.137. The next two are applications of theorem 1.138. See also
table 1.
�
1.8. MATRIX NORMS 95
1.8.11. Block diagonally dominant matrices and generalized
Gershgorin disc theorem
In [1] the idea of diagonally dominant matrices (see section 1.6.9) has
been generalized to block matrices using matrix norms. We consider
the specific case with spectral norm.
Definition 1.51 [Block diagonally dominant matrix] Let A be a
square matrix in Cn×n which is partitioned in following manner
A =
A11 A12 . . . A1k
A21 A22 . . . A2k
......
. . ....
Ak1 Ak2 . . . Akk
(1.8.56)
where each of the submatrices Aij is a square matrix of size m×m.
Thus n = km.
A is called block diagonally dominant if
‖Aii‖2 ≥∑j 6=i
‖Aij‖2.
holds true for all 1 ≤ i ≤ n. If the inequality satisfies strictly
for all i, then A is called block strictly diagonally dominant
matrix.
Theorem 1.141 If the partitioned matrix A of definition 1.51 is
block strictly diagonally dominant matrix, then it is nonsingular.
For proof see [1].
This leads to the generalized Gershgorin disc theorem.
96 1. MATRIX ALGEBRA
Theorem 1.142 Let A be a square matrix in Cn×n which is par-
titioned in following manner
A =
A11 A12 . . . A1k
A21 A22 . . . A2k
......
. . ....
Ak1 Ak2 . . . Akk
(1.8.57)
where each of the submatrices Aij is a square matrix of size m×m.
Then each eigenvalue λ of A satisfies
‖λI − Aii‖2 ≤∑j 6=i
‖Aij‖ for some i ∈ {1, 2, . . . , n}. (1.8.58)
For proof see [1].
Since the 2-norm of a positive semidefinite matrix is nothing but its
largest eigen value, the theorem directly applies.
Corollary 1.143. Let A be a Hermitian positive semidefinite matrix.
Let A be partitioned as in theorem 1.142. Then its 2-norm ‖A‖2 satis-
fies
|‖A‖2 − ‖Aii‖2| ≤∑j 6=i
‖Aij‖ for some i ∈ {1, 2, . . . , n}. (1.8.59)
1.9. Miscellaneous topics
1.9.1. Hadamard product
Usually standard linear algebra books don’t dwell much about element-
wise or component wise products of vectors or matrices. Yet in certain
contexts and algorithms, this is quite useful. We define the notation in
this section. For further details see [3], [2] and [4].
Definition 1.52 The Hadamard product of two matrices A =
[aij] and B = [bij] with same dimensions (not necessarily square)
1.10. DIGEST 97
with entries in a given ring R is the entry-wise product A ◦ B ≡[aijbij], which has the same dimensions as A and B.
Example 1.3: Hadamard product Let
A =
[1 2
3 4
]and B =
[5 −6
7 −3
]
Then
A ◦B =
[5 −12
21 −12
]�
The Hardamard product is associative and distributive. It is also com-
mutative.
Naturally this can also be defined for column vectors and row vectors
also.
The reason why this product is not mentioned in linear algebra texts
is because it is inherently basis dependent. But this product has a
number of uses in statistics and analysis.
In analysis, a similar concept is point-wise product which is defined
to be
(f.g)(x) = f(x)g(x).
1.10. Digest
1.10.1. Norms
All norms are equivalent.
Sum norm
‖A‖S =m∑i=1
n∑j=1
|aij|.
98 1. MATRIX ALGEBRA
Frobenius norm
‖A‖F =
(m∑i=1
n∑j=1
|aij|2) 1
2
.
Max norm
‖A‖M = max1≤i≤m1≤j≤n
|aij|.
Frobenius norm of Hermitian transpose
‖AH‖F = ‖A‖F .
Frobenius norm as sum of norms of column vectors
‖A‖2F =n∑j=1
‖aj‖22.
Frobenius norm as sum of norms of row vectors
‖A‖2F =m∑i=1
‖ai‖22.
Frobenius norm invariance w.r.t. unitary matrices
‖UA‖F = ‖A‖F
‖AV ‖F = ‖A‖F .
Frobenius norm is consistent:
‖AB‖F ≤ ‖A‖F‖B‖F .
corollary 1.123
‖Ax‖2 ≤ ‖A‖F‖x‖2.
‖A‖F =
√√√√ n∑i=1
σ2i .
Consistent norms
‖AB‖ ≤ ‖A‖‖B‖
also known as sub-multiplicative norm.
1.10. DIGEST 99
Subordinate matrix norm
‖Ax‖α ≤ ‖A‖‖x‖β
(α→ β) Operator norm
‖A‖ , ‖A‖α→β , maxx 6=0
‖Ax‖β‖x‖α
.
‖A‖α→β = maxx/∈ker(A)
‖Ax‖β‖x‖α
= max‖x‖α=1
‖Ax‖β.
(α→ β) norm is subordinate
‖Ax‖β ≤ ‖A‖α→β‖x‖α.
There exists a unit norm vector x∗ such that
‖A‖α→β = ‖Ax∗‖β.
α→ α-norms are consistent
‖A‖α = maxx 6=0
‖Ax‖α‖x‖α
‖AB‖α ≤ ‖A‖α‖B‖α.
p-norm
‖A‖p , maxx6=0
‖Ax‖p‖x‖p
= max‖x‖p=1
‖Ax‖p
Closed form p-norms
‖A‖1 , max1≤j≤n
m∑i=1
|aij|.
‖A‖∞ , max1≤i≤m
n∑j=1
|aij|.
2-norm
‖A‖2 , σ1
non-singular
‖A−1‖2 =1
σn.
100 1. MATRIX ALGEBRA
symmetric and positive definite
‖A‖2 = λ1
non-singular
‖A−1‖2 =1
λn.
normal
‖A‖2 = |λ1|
non-singular
‖A−1‖2 =1
|λn|.
Unitary invariant norm ‖UAV ‖ = ‖A‖ for any A ∈ Cm×n and any
unitary U and V .
Typical p→ q norms
Dual norm and conjugate transpose
‖A‖p→q = ‖AH‖q′→p′1
p+
1
p′= 1.
‖A‖2 = ‖AH‖2.
‖A‖1 = ‖AH‖∞, ‖A‖∞ = ‖AH‖1.
‖A‖1→∞ = ‖AH‖1→∞.
‖A‖1→2 = ‖AH‖2→∞.
‖A‖∞→2 = ‖AH‖2→1.
‖A‖1→p‖A‖1→p = max
1≤j≤n‖aj‖p.
‖A‖p→∞‖A‖p→∞ = max
1≤i≤m‖ai‖q
with 1p
+ 1q
= 1.
Consistency of p→ q norm
‖AB‖p→q ≤ ‖B‖p→s‖A‖s→q.
1.10. DIGEST 101
Consistency of p→∞ norm
‖AB‖p→∞ ≤ ‖A‖∞→∞‖B‖p→∞.
Dominance of p→∞ norm by p→ p norm
‖A‖p→∞ ≤ ‖A‖p→p.
‖A‖1→∞ ≤ ‖A‖1.
‖A‖2→∞ ≤ ‖A‖2.
Restricted minimum property
minz∈C(AH)z 6=0
‖Az‖q‖z‖p
≥ ‖A†‖−1q,p.
If A is surjective (onto), then the equality holds. When A is bijective