Some notes on Matrix Algebra

CHAPTER 1

Matrix Algebra

In this chapter we collect results related to matrix algebra which are

relevant to this book. Some specific topics which are typically not

found in standard books are also covered here.

1.1. Preliminaries

Standard notation in this chapter is given here. Matrices are denoted

by capital letters A, B etc.. They can be rectangular with m rows

and n columns. Their elements or entries are referred to with small

letters aij, bij etc. where i denotes the i-th row of matrix and j denotes

the j-th column of matrix. Thus

A =

a11 a12 . . . a1n

a21 a22 . . . a1n...

.... . .

...

am1 am2 . . . amn

Mostly we consider complex matrices belonging to Cm×n. Sometimes

we will restrict our attention to real matrices belonging to Rm×n.

Definition 1.1 [Square matrix] An m×n matrix is called square

matrix if m = n.

Definition 1.2 [Tall matrix] An m× n matrix is called tall ma-

trix if m > n i.e. the number of rows is greater than columns.

1

2 1. MATRIX ALGEBRA

Definition 1.3 [Wide matrix] An m × n matrix is called wide

matrix if m < n i.e. the number of columns is greater than rows.

Definition 1.4 [Main diagonal] Let A = [aij] be an m×n matrix.

The main diagonal consists of entries aij where i = j. i.e. main

diagonal is {a11, a22, . . . , akk} where k = min(m,n). Main diagonal

is also known as leading diagonal, major diagonal primary

diagonal or principal diagonal. The entries of A which are not

on the main diagonal are known as off diagonal entries.

Definition 1.5 [Diagonal matrix] A diagonal matrix is a matrix

(usually a square matrix) whose entries outside the main diagonal

are zero.

Whenever we refer to a diagonal matrix which is not square, we

will use the term rectangular diagonal matrix.

A square diagonal matrixA is also represented by diag(a11, a22, . . . , ann)

which lists only the diagonal (non-zero) entries in A.

The transpose of a matrix A is denoted by AT while the Hermitian

transpose is denoted by AH . For real matrices AT = AH .

When matrices are square, we have the number of rows and columns

both equal to n and they belong to Cn×n.

If not specified, the square matrices will be of size n×n and rectangular

matrices will be of size m×n. If not specified the vectors (column vec-

tors) will be of size n×1 and belong to either Rn or Cn. Corresponding

row vectors will be of size 1× n.

For statements which are valid both for real and complex matrices,

sometimes we might say that matrices belong to Fm×n while the scalars

belong to F and vectors belong to Fn where F refers to either the field

of real numbers or the field of complex numbers. Note that this is not

1.1. PRELIMINARIES 3

consistently followed at the moment. Most results are written only for

Cm×n while still being applicable for Rm×n.

Identity matrix for Fn×n is denoted as In or simply I whenever the size

is clear from context.

Sometimes we will write a matrix in terms of its column vectors. We

will use the notation

A =[a1 a2 . . . an

]indicating n columns.

When we write a matrix in terms of its row vectors, we will use the

notation

A =

aT1aT2...

aTm

indicating m rows with ai being column vectors whose transposes form

the rows of A.

The rank of a matrix A is written as rank(A), while the determinant

as det(A) or |A|.

We say that an m × n matrix A is left-invertible if there exists an

n×m matrix B such that BA = I. We say that an m× n matrix A is

right-invertible if there exists an n×m matrix B such that AB = I.

We say that a square matrix A is invertible when there exists another

square matrix B of same size such that AB = BA = I. A square

matrix is invertible iff its both left and right invertible. Inverse of a

square invertible matrix is denoted by A−1.

A special left or right inverse is the pseudo inverse, which is denoted

by A†.

Column space of a matrix is denoted by C(A), the null space by N (A),

and the row space by R(A).

4 1. MATRIX ALGEBRA

We say that a matrix is symmetric when A = AT , conjugate sym-

metric or Hermitian when AH = A.

When a square matrix is not invertible, we say that it is singular. A

non-singular matrix is invertible.

The eigen values of a square matrix are written as λ1, λ2, . . . while the

singular values of a rectangular matrix are written as σ1, σ2, . . . .

The inner product or dot product of two column / row vectors u and

v belonging to Rn is defined as

u · v = 〈u, v〉 =n∑i=1

uivi. (1.1.1)

The inner product or dot product of two column / row vectors u and

v belonging to Cn is defined as

u · v = 〈u, v〉 =n∑i=1

uivi. (1.1.2)

1.1.1. Block matrix

Definition 1.6 A block matrix is a matrix whose entries them-

selves are matrices with following constraints

(1) Entries in every row are matrices with same number of

rows.

(2) Entries in every column are matrices with same number

of columns.

Let A be an m× n block matrix. Then

A =

A11 A12 . . . A1n

A21 A22 . . . A2n

......

. . ....

Am1 Am2 . . . Amn

(1.1.3)

where Aij is a matrix with ri rows and cj columns.

1.1. PRELIMINARIES 5

A block matrix is also known as a partitioned matrix.

Example 1.1: 2x2 block matrices Quite frequently we will be using

2x2 block matrices.

P =

[P11 P12

P21 P22

]. (1.1.4)

An example

P =

a b c

d e f

g h i

We have

P11 =

[a b

d e

]P12 =

[c

f

]P21 =

[g h

]P22 =

[i]

• P11 and P12 have 2 rows.

• P21 and P22 have 1 row.

• P11 and P21 have 2 columns.

• P12 and P22 have 1 column.

�

Lemma 1.1 Let A = [Aij] be an m×n block matrix with Aij being

an ri × cj matrix. Then A is an r × c matrix where

r =m∑i=1

ri (1.1.5)

and

c =n∑j=1

cj. (1.1.6)

Remark. Sometimes it is convenient to think of a regular matrix as a

block matrix whose entries are 1× 1 matrices themselves.

Definition 1.7 [Multiplication of block matrices] Let A = [Aij]

be an m × n block matrix with Aij being a pi × qj matrix. Let

6 1. MATRIX ALGEBRA

B = [Bjk] be an n×p block matrix with Bjk being a qj×rk matrix.

Then the two block matrices are compatible for multiplication

and their multiplication is defined by C = AB = [Cik] where

Cik =n∑j=1

AijBjk (1.1.7)

and Cik is a pi × rk matrix.

Definition 1.8 A block diagonal matrix is a block matrix

whose off diagonal entries are zero matrices.

1.2. Linear independence, span, rank

1.2.1. Spaces associated with a matrix

Definition 1.9 The column space of a matrix is defined as the

vector space spanned by columns of the matrix.

Let A be an m× n matrix with

A =[a1 a2 . . . an

]Then the column space is given by

C(A) = {x ∈ Fm : x =n∑i=1

αiai for some αi ∈ F}. (1.2.1)

Definition 1.10 The row space of a matrix is defined as the

vector space spanned by rows of the matrix.

Let A be an m× n matrix with

A =

aT1aT2...

aTm

1.2. LINEAR INDEPENDENCE, SPAN, RANK 7

Then the row space is given by

R(A) = {x ∈ Fn : x =m∑i=1

αiai for some αi ∈ F}. (1.2.2)

1.2.2. Rank

Definition 1.11 [Column rank] The column rank of a matrix

is defined as the maximum number of columns which are linearly

independent. In other words column rank is the dimension of the

column space of a matrix.

Definition 1.12 [Row rank] The row rank of a matrix is defined

as the maximum number of rows which are linearly independent.

In other words row rank is the dimension of the row space of a

matrix.

Theorem 1.2 The column rank and row rank of a matrix are

equal.

Definition 1.13 [Rank] The rank of a matrix is defined to be

equal to its column rank which is equal to its row rank.

Lemma 1.3 For an m× n matrix A

0 ≤ rank(A) ≤ min(m,n). (1.2.3)

Lemma 1.4 The rank of a matrix is 0 if and only if it is a zero

matrix.

Definition 1.14 [Full rank matrix] An m × n matrix A is called

full rank if

rank(A) = min(m,n).

In other words it is either a full column rank matrix or a full row

rank matrix or both.

8 1. MATRIX ALGEBRA

Lemma 1.5 [Rank of product to two matrices] Let A be an m×nmatrix and B be an n× p matrix then

rank(AB) ≤ min(rank(A), rank(B)). (1.2.4)

Lemma 1.6 [Post-multiplication with a full row rank matrix] Let

A be an m× n matrix and B be an n× p matrix. If B is of rank

n then

rank(AB) = rank(A). (1.2.5)

Lemma 1.7 [Pre-multiplication with a full column rank matrix]

Let A be an m × n matrix and B be an n × p matrix. If A is of

rank n then

rank(AB) = rank(B). (1.2.6)

Lemma 1.8 The rank of a diagonal matrix is equal to the number

of non-zero elements on its main diagonal.

Proof. The columns which correspond to diagonal entries which

are zero are zero columns. Other columns are linearly independent.

The number of linearly independent rows is also the same. Hence their

count gives us the rank of the matrix. �

1.3. Invertible matrices

Definition 1.15 [Invertible] A square matrix A is called invert-

ible if there exists another square matrix B of same size such that

AB = BA = I.

The matrix B is called the inverse of A and is denoted as A−1.

Lemma 1.9 If A is invertible then its inverse A−1 is also invertible

and the inverse of A−1 is nothing but A.

1.3. INVERTIBLE MATRICES 9

Lemma 1.10 Identity matrix I is invertible.

Proof.

II = I =⇒ I−1 = I.

�

Lemma 1.11 If A is invertible then columns of A are linearly

independent.

Proof. Assume A is invertible, then there exists a matrix B such

that

AB = BA = I.

Assume that columns of A are linearly dependent. Then there exists

u 6= 0 such that

Au = 0 =⇒ BAu = 0 =⇒ Iu = 0 =⇒ u = 0

a contradiction. Hence columns of A are linearly independent. �

Lemma 1.12 If an n × n matrix A is invertible then columns of

A span Fn.


that

AB = BA = I.

Now let x ∈ Fn be any arbitrary vector. We need to show that there

exists α ∈ Fn such that

x = Aα.

But

x = Ix = ABx = A(Bx).

Thus if we choose α = Bx, then

x = Aα.

10 1. MATRIX ALGEBRA

Thus columns of A span Fn. �

Lemma 1.13 If A is invertible, then columns of A form a basis

for Fn.

Proof. In Fn a basis is a set of vectors which is linearly inde-

pendent and spans Fn. By lemma 1.11 and lemma 1.12, columns of

an invertible matrix A satisfy both conditions. Hence they form a

basis. �

Lemma 1.14 If A is invertible than AT is invertible.


that

AB = BA = I.

Applying transpose on both sides we get

BTAT = ATBT = I.

Thus BT is inverse of AT and AT is invertible. �

Lemma 1.15 If A is invertible than AH is invertible.


that

AB = BA = I.

Applying conjugate transpose on both sides we get

BHAH = AHBH = I.

Thus BH is inverse of AH and AH is invertible. �


Lemma 1.16 If A and B are invertible then AB is invertible.

Proof. We note that

(AB)(B−1A−1) = A(BB−1)A−1 = AIA−1 = I.

Similarly

(B−1A−1)(AB) = B−1(A−1A)B = B−1IB = I.

Thus B−1A−1 is the inverse of AB. �

Lemma 1.17 The set of n×n invertible matrices under the matrix

multiplication operation form a group.

Proof. We verify the properties of a group

Closure: If A and B are invertible then AB is invertible. Hence the

set is closed.

Associativity: Matrix multiplication is associative.

Identity element: I is invertible and AI = IA = A for all invertible

matrices.

Inverse element: If A is invertible then A−1 is also invertible.

Thus the set of invertible matrices is indeed a group under matrix

multiplication. �

Lemma 1.18 An n × n matrix A is invertible if and only if it is

full rank i.e.

rank(A) = n.

Corollary 1.19. The rank of an invertible matrix and its inverse are

same.


1.3.1. Similar matrices

Definition 1.16 [Similar matrices] An n×n matrix B is similar

to an n× n matrix A if there exists an n× n non-singular matrix

C such that

B = C−1AC.

Lemma 1.20 If B is similar to A then A is similar to B. Thus

similarity is a symmetric relation.

Proof.

B = C−1AC =⇒ A = CBC−1 =⇒ A = (C−1)−1BC−1

Thus there exists a matrix D = C−1 such that

A = D−1BD.

Thus A is similar to B. �

Lemma 1.21 Similar matrices have same rank.

Proof. Let B be similar to A. Thus their exists an invertible

matrix C such that

B = C−1AC.

Since C is invertible hence we have rank(C) = rank(C−1) = n. Now

using lemma 1.6 rank(AC) = rank(A) and using lemma 1.7 we have

rank(C−1(AC)) = rank(AC) = rank(A). Thus

rank(B) = rank(A).

�

Lemma 1.22 Similarity is an equivalence relation on the set of

n× n matrices.


Proof. Let A,B,C be n×n matrices. A is similar to itself through

an invertible matrix I. If A is similar to B then B is similar to itself.

If B is similar to A via P s.t. B = P−1AP and C is similar to B

via Q s.t. C = Q−1BQ then C is similar to A via PQ such that

C = (PQ)−1A(PQ). Thus similarity is an equivalence relation on the

set of square matrices and if A is any n×n matrix then the set of n×nmatrices similar to A forms an equivalence class. �

1.3.2. Gram matrices

Definition 1.17 Gram matrix of columns of A is given by

G = AHA (1.3.1)

Definition 1.18 Gram matrix of rows of A is given by

G = AAH (1.3.2)

Remark. Usually when we talk about Gram matrix of a matrix we

are looking at the Gram matrix of its column vectors.

Remark. For real matrix A ∈ Rm×n, the Gram matrix of its column

vectors is given by ATA and the Gram matrix for its row vectors is

given by AAT .

Following results apply equally well for the real case.

Lemma 1.23 The columns of a matrix are linearly dependent if

and only if the Gram matrix of its column vectors AHA is not

invertible.

Proof. Let A be an m × n matrix and G = AHA be the Gram

matrix of its columns.

If columns of A are linearly dependent, then there exists a vector u 6= 0

such that

Au = 0.


Thus

Gu = AHAu = 0.

Hence the columns of G are also dependent and G is not invertible.

Conversely let us assume that G is not invertible, thus columns of G

are dependent and there exists a vector v 6= 0 such that

Gv = 0.

Now

vHGv = vHAHAv = (Av)H(Av) = ‖Av‖22.

From previous equation, we have

‖Av‖22 = 0 =⇒ Av = 0.

Since v 6= 0 hence columns of A are also linearly dependent. �

Corollary 1.24. The columns of a matrix are linearly independent if

and only if the Gram matrix of its column vectors AHA is invertible.

Proof. Columns of A can be dependent only if its Gram matrix is

not invertible. Thus if the Gram matrix is invertible, then the columns

of A are linearly independent.

The Gram matrix is not invertible only if columns of A are linearly

dependent. Thus if columns of A are linearly independent then the

Gram matrix is invertible. �

Corollary 1.25. Let A be a full column rank matrix. Then AHA is

invertible.

Lemma 1.26 The null space of A and its Gram matrix AHA co-

incide. i.e.

N (A) = N (AHA). (1.3.3)

Proof. Let u ∈ N (A). Then

Au = 0 =⇒ AHAu = 0.


Thus

u ∈ N (AHA) =⇒ N (A) ⊆ N (AHA).

Now let u ∈ N (AHA). Then

AHAu = 0 =⇒ uHAHAu = 0 =⇒ ‖Au‖22 = 0 =⇒ Au = 0.

Thus we have

u ∈ N (A) =⇒ N (AHA) ⊆ N (A).

�

Lemma 1.27 The rows of a matrix A are linearly dependent if and

only if the Gram matrix of its row vectors AAH is not invertible.

Proof. Rows of A are linearly dependent, if and only if columns

of AH are linearly dependent. There exists a vector v 6= 0 s.t.

AHv = 0

Thus

Gv = AAHv = 0.

Since v 6= 0 hence G is not invertible.

Converse: assuming that G is not invertible, there exists a vector u 6= 0

s.t.

Gu = 0.

Now

uHGu = uHAAHu = (AHu)H(AHu) = ‖AHu‖22 = 0 =⇒ AHu = 0.

Since u 6= 0 hence columns of AH and consequently rows of A are

linearly dependent. �

Corollary 1.28. The rows of a matrix A are linearly independent if

and only if the Gram matrix of its row vectors AAH is invertible.

Corollary 1.29. Let A be a full row rank matrix. Then AAH is in-

vertible.


1.3.3. Pseudo inverses

Definition 1.19 [Moore-Penrose pseudo-inverse] Let A be an m×n matrix. An n×m matrix A† is called its Moore-Penrose pseudo-

inverse if it satisfies all of the following criteria:

(1) AA†A = A.

(2) A†AA† = A†.

(3)(AA†

)H= AA† i.e. AA† is Hermitian.

(4) (A†A)H = A†A i.e. A†A is Hermitian.

Theorem 1.30 [Existence and uniqueness] For any matrix A there

exists precisely one matrix A† which satisfies all the requirements

in definition 1.19.

We omit the proof for this. The pseudo-inverse can actually be ob-

tained by the singular value decomposition of A. This is shown in

lemma 1.110.

Lemma 1.31 Let D = diag(d1, d2, . . . , dn) be an n × n diag-

onal matrix. Then its Moore-Penrose pseudo-inverse is D† =

diag(c1, c2, . . . , cn) where

ci =

{1di

if di 6= 0;

0 if di = 0.

Proof. We note that D†D = DD† = F = diag(f1, f2, . . . fn)

where

fi =

{1 if di 6= 0;

0 if di = 0.

We now verify the requirements in definition 1.19.

DD†D = FD = D.

D†DD† = FD† = D†

D†D = DD† = F is a diagonal hence Hermitian matrix. �


Lemma 1.32 Let D = diag(d1, d2, . . . , dp) be an m × n rectan-

gular diagonal matrix where p = min(m,n). Then its Moore-

Penrose pseudo-inverse is an n × m rectangular diagonal matrix

D† = diag(c1, c2, . . . , cp) where

ci =

{1di

if di 6= 0;

0 if di = 0.

Proof. F = D†D = diag(f1, f2, . . . fn) is an n× n matrix where

fi =

1 if di 6= 0;

0 if di = 0;

0 if i > p.

G = DD† = diag(g1, g2, . . . gn) is an m×m matrix where

gi =

1 if di 6= 0;

0 if di = 0;

0 if i > p.

We now verify the requirements in definition 1.19.

DD†D = DF = D.

D†DD† = D†G = D†

F = D†D and G = DD† are both diagonal hence Hermitian matrices.

�

Lemma 1.33 If A is full column rank then its Moore-Penrose

pseudo-inverse is given by

A† = (AHA)−1AH . (1.3.4)

It is a left inverse of A.

Proof. By corollary 1.25 AHA is invertible.


First of all we verify that its a left inverse.

A†A = (AHA)−1AHA = I.

We now verify all the properties.

AA†A = AI = A.

A†AA† = IA† = A†.

Hermitian properties:(AA†

)H=(A(AHA)−1AH

)H=(A(AHA)−1AH

)= AA†.

(A†A)H = IH = I = A†A.

�

Lemma 1.34 If A is full row rank then its Moore-Penrose pseudo-

inverse is given by

A† = AH(AAH)−1. (1.3.5)

It is a right inverse of A.

Proof. By corollary 1.29 AAH is invertible.

First of all we verify that its a right inverse.

AA† = AAH(AAH)−1 = I.

We now verify all the properties.

AA†A = IA = A.

A†AA† = A†I = A†.

Hermitian properties: (AA†

)H= IH = I = AA†.

(A†A)H =(AH(AAH)−1A

)H= AH(AAH)−1A = A†A.

�

1.4. TRACE AND DETERMINANT 19

1.4. Trace and determinant

1.4.1. Trace

Definition 1.20 [Trace] The trace of a square matrix is defined

as the sum of the entries on its main diagonal. Let A be an n× nmatrix, then

tr(A) =n∑i=1

aii (1.4.1)

where tr(A) denotes the trace of A.

Lemma 1.35 The trace of a square matrix and its transpose are

equal.

tr(A) = tr(AT ). (1.4.2)

Lemma 1.36 Trace of sum of two square matrices is equal to the

sum of their traces.

tr(A+B) = tr(A) + tr(B). (1.4.3)

Lemma 1.37 Let A be an m×n matrix and B be an n×m matrix.

Then

tr(AB) = tr(BA). (1.4.4)

Proof. Let AB = C = [cij]. Then

cij =n∑k=1

aikbkj.

Thus

cii =n∑k=1

aikbki.

Now

tr(C) =m∑i=1

cii =m∑i=1

n∑k=1

aikbki =n∑k=1

m∑i=1

aikbki =n∑k=1

m∑i=1

bkiaik.


Let BA = D = [dij]. Then

dij =m∑k=1

bikakj.

Thus

dii =m∑k=1

bikaki.

Hence

tr(D) =n∑i=1

dii =n∑i=1

m∑k=1

bikaki =m∑i=1

n∑k=1

bkiaik.

This completes the proof. �

Lemma 1.38 Let A ∈ Fm×n, B ∈ Fn×p, C ∈ Fp×m be three ma-

trices. Then

tr(ABC) = tr(BCA) = tr(CAB). (1.4.5)

Proof. Let AB = D. Then

tr(ABC) = tr(DC) = tr(CD) = tr(CAB).

Similarly the other result can be proved. �

Lemma 1.39 Trace of similar matrices is equal.

Proof. Let B be similar to A. Thus

B = C−1AC

for some invertible matrix C. Then

tr(B) = tr(C−1AC) = tr(CC−1A) = tr(A).

We used lemma 1.37. �

1.4. TRACE AND DETERMINANT 21

1.4.2. Determinants

Following are some results on determinant of a square matrix A.

Lemma 1.40

det(αA) = αn det(A). (1.4.6)

Lemma 1.41 Determinant of a square matrix and its transpose

are equal.

det(A) = det(AT ). (1.4.7)

Lemma 1.42 Let A be a complex square matrix. Then

det(AH) = det(A). (1.4.8)

Proof.

det(AH) = det(AT

) = det(A) = det(A).

�

Lemma 1.43 Let A and B be two n× n matrices. Then

det(AB) = det(A) det(B). (1.4.9)

Lemma 1.44 Let A be an invertible matrix. Then

det(A−1) =1

det(A). (1.4.10)


Lemma 1.45 Let A be a square matrix and p ∈ N. Then

det(Ap) = (det(A))p . (1.4.11)

Lemma 1.46 [Determinant of a triangular matrix] Determinant

of a triangular matrix is the product of its diagonal entries. i.e. if

A is upper or lower triangular matrix then

det(A) =n∏i=1

aii. (1.4.12)

Lemma 1.47 [Determinant of a diagonal matrix] Determinant of

a diagonal matrix is the product of its diagonal entries. i.e. if A

is a diagonal matrix then

det(A) =n∏i=1

aii. (1.4.13)

Lemma 1.48 [Determinant of similar matrices] Determinant of

similar matrices is equal.

Proof. Let B be similar to A. Thus

B = C−1AC

for some invertible matrix C. Hence

det(B) = det(C−1AC) = det(C−1) det(A) det(C).

Now

det(C−1) det(A) det(C) =1

det(C)det(A) det(C) = det(A).

We used lemma 1.43 and lemma 1.44. �

1.5. UNITARY AND ORTHOGONAL MATRICES 23

Lemma 1.49 Let u and v be vectors in Fn. Then

det(I + uvT ) = 1 + uTv. (1.4.14)

Lemma 1.50 [Determinant of a small perturbation of identity

matrix] Let A be a square matrix and let ε ≈ 0. Then

det(I + εA) ≈ 1 + ε tr(A). (1.4.15)

1.5. Unitary and orthogonal matrices

1.5.1. Orthogonal matrix

Definition 1.21 [Orthogonal matrix] A real square matrix U is

called orthogonal if the columns of U form an orthonormal set.

In other words, let

U =[u1 u2 . . . un

]with ui ∈ Rn. Then we have

ui · uj = δi,j.

Lemma 1.51 An orthogonal matrix U is invertible with UT =

U−1.

Proof. Let

U =[u1 u2 . . . un

]be orthogonal with

UT =

uT1uT2...

uTn .


Then

UTU =

uT1uT2...

uTn .

[u1 u2 . . . un

]=[ui · uj

]= I.

Since columns of U are linearly independent and span Rn, hence U is

invertible. Thus

UT = U−1.

�

Lemma 1.52 Determinant of an orthogonal matrix is ±1.

Proof. Let U be an orthogonal matrix. Then

det(UTU) = det(I) =⇒ (det(U))2 = 1

Thus we have

det(U) = ±1.

�

1.5.2. Unitary matrix

Definition 1.22 [Unitary matrix] A complex square matrix U is

called unitary if the columns of U form an orthonormal set. In

other words, let

U =[u1 u2 . . . un

]with ui ∈ Cn. Then we have

ui · uj = 〈ui, uj〉 = uHj ui = δi,j.

Lemma 1.53 A unitary matrix U is invertible with UH = U−1.

Proof. Let

U =[u1 u2 . . . un

]

1.5. UNITARY AND ORTHOGONAL MATRICES 25

be orthogonal with

UH =

uH1uH2...

uHn .

Then

UHU =

uH1uH2...

uHn .

[u1 u2 . . . un

]=[uHi uj

]= I.

Since columns of U are linearly independent and span Cn, hence U is

invertible. Thus

UH = U−1.

�

Lemma 1.54 The magnitude of determinant of a unitary matrix

is 1.

Proof. Let U be a unitary matrix. Then

det(UHU) = det(I) =⇒ det(UH) det(U) = 1 =⇒ det(U)det(U) = 1.

Thus we have

| det(U)|2 = 1 =⇒ | det(U)| = 1.

�

1.5.3. F unitary matrix

We provide a common definition for unitary matrices over any field F.

This definition applies to both real and complex matrices.

Definition 1.23 [F Unitary matrix] A square matrix U ∈ Fn×n is

called F unitary if the columns of U form an orthonormal set. In


other words, let

U =[u1 u2 . . . un

]with ui ∈ Fn. Then we have

〈ui, uj〉 = uHj ui = δi,j.

We note that a suitable definition of inner product transports the def-

inition appropriately into orthogonal matrices over R and unitary ma-

trices over C.

When we are talking about F unitary matrices, then we will use the

symbol UH to mean its inverse. In the complex case, it will map to its

conjugate transpose, while in real case it will map to simple transpose.

This definition helps us simplify some of the discussions in the sequel

(like singular value decomposition).

Following results apply equally to orthogonal matrices for real case and

unitary matrices for complex case.

Lemma 1.55 [Norm preservation] F-unitary matrices preserve norm.

i.e.

‖Ux‖2 = ‖x‖2.

Proof.

‖Ux‖22 = (Ux)H(Ux) = xHUHUx = xHIx = ‖x‖22.

�

Remark. For the real case we have

‖Ux‖22 = (Ux)T (Ux) = xTUTUx = xT Ix = ‖x‖22.

Lemma 1.56 [Inner product preservation] F-unitary matrices pre-

serve inner product. i.e.

〈Ux, Uy〉 = 〈x, y〉.

1.6. EIGEN VALUES 27

Proof.

〈Ux, Uy〉 = (Uy)HUx = yHUHUx = yHx.

�

Remark. For the real case we have

〈Ux, Uy〉 = (Uy)TUx = yTUTUx = yTx.

1.6. Eigen values

Much of the discussion in this section will be equally applicable to real

as well as complex matrices. We will use the complex notation mostly

and make specific remarks for real matrices wherever needed.

Definition 1.24 [Eigen value] A scalar λ is an eigen value of an

n×n matrix A = [aij] if there exists a non null vector x such that

Ax = λx. (1.6.1)

A non null vector x which satisfies this equation is called an eigen

vector of A for the eigen value λ.

An eigen value is also known as a characteristic value, proper

value or a latent value.

We note that (1.6.1) can be written as

Ax = λInx =⇒ (A− λIn)x = 0. (1.6.2)

Thus λ is an eigen value of A if and only if the matrix A−λI is singular.

Definition 1.25 [Spectrum of a matrix] The set comprising of

eigen values of a matrix A is known as its spectrum.

Remark. For each eigen vector x for a matrix A the corresponding

eigen value λ is unique.


Proof. Assume that for x there are two eigen values λ1 and λ2,

then

Ax = λ1x = λ2x =⇒ (λ1 − λ2)x = 0.

This can happen only when either x = 0 or λ1 = λ2. Since x is an

eigen vector, it cannot be 0. Thus λ1 = λ2. �

Remark. If x is an eigen vector for A, then the corresponding eigen

value is given by

λ =xHAx

xHx. (1.6.3)

Proof.

Ax = λx =⇒ xHAx = λxHx =⇒ λ =xHAx

xHx.

since x is non-zero. �

Remark. An eigen vector x of A for eigen value λ belongs to the null

space of A− λI, i.e.

x ∈ N (A− λI).

In other words x is a nontrivial solution to the homogeneous system of

linear equations given by

(A− λI)z = 0.

Definition 1.26 [Eigen space] Let λ be an eigen value for a square

matrix A. Then its eigen space is the null space of A − λI i.e.

N (A− λI).

Remark. The set comprising all the eigen vectors of A for an eigen

value λ is given by

N (A− λI) \ {0} (1.6.4)

since 0 cannot be an eigen vector.


Definition 1.27 [Geometric multiplicity] Let λ be an eigen value

for a square matrix A. The dimension of its eigen space N (A−λI)

is known as the geometric multiplicity of the eigen value λ.

Remark. Clearly

dim(N (A− λI)) = n− rank(A− λI).

Remark. A scalar λ can be an eigen value of a square matrix A if and

only if

det(A− λI) = 0.

det(A− λI) is a polynomial in λ of degree n.

Remark.

det(A− λI) = p(λ) = αnλn + αn−1λn−1 + · · ·+ α1λ+ α0 (1.6.5)

where αi depend on entries in A.

In this sense, an eigen value of A is a root of the equation

p(λ) = 0. (1.6.6)

Its easy to show that αn = (−1)n.

Definition 1.28 [Characteristic polynomial and equation] For any

square matrix A, the polynomial given by p(λ) = det(A − λI) is

known as its characteristic polynomial. The equation give by

p(λ) = 0 (1.6.7)

is known as its characteristic equation. The eigen values of

A are the roots of its characteristic polynomial or solutions of its

characteristic equation.


Lemma 1.57 [Roots of characteristic equation] For real square

matrices, if we restrict eigen values to real values, then the char-

acteristic polynomial can be factored as

p(λ) = (−1)n(λ− λ1)r1 . . . (λ− λk)rkq(λ). (1.6.8)

The polynomial has k distinct real roots. For each root λi, ri is a

positive integer indicating how many times the root appears. q(λ)

is a polynomial that has no real roots. The following is true

r1 + · · ·+ rk + deg(q(λ)) = n. (1.6.9)

Clearly k ≤ n.

For complex square matrices where eigen values can be complex

(including real square matrices), the characteristic polynomial can

be factored as

p(λ) = (−1)n(λ− λ1)r1 . . . (λ− λk)rk . (1.6.10)

The polynomial can be completely factorized into first degree poly-

nomials. There are k distinct roots or eigen values. The following

is true

r1 + · · ·+ rk = n. (1.6.11)

Thus including the duplicates there are exactly n eigen values for

a complex square matrix.

Remark. It is quite possible that a real square matrix doesn’t have

any real eigen values.

Definition 1.29 [Algebraic multiplicity] The number of times an

eigen value appears in the factorization of the characteristic poly-

nomial of a square matrix A is known as its algebraic multiplicity.

In other words ri is the algebraic multiplicity for λi in above fac-

torization.

Remark. In above the set {λ1, . . . , λk} forms the spectrum of A.


Let us consider the sum of ri which gives the count of total number of

roots of p(λ).

m =k∑i=1

ri. (1.6.12)

With this there are m not-necessarily distinct roots of p(λ). Let us

write p(λ) as

p(λ) = (−1)n(λ− c1)(λ− c2) . . . (λ− cm)q(λ). (1.6.13)

where c1, c2, . . . , cm are m scalars (not necessarily distinct) of which r1

scalars are λ1, r2 are λ2 and so on. Obviously for the complex case

q(λ) = 1.

We will refer to the set (allowing repetitions) {c1, c2, . . . , cm} as the

eigen values of the matrix A where ci are not necessarily distinct. In

contrast the spectrum of A refers to the set of distinct eigen values of

A. The symbol c has been chosen based on the other name for eigen

values (the characteristic values).

We can put together eigen vectors of a matrix into another matrix by

itself. This can be very useful tool. We start with a simple idea.

Lemma 1.58 Let A be an n × n matrix. Let u1, u2, . . . , ur be r

non-zero vectors from Fn. Let us construct an n× r matrix

U =[u1 u2 . . . ur

].

Then all the r vectors are eigen vectors of A if and only if there

exists a diagonal matrix D = diag(d1, . . . , dr) such that

AU = UD. (1.6.14)

Proof. Expanding the equation, we can write[Au1 Au2 . . . Aur

]=[d1u1 d2u2 . . . drur

].

Clearly we want

Aui = diui


where ui are non-zero. This is possible only when di is an eigen value

of A and ui is an eigen vector for di.

Converse: Assume that ui are eigen vectors. Choose di to be corre-

sponding eigen values. Then the equation holds. �

Lemma 1.59 0 is an eigen value of a square matrix A if and only

if A is singular.

Proof. Let 0 be an eigen value of A. Then there exists u 6= 0 such

that

Au = 0u = 0.

Thus u is a non-trivial solution of the homogeneous linear system. Thus

A is singular.

Converse: Assuming that A is singular, there exists u 6= 0 s.t.

Au = 0 = 0u.

Thus 0 is an eigen value of A. �

Lemma 1.60 If a square matrix A is singular, then N (A) is the

eigen space for the eigen value λ = 0.

Proof. This is straight forward from the definition of eigen space

(see definition 1.26). �

Remark. Clearly the geometric multiplicity of λ = 0 equals nullity(A) =

n− rank(A).

Lemma 1.61 Let A be a square matrix. Then A and AT have

same eigen values.

Proof. The eigen values of AT are given by

det(AT − λI) = 0.


But

AT − λI = AT − (λI)T = (A− λI)T .

Hence (using lemma 1.41)

det(AT − λI) = det((A− λI)T

)= det(A− λI).

Thus the characteristic polynomials of A and AT are same. Hence the

eigen values are same. In other words the spectrum of A and AT are

same. �

Remark (Direction preservation). If x is an eigen vector with a non-

zero eigen value λ for A then Ax and x are collinear.

In other words the angle between Ax and x is either 0◦ when λ is

positive and is 180◦ when λ is negative. Let us look at the inner

product:

〈Ax, x〉 = xHAx = xHλx = λ‖x‖22.

Meanwhile

‖Ax‖2 = ‖λx‖2 = |λ|‖x‖2.

Thus

|〈Ax, x〉| = ‖Ax‖2‖x‖2.

The angle θ between Ax and x is given by

cos θ =〈Ax, x〉‖Ax‖2‖x‖2

=λ‖x‖22|λ|‖x‖22

= ±1.

Lemma 1.62 Let A be a square matrix and λ be an eigen value

of A. Let p ∈ N. Then λp is an eigen value of Ap.

Proof. For p = 1 the statement holds trivially since λ1 is an eigen

value of A1. Assume that the statement holds for some value of p.

Thus let λp be an eigen value of Ap and let u be corresponding eigen

vector. Now

Ap+1u = Ap(Au) = Apλu = λApu = λλpu = λp+1u.


Thus λp+1 is an eigen value for Ap+1 with the same eigen vector u. With

the principle of mathematical induction, the proof is complete. �

Lemma 1.63 Let a square matrix A be non singular and let λ 6= 0

be some eigen value of A. Then λ−1 is an eigen value of A−1.

Moreover, all eigen values of A−1 are obtained by taking inverses

of eigen values of A i.e. if µ 6= 0 is an eigen value of A−1 then 1µ

is an eigen value of A also. Also, A and A−1 share the same set

of eigen vectors.

Proof. Let u 6= 0 be an eigen vector of A for the eigen value λ.

Then

Au = λu =⇒ u = A−1λu =⇒ 1

λu = A−1u.

Thus u is also an eigen vector of A−1 for the eigen value 1λ.

Now let B = A−1. Then B−1 = A. Thus if µ is an eigen value of B

then 1µ

is an eigen value of B−1 = A.

Thus if A is invertible then eigen values of A and A−1 have one to one

correspondence. �

This result is very useful. Since if it can be shown that a matrix A is

similar to a diagonal or a triangular matrix whose eigen values are easy

to obtain then determination of the eigen values of A becomes straight

forward.

1.6.1. Invariant subspaces

Definition 1.30 [Invariance subspace] Let A be a square n × n

matrix and let W be a subspace of Fn i.e. W ≤ F. Then W is

invariant relative to A if

Aw ∈W ∀ w ∈W. (1.6.15)

i.e. A(W ) ⊆ W or for every vector w ∈W its mapping Aw is also

in W. Thus action of A on W doesn’t take us outside of W.


We also say that W is A-invariant.

Eigen vectors are generators of invariant subspaces.

Lemma 1.64 Let A be an n × n matrix. Let x1, x2, . . . , xr be r

eigen vectors of A. Let us construct an n× r matrix

X =[x1 x2 . . . rr

].

Then the column space of X i.e. C(X) is invariant relative to A.

Proof. Let us assume that c1, c2, . . . , cr are the eigen values cor-

responding to x1, x2, . . . , xr (not necessarily distinct).

Let any vector x ∈ C(X) be given by

x =r∑i=1

αixi.

Then

Ax = Ar∑i=1

αixi =r∑i=1

αiAxi =r∑i=1

αicixi.

Clearly Ax is also a linear combination of xi hence belongs to C(X).

Thus X is invariant relative to A or X is A-invariant. �

1.6.2. Triangular matrices

Lemma 1.65 Let A be an n×n upper or lower triangular matrix.

Then its eigen values are the entries on its main diagonal.

Proof. If A is triangular then A − λI is also triangular with its

diagonal entries being (aii − λ). Using lemma 1.46, we have

p(λ) = det(A− λI) =n∏i=1

(aii − λ).

Clearly the roots of characteristic polynomial are aii. �

Several small results follow from this lemma.


Corollary 1.66. Let A = [aij] be an n× n triangular matrix.

(a) The characteristic polynomial of A is p(λ) = (−1)n(λ− aii).

(a) A scalar λ is an eigen value of A iff its one of the diagonal entries

of A.

(a) The algebraic multiplicity of an eigen value λ is equal to the number

of times it appears on the main diagonal of A.

(a) The spectrum of A is given by the distinct entries on the main

diagonal of A.

A diagonal matrix is naturally both an upper triangular matrix as well

as a lower triangular matrix. Similar results hold for the eigen values

of a diagonal matrix also.

Lemma 1.67 Let A = [aij] be an n× n diagonal matrix.

(a) Its eigen values are the entries on its main diagonal.

(a) The characteristic polynomial of A is p(λ) = (−1)n(λ− aii).

(a) A scalar λ is an eigen value of A iff its one of the diagonal

entries of A.

(a) The algebraic multiplicity of an eigen value λ is equal to the

number of times it appears on the main diagonal of A.

(a) The spectrum of A is given by the distinct entries on the main

diagonal of A.

There is also a result for the geometric multiplicity of eigen values for

a diagonal matrix.

Lemma 1.68 Let A = [aij] be an n × n diagonal matrix. The

geometric multiplicity of an eigen value λ is equal to the number

of times it appears on the main diagonal of A.

Proof. The unit vectors ei are eigen vectors for A since

Aei = aiiei.


They are independent. Thus if a particular eigen value appears r num-

ber of times, then there are r linearly independent eigen vectors for the

eigen value. Thus its geometric multiplicity is equal to the algebraic

multiplicity. �

1.6.3. Similar matrices

Some very useful results are available for similar matrices.

Lemma 1.69 The characteristic polynomial and spectrum of sim-

ilar matrices is same.

Proof. Let B be similar to A. Thus there exists an invertible

matrix C such that

B = C−1AC.

Now

B−λI = C−1AC−λI = C−1AC−λC−1C = C−1(AC−λC) = C−1(A−λI)C.

Thus B − λI is similar to A − λI. Hence due to lemma 1.48, their

determinant is equal i.e.

det(B − λI) = det(A− λI).

This means that the characteristic polynomials of A and B are same.

Since eigen values are nothing but roots of the characteristic polyno-

mial, hence they are same too. This means that the spectrum (the set

of distinct eigen values) is same. �

Corollary 1.70. If A and B are similar to each other then

(a) An eigen value has same algebraic and geometric multiplicity for

both A and B.

(a) The (not necessarily distinct) eigen values of A and B are same.

Although the eigen values are same, but the eigen vectors are differ-

ent.


Lemma 1.71 Let A and B be similar with

B = C−1AC

for some invertible matrix C. If u is an eigen vector of A for an

eigen value λ, then C−1u is an eigen vector of B for the same

eigen value.

Proof. u is an eigen vector of A for an eigen value λ. Thus we

have

Au = λu.

Thus

BC−1u = C−1ACC−1u = C−1Au = C−1λu = λC−1u.

Now u 6= 0 and C−1 is non singular. Thus C−1u 6= 0. Thus C−1u is an

eigen vector of B.

�

Theorem 1.72 [Geometric vs. algebraic multiplicity] Let λ be an

eigen value of a square matrix A. Then the geometric multiplicity

of λ is less than or equal to its algebraic multiplicity.

Corollary 1.73. If an n×n matrix A has n distinct eigen values, then

each of them has a geometric (and algebraic) multiplicity of 1.

Proof. The algebraic multiplicity of an eigen value is greater than

or equal to 1. But the sum cannot exceed n. Since there are n distinct

eigen values, thus each of them has algebraic multiplicity of 1. Now

geometric multiplicity of an eigen value is greater than equal to 1 and

less than equal to its algebraic multiplicity. �


Corollary 1.74. Let an n × n matrix A has k distinct eigen values

λ1, λ2, . . . , λk with algebraic multiplicities r1, r2, . . . , rk and geometric

multiplicities g1, g2, . . . gk respectively. Then

k∑i=1

gk ≤k∑i=1

rk ≤ n.

Moreover ifk∑i=1

gk =k∑i=1

rk

then

gk = rk.

1.6.4. Linear independence of eigen vectors

Theorem 1.75 [Linear independence of eigen vectors for distinct

eigen values] Let A be an n × n square matrix. Let x1, x2, . . . , xk

be any k eigen vectors of A for distinct eigen values λ1, λ2, . . . , λk

respectively. Then x1, x2, . . . , xk are linearly independent.

Proof. We first prove the simpler case with 2 eigen vectors x1 and

x2 and corresponding eigen values λ1 and λ2 respectively.

Let there be a linear relationship between x1 and x2 given by

α1x1 + α2x2 = 0.

Multiplying both sides with (A− λ1I) we get

α1(A− λ1I)x1 + α2(A− λ1I)x2 = 0

=⇒ α1(λ1 − λ1)x1 + α2(λ1 − λ2)x2 = 0

=⇒ α2(λ1 − λ2)x2 = 0.

Since λ1 6= λ2 and x2 6= 0 , hence α2 = 0.

Similarly by multiplying with (A − λ2I) on both sides, we can show

that α1 = 0. Thus x1 and x2 are linearly independent.


Now for the general case, consider a linear relationship between x1, x2, . . . , xk

given by

α1x1 + α2x2 + . . . αkxk = 0.

Multiplying by∏k

i 6=j,i=1(A − λiI) and using the fact that λi 6= λj if

i 6= j, we get αj = 0. Thus the only linear relationship is the trivial

relationship. This completes the proof. �

For eigen values with geometric multiplicity greater than 1 there are

multiple eigenvectors corresponding to the eigen value which are lin-

early independent. In this context, above theorem can be generalized

further.

Theorem 1.76 Let λ1, λ2, . . . , λk be k distinct eigen values of

A. Let {xj1, xj2, . . . x

jgj} be any gj linearly independent eigen vec-

tors from the eigen space of λj where gj is the geometric mul-

tiplicity of λj. Then the combined set of eigen vectors given by

{x11, . . . x1g1 , . . . xk1, . . . x

kgk} consisting of

∑kj=1 gj eigen vectors is

linearly independent.

This result puts an upper limit on the number of linearly independent

eigen vectors of a square matrix.

Lemma 1.77 Let {λ1, . . . , λk} represents the spectrum of an n×nmatrix A. Let g1, . . . , gk be the geometric multiplicities of λ1, . . . λk

respectively. Then the number of linearly independent eigen vectors

for A isk∑i=1

gi.

Moreover ifk∑i=1

gi = n

then a set of n linearly independent eigen vectors of A can be found

which forms a basis for Fn.


1.6.5. Diagonalization

Diagonalization is one of the fundamental operations in linear algebra.

This section discusses diagonalization of square matrices in depth.

Definition 1.31 [Diagonalizable matrix] An n × n matrix A is

said to be diagonalizable if it is similar to a diagonal matrix.

In other words there exists an n × n non-singular matrix P such

that D = P−1AP is a diagonal matrix. If this happens then we

say that P diagonalizes A or A is diagonalized by P .

Remark.

D = P−1AP ⇐⇒ PD = AP ⇐⇒ PDP−1 = A. (1.6.16)

We note that if we restrict to real matrices, then U and D should

also be real. If A ∈ Cn×n (it may still be real) then P and D can be

complex.

The next theorem is the culmination of a variety of results studied so

far.

Theorem 1.78 [Properties of diagonalizable matrices] Let A be a

diagonalizable matrix with D = P−1AP being its diagonalization.

Let D = diag(d1, d2, . . . , dn). Then the following hold

(a) rank(A) = rank(D) which equals the number of non-zero en-

tries on the main diagonal of D.

(a) det(A) = d1d2 . . . dn.

(a) tr(A) = d1 + d2 + . . . dn.

(a) The characteristic polynomial of A is

p(λ) = (−1)n(λ− d1)(λ− d2) . . . (λ− dn).

(a) The spectrum of A comprises the distinct scalars on the diag-

onal entries in D.


(a) The (not necessarily distinct) eigenvalues of A are the diagonal

elements of D.

(a) The columns of P are (linearly independent) eigenvectors of

A.

(a) The algebraic and geometric multiplicities of an eigenvalue λ

of A equal the number of diagonal elements of D that equal λ.

Proof. From definition 1.31 we note that D and A are similar.

Due to lemma 1.48

det(A) = det(D).

Due to lemma 1.47

det(D) =n∏i=1

di.

Now due to lemma 1.39

tr(A) = tr(D) =n∑i=1

di.

Further due to lemma 1.69 the characteristic polynomial and spectrum

of A and D are same. Due to lemma 1.67 the eigen values of D are

nothing but its diagonal entries. Hence they are also the eigen values

of A.

D = P−1AP =⇒ AP = PD.

Now writing

P =[p1 p2 . . . pn

]we have

AP =[Ap1 Ap2 . . . Apn

]= PD =

[d1p1 d2p2 . . . dnpn

].

Thus pi are eigen vectors of A.

Since the characteristic polynomials of A and D are same, hence the

algebraic multiplicities of eigen values are same.

From lemma 1.71 we get that there is a one to one correspondence

between the eigen vectors of A and D through the change of basis


given by P . Thus the linear independence relationships between the

eigen vectors remain the same. Hence the geometric multiplicities of

individual eigenvalues are also the same.


So far we have verified various results which are available if a matrix A

is diagonalizable. We haven’t yet identified the conditions under which

A is diagonalizable. We note that not every matrix is diagonalizable.

The following theorem gives necessary and sufficient conditions under

which a matrix is diagonalizable.

Theorem 1.79 An n× n matrix A is diagonalizable by an n× nnon-singular matrix P if and only if the columns of P are (linearly

independent) eigenvectors of A.

Proof. We note that since P is non-singular hence columns of P

have to be linearly independent.

The necessary condition part was proven in theorem 1.78. We now

show that if P consists of n linearly independent eigen vectors of A

then A is diagonalizable.

Let the columns of P be p1, p2, . . . , pn and corresponding (not neces-

sarily distinct) eigen values be d1, d2, . . . , dn. Then

Api = dipi.

Thus by letting D = diag(d1, d2, . . . , dn), we have

AP = PD.

Now since columns of P are linearly independent, hence P is invertible.

This gives us

D = P−1AP.

Thus A is similar to a diagonal matrix D. This validates the sufficient

condition. �


A corollary follows.

Corollary 1.80. An n×n matrix is diagonalizable if and only if there

exists a linearly independent set of n eigenvectors of A.

Now we know that geometric multiplicities of eigen values of A provide

us information about linearly independent eigenvectors of A.

Corollary 1.81. Let A be an n× n matrix. Let λ1, λ2, . . . , λk be its k

distinct eigen values (comprising its spectrum). Let gj be the geometric

multiplicity of λj.Then A is diagonalizable if and only if

n∑i=1

gi = n. (1.6.17)

1.6.6. Symmetric matrices

This subsection is focused on real symmetric matrices.

Following is a fundamental property of real symmetric matrices.

Theorem 1.82 Every real symmetric matrix has an eigen value.

The proof of this result is beyond the scope of this book.

Lemma 1.83 Let A be an n×n real symmetric matrix. Let λ1 and

λ2 be any two distinct eigen values of A and let x1 and x2 be any

two corresponding eigen vectors. Then x1 and x2 are orthogonal.

Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus

xT2Ax1 = λ1xT2 x1

=⇒ xT1ATx2 = λ1x

T1 x2

=⇒ xT1Ax2 = λ1xT1 x2

=⇒ xT1 λ2x2 = λ1xT1 x2

=⇒ (λ1 − λ2)xT1 x2 = 0

=⇒ xT1 x2 = 0.


Thus x1 and x2 are orthogonal. In between we took transpose on both

sides, used the fact that A = AT and λ1 − λ2 6= 0. �

Definition 1.32 [Orthogonally diagonalizable matrix] A real n×nmatrix A is said to be orthogonally diagonalizable if there

exists an orthogonal matrix U which can diagonalize A, i.e.

D = UTAU

is a real diagonal matrix.

Lemma 1.84 Every orthogonally diagonalizable matrix A is sym-

metric.

Proof. We have a diagonal matrix D such that

A = UDUT .

Taking transpose on both sides we get

AT = UDTUT = UDUT = A.

Thus A is symmetric. �

Theorem 1.85 Every symmetric matrix A is orthogonally diago-

nalizable.

We skip the proof of this theorem.

1.6.7. Hermitian matrices

Following is a fundamental property of Hermitian matrices.

Theorem 1.86 Every Hermitian matrix has an eigen value.

The proof of this result is beyond the scope of this book.


Lemma 1.87 The eigenvalues of a Hermitian matrix are real.

Proof. Let A be a Hermitian matrix and let λ be an eigen value

of A. Let u be a corresponding eigen vector. Then

Au = λu

=⇒ uHAH = uHλ

=⇒ uHAHu = uHλu

=⇒ uHAu = λuHu

=⇒ uHλu = λuHu

=⇒ ‖u‖22(λ− λ) = 0

=⇒ λ = λ

thus λ is real. We used the facts that A = AH and u 6= 0 =⇒ ‖u‖2 6=0. �

Lemma 1.88 Let A be an n × n complex Hermitian matrix. Let

λ1 and λ2 be any two distinct eigen values of A and let x1 and

x2 be any two corresponding eigen vectors. Then x1 and x2 are

orthogonal.

Proof. By definition we have Ax1 = λ1x1 and Ax2 = λ2x2. Thus

xH2 Ax1 = λ1xH2 x1

=⇒ xH1 AHx2 = λ1x

H1 x2

=⇒ xH1 Ax2 = λ1xH1 x2

=⇒ xH1 λ2x2 = λ1xH1 x2

=⇒ (λ1 − λ2)xH1 x2 = 0

=⇒ xH1 x2 = 0.

Thus x1 and x2 are orthogonal. In between we took conjugate transpose

on both sides, used the fact that A = AH and λ1 − λ2 6= 0. �


Definition 1.33 [Unitary diagonalizable matrix] A complex n×nmatrix A is said to be unitary diagonalizable if there exists a

unitary matrix U which can diagonalize A, i.e.

D = UHAU

is a complex diagonal matrix.

Lemma 1.89 Let A be a unitary diagonalizable matrix whose di-

agonalization D is real. Then A is Hermitian.

Proof. We have a real diagonal matrix D such that

A = UDUH .

Taking conjugate transpose on both sides we get

AH = UDHUH = UDUH = A.

Thus A is Hermitian. We used the fact that DH = D since D is

real. �

Theorem 1.90 Every Hermitian matrix A is unitary diagonaliz-

able.

We skip the proof of this theorem. The theorem means that if A is

Hermitian then A = UΛUH

Definition 1.34 [Eigen value decomposition of a Hermitian ma-

trix] Let A be an n × n Hermitian matrix. Let λ1, . . . λn be its

eigen values such that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Let

Λ = diag(λ1, . . . , λn).

Let U be a unit matrix consisting of orthonormal eigen vectors

corresponding to λ1, . . . , λn. Then The eigen value decomposition

of A is defined as

A = UΛUH . (1.6.18)


If λi are distinct, then the decomposition is unique. If they are

not distinct, then

Remark. Let Λ be a diagonal matrix as in definition 1.34. Consider

some vector x ∈ Cn.

xHΛx =n∑i=1

λi|xi|2. (1.6.19)

Now if λi ≥ 0 then

xHΛx ≤ λ1

n∑i=1

|xi|2 = λ1‖x‖22.

Also

xHΛx ≥ λn

n∑i=1

|xi|2 = λn‖x‖22.

Lemma 1.91 Let A be a Hermitian matrix with non-negative eigen

values. Let λ1 be its largest and λn be its smallest eigen values.

λn‖x‖22 ≤ xHAx ≤ λ1‖x‖22 ∀ x ∈ Cn. (1.6.20)

Proof. A has an eigen value decomposition given by

A = UΛUH .

Let x ∈ Cn and let v = UHx. Clearly ‖x‖2 = ‖v‖2. Then

xHAx = xHUΛUHx = vHΛv.

From previous remark we have

λn‖v‖22 ≤ vHΛv ≤ λ1‖v‖22.

Thus we get

λn‖x‖22 ≤ xHAx ≤ λ1‖x‖22.

�


1.6.8. Miscellaneous properties

This subsection lists some miscellaneous properties of eigen values of a

square matrix.

Lemma 1.92 λ is an eigen value of A if and only if λ + k is an

eigen value of A + kI. Moreover A and A + kI share the same

eigen vectors.

Proof.Ax = λx

⇐⇒ Ax+ kx = λx+ kx

⇐⇒ (A+ kI)x = (λ+ k)x.

(1.6.21)

Thus λ is an eigen value of A with an eigen vector x if and only if λ+k

is an eigen vector of A+ kI with an eigen vector x. �

1.6.9. Diagonally dominant matrices

Definition 1.35 [Diagonally dominant matrix] Let A = [aij] be a

square matrix in Cn×n. A is called diagonally dominant if

|aii| ≥∑j 6=i

|aij|

holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal

element is greater than or equal to the sum of absolute values of

all the off diagonal elements on that row.

Definition 1.36 [Strictly diagonally dominant matrix] Let A =

[aij] be a square matrix in Cn×n. A is called strictly diagonally

dominant if

|aii| >∑j 6=i

|aij|

holds true for all 1 ≤ i ≤ n. i.e. the absolute value of the diagonal

element is bigger than the sum of absolute values of all the off

diagonal elements on that row.


Example 1.2: Strictly diagonally dominant matrix Let us con-

sider

A =

−4 −2 −1 0

−4 7 2 0

3 −4 9 1

2 −1 −3 15

We can see that the strict diagonal dominance condition is satisfied for

each row as follows:

row 1 : | − 4| > | − 2|+ | − 1|+ |0| = 3

row 2 : |7| > | − 4|+ |2|+ |0| = 6

row 3 : |9| > |3|+ | − 4|+ |1| = 8

row 4 : |15| > |2|+ | − 1|+ | − 3| = 6

�

Strictly diagonally dominant matrices have a very special property.

They are always non-singular.

Theorem 1.93 Strictly diagonally dominant matrices are non-

singular.

Proof. Suppose that A is diagonally dominant and singular. Then

there exists a vector u ∈ Cn with u 6= 0 such that

Au = 0. (1.6.22)

Let

u =[u1 u2 . . . un

]T.

We first show that every entry in u cannot be equal in magnitude. Let

us assume that this is so. i.e.

c = |u1| = |u2| = · · · = |un|.


Since u 6= 0 hence c 6= 0. Now for any row i in (1.6.22) , we have

n∑j=1

aijuj = 0

=⇒n∑j=1

±aijc = 0

=⇒n∑j=1

±aij = 0

=⇒ ∓ aii =∑j 6=i

±aij

=⇒ |aii| = |∑j 6=i

±aij|

=⇒ |aii| ≤∑j 6=i

|aij| using triangle inequality

but this contradicts our assumption that A is strictly diagonally dom-

inant. Thus all entries in u are not equal in magnitude.

Let us now assume that the largest entry in u lies at index i with

|ui| = c. Without loss of generality we can scale down u by c to

get another vector in which all entries are less than or equal to 1 in

magnitude while i-th entry is ±1. i.e. ui = ±1 and |uj| ≤ 1 for all

other entries.

Now from (1.6.22) we get for the i-th row

n∑j=1

aijuj = 0

=⇒ ± aii =∑j 6=i

ujaij

=⇒ |aii| ≤∑j 6=i

|ujaij| ≤∑j 6=i

|aij|

which again contradicts our assumption that A is strictly diagonally

dominant.

Hence strictly diagonally dominant matrices are non-singular. �


1.6.10. Gershgorin’s theorem

We are now ready to examine Gershgorin’ theorem which provides very

useful bounds on the spectrum of a square matrix.

Theorem 1.94 Every eigen value λ of a square matrix A ∈ Cn×n

satisfies

|λ− aii| ≤∑j 6=i

|aij| for some i ∈ {1, 2, . . . , n}. (1.6.23)

Proof. The proof is a straight forward application of non-singularity

of diagonally dominant matrices.

We know that for an eigen value λ, det(λI − A) = 0 i.e. the matrix

(λI − A) is singular. Hence it cannot be strictly diagonally dominant

due to theorem 1.93.

Thus looking at each row i of (λI − A) we can say that

|λ− aii| >∑j 6=i

|aij|

cannot be true for all rows simultaneously. i.e. it must fail at least for

one row. This means that there exists at least one row i for which

|λ− aii| ≤∑j 6=i

|aij|

holds true. �

What this theorem means is pretty simple. Consider a disc in the

complex plane for the i-th row of A whose center is given by aii and

whose radius is given by r =∑

j 6=i |aij| i.e. the sum of magnitudes of

all non-diagonal entries in i-th row.

There are n such discs corresponding to n rows in A. (1.6.23) means

that every eigen value must lie within the union of these discs. It

cannot lie outside.

This idea is crystallized in following definition.

1.7. SINGULAR VALUES 53

Definition 1.37 [Gershgorin’s disc] For i-th row of matrix A we

define the radius ri =∑

j 6=i |aij| and the center ci = aii. Then the

set given by

Di = {z ∈ C : |z − aii| ≤ ri}

is called the i-th Gershgorin’s disc of A.

We note that the definition is equally valid for real as well as complex

matrices. For real matrices, the centers of disks lie on the real line. For

complex matrices, the centers may lie anywhere in the complex plane.

Clearly there is nothing magical about the rows of A. We can as well

consider the columns of A.

Theorem 1.95 Every eigen value of a matrix A must lie in a

Gershgorin disc corresponding to the columns of A where the Ger-

shgorin disc for j-th column is given by

Dj = {z ∈ C : |z − ajj| ≤ rj}

with

rj =∑i 6=j

|aij|

Proof. We know that eigen values of A are same as eigen values of

AT and columns of A are nothing but rows of AT . Hence eigen values of

A must satisfy conditions in theorem 1.94 w.r.t. the matrix AT . This

completes the proof. �

1.7. Singular values

In previous section we saw diagonalization of square matrices which

resulted in an eigen value decomposition of the matrix. This matrix

factorization is very useful yet it is not applicable in all situations. In

particular, the eigen value decomposition is useless if the square matrix

is not diagonalizable or if the matrix is not square at all. Moreover,


the decomposition is particularly useful only for real symmetric or Her-

mitian matrices where the diagonalizing matrix is an F-unitary matrix

(see definition 1.23). Otherwise, one has to consider the inverse of the

diagonalizing matrix also.

Fortunately there happens to be another decomposition which applies

to all matrices and it involves just F-unitary matrices.

Definition 1.38 [Singular value] A non-negative real number σ is

a singular value for a matrix A ∈ Fm×n if and only if there exist

unit-length vectors u ∈ Fm and v ∈ Fn such that

Av = σu (1.7.1)

and

AHu = σv (1.7.2)

hold. The vectors u and v are called left-singular and right-

singular vectors for σ respectively.

We first present the basic result of singular value decomposition. We

will not prove this result completely although we will present proofs of

some aspects.

Theorem 1.96 For every A ∈ Fm×n with k = min(m,n), there

exist two F-unitary matrices U ∈ Fm×m and V ∈ Fn×n and a

sequence of real numbers

σ1 ≥ σ2 ≥ · · · ≥ σk ≥ 0

such that

UHAV = Σ (1.7.3)

where

Σ = diag(σ1, σ2, . . . , σk) ∈ Fm×n.

The non-negative real numbers σi are the singular values of A as

per definition 1.38.


The sequence of real numbers σi doesn’t depend on the particular

choice of U and V .

Σ is rectangular with the same size as A. The singular values of A lie

on the principle diagonal of Σ. All other entries in Σ are zero.

It is certainly possible that some of the singular values are 0 themselves.

Remark. Since UHAV = Σ hence

A = UΣV H . (1.7.4)

Definition 1.39 [Singular value decomposition] The decomposi-

tion of a matrix A ∈ Fm×n given by

A = UΣV H (1.7.5)

is known as its singular value decomposition.

Remark. When F is R then the decomposition simplifies to

UTAV = Σ (1.7.6)

and

A = UΣV T . (1.7.7)

Remark. Clearly there can be at most k = min(m,n) distinct singular

values of A.

Remark. We can also write

AV = UΣ. (1.7.8)

Remark. Let us expand

A = UΣV H =[u1 u2 . . . um

] [σij

]vH1vH2...

vHn

=m∑i=1

n∑j=1

σijuivHj .


Remark. Alternatively, let us expand

Σ = UHAV =

uH1uH2...

uHm

A[v1 v2 . . . vm

]=[uHi Avj

]

This gives us

σij = uHi Avj. (1.7.9)

Following lemma verifies that Σ indeed consists of singular values of A

as per definition 1.38.

Lemma 1.97 Let A = UΣV H be a singular value decomposition

of A. Then the main diagonal entries of Σ are singular values.

The first k = min(m,n) column vectors in U and V are left and

right singular vectors of A.

Proof. We have

AV = UΣ.

Let us expand R.H.S.

UΣ =[∑m

j=1 uijσjk

]= [uikσk] =

[σ1u1 σ2u2 . . . σkuk 0 . . . 0

]where 0 columns in the end appear n− k times.

Expanding the L.H.S. we get

AV =[Av1 Av2 . . . Avn

].

Thus by comparing both sides we get

Avi = σiui for 1 ≤ i ≤ k

and

Avi = 0 for k < i ≤ n.

Now let us start with

A = UΣV H =⇒ AH = V ΣHUH =⇒ AHU = V ΣH .


Let us expand R.H.S.

V ΣH =[∑n

j=1 vijσjk

]= [vikσk] =

[σ1v1 σ2v2 . . . σkvk 0 . . . 0

]where 0 columns appear m− k times.

Expanding the L.H.S. we get

AHU =[AHu1 AHu2 . . . AHum

].

Thus by comparing both sides we get

AHui = σivi for 1 ≤ i ≤ k

and

AHui = 0 for k < i ≤ m.

We now consider the three cases.

For m = n, we have k = m = n. And we get

Avi = σiui, AHui = σivi for 1 ≤ i ≤ m

Thus σi is a singular value of A and ui is a left singular vector while vi

is a right singular vector.

For m < n, we have k = m. We get for first m vectors in V

Avi = σiui, AHui = σivi for 1 ≤ i ≤ m.

Finally for remaining n−m vectors in V , we can write

Avi = 0.

They belong to the null space of A.

For m > n, we have k = n. We get for first n vectors in U

Avi = σiui, AHui = σivi for 1 ≤ i ≤ n.

Finally for remaining m− n vectors in U , we can write

AHui = 0.

�


Lemma 1.98 ΣΣH is an m×m matrix given by

ΣΣH = diag(σ21, σ

22, . . . σ

2k, 0, 0, . . . 0)

where the number of 0’s following σ2k is m− k.

Lemma 1.99 ΣHΣ is an n× n matrix given by

ΣHΣ = diag(σ21, σ

22, . . . σ

2k, 0, 0, . . . 0)

where the number of 0’s following σ2k is n− k.

Lemma 1.100 [Rank and singular value decomposition] Let A ∈Fm×n have a singular value decomposition given by

A = UΣV H .

Then

rank(A) = rank(Σ). (1.7.10)

In other words, rank of A is number of non-zero singular values of

A. Since the singular values are ordered in descending order in A

hence, the first r singular values σ1, . . . , σr are non-zero.

Proof. This is a straight forward application of lemma 1.6 and

lemma 1.7. Further since only non-zero values in Σ appear on its main

diagonal hence its rank is number of non-zero singular values σi. �

Corollary 1.101. Let r = rank(A). Then Σ can be split as a block

matrix

Σ =

[Σr 0

0 0

](1.7.11)

where Σr is an r × r diagonal matrix of the non-zero singular values

diag(σ1, σ2, . . . , σr). All other sub-matrices in Σ are 0.


Lemma 1.102 The eigen values of Hermitian matrix AHA ∈Fn×n are σ2

1, σ22, . . . σ

2k, 0, 0, . . . 0 with n− k 0’s after σ2

k. Moreover

the eigen vectors are the columns of V .

Proof.

AHA =(UΣV H

)HUΣV H = V ΣHUHUΣV H = V ΣHΣV H .

We note that AHA is Hermitian. Hence AHA is diagonalized by V and

the diagonalization of AHA is ΣHΣ. Thus the eigen values of AHA are

σ21, σ

22, . . . σ

2k, 0, 0, . . . 0 with n− k 0’s after σ2

k.

Clearly

(AHA)V = V (ΣHΣ)

thus columns of V are the eigen vectors of AHA. �

Lemma 1.103 The eigen values of Hermitian matrix AAH ∈Fm×m are σ2

1, σ22, . . . σ

2k, 0, 0, . . . 0 with m−k 0’s after σ2

k. Moreover

the eigen vectors are the columns of V .

Proof.

AAH = UΣV H(UΣV H

)H= UΣV HV ΣHUH = UΣΣHUH .

We note that AHA is Hermitian. Hence AHA is diagonalized by V and

the diagonalization of AHA is ΣHΣ. Thus the eigen values of AHA are

σ21, σ

22, . . . σ

2k, 0, 0, . . . 0 with m− k 0’s after σ2

k.

Clearly

(AAH)U = U(ΣΣH)

thus columns of U are the eigen vectors of AAH . �

Lemma 1.104 The Gram matrices AAH and AHA share the same

eigen values except for some extra 0s. Their eigen values are the

squares of singular values of A and some extra 0s. In other words


singular values of A are the square roots of non-zero eigen values

of the Gram matrices AAH or AHA.

1.7.1. The largest singular value

Lemma 1.105 For all u ∈ Fn the following holds

‖Σu‖2 ≤ σ1‖u‖2 (1.7.12)

Moreover for all u ∈ Fm the following holds

‖ΣHu‖2 ≤ σ1‖u‖2 (1.7.13)

Proof. Let us expand the term Σu.

σ1 0 . . . . . . 0

0 σ2 . . . . . . 0...

.... . . . . . 0

0... σk . . . 0

0 0... . . . 0

u1

u2...

uk...

un

=

σ1u1

σ2u2...

σkuk

0...

0

Now since σ1 is the largest singular value, hence

|σrui| ≤ |σ1ui| ∀ 1 ≤ i ≤ k.

Thusn∑i=1

|σ1ui|2 ≥n∑i=1

|σiui|2

or

σ21‖u‖22 ≥ ‖Σu‖22.

The result follows.

A simpler representation of Σu can be given using corollary 1.101. Let

r = rank(A). Thus

Σ =

[Σr 0

0 0

]


We split entries in u as u = [(u1, . . . , ur)(ur+1 . . . un)]T . Then

Σu =

Σr

[u1 . . . ur

]T0[ur+1 . . . un

]T =

[σ1u1 σ2u2 . . . σrur 0 . . . 0

]TThus

‖Σu‖22 =r∑i=1

|σiui|2 ≤ σ1

r∑i=1

|ui|2 ≤ σ1‖u‖22.

2nd result can also be proven similarly. �

Lemma 1.106 Let σ1 be the largest singular value of an m × n

matrix A. Then

‖Ax‖2 ≤ σ1‖x‖2 ∀ x ∈ Fn. (1.7.14)

Moreover

‖AHx‖2 ≤ σ1‖x‖2 ∀ x ∈ Fm. (1.7.15)

Proof.

‖Ax‖2 = ‖UΣV Hx‖2 = ‖ΣV Hx‖2since U is unitary. Now from previous lemma we have

‖ΣV Hx‖2 ≤ σ1‖V Hx‖2 = σ1‖x‖2

since V H also unitary. Thus we get the result

‖Ax‖2 ≤ σ1‖x‖2 ∀ x ∈ Fn.

Similarly

‖AHx‖2 = ‖V ΣHUHx‖2 = ‖ΣHUHx‖2since V is unitary. Now from previous lemma we have

‖ΣHUHx‖2 ≤ σ1‖UHx‖2 = σ1‖x‖2

since UH also unitary. Thus we get the result

‖AHx‖2 ≤ σ1‖x‖2 ∀ x ∈ Fm.

�


There is a direct connection between the largest singular value and

2-norm of a matrix (see section 1.8.6).

Corollary 1.107. The largest singular value of A is nothing but its

2-norm. i.e.

σ1 = max‖u‖2=1

‖Au‖2.

1.7.2. SVD and pseudo inverse

Lemma 1.108 [Pseudo-inverse of Σ] Let A = UΣV H and let r =

rank(A). Let σ1, . . . , σr be the r non-zero singular values of A.

Then the Moore-Penrose pseudo-inverse of Σ is an n ×m matrix

Σ† given by

Σ† =

[Σ−1r 0

0 0

](1.7.16)

where Σr = diag(σ1, . . . , σr).

Essentially Σ† is obtained by transposing Σ and inverting all its

non-zero (positive real) values.

Proof. Straight forward application of lemma 1.32. �

Corollary 1.109. The rank of Σ and its pseudo-inverse Σ† are same.

i.e.

rank(Σ) = rank(Σ†). (1.7.17)

Proof. The number of non-zero diagonal entries in Σ and Σ† are

same. �

Lemma 1.110 Let A be an m× n matrix and let A = UΣV H be

its singular value decomposition. Let Σ† be the pseudo inverse of

Σ as per lemma 1.108. Then the Moore-Penrose pseudo-inverse of

A is given by

A† = V Σ†UH . (1.7.18)


Proof. As usual we verify the requirements for a Moore-Penrose

pseudo-inverse as per definition 1.19. We note that since Σ† is the

pseudo-inverse of Σ it already satisfies necessary criteria.

First requirement:

AA†A = UΣV HV Σ†UHUΣV H = UΣΣ†ΣV H = UΣV H = A.

Second requirement:

A†AA† = V Σ†UHUΣV HV Σ†UH = V Σ†ΣΣ†UH = V Σ†UH = A†.

We now consider

AA† = UΣV HV Σ†UH = UΣΣ†UH .

Thus(AA†

)H=(UΣΣ†UH

)H= U

(ΣΣ†

)HUH = UΣΣ†UH = AA†

since ΣΣ† is Hermitian.

Finally we consider

A†A = V Σ†UHUΣV H = V Σ†ΣV H .

Thus(A†A

)H=(V Σ†ΣV H

)H= V

(Σ†Σ

)HV H = V Σ†ΣV H = A†A

since Σ†Σ is also Hermitian.


Finally we can connect the singular values of A with the singular values

of its pseudo-inverse.

Corollary 1.111. The rank of any m × n matrix A and its pseudo-

inverse A† are same. i.e.

rank(A) = rank(A†). (1.7.19)

Proof. We have rank(A) = rank(Σ). Also its easy to verify that

rank(A†) = rank(Σ†). So using corollary 1.109 completes the proof. �


Lemma 1.112 Let A be an m× n matrix and let A† be its n×mpseudo inverse as per lemma 1.110. Let r = rank(A) Let k =

min(m,n) denote the number of singular values while r denote the

number of non-singular values of A. Let σ1, . . . , σr be the non-zero

singular values of A. Then the number of singular values of A† is

same as that of A and the non-zero singular values of A† are

1

σ1, . . . ,

1

σr

while all other k − r singular values of A† are zero.

Proof. k = min(m,n) denotes the number of singular values for

both A and A†. Since rank of A and A† are same, hence the number

of non-zero singular values is same. Now look at

A† = V Σ†UH

where

Σ† =

[Σ−1r 0

0 0

].

Clearly Σ−1r = diag( 1σ1, . . . , 1

σr).

Thus expanding the R.H.S. we can get

A† =r∑i=1

1

σiviu

Hi

where vi and ui are first r columns of V and U respectively. If we

reverse the order of first r columns of U and V and reverse the first r

diagonal entries of Σ† , the R.H.S. remains the same while we are able

to express A† in the standard singular value decomposition form. Thus1σ1, . . . , 1

σrare indeed the non-zero singular values of A†. �

1.7.3. Full column rank matrices

In this subsection we consider some specific results related to singular

value decomposition of a full column rank matrix.


We will consider A to be an m × n matrix in Fm×n with m ≥ n and

rank(A) = n. Let A = UΣV H be its singular value decomposition.

From lemma 1.100 we observe that there are n non-zero singular values

of A. We will call these singular values as σ1, σ2, . . . , σn. We will define

Σn = diag(σ1, σ2, . . . , σn).

Clearly Σ is an 2× 1 block matrix given by

Σ =

[Σn

0

]

where the lower 0 is an (m− n)× n zero matrix. From here we obtain

that ΣHΣ is an n× n matrix given by

ΣHΣ = Σ2n

where

Σ2n = diag(σ2

1, σ22, . . . , σ

2n).

Lemma 1.113 Let A be a full column rank matrix with singular

value decomposition A = UΣV H . Then ΣHΣ = Σ2n = diag(σ2

1, σ22, . . . , σ

2n)

and ΣHΣ is invertible.

Proof. Since all singular values are non-zero hence Σ2n is invert-

ible. Thus(ΣHΣ

)−1=(Σ2n

)−1= diag

(1

σ21

,1

σ22

, . . . ,1

σ2n

). (1.7.20)

�


value decomposition A = UΣV H . Let σ1 be its largest singular

value and σn be its smallest singular value. Then

σ2n‖x‖2 ≤ ‖ΣHΣx‖2 ≤ σ2

1‖x‖2 ∀ x ∈ Fn. (1.7.21)


Proof. Let x ∈ Fn. We have

‖ΣHΣx‖22 = ‖Σ2nx‖22 =

n∑i=1

|σ2i xi|2.

Now since

σn ≤ σi ≤ σ1

hence

σ4n

n∑i=1

|xi|2 ≤n∑i=1

|σ2i xi|2 ≤ σ4

1

n∑i=1

|xi|2

thus

σ4n‖x‖22 ≤ ‖ΣHΣx‖22 ≤ σ4

1‖x‖22.

Applying square roots, we get

σ2n‖x‖2 ≤ ‖ΣHΣx‖2 ≤ σ2

1‖x‖2 ∀ x ∈ Fn.

�

We recall from corollary 1.25 that the Gram matrix of its column vec-

tors G = AHA is full rank and invertible.




σ2n‖x‖2 ≤ ‖AHAx‖2 ≤ σ2

1‖x‖2 ∀ x ∈ Fn. (1.7.22)

Proof.

AHA = (UΣV H)H(UΣV H) = V ΣHΣV H .

Let x ∈ Fn. Let

u = V Hx =⇒ ‖u‖2 = ‖x‖2.

Let

r = ΣHΣu.

Then from previous lemma we have

σ2n‖u‖2 ≤ ‖ΣHΣu‖2 = ‖r‖2 ≤ σ2

1‖u‖2.


Finally

AHAx = V ΣHΣV Hx = V r.

Thus

‖AHAx‖2 = ‖r‖2.

Substituting we get

σ2n‖x‖2 ≤ ‖AHAx‖2 ≤ σ2

1‖x‖2 ∀ x ∈ Fn.

�

There are bounds for the inverse of Gram matrix also. First let us

establish the inverse of Gram matrix.


value decomposition A = UΣV H . Let the singular values of A be

σ1, . . . , σn. Let the Gram matrix of columns of A be G = AHA.

Then

G−1 = VΨV H

where

Ψ = diag

(1

σ21

,1

σ22

, . . . ,1

σ2n

).

Proof. We have

G = V ΣHΣV H

Thus

G−1 =(V ΣHΣV H

)−1=(V H)−1 (

ΣHΣ)−1

V −1 = V(ΣHΣ

)−1V H .

From lemma 1.113 we have

Ψ =(ΣHΣ

)−1= diag

(1

σ21

,1

σ22

, . . . ,1

σ2n

).


We can now state the bounds:





1

σ21

‖x‖2 ≤ ‖(AHA

)−1x‖2 ≤

1

σ2n

‖x‖2 ∀ x ∈ Fn. (1.7.23)

Proof. From lemma 1.116 we have

G−1 =(AHA

)−1= VΨV H

where

Ψ = diag

(1

σ21

,1

σ22

, . . . ,1

σ2n

).

Let x ∈ Fn. Let

u = V Hx =⇒ ‖u‖2 = ‖x‖2.

Let

r = Ψu.

Then

‖r‖22 =n∑i=1

∣∣∣∣ 1

σ2i

ui

∣∣∣∣2 .Thus

1

σ21

‖u‖2 ≤ ‖Ψu‖2 = ‖r‖2 ≤1

σ2n

‖u‖2.

Finally (AHA

)−1x = VΨV Hx = V r.

Thus

‖(AHA

)−1x‖2 = ‖r‖2.

Substituting we get the result. �

1.8. MATRIX NORMS 69

1.7.4. Low rank approximation of a matrix

Definition 1.40 An m× n matrix A is called low rank if

rank(A)� min(m,n). (1.7.24)

Remark. A matrix is low rank if the number of non-zero singular

values for the matrix is much smaller than its dimensions.

Following is a simple procedure for making a low rank approximation

of a given matrix A.

(1) Perform the singular value decomposition of A given by A =

UΣV H .

(2) Identify the singular values of A in Σ.

(3) Keep the first r singular values (where r � min(m,n) is the

rank of the approximation). and set all other singular values

to 0 to obtain Σ.

(4) Compute A = UΣV H .

1.8. Matrix norms

This section reviews various matrix norms on the vector space of com-

plex matrices over the field of complex numbers (Cm×n,C).

We know (Cm×n,C) is a finite dimensional vector space with dimension

mn. We will usually refer to it as Cm×n.

Matrix norms will follow the usual definition of norms for a vector

space.

Definition 1.41 A function ‖ · ‖ : Cm×n → R is called a matrix

norm on Cm×n if for all A,B ∈ Cm×n and all α ∈ C it satisfies

the following

Positivity:

‖A‖ ≥ 0


with ‖A‖ = 0 ⇐⇒ A = 0.

Homogeneity:

‖αA‖ = |α|‖A‖.

Triangle inequality:

‖A+B‖ ≤ ‖A‖+ ‖B‖.

We recall some of the standard results on normed vector spaces.

All matrix norms are equivalent. Let ‖ · ‖ and ‖ · ‖′ be two different

matrix norms on Cm×n. Then there exist two constants a and b such

that the following holds

a‖A‖ ≤ ‖A‖′ ≤ b‖A‖ ∀ A ∈ Cm×n.

A matrix norm is a continuous function ‖ · ‖ : Cm×n → R.

1.8.1. Norms like lp on Cn

Following norms are quite like lp norms on finite dimensional complex

vector space Cn. They are developed by the fact that the matrix vector

space Cm×n has one to one correspondence with the complex vector

space Cmn.

Definition 1.42 Let A ∈ Cm×n and A = [aij].

Matrix sum norm is defined as

‖A‖S =m∑i=1

n∑j=1

|aij| (1.8.1)


Matrix Frobenius norm is defined as

‖A‖F =

(m∑i=1

n∑j=1

|aij|2) 1

2

. (1.8.2)



Matrix Max norm is defined as

‖A‖M = max1≤i≤m1≤j≤n

|aij|. (1.8.3)

1.8.2. Properties of Frobenius norm

We now prove some elementary properties of Frobenius norm.

Lemma 1.118 The Frobenius norm of a matrix is equal to the

Frobenius norm of its Hermitian transpose.

‖AH‖F = ‖A‖F . (1.8.4)

Proof. Let

A = [aij].

Then

AH = [aji]

‖AH‖2F =

(n∑j=1

m∑i=1

|aij|2)

=

(m∑i=1

n∑j=1

|aij|2)

= ‖A‖2F .

Now

‖AH‖2F = ‖A‖2F =⇒ ‖AH‖F = ‖A‖F

�

Lemma 1.119 Let A ∈ Cm×n be written as a row of column vec-

tors

A =[a1 . . . an

].

Then

‖A‖2F =n∑j=1

‖aj‖22. (1.8.5)


Proof. We note that

‖aj‖22 =m∑i=1

‖aij‖22.

Now

‖A‖2F =

(m∑i=1

n∑j=1

|aij|2)

=

(n∑j=1

(m∑i=1

|aij|2))

=

(n∑j=1

‖aj‖22

).

�

We thus showed that that the square of the Frobenius norm of a matrix

is nothing but the sum of squares of l2 norms of its columns.

Lemma 1.120 Let A ∈ Cm×n be written as a column of row vec-

tors

A =

a1

...

am

.Then

‖A‖2F =m∑i=1

‖ai‖22. (1.8.6)

Proof. We note that

‖ai‖22 =n∑j=1

‖aij‖22.

Now

‖A‖2F =

(m∑i=1

n∑j=1

|aij|2)

=m∑i=1

‖ai‖22.

�

We now consider how the Frobenius norm is affected with the action

of unitary matrices.

Let A be any arbitrary matrix in Cm×n. Let U be some unitary matrices

in Cm×m. Let V be some unitary matrices in Cn×n.


We present our first result that multiplication with unitary matrices

doesn’t change Frobenius norm of a matrix.

Theorem 1.121 The Frobenius norm of a matrix is invariant to

pre or post multiplication by a unitary matrix. i.e.

‖UA‖F = ‖A‖F (1.8.7)

and

‖AV ‖F = ‖A‖F . (1.8.8)

Proof. We can write A as

A =[a1 . . . an

].

So

UA =[Ua1 . . . Uan

].

Then applying lemma 1.119 clearly

‖UA‖2F =n∑j=1

‖Uaj‖22.

But we know that unitary matrices are norm preserving. Hence

‖Uaj‖22 = ‖aj‖22.

Thus

‖UA‖2F =n∑j=1

‖aj‖22 = ‖A‖2F

which implies

‖UA‖F = ‖A‖F .

Similarly writing A as


A =

r1...

rm

.we have

AV =

r1V

...

rmV

.Then applying lemma 1.120 clearly

‖AV ‖2F =m∑i=1

‖riV ‖22.

But we know that unitary matrices are norm preserving. Hence

‖riV ‖22 = ‖ri‖22.

Thus

‖AV ‖2F =m∑i=1

‖ri‖22 = ‖A‖2F

which implies

‖AV ‖F = ‖A‖F .

An alternative approach for the 2nd part of the proof using the first

part is just one line

‖AV ‖F = ‖(AV )H‖F = ‖V HAH‖F = ‖AH‖F = ‖A‖F .

In above we use lemma 1.118 and the fact that V is a unitary matrix

implies that V H is also a unitary matrix. We have already shown that

pre multiplication by a unitary matrix preserves Frobenius norm. �

Theorem 1.122 Let A ∈ Cm×n and B ∈ Cn×P be two matrices.

Then the Frobenius norm of their product is less than or equal to


the product of Frobenius norms of the matrices themselves. i.e.

‖AB‖F ≤ ‖A‖F‖B‖F . (1.8.9)

Proof. We can write A as

A =

aT1...

aTm

where ai are m column vectors corresponding to rows of A. Similarly

we can write B as

B =[b1 . . . bP

]where bi are column vectors corresponding to columns of B. Then

AB =

aT1...

aTm

[b1 . . . bP

]=

aT1 b1 . . . aT1 bP

.... . .

...

aTmb1 . . . aTmbP

=[aTi bj

].

Now looking carefully

aTi bj = 〈ai, bj〉

Applying the Cauchy-Schwartz inequality we have

|〈ai, bj〉|2 ≤ ‖ai‖22‖bj‖22 = ‖ai‖22‖bj‖22

Now

‖AB‖2F =m∑i=1

P∑j=1

|aTi bj|2

≤m∑i=1

P∑j=1

‖ai‖22‖bj‖22

=

(m∑i=1

‖ai‖22

)(P∑j=1

‖bj‖22

)= ‖A‖2F‖B‖2F

which implies

‖AB‖F ≤ ‖A‖F‖B‖Fby taking square roots on both sides. �


Corollary 1.123. Let A ∈ Cm×n and let x ∈ Cn. Then

‖Ax‖2 ≤ ‖A‖F‖x‖2.

Proof. We note that Frobenius norm for a column matrix is same

as l2 norm for corresponding column vector. i.e.

‖x‖F = ‖x‖2 ∀ x ∈ Cn.

Now applying theorem 1.122 we have

‖Ax‖2 = ‖Ax‖F ≤ ‖A‖F‖x‖F = ‖A‖F‖x‖2 ∀ x ∈ Cn.

�

It turns out that Frobenius norm is intimately related to the singular

value decomposition of a matrix.

Lemma 1.124 Let A ∈ Cm×n. Let the singular value decomposi-

tion of A be given by

A = UΣV H .

Let the singular value of A be σ1, . . . , σn. Then

‖A‖F =

√√√√ n∑i=1

σ2i . (1.8.10)

Proof.

A = UΣV H =⇒ ‖A‖F = ‖UΣV H‖F .

But

‖UΣV H‖F = ‖ΣV H‖F = ‖Σ‖Fsince U and V are unitary matrices (see theorem 1.121 ).

Now the only non-zero terms in Σ are the singular values. Hence

‖A‖F = ‖Σ‖F =

√√√√ n∑i=1

σ2i .


�

1.8.3. Consistency of a matrix norm

Definition 1.45 A matrix norm ‖·‖ is called consistent on Cn×n

if

‖AB‖ ≤ ‖A‖‖B‖ (1.8.11)

holds true for all A,B ∈ Cn×n. A matrix norm ‖ · ‖ is called

consistent if it is defined on Cm×n for all m,n ∈ N and eq (1.8.11)

holds for all matrices A,B for which the product AB is defined.

A consistent matrix norm is also known as a sub-multiplicative

norm.

With this definition and results in theorem 1.122 we can see that Frobe-

nius norm is consistent.

1.8.4. Subordinate matrix norm

A matrix operates on vectors from one space to generate vectors in

another space. It is interesting to explore the connection between the

norm of a matrix and norms of vectors in the domain and co-domain

of a matrix.

Definition 1.46 Let m,n ∈ N be given. Let ‖ · ‖α be some norm

on Cm and ‖ · ‖β be some norm on Cn. Let ‖ · ‖ be some norm on

matrices in Cm×n. We say that ‖ · ‖ is subordinate to the vector

norms ‖ · ‖α and ‖ · ‖β if

‖Ax‖α ≤ ‖A‖‖x‖β (1.8.12)

for all A ∈ Cm×n and for all x ∈ Cn. In other words the length of

the vector doesn’t increase by the operation of A beyond a factor

given by the norm of the matrix itself.

If ‖ · ‖α and ‖ · ‖β are same then we say that ‖ · ‖ is subordinate

to the vector norm ‖ · ‖α.


We have shown earlier in corollary 1.123 that Frobenius norm is sub-

ordinate to Euclidean norm.

1.8.5. Operator norm

We now consider the maximum factor by which a matrix A can increase

the length of a vector.

Definition 1.47 Let m,n ∈ N be given. Let ‖ · ‖α be some norm

on Cn and ‖ · ‖β be some norm on Cm. For A ∈ Cm×n we define

‖A‖ , ‖A‖α→β , maxx 6=0

‖Ax‖β‖x‖α

. (1.8.13)

‖Ax‖β‖x‖α represents the factor with which the length of x increased

by operation of A. We simply pick up the maximum value of such

scaling factor.

The norm as defined above is known as (α→ β) operator norm,

the (α→ β)-norm, or simply the α-norm if α = β.

Off course we need to verify that this definition satisfies all properties

of a norm.

Clearly if A = 0 then Ax = 0 always, hence ‖A‖ = 0.

Conversely, if ‖A‖ = 0 then ‖Ax‖β = 0 ∀ x ∈ Cn. In particular this is

true for the unit vectors ei ∈ Cn. The i-th column of A is given by Aei

which is 0. Thus each column in A is 0. Hence A = 0.

Now consider c ∈ C.

‖cA‖ = maxx 6=0

‖cAx‖β‖x‖α

= |c|maxx6=0

‖Ax‖β‖x‖α

= |c|‖A‖.

We now present some useful observations on operator norm before we

can prove triangle inequality for operator norm.

For any x ∈ ker(A), Ax = 0 hence we only need to consider vectors

which don’t belong to the kernel of A.


Thus we can write

‖A‖α→β = maxx/∈ker(A)

‖Ax‖β‖x‖α

. (1.8.14)

We also note that

‖Acx‖β‖cx‖α

=|c|‖Ax‖β|c|‖x‖α

=‖Ax‖β‖x‖α

∀ c 6= 0, x 6= 0.

Thus, it is sufficient to find the maximum on unit norm vectors:

‖A‖α→β = max‖x‖α=1

‖Ax‖β.

Note that since ‖x‖α = 1 hence the term in denominator goes away.

Lemma 1.125 The (α→ β)-operator norm is subordinate to vec-

tor norms ‖ · ‖α and ‖ · ‖β. i.e.

‖Ax‖β ≤ ‖A‖α→β‖x‖α. (1.8.15)

Proof. For x = 0 the inequality is trivially satisfied. Now for

x 6= 0 by definition, we have

‖A‖α→β ≥‖Ax‖β‖x‖α

=⇒ ‖A‖α→β‖x‖α ≥ ‖Ax‖β.

�

Remark. There exists a vector x∗ ∈ Cn with unit norm (‖x∗‖α = 1)

such that

‖A‖α→β = ‖Ax∗‖β. (1.8.16)

Proof. Let x′ 6= 0 be some vector which maximizes the expression

‖Ax‖β‖x‖α

.

Then

‖A‖α→β =‖Ax′‖β‖x′‖α

.

Now consider x∗ = x′

‖x′‖α . Thus ‖x∗‖α = 1. We know that

‖Ax′‖β‖x′‖α

= ‖Ax∗‖β.


Hence

‖A‖α→β = ‖Ax∗‖β.

�

We are now ready to prove triangle inequality for operator norm.

Lemma 1.126 Operator norm as defined in definition 1.47 satis-

fies triangle inequality.

Proof. Let A and B be some matrices in Cm×n. Consider the

operator norm of matrix A + B. From previous remarks, there exists

some vector x∗ ∈ Cn with ‖x∗‖α = 1 such that

‖A+B‖ = ‖(A+B)x∗‖β.

Now

‖(A+B)x∗‖β = ‖Ax∗ +Bx∗‖β ≤ ‖Ax∗‖β + ‖Bx∗‖β.

From another remark we have

‖Ax∗‖β ≤ ‖A‖‖x∗‖α = ‖A‖

and

‖Bx∗‖β ≤ ‖B‖‖x∗‖α = ‖B‖

since ‖x∗‖α = 1.

Hence we have

‖A+B‖ ≤ ‖A‖+ ‖B‖.

�

It turns out that operator norm is also consistent under certain condi-

tions.


Lemma 1.127 Let ‖ · ‖α be defined over all m ∈ N. Let ‖ · ‖β =

‖ · ‖α. Then the operator norm

‖A‖α = maxx 6=0

‖Ax‖α‖x‖α

is consistent.

Proof. We need to show that

‖AB‖α ≤ ‖A‖α‖B‖α.

Now

‖AB‖α = maxx 6=0

‖ABx‖α‖x‖α

.

We note that if Bx = 0, then ABx = 0. Hence we can rewrite as

‖AB‖α = maxBx 6=0


.

Now if Bx 6= 0 then ‖Bx‖α 6= 0. Hence


=‖ABx‖α‖Bx‖α

‖Bx‖α‖x‖α

and

maxBx 6=0


≤ maxBx 6=0

‖ABx‖α‖Bx‖α

maxBx 6=0

‖Bx‖α‖x‖α

.

Clearly

‖B‖α = maxBx 6=0

‖Bx‖α‖x‖α

.

Furthermore

maxBx 6=0

‖ABx‖α‖Bx‖α

≤ maxy 6=0

‖Ay‖α‖y‖α

= ‖A‖α.

Thus we have


�


1.8.6. p-norm for matrices

We recall the definition of lp norms for vectors x ∈ Cn from (??)

‖x‖p =

(∑n

i=1 |x|pi )

1p p ∈ [1,∞)

max1≤i≤n

|xi| p =∞.

The operator norms ‖ · ‖p defined from lp vector norms are of specific

interest.

Definition 1.48 The p-norm for a matrix A ∈ Cm×n is defined as

‖A‖p , maxx 6=0

‖Ax‖p‖x‖p

= max‖x‖p=1

‖Ax‖p (1.8.17)

where ‖x‖p is the standard lp norm for vectors in Cm and Cn.

Remark. As per lemma 1.127 p-norms for matrices are consistent

norms. They are also sub-ordinate to lp vector norms.

Special cases are considered for p = 1, 2 and ∞.

Theorem 1.128 Let A ∈ Cm×n.

For p = 1 we have

‖A‖1 , max1≤j≤n

m∑i=1

|aij|. (1.8.18)

This is also known as max column sum norm.

For p =∞ we have

‖A‖∞ , max1≤i≤m

n∑j=1

|aij|. (1.8.19)

This is also known as max row sum norm.

Finally for p = 2 we have

‖A‖2 , σ1 (1.8.20)


where σ1 is the largest singular value of A. This is also known as

spectral norm.

Proof. Let

A =[a1 . . . , an

].

Then

‖Ax‖1 =

∥∥∥∥∥n∑j=1

xjaj

∥∥∥∥∥1

≤n∑j=1

∥∥xjaj∥∥1=

n∑j=1

|xj|∥∥aj∥∥

1

≤ max1≤j≤n

‖aj‖1n∑j=1

|xj|

= max1≤j≤n

‖aj‖1‖x‖1.

Thus,

‖A‖1 = maxx 6=0

‖Ax‖1‖x‖1

≤ max1≤j≤n

‖aj‖1

which the maximum column sum. We need to show that this upper

bound is indeed an equality.

Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and

0 elsewhere,

‖Aej‖1 = ‖aj‖1.

Thus

‖A‖1 ≥ ‖aj‖1 ∀ 1 ≤ j ≤ n.

Combining the two, we see that

‖A‖1 = max1≤j≤n

‖aj‖1.


For p =∞, we proceed as follows:

‖Ax‖∞ = max1≤i≤m

∣∣∣∣∣n∑j=1

aijxj

∣∣∣∣∣≤ max

1≤i≤m

n∑j=1

|aij||xj|

≤ max1≤j≤n

|xj| max1≤i≤m

n∑j=1

|aij|

= ‖x‖∞ max1≤i≤m

‖ai‖1

where ai are the rows of A.

This shows that

‖Ax‖∞ ≤ max1≤i≤m

‖ai‖1.

We need to show that this is indeed an equality.

Fix an i = k and choose x such that

xj = sgn(akj).

Clearly ‖x‖∞ = 1.

Then

‖Ax‖∞ = max1≤i≤m

∣∣∣∣∣n∑j=1

aijxj

∣∣∣∣∣≥

∣∣∣∣∣n∑j=1

akjxj

∣∣∣∣∣=

∣∣∣∣∣n∑j=1

|akj|

∣∣∣∣∣=

n∑j=1

|akj|

= ‖ak‖1.Thus,

‖A‖∞ ≥ max1≤i≤m

‖ai‖1


Combining the two inequalities we get:

‖A‖∞ = max1≤i≤m

‖ai‖1.

Remaining case is for p = 2.

For any vector x with ‖x‖2 = 1,

‖Ax‖2 = ‖UΣV Hx‖2 = ‖U(ΣV Hx)‖2 = ‖ΣV Hx‖2

since l2 norm is invariant to unitary transformations.

Let v = V Hx. Then ‖v‖2 = ‖V Hx‖2 = ‖x‖2 = 1.

Now‖Ax‖2 = ‖Σv‖2

=

(n∑j=1

|σjvj|2) 1

2

≤ σ1

(n∑j=1

|vj|2) 1

2

= σ1‖v‖2 = σ1.

This shows that

‖A‖2 ≤ σ1.

Now consider some vector x such that v = (1, 0, . . . , 0). Then

‖Ax‖2 = ‖Σv‖2 = σ1.

Thus

‖A‖2 ≥ σ1.

Combining the two, we get that ‖A‖2 = σ1. �

1.8.7. The 2-norm

Theorem 1.129 Let A ∈ Cn×n has singular values σ1 ≥ σ2 ≥· · · ≥ σn. Let the eigen values for A be λ1, λ2, . . . , λn with |λ1| ≥|λ2| ≥ · · · ≥ |λn|. Then the following hold

‖A‖2 = σ1 (1.8.21)


and if A is non-singular

‖A−1‖2 =1

σn. (1.8.22)

If A is symmetric and positive definite, then

‖A‖2 = λ1 (1.8.23)


‖A−1‖2 =1

λn. (1.8.24)

If A is normal then

‖A‖2 = |λ1| (1.8.25)


‖A−1‖2 =1

|λn|. (1.8.26)

1.8.8. Unitary invariant norms

Definition 1.49 A matrix norm ‖ · ‖ on Cm×n is called unitary

invariant if ‖UAV ‖ = ‖A‖ for any A ∈ Cm×n and any unitary

matrices U ∈ Cm×m and V ∈ Cn×n.

We have already seen in theorem 1.121 that Frobenius norm is unitary

invariant.

It turns out that spectral norm is also unitary invariant.

1.8.9. More properties of operator norms

In this section we will focus on operator norms connecting normed

linear spaces (Cn, ‖ · ‖p) and (Cm, ‖ · ‖q). Typical values of p, q would

be in {1, 2,∞}.

We recall that

‖A‖p→q = maxx 6=0

‖Ax‖q‖x‖p

= max‖x‖p=1

‖Ax‖q = max‖x‖p≤1

‖Ax‖q. (1.8.27)


Table 1[[5]] shows how to compute different (p, q) norms. Some can be

computed easily while others are NP-hard to compute.

Table 1. Typical (p→ q) norms

p q ‖A‖p→q Calculation

1 1 ‖A‖1 Maximum l1 norm of a column

1 2 ‖A‖1→2 Maximum l2 norm of a column

1 ∞ ‖A‖1→∞ Maximum absolute entry of a matrix

2 1 ‖A‖2→1 NP hard

2 2 ‖A‖2 Maximum singular value

2 ∞ ‖A‖2→∞ Maximum l2 norm of a row

∞ 1 ‖A‖∞→1 NP hard

∞ 2 ‖A‖∞→2 NP hard

∞ ∞ ‖A‖∞ Maximum l1-norm of a row

The topological dual of the finite dimensional normed linear space

(Cn, ‖ · ‖p) is the normed linear space (Cn, ‖ · ‖p′) where

1

p+

1

p′= 1.

l2-norm is dual of l2-norm. It is a self dual. l1 norm and l∞-norm are

dual of each other.

When a matrix A maps from the space (Cn, ‖ · ‖p) to the space (Cm, ‖ ·‖q), we can view its conjugate transpose AH as a mapping from the

space (Cm, ‖ · ‖q′) to (Cn, ‖ · ‖p′).

Theorem 1.130 Operator norm of a matrix always equals the op-

erator norm of its conjugate transpose. i.e.

‖A‖p→q = ‖AH‖q′→p′ (1.8.28)

where1

p+

1

p′= 1,

1

q+

1

q′= 1.


Specific applications of this result are:

‖A‖2 = ‖AH‖2. (1.8.29)

This is obvious since the maximum singular value of a matrix and its

conjugate transpose are same.

‖A‖1 = ‖AH‖∞, ‖A‖∞ = ‖AH‖1. (1.8.30)

This is also obvious since max column sum of A is same as the max

row sum norm of AH and vice versa.

‖A‖1→∞ = ‖AH‖1→∞. (1.8.31)

‖A‖1→2 = ‖AH‖2→∞. (1.8.32)

‖A‖∞→2 = ‖AH‖2→1. (1.8.33)

We now need to show the result for the general case (arbitrary 1 ≤p, q ≤ ∞).

Proof. TODO �

Theorem 1.131

‖A‖1→p = max1≤j≤n

‖aj‖p. (1.8.34)

where

A =[a1 . . . , an

].


Proof.

‖Ax‖p =

∥∥∥∥∥n∑j=1

xjaj

∥∥∥∥∥p

≤n∑j=1

∥∥xjaj∥∥p=

n∑j=1

|xj|∥∥aj∥∥

p

≤ max1≤j≤n

‖aj‖pn∑j=1

|xj|

= max1≤j≤n

‖aj‖p‖x‖1.

Thus,

‖A‖1→p = maxx 6=0

‖Ax‖p‖x‖1

≤ max1≤j≤n

‖aj‖p.

We need to show that this upper bound is indeed an equality.

Indeed for any x = ej where ej is a unit vector with 1 in j-th entry and

0 elsewhere,

‖Aej‖p = ‖aj‖p.

Thus

‖A‖1→p ≥ ‖aj‖p ∀ 1 ≤ j ≤ n.

Combining the two, we see that

‖A‖1→p = max1≤j≤n

‖aj‖p.

�

Theorem 1.132

‖A‖p→∞ = max1≤i≤m

‖ai‖q (1.8.35)

where1

p+

1

q= 1.


Proof. Using theorem 1.130, we get

‖A‖p→∞ = ‖AH‖1→q.

Using theorem 1.131, we get

‖AH‖1→q = max1≤i≤m

‖ai‖q.


Theorem 1.133 For two matrices A and B and p ≥ 1, we have

‖AB‖p→q ≤ ‖B‖p→s‖A‖s→q. (1.8.36)

Proof. We start with

‖AB‖p→q = max‖x‖p=1

‖A(Bx)‖q.

From lemma 1.125, we obtain

‖A(Bx)‖q ≤ ‖A‖s→q‖(Bx)‖s.

Thus,

‖AB‖p→q ≤ ‖A‖s→q max‖x‖p=1

‖(Bx)‖s = ‖A‖s→q‖B‖p→s.

�

Theorem 1.134 For two matrices A and B and p ≥ 1, we have

‖AB‖p→∞ ≤ ‖A‖∞→∞‖B‖p→∞. (1.8.37)

Proof. We start with

‖AB‖p→∞ = max‖x‖p=1

‖A(Bx)‖∞.

From lemma 1.125, we obtain

‖A(Bx)‖∞ ≤ ‖A‖∞→∞‖(Bx)‖∞.

Thus,

‖AB‖p→∞ ≤ ‖A‖∞→∞ max‖x‖p=1

‖(Bx)‖∞ = ‖A‖∞→∞‖B‖p→∞.


�

Theorem 1.135

‖A‖p→∞ ≤ ‖A‖p→p. (1.8.38)

In particular

‖A‖1→∞ ≤ ‖A‖1. (1.8.39)

‖A‖2→∞ ≤ ‖A‖2. (1.8.40)

Proof. Choosing q =∞ and s = p and applying theorem 1.133

‖IA‖p→∞ ≤ ‖A‖p→p‖I‖p→∞.

But ‖I‖p→∞ is the maximum lp norm of any row of I which is 1. Thus

‖A‖p→∞ ≤ ‖A‖p→p.

�

Consider the expression

minz∈C(AH)z 6=0

‖Az‖q‖z‖p

. (1.8.41)

z ∈ C(AH), z 6= 0 means there exists some vector u /∈ ker(AH) such

that z = AHu.

This expression measures the factor by which the non-singular part of

A can decrease the length of a vector.

Theorem 1.136 [5] The following bound holds for every matrix

A:

minz∈C(AH)z 6=0

‖Az‖q‖z‖p

≥ ‖A†‖−1q,p. (1.8.42)

If A is surjective (onto), then the equality holds. When A is bijec-

tive (one-one onto, square, invertible), then the result implies

minz∈C(AH)z 6=0

‖Az‖q‖z‖p

= ‖A−1‖−1q,p. (1.8.43)


Proof. The spaces C(AH) and C(A) have same dimensions given

by rank(A). We recall that A†A is a projector onto the column space

of A.

w = Az ⇐⇒ z = A†w = A†Az ∀ z ∈ C(AH).

As a result we can write

‖z‖p‖Az‖q

=‖A†w‖p‖w‖q

whenever z ∈ C(AH). Now minz∈C(AH)z 6=0

‖Az‖q‖z‖p

−1 = maxz∈C(AH)z 6=0

‖z‖p‖Az‖q

= maxw∈C(A)w 6=0

‖A†w‖p‖w‖q

≤ maxw 6=0


.

When A is surjective, then C(A) = Cm. Hence

maxw∈C(A)w 6=0


= maxw 6=0


.

Thus, the inequality changes into equality. Finally

maxw 6=0


= ‖A†‖q→p

which completes the proof.

�

1.8.10. Row column norms

Definition 1.50 Let A be an m× n matrix with rows ai as

A =

a1

...

am

Then we define

‖A‖p,∞ , max1≤i≤m

‖ai‖p = max1≤i≤m

(n∑j=1

|aij|p) 1

p

(1.8.44)

where 1 ≤ p < ∞. i.e. we take p-norms of all row vectors and

then find the maximum.


We define

‖A‖∞,∞ = maxi,j|aij|. (1.8.45)

This is equivalent to taking l∞ norm on each row and then taking

the maximum of all the norms.

For 1 ≤ p, q <∞, we define the norm

‖A‖p,q ,

[m∑i=1

(‖ai‖p

)q] 1q

. (1.8.46)

i.e., we compute p-norm of all the row vectors to form another

vector and then take q-norm of that vector.

Note that the norm ‖A‖p,∞ is different from the operator norm ‖A‖p→∞.

Similarly ‖A‖p,q is different from ‖A‖p→q.

Theorem 1.137

‖A‖p,∞ = ‖A‖q→∞ (1.8.47)

where1

p+

1

q= 1.

Proof. From theorem 1.132 we get

‖A‖q→∞ = max1≤i≤m

‖ai‖p.

This is exactly the definition of ‖A‖p,∞. �

Theorem 1.138

‖A‖1→p = ‖A‖p,∞. (1.8.48)

Proof.

‖A‖1→p = ‖AH‖q→∞.

From theorem 1.137

‖AH‖q→∞ = ‖AH‖p,∞.

�


Theorem 1.139 For any two matrices A,B, we have

‖AB‖p,∞‖B‖p,∞

≤ ‖A‖∞→∞. (1.8.49)

Proof. Let q be such that 1p

+ 1q

= 1. From theorem 1.134, we

have

‖AB‖q→∞ ≤ ‖A‖∞→∞‖B‖q→∞.

From theorem 1.137

‖AB‖q→∞ = ‖AB‖p,∞

and

‖B‖q→∞ = ‖B‖p,∞.

Thus

‖AB‖p,∞ ≤ ‖A‖∞→∞‖B‖p,∞.

�

Theorem 1.140 Relations between (p, q) norms and (p → q)

norms

‖A‖1,∞ = ‖A‖∞→∞ (1.8.50)

‖A‖2,∞ = ‖A‖2→∞ (1.8.51)

‖A‖∞,∞ = ‖A‖1→∞ (1.8.52)

‖A‖1→1 = ‖AH‖1,∞ (1.8.53)

‖A‖1→2 = ‖AH‖2,∞ (1.8.54)

(1.8.55)

Proof. The first three are straight forward applications of theo-

rem 1.137. The next two are applications of theorem 1.138. See also

table 1.

�


1.8.11. Block diagonally dominant matrices and generalized

Gershgorin disc theorem

In [1] the idea of diagonally dominant matrices (see section 1.6.9) has

been generalized to block matrices using matrix norms. We consider

the specific case with spectral norm.

Definition 1.51 [Block diagonally dominant matrix] Let A be a

square matrix in Cn×n which is partitioned in following manner

A =

A11 A12 . . . A1k

A21 A22 . . . A2k

......

. . ....

Ak1 Ak2 . . . Akk

(1.8.56)

where each of the submatrices Aij is a square matrix of size m×m.

Thus n = km.

A is called block diagonally dominant if

‖Aii‖2 ≥∑j 6=i

‖Aij‖2.

holds true for all 1 ≤ i ≤ n. If the inequality satisfies strictly

for all i, then A is called block strictly diagonally dominant

matrix.

Theorem 1.141 If the partitioned matrix A of definition 1.51 is

block strictly diagonally dominant matrix, then it is nonsingular.

For proof see [1].

This leads to the generalized Gershgorin disc theorem.


Theorem 1.142 Let A be a square matrix in Cn×n which is par-

titioned in following manner

A =

A11 A12 . . . A1k

A21 A22 . . . A2k

......

. . ....

Ak1 Ak2 . . . Akk

(1.8.57)

where each of the submatrices Aij is a square matrix of size m×m.

Then each eigenvalue λ of A satisfies

‖λI − Aii‖2 ≤∑j 6=i

‖Aij‖ for some i ∈ {1, 2, . . . , n}. (1.8.58)

For proof see [1].

Since the 2-norm of a positive semidefinite matrix is nothing but its

largest eigen value, the theorem directly applies.

Corollary 1.143. Let A be a Hermitian positive semidefinite matrix.

Let A be partitioned as in theorem 1.142. Then its 2-norm ‖A‖2 satis-

fies

|‖A‖2 − ‖Aii‖2| ≤∑j 6=i

‖Aij‖ for some i ∈ {1, 2, . . . , n}. (1.8.59)

1.9. Miscellaneous topics

1.9.1. Hadamard product

Usually standard linear algebra books don’t dwell much about element-

wise or component wise products of vectors or matrices. Yet in certain

contexts and algorithms, this is quite useful. We define the notation in

this section. For further details see [3], [2] and [4].

Definition 1.52 The Hadamard product of two matrices A =

[aij] and B = [bij] with same dimensions (not necessarily square)

1.10. DIGEST 97

with entries in a given ring R is the entry-wise product A ◦ B ≡[aijbij], which has the same dimensions as A and B.

Example 1.3: Hadamard product Let

A =

[1 2

3 4

]and B =

[5 −6

7 −3

]

Then

A ◦B =

[5 −12

21 −12

]�

The Hardamard product is associative and distributive. It is also com-

mutative.

Naturally this can also be defined for column vectors and row vectors

also.

The reason why this product is not mentioned in linear algebra texts

is because it is inherently basis dependent. But this product has a

number of uses in statistics and analysis.

In analysis, a similar concept is point-wise product which is defined

to be

(f.g)(x) = f(x)g(x).

1.10. Digest

1.10.1. Norms

All norms are equivalent.

Sum norm

‖A‖S =m∑i=1

n∑j=1

|aij|.


Frobenius norm

‖A‖F =

(m∑i=1

n∑j=1

|aij|2) 1

2

.

Max norm

‖A‖M = max1≤i≤m1≤j≤n

|aij|.

Frobenius norm of Hermitian transpose

‖AH‖F = ‖A‖F .

Frobenius norm as sum of norms of column vectors

‖A‖2F =n∑j=1

‖aj‖22.

Frobenius norm as sum of norms of row vectors

‖A‖2F =m∑i=1

‖ai‖22.

Frobenius norm invariance w.r.t. unitary matrices

‖UA‖F = ‖A‖F

‖AV ‖F = ‖A‖F .

Frobenius norm is consistent:

‖AB‖F ≤ ‖A‖F‖B‖F .

corollary 1.123

‖Ax‖2 ≤ ‖A‖F‖x‖2.

‖A‖F =

√√√√ n∑i=1

σ2i .

Consistent norms

‖AB‖ ≤ ‖A‖‖B‖

also known as sub-multiplicative norm.

1.10. DIGEST 99

Subordinate matrix norm

‖Ax‖α ≤ ‖A‖‖x‖β

(α→ β) Operator norm

‖A‖ , ‖A‖α→β , maxx 6=0

‖Ax‖β‖x‖α

.

‖A‖α→β = maxx/∈ker(A)

‖Ax‖β‖x‖α

= max‖x‖α=1

‖Ax‖β.

(α→ β) norm is subordinate

‖Ax‖β ≤ ‖A‖α→β‖x‖α.

There exists a unit norm vector x∗ such that

‖A‖α→β = ‖Ax∗‖β.

α→ α-norms are consistent

‖A‖α = maxx 6=0

‖Ax‖α‖x‖α


p-norm

‖A‖p , maxx6=0

‖Ax‖p‖x‖p

= max‖x‖p=1

‖Ax‖p

Closed form p-norms

‖A‖1 , max1≤j≤n

m∑i=1

|aij|.

‖A‖∞ , max1≤i≤m

n∑j=1

|aij|.

2-norm

‖A‖2 , σ1

non-singular

‖A−1‖2 =1

σn.


symmetric and positive definite

‖A‖2 = λ1

non-singular

‖A−1‖2 =1

λn.

normal

‖A‖2 = |λ1|

non-singular

‖A−1‖2 =1

|λn|.

Unitary invariant norm ‖UAV ‖ = ‖A‖ for any A ∈ Cm×n and any

unitary U and V .

Typical p→ q norms

Dual norm and conjugate transpose

‖A‖p→q = ‖AH‖q′→p′1

p+

1

p′= 1.

‖A‖2 = ‖AH‖2.

‖A‖1 = ‖AH‖∞, ‖A‖∞ = ‖AH‖1.

‖A‖1→∞ = ‖AH‖1→∞.

‖A‖1→2 = ‖AH‖2→∞.

‖A‖∞→2 = ‖AH‖2→1.

‖A‖1→p‖A‖1→p = max

1≤j≤n‖aj‖p.

‖A‖p→∞‖A‖p→∞ = max

1≤i≤m‖ai‖q

with 1p

+ 1q

= 1.

Consistency of p→ q norm

‖AB‖p→q ≤ ‖B‖p→s‖A‖s→q.

1.10. DIGEST 101

Consistency of p→∞ norm

‖AB‖p→∞ ≤ ‖A‖∞→∞‖B‖p→∞.

Dominance of p→∞ norm by p→ p norm

‖A‖p→∞ ≤ ‖A‖p→p.

‖A‖1→∞ ≤ ‖A‖1.

‖A‖2→∞ ≤ ‖A‖2.

Restricted minimum property

minz∈C(AH)z 6=0

‖Az‖q‖z‖p

≥ ‖A†‖−1q,p.

If A is surjective (onto), then the equality holds. When A is bijective

minz∈C(AH)z 6=0

‖Az‖q‖z‖p

= ‖A−1‖−1q,p.

Row column norm

‖A‖p,∞ , max1≤i≤m

‖ai‖p.

‖A‖p,∞ = max1≤i≤m

(n∑j=1

|aij|p) 1

p

.

‖A‖∞,∞ = maxi,j|aij|.

‖A‖p,q ,

[m∑i=1

(‖ai‖p

)q] 1q

.

Row column norm and p→∞ norm

‖A‖p,∞ = ‖A‖q→∞

with 1p

+ 1q

= 1.

Consistency of (p,∞) norm

‖AB‖p,∞‖B‖p,∞

≤ ‖A‖∞→∞.


Relations between (p, q) norms and (p→ q) norms

‖A‖1,∞ = ‖A‖∞→∞

‖A‖2,∞ = ‖A‖2→∞‖A‖∞,∞ = ‖A‖1→∞‖A‖1→1 = ‖AH‖1,∞‖A‖1→2 = ‖AH‖2,∞

Bibliography

[1] David G Feingold, Richard S Varga, et al. Block diagonally domi-

nant matrices and generalizations of the gerschgorin circle theorem.

Pacific J. Math, 12(4):1241–1250, 1962.

[2] Roger A Horn. The hadamard product. In Proc. Symp. Appl. Math,

volume 40, pages 87–169, 1990.

[3] Elizabeth Million. The hadamard product, 2007.

[4] George PH Styan. Hadamard products and multivariate statistical

analysis. Linear Algebra and Its Applications, 6:217–240, 1973.

[5] JOEL A TROPP. Just relax: Convex programming methods for

subset selection and sparse approximation. 2004.

103

Index

F Unitary matrix, 25

p-norm for matrices, 82

Algebraic multiplicity, 30

Block diagonal matrix, 6

Block diagonally dominant matrix,

95

Block matrix, 4

Block strictly diagonally dominant

matrix, 95

Characteristic equation, 29

Characteristic polynomial, 29

Characteristic value, 27

Column rank, 7

Column space, 6

Consistent matrix norm, 77

Diagonal matrix, 2

Diagonalizable, 41

Diagonally dominant matrix, 49

Eigen space, 28

Eigen value, 27

Eigen value decomposition of a

Hermitian matrix, 47

Eigen vector, 27

Element-wise product, 97

Frobenius norm on matrix, 70

Full rank matrix, 7

Geometric multiplicity, 29

Gershgorin’s disc, 53

Gershgorin’s theorem, 52

Gram matrix of columns of a matrix,

13

Gram matrix of rows of a matrix, 13

Hadamard product, 97

Invariant subspace, 34

Inverse of a matrix, 8

Invertible matrix, 8

Latent value, 27

Left singular vector, 54

Low rank approximation, 69

Low rank matrix, 69

Main diagonal, 2

Matrix p-norm, 82

Matrix norm, 70

Max column sum norm, 82

Max norm on matrix, 71

Max row sum norm, 82

Moore-Penrose pseudo-inverse, 16

Multiplication of block matrices, 5

Off diagonal, 2

Operator norm, 78

104

INDEX 105

Orthogonal matrix, 23

Orthogonally diagonalizable matrix,

45

Partitioned matrix, 4

Proper value, 27

Rank, 7

rectangular diagonal matrix, 2

Right singular vector, 54

Row column norms, 92

Row rank, 7

Row space, 6

Similar matrices, 12

Singular value, 54

Singular value decomposition, 55

Spectral norm, 82

Spectrum of a matrix, 27

Square matrix, 1

Strictly diagonally dominant matrix,

49

Sub-multiplicative norm, 77

Subordinate matrix norm, 77

Sum norm on matrix, 70

Tall matrix, 1

Trace, 19

Unitary diagonalizable matrix, 47

Unitary invariant matrix norm, 86

Unitary matrix, 24

Wide matrix, 2

Some notes on Matrix Algebra

Science