Linear Algebra - Missouri State Universitypeople.missouristate.edu/jrebaza/assets/10linearalgebra.pdfLinear Algebra JORGE REBAZA Undoubtedly, one of the subjects in mathematics that

Chapter 2

Linear Algebra

JORGE REBAZA

Undoubtedly, one of the subjects in mathematics that has become more indis-pensable than ever is linear algebra. Several application problems involve atsome stage solving linear systems, the computation of eigenvalues and eigen-vectors, linear transformations, bases of vector subspaces, and matrix factoriza-tions, to mention a few. One very important characteristic of linear algebra isthat as a first course, it requires only very basic prerequisites so that it can betaught very early at undergraduate level; at the same time, mastering vectorspaces, linear transformations and their natural extensions to function spacesis essential for researchers in any area of applied mathematics. Linear algebrahas innumerable applications, including differential equations, least-square so-lutions and optimization, demography, electrical engineering, fractal geometry,communication networks, compression, search engines, social sciences, etc. Inthe next sections we briefly review the concepts of linear algebra that we willneed later on.

2.1 Notation and Terminology

We start this section by defining an m × n matrix, as a rectangular array ofelements arranged in m rows and n columns, and we say the matrix is of orderm × n. We usually denote the elements of a matrix A of order m × n as aij ,

13

14 CHAPTER 2. LINEAR ALGEBRA

where i = 1, . . . m, j = 1, . . . , n, and we write the matrix A as

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

.

Although the elements of a matrix can be real or complex numbers, here wewill mostly consider the entries of a given matrix to be real unless otherwisestated. In most cases, we will take care of stating the complex version of somedefinitions and results.

Matrix addition and product. Given two arbitrary matrices Am×n andBm×n, we define the matrix

C = A + B

by adding the entries of A and B componentwise. That is, for i = 1, . . . ,m, j =1, . . . , n we have

cij = aij + bij .

This means that the addition of matrices is well defined only for matrices of thesame order.

Now consider two arbitrary matrices Am×p and Bp×n. Then we define theproduct matrix Cm×n = A · B as

cij =

p∑

k=1

aikbkj,

where i = 1, . . . ,m, j = 1, . . . , n. This means that to obtain the entry (i, j) ofthe product matrix, we multiply the i-th row of A with the j-th column of Bentry-wise and add their products.

Observe that for the product to be well-defined, the number of columns of Ahas to agree with the number of rows of B.

Example 2.1.1 Let A =

4 −53 21 6

, B =

[

−1 20 3

]

. Then, by instance,

to obtain the entry c32 of the product matrix C = AB we multiply entrywise the

2.1. NOTATION AND TERMINOLOGY 15

third row of A with the second column of B: (1)(2) + (6)(3) = 20. Thus, we get

C = A · B =

4 −53 21 6

[

−1 20 3

]

=

−4 −7−3 12−1 20

.

• MATLAB command: A + B, A ∗ B.

Given a matrix Bp×n, we can denote its columns with b1, . . . , bn, where each bi

is an p-dimensional vector. We will write accordingly,

B = [b1 · · · bn].

In such case, we can write a product of matrices as

AB = A [b1 · · · bn] = [Ab1 · · · Abn], (2.1)

so that Ab1, . . . , Abn are the columns of the matrix AB.

Similarly, if we denote with a1, . . . , am the rows of a matrix Am×p, then

AB =

a1...

am

B =

a1B...

amB

.

For an arbitrary matrix Am×n we denote with AT its transpose matrix, thatis, the matrix of order n × m, where the rows of A have been exchanged forcolumns and vice versa. By instance,

If A =

[

6 −4 7−2 3 8

]

, then AT =

6 −2−4 3

7 8

.

Remark 2.1 If Am×n is a complex matrix, its adjoint matrix is denoted asA∗, where A∗ = AT , that is, the conjugate transpose. By instance,

If A =

[

3 −4 + i−2i 5 + 2i

]

, then A∗ =

[

3 2i−4 − i 5 − 2i

]

.


• MATLAB command: A’

The sum and product of matrices satisfy the following properties (See Exercise2.1)

(A + B)T = AT + BT , (AB)T = BTAT . (2.2)

Definition 2.2 The trace of a square matrix A of order n is defined as thesum of its diagonal elements, that is,

tr(A) =n∑

i=1

aii. (2.3)

• MATLAB command: trace(A)

A particular and very special case is that of matrices of order n × 1. Such amatrix is usually called an n-dimensional vector. That is, here we considervectors as column-vectors, and we will use the notation

x =

x1...

xn

= [x1 · · · xn]T

for a typical vector. This substitutes the usual notation x = (x1, . . . , xn), whichwe reserve to denote a point, and at the same time it will allow us to performmatrix-vector operations in agreement with their dimensions. This also closelyfollows the notation used in MATLAB .

The first two vectors below are column vectors, whereas the third is a row vector.

943

, [4 − 3 5]T , [1 8 5].

2.2. VECTOR AND MATRIX NORMS 17

2.2 Vector and Matrix Norms

It is always important and useful to have a notion for the “size” or magnitude ofa vector or a matrix, just as we understand the magnitude of a real number byusing its absolute value. In fact, a norm can be understood as the generalizationof the absolute value function to a higher dimensional case. This is especiallyuseful in numerical analysis for estimating the magnitude of the error whenapproximating the solution to a given problem.

Definition 2.3 A vector norm, denoted by ‖·‖, is a real function which satisfiesthe following properties for arbitrary n-dimensional vectors x and y and forarbitrary real or complex α:

(i) ‖x‖ ≥ 0,

(ii) ‖x‖ = 0 if and only if x = 0,

(iii) ‖α x‖ = |α| ‖x‖,(iv) ‖x + y‖ ≤ ‖x‖ + ‖y‖.

There are several norms that can be defined for vectors. Here we mention themost commonly used.

Let x = [x1 · · · xn]T , xi ∈ R. Then, we define

‖x‖1 =n∑

i=1|xi| Sum norm

‖x‖2 =

(

n∑

i=1x2

i

)1/2

Euclidean norm

‖x‖∞ = maxi=1,...,n

|xi| Maximum norm

(2.4)

• MATLAB command: norm(x, 1), norm(x) norm(x,inf)

Example 2.2.1 Let x = [3 − 2 4√

7 ]T . Then,

‖x‖1 = |3| + | − 2| + |4| + |√

7| ≈ 11.6458

‖x‖2 =√

9 + 4 + 16 + 7 = 6,

‖x‖∞ = max{ |3|, | − 2|, |4|, |√

7| } = 4.


−1 0 1−1.5

−1

−0.5

0

0.5

1

1.5(a)

−1 0 1−1.5

−1

−0.5

0

0.5

1

1.5(b)

−1 0 1−1.5

−1

−0.5

0

0.5

1

1.5(c)

Figure 2.1: Unit ball in (a) ‖ · ‖1, (b) ‖ · ‖2, (c) ‖ · ‖∞

Remark 2.4 In general, for an arbitrary vector x ∈ Rn, the following inequal-

ities (see Exercise 2.4) relate the three norms above

‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1. (2.5)

Example 2.2.2 The unit ball in Rn is the set {x ∈ R

n : ‖x‖ ≤ 1}. Thegeometrical shape of the unit ball varies according to what norm is used. For thecase n = 2, the unit balls for the three norms in (2.4) are shown in Figure 2.1.

Note: For simplicity of notation, ‖ · ‖ will always denote the Euclidean norm‖ · ‖2 for vectors, unless otherwise stated.

We now introduce the notion of norms for a matrix, in some sense as a general-ization of vector norms.

Definition 2.5 A matrix norm is a real function that for arbitrary matrices Aand B and arbitrary real or complex α, satisfies

(i) ‖A‖ ≥ 0,

(ii) ‖A‖ = 0 if and only if A = 0,

(iii) ‖α A‖ = |α|‖A‖,(iv) ‖A + B‖ ≤ ‖A‖ + ‖B‖,(v) ‖AB‖ ≤ ‖A‖ ‖B‖.

Let A be an m × n matrix. Here are some of the most common matrix norms

2.2. VECTOR AND MATRIX NORMS 19

‖A‖1 = maxj=1,...,n

m∑

i=1|aij | Maximum column sum norm

‖A‖F =

(

m∑

i=1

n∑

j=1|aij |2

)1/2

Frobenius norm

‖A‖∞ = maxi=1,...,m

n∑

j=1|aij | Maximum row sum norm

(2.6)

• MATLAB command: norm(A,1), norm(A,’fro’), norm(A,inf)

Remark 2.6 The Frobenius norm is also know as the Euclidean norm for ma-trices and in some textbooks it is denoted as ‖ · ‖2. We will reserve this notationfor a different matrix norm.


−1 3 22 0 −24 −3 1

. Then,

‖A‖1 = max {7, 6, 5} = 7.

‖A‖F = (1 + 9 + 4 + 4 + 0 + 4 + 16 + 9 + 1)1/2 ≈ 6.9282.‖A‖∞ = max {6, 4, 8} = 8.

Remark 2.7 There are two other ways of defining the Frobenius norm, thatare useful for certain computations, as we illustrate later on. Let Am×n be anarbitrary matrix, and denote its columns with a1, . . . , an. Then,

(a) ‖A‖2F = ‖a1‖2

2 + · · · + ‖an‖22. (2.7)

(b) ‖A‖2F = tr(AT A). (2.8)

It seems natural to think that vector norms can be directly generalized to obtainmatrix norms (after all, m×n matrices can be thought of as vectors of dimensionm·n). However, not all vector norms directly become matrix norms (see Exercise2.11).


A matrix norm can be induced or constructed from a vector norm by definingfor 1 ≤ p ≤ ∞

‖A‖p = max‖x‖p=1

‖Ax‖p, (2.9)

where the norms on the right hand side represent the vector norm. This is theway to correctly extend or use a vector norm into obtaining a matrix norm.

Given a vector x of norm ‖x‖, when it is multiplied by A, we get the new vectorAx of norm ‖Ax‖. Thus, we can interpret the matrix norm (2.9) as a naturalway to measure how much the vector x has can be stretched or shrunk by A.

The definition of the p-norm of a matrix in (2.9) is not easy to implement orcompute in general. Luckily enough, there are alternative ways to compute sucha norm for some particular values of p.

Example 2.2.4 It can be proved (see Exercise 2.12) that for p = 1 and p =∞, the p-norms in (2.9) can be directly computed through the correspondingdefinitions in (2.6).

Example 2.2.5 For p = 2, the p-norm in (2.9) can be computed as the squareroot of the largest eigenvalue of the symmetric matrix AT A. That is,

‖A‖2 = max {√

λ : λ is an eigenvalue of AT A }. (2.10)

By instance, for the matrix of Example 2.2.3, we get ‖A‖2 ≈ 5.9198.

Remark 2.8 It is possible to show that an equivalent way to define the matrixnorm (2.9) is

‖A‖p = supx 6=0

‖Ax‖p

‖x‖p. (2.11)

Remark 2.9 In general, inequalities for matrix norms similar to the ones in(2.5) are not true. However, it is still true that

‖A‖2 ≤ ‖A‖F , (2.12)

where ‖ · ‖2 is given by (2.9).

2.3. DOT PRODUCT AND ORTHOGONALITY 21

Finally, we mention another important property that relates matrix and vectornorms. We say that the norm of an m × n matrix A is consistent with thenorm of an n-dimensional vector x if

‖Ax‖ ≤ ‖A‖ ‖x‖, (2.13)

for any x ∈ Rn. Then, it is straightforward to see that the p-norms defined by

(2.9) or (2.11) are all consistent.

2.3 Dot Product and Orthogonality

Quite often we will need to use the so-called dot product or inner productbetween two vectors x = [x1 · · · xn]T and y = [y1 · · · yn]T , defined as

x · y = < x, y > = x1y1 + · · · + xnyn. (2.14)

Since vectors can also be considered to be n × 1 matrices, it is common torepresent the inner product as matrix multiplication. Thus, we will consider allthese notations to be equivalent:

x · y = < x, y > = xT y. (2.15)

Observe that with the definition above,

‖x‖22 = xT x.

For arbitrary vectors x, y, z ∈ Rn and arbitrary real scalar c, the dot product

satisfies the following properties:

(1) < x, x > ≥ 0,

(2) < x, x > = 0 if and only if x = 0,

(3) < cx, y > = c < x, y >,

(4) < x + y, z > = < x, z > + < y, z >,

(5) < x, y > = < y, x > .

Remark 2.10 In the complex case, we have:

(3a) < x, cy > = c < x, y >

(5a) < x, y > = < y, x >.(2.16)


Following (2.15), we also see that for arbitrary Am×n, x ∈ Rn, y ∈ R

m,

< Ax, y > = (Ax)T y = xT AT y = < x, AT y > . (2.17)

Remark 2.11 The dot product introduced here is a particular case of the gen-eral inner product function studied in the context of inner product vector spaces,where the elements are not restricted to real n-dimensional vectors.

There is a special kind of vectors that is very useful in several instances inlinear algebra and matrix computations. These are the so-called orthonormalvectors, which are orthogonal (perpendicular) to each other and they are unit,that is, x and y are orthonormal if

xT y = 0, and ‖x‖ = ‖y‖ = 1.

For example, the following set of vectors is orthonormal:

v1 =

[

2√5

0 − 1√5

]T

, v2 =

[

1√5

02√5

]T

, v3 = [0 − 1 0]T .

In fact, we can readily verify that

vT1 v2 = vT

1 v3 = vT2 v3 = 0, and ‖v1‖ = ‖v2‖ = ‖v3‖ = 1.

2.4 Special Matrices

We will be using matrices at almost every place in this book, matrices of differentkinds and properties. Here we list some of the most common matrices we willencounter.

2.4.1 Diagonal and triangular matrices

A square matrix A of order n is called diagonal if aij = 0 for all i 6= j.

Example 2.4.1 A =

4 0 00 1 00 0 −9

.

2.4. SPECIAL MATRICES 23

• MATLAB command: diag

A square matrix A of order n is called upper (resp. lower) triangular ifaij = 0 for all i > j, (resp. for all i < j).

Example 2.4.2 A =

4 2 50 1 30 0 9

, B =

4 0 05 1 07 1 9

.

The matrix A is upper triangular, and B is lower triangular.

• MATLAB commands: triu, tril

Remark 2.12 If the matrix A is rectangular, say order m × n, we say it isupper (resp. lower) trapezoidal if aij = 0 for all i > j, (resp. for all i < j).

A more general case of triangular matrices is that of block triangular matri-ces. For example, the following 5 × 5 matrix is block upper triangular

A =

−8 1 0 9 34 −2 8 5 5

0 0 5 7 10 0 −9 −8 4

0 0 0 0 −6

=

A11 A12 A13

0 A22 A23

0 0 A33

,

where for instance, the zero in the (2,1) entry of the matrix on the right repre-sents the corresponding 2 × 2 zero block of the matrix on the left. In a similarway, block diagonal matrix can be defined. We will encounter this type ofmatrices in Section 2.10 when we compute eigenvalues of a matrix and whencomputing special vector subspaces in Chapter 8.

2.4.2 Hessenberg matrices

A matrix An×n is called an upper Hessenberg matrix if aij = 0 for all i > j + 1.They take the form:

A =

a11 a12 a13 · · · a1n−1 a1n

a21 a22 a23 · · · a2n−1 a2n

0 a32 a33 · · · a3n−1 a3n

0 0 a43 · · · a4n−1 a4n...

......

. . ....

...0 0 0 · · · ann−1 ann

. (2.18)


In other words, it is an upper triangular matrix with an additional subdiagonalbelow the main diagonal.

A matrix A is called lower Hessenberg if AT is upper Hessenberg.

• MATLAB command: hess(A)

2.4.3 Nonsingular and inverse matrices

In the set of real numbers R, the number 1 is the multiplicative identity, meaningthat a ·1 = 1 ·a = a, for any real number a. This is generalized to what is calledthe identity matrix of order n, denoted by I. It consists of a diagonal of ones,and every other entry is zero (see (2.21) below), with the property that

AI = I A = A,

for any matrix A.

• MATLAB command: eye(n).

A square matrix A of order n is called nonsingular if its determinant is notzero: det(A)6= 0. Otherwise it is called singular.

Example 2.4.3 det

4 2 91 1 07 1 3

= −48. Hence, the matrix is nonsingular.

• MATLAB command: det(A).

Remark 2.13 Two very useful properties of determinants are the following:

det(AB) = detAdetB,

det A−1 = 1/detA.(2.19)

In one dimension, every real number a 6= 0 has a unique multiplicative inverse1/a, and it is obvious that a

(

1a

)

=(

1a

)

a = 1. This has a natural generalization


to matrices. A nonsingular matrix An×n has a unique n × n inverse matrix,denoted by A−1 with the property that

AA−1 = A−1A = I. (2.20)

For this reason, nonsingular matrices are also called invertible.

• MATLAB command: inv(A).


4 0 91 −1 0

−2 1 0

. Since det(A) = −9, the matrix is

nonsingular and has an inverse: A−1 =

0 −1 −10 −2 −1

1/9 4/9 4/9

, and we can verify

that in fact we have

AA−1 = A−1A = I =

1 0 00 1 00 0 1

. (2.21)

Computing the inverse of a matrix is not an easy task. There is more than oneanalytical or exact way to do this, e.g. using the adjoint method, we have

A−1 = adjoint(A)/det(A).

However, this method is rarely used in practice, and we do not discuss it here. Asecond and more efficient method is through Gauss elimination (see Section 4.1).In general, the computation of the inverse of a matrix has to be done numerically,and great care has to be taken due to potentially large accumulation of errorsfor some matrices. Thus, in practice it is customary to avoid computing theinverse of a matrix explicitly, and some other options must be used. We willdiscuss these issues later on.

Remark 2.14 For the inverse of a product of nonsingular matrices we have

(A1A2 · · ·Ak)−1 = A−1

k A−1k−1 · · ·A−1

1 . (2.22)


2.4.4 Symmetric and positive definite matrices

A square matrix A of order n is called symmetric if A = AT , that is, thematrix equals its transpose. If A = (aij), then we can also say A is symmetricif aij = aji, for i, j = 1, . . . , n.

Example 2.4.5 The following matrices are symmetric

A =

[

1 44 9

]

, A =

1 −2 7−2 3 5

7 5 6

.

Example 2.4.6 For an arbitrary matrix A, the matrices A+AT and AAT aresymmetric.

Remark 2.15 If An×n is complex, we say A is Hermitian if A = A∗. Byinstance, the following matrix is Hermitian

A =

[

1 4 − 2i4 + 2i −9

]

.

In the set of real numbers, we can define a number a to be positive if ax2 > 0,for all numbers x 6= 0. The generalization of this definition for matrices leads tothe so called positive definite matrices. A square matrix A is positive definiteif

xT Ax > 0, (2.23)

for all vectors x 6= 0. If instead we have xT Ax ≥ 0, then the matrix is said tobe positive semidefinite.

In several applications, especially optimization, we will encounter matrices thatare both, symmetric and positive definite. That is why several textbooks sim-ply assume positive definite matrices to be symmetric; we call such matricessymmetric positive definite (spd). See also Remark 2.16 below.

• MATLAB command: chol(A) (Choleski)


Example 2.4.7 The following matrices are spd:

A =

[

4 1010 26

]

, B =

100 15 0.0115 2.3 0.01

0.01 0.01 1

.

It is obvious that A is symmetric. To see why it is positive definite, we canverify that xT Ax > 0 for any vector x 6= 0:

[x1 x2]

[

4 1010 26

] [

x1

x2

]

= (2x1 + 5x2)2 + x2

2 > 0.

Example 2.4.8 The matrix AT A is symmetric positive semidefinite, for anymatrix Am×n.

Example 2.4.9 The matrix A =

[

a b−b a

]

with a > 0 is positive definite.

Remark 2.16 It is important to state that a positive definite matrix does not

need to be symmetric, as in the case of the matrix A =

[

1 02 2

]

.

Definition 2.17 Let A be a square matrix of order n, and let m < n. Aprincipal submatrix of A is a matrix obtained by deleting any n − m rowsand corresponding columns. A leading principal submatrix is obtained by deletingthe last n − m rows and columns.

Notation. We can use and index set I ⊆ {1, . . . , n} to represent which rowsand columns of A are used to form the principal submatrix. For instance, if

A =

1 4 72 5 83 6 9

, then A({1, 3}) =

[

1 73 9

]

, A({1, 2}) =

[

1 42 5

]

.

The last one is a leading principal submatrix of A.

Remark 2.18 A positive definite matrix has all its principal submatrices posi-tive definite. Thus, in particular, all the diagonal elements of a positive definitematrix are positive.


2.4.5 Matrix exponential

A central tool in the theory and applications of differential equations, dynamicalsystems and control theory, is the matrix exponential of a square matrix Aof order n. This special type of matrix is defined as

eA =

∞∑

k=0

Ak

k!= I +

A

1!+

A2

2!+ · · · + Ak

k!+ · · · . (2.24)

This is an extension of the real exponential function

ex = 1 +x

1!+

x2

2!+ · · · + xk

k!+ · · · (x ∈ R)

to n dimensions. The series (2.24) always converges for any square matrix A.


[

3 00 5

]

. Since A0 = I and

A2 =

[

32 00 52

]

, A3 =

[

33 00 53

]

, . . . , Ak =

[

3k 00 5k

]

,

we have

eA =

∞∑

k=0

3k

k! 0

0∞∑

k=0

5k

k!

=

[

e3 00 e5

]

.


0 1 −20 0 40 0 0

. Then, A2 =

0 0 40 0 00 0 0

,

A3 =

0 0 00 0 00 0 0

, and Ak is the zero matrix, for any k ≥ 3. Then eA =

I + A + 12A2, and therefore eA =

1 1 00 1 40 0 1

.

In Chapter 8 we will consider several other examples of a matrix exponential.

• MATLAB command: expm(A)


2.4.6 Permutation matrices

In matrix computations we often need to perform permutations of rows orcolumns of a matrix, that is, we exchange the entries of two given rows orcolumns respectively. This can be simply achieved through matrix multiplica-tion by a special type of square matrix.

A permutation matrix P can be defined as the identity matrix with some ofits rows reordered as needed.

Example 2.4.12 The following are permutation matrices

P1 =

1 0 00 0 10 1 0

, P2 =

0 0 1 00 0 0 10 1 0 01 0 0 0

, P3 =

0 0 0 11 0 0 00 0 1 00 1 0 0

.

Observe that for the matrices above we have:

P 21 = I, P 4

2 = I, P 33 = I.

Thus, to recover the identity matrix, we need to multiply the permutation matrixP by itself as many times as the number of rows that have been reordered inthe identity (or, as the number of zeros on the diagonal of P ).

In general, due to the way they are defined, permutation matrices satisfy theimportant property

P T P = I = P P T . (2.25)

Therefore they are invertible, and P−1 = P T .

Multiplying an arbitrary matrix A by a permutation matrix P from the left willresult in row permutations in A, and multiplication from the right will result incolumn permutations.


1 5 82 0 53 3 6

, and P =

1 0 00 0 10 1 0

. Then, since the

permutation matrix P is the identity after permuting its rows (or columns) 2and 3, we can permute the corresponding rows (or columns) of A by multiplying:

P A =

1 5 83 3 62 0 5

, or AP =

1 8 52 5 03 6 3

.


It is clear that if P is a permutation matrix, then P T also is (but in general,P T 6= P , as we see from the matrices P2, P3 in Example 2.4.12). Then, we cansimultaneously interchange rows and columns of a given matrix A through theoperation P T AP . In the example below, AP is the matrix A with the first andthird columns interchanged. Then, the product P T AP interchanges the firstand third row of AP .


0 7 9 03 0 0 06 4 0 21 0 5 0

and P =

0 0 1 00 1 0 01 0 0 00 0 0 1

. Then,

P T AP =

0 4 6 20 0 3 09 7 0 05 0 1 0

.

• MATLAB command: colperm(A), flipud(A), fliplr(A)

2.4.7 Orthogonal matrices

And last, but definitely not least, we consider the type of matrix that is essentialand the favorite one in matrix computations, numerical analysis, and appliedmathematics in general, due to its very important and remarkable properties.

An m × n matrix Q is called orthogonal if

QT Q = I. (2.26)

This definition implies (see Exercise 2.38) that the columns of an orthogonalmatrix Qm×n are orthonormal, in the ‖ ‖2 norm.

In particular, if the matrix Q is square, then QT Q = Q QT = I. In this case,the following statements are equivalent:

(a) Q is orthogonal

(b) Q−1 = QT .


(c) The columns of Q are orthonormal.

(d) The rows of Q are orthonormal.

Example 2.4.15 Permutation matrices are obviously orthogonal. See (2.25).

Example 2.4.16 The following matrices (the first for any real α) are also or-thogonal.

Q =

[

cos α − sin αsin α cos α

]

, Q =

1√3

− 1√2

1√3

0

1√3

1√2

.

Remark 2.19 In the complex case, we say a matrix Um×n is unitary if

U∗U = I,

where as before U∗ denotes the adjoint of U .

The definition (2.26) and the statements (a) - (d) above immediately set orthog-onal matrices apart from most other matrices, and would be enough to considerthem very important in theory and applications. But there is more.

A very important and useful fact about orthogonal matrices is that they preservethe norm of a vector or a matrix. More precisely, we have the following

Theorem 2.20 Let Q be an orthogonal matrix. For any vector x ∈ Rn and an

arbitrary matrix A we have

‖Q x‖2 = ‖x‖2, and ‖Q A‖2 = ‖A‖2. (2.27)

Proof. First, observe that ‖Qx‖22 = xT QT Qx = xT x = ‖x‖2

2. Using this fact,we now have

‖QA‖2 = max‖x‖2=1

‖QAx‖2 = max‖x‖2=1

‖Ax‖2 = ‖A‖2.

�


The next important property of an orthogonal matrix is especially related towhat is known as sensitivity in the solution of several problems. First we needthe following

Definition 2.21 The condition number of a square matrix A is defined as

cond (A) = ‖A‖ ‖A−1‖. (2.28)

This number represents how well or ill-posed a matrix can be, in the sensethat how much we can rely on the computations performed with such a matrix,such as solving systems of equations, due to potential accumulation of roundoff errors. Computations involving matrices with large condition numbers arepotentially very inaccurate; roughly speaking, with cond(A) ≈ 10k, one canexpect to lose about k digits of precision in numerical computations.

For an arbitrary matrix A, the condition number has the property

cond(A) ≥ 1,

and for any orthogonal matrix Q we have:

cond(Q) = 1,

the minimum possible. See Exercise 2.45.

It is important to remark that due to these properties, errors are not magnifiedwhen performing matrix computations with orthogonal matrices. By instance,when reducing a matrix A to a triangular form, operations with orthogonalmatrices such as successive products of the type QA are performed safely becausethe condition number of QA is the same as that of A. In general, with productsof the type MA, there is the actual risk that the condition number of MA ismuch larger than that of A, making the computations numerically unstable andunreliable. This does not happen with orthogonal matrices. See also Exercises2.42 and 2.43.

• MATLAB command: cond(A)

Remark 2.22 It is important to stress again that orthogonal matrices are theideal tools to use in numerical computations. They are stable, errors are notmagnified, and there is no need to compute inverses. In addition, their conditionnumber is 1, the best possible.


Now we introduce one of the most important orthogonal matrices that are exten-sively used in matrix computations and which we will use explicitly in Chapter4. This matrix has the additional and convenient property of symmetry.

Definition 2.23 (Householder matrix). Let u be a unit vector in Rn. Then

the matrixH = I − 2uuT (2.29)

is called Householder matrix.

Theorem 2.24 The matrix H in (2.29) is orthogonal and symmetric.

Proof. Since (uuT )T = uuT , we have HT = H and hence H is symmetric. Inaddition, since uT u = 1, we get

HT H = (I − 2uuT )T (I − 2uuT ) = (I − 2uuT ) (I − 2uuT )

= I − 4uuT + 4uuT uuT = I,

and therefore H is orthogonal.

�

A Householder matrix (2.29) is also known as Householder reflection becausegiven an arbitrary vector x ∈ R

n, the vector Hx is a reflection of x with respectto the hyperplane u⊥, which is the set of all vectors perpendicular to u. In otherwords, the vectors x and Hx have exactly the same orthogonal projection ontothat hyperplane.

Example 2.4.17 Let u = [ 1√2

1√2]T . Then, the associated Householder matrix

is

H = I − 2uuT =

[

0 −1−1 0

]

.

Now let x = [0 2]T ; then Hx = [−2 0]T , which as we can see from Figure 2.2is a reflection of x with respect to u⊥.

Remark 2.25 For an arbitrary vector u 6= 0, a Householder matrix is definedas

H = I − 2uuT

uT u. (2.30)


u

u

x

xH

Figure 2.2: Householder reflection of x.

Orthonormal extensions. One important application of Householder matri-ces is extending a given vector x ∈ R

n to a set of n orthonormal vectors. Theidea is to get an orthogonal matrix H, with its first column being the vectorx, normalized, if necessary. This tool will be crucial e.g. when proving severaltheorems in Chapter 4.

Let x ∈ Rn be a vector with ‖x‖2 = xT x = 1. Define the vector

u = x − e1.

First observe that uT x = xT x− eT1 x = 1− eT

1 x and that uT u = xT x−xT e1 −eT1 x + eT

1 e1 = 2(1 − eT1 x), so that uT u = 2uT x. Then,

Hx =(

I − 2uuT

uT u

)

x = x − 2u(uT x)uT u

= x − (2uT x)uuT u

= x − u = e1.

The columns of this Householder matrix H are of course orthonormal, but ac-cording to the equality Hx = e1, we have that x = HT e1 = He1. That is, x isthe first column of H.

Example 2.4.18

Let x =1

6

40

−42

. Take u = x − e1 =1

6

−20

−42

.

2.5. VECTOR SPACES 35

Then, the matrix

H = I − 2uT u

uT u=

1

6

4 0 −4 20 6 0 0

−4 0 −2 42 0 4 4

is orthogonal, with x as its first column.

We will see some more applications of Householder matrices in Chapter 4.

2.5 Vector Spaces

Although vector spaces can be very general, we restrict our attention to vectorspaces in finite dimensions, and unless otherwise stated, we will always considerthe scalars to be real numbers.

We know that if we add, subtract, multiply or divide (except division by zero)two real numbers, we get a real number again. We know that there is thenumber zero and the number one, with the properties that x + 0 = 0 + x = xand x ·1 = 1 ·x = x. We also know that the real numbers have the commutativeand distributive properties; namely: x+y = y +x, x ·y = y ·x and x+(y +z) =(x + y) + z, etc. Because of all these properties, working with real numbers issimple and flexible, and we usually take all these properties for granted.

Since linear algebra deals not only with real numbers but also and especiallywith vectors and matrices, we would like to provide vectors and matrices withproperties similar to those of real numbers, so that we can handle them in asimple manner. It happens that this is possible not only for vectors and matricesbut also for several other objects. Thus, in general we have the following

Definition 2.26 A set V of elements (called vectors) having two operations,addition and scalar multiplication, is a vector space if the following propertiesare satisfied

1. u + v ∈ V , for all u, v ∈ V .

2. c u ∈ V , for any scalar c and u ∈ V .


3. u + v = v + u, for all u, v ∈ V .

4. u+(v+w)=(u+v)+w, for all u, v,w ∈ V .

5. There exists a zero vector such that u+0=u, for all u ∈ V .

6. 1u = u, for all u ∈ V .

7. For every u in V , there exists a negative vector −u, such that u+(−u) = 0.

8. c (u + v) = cu + cv, for every scalar c and all u, v ∈ V .

9. (c1 + c2)u = c1u + c2u, for every scalars c1, c2 and all u ∈ V .

10. c1(c2u) = (c1c2)u, for every scalars c1, c2 and all u ∈ V .

Example 2.5.1 The real numbers with the usual addition and multiplicationobviously form a vector space.

Example 2.5.2 The set Rn with the usual operations of addition of and scalar

multiplication forms a vector space. This is somehow the “canonical” exampleof a vector space. In fact, when we think about vectors we tend to think aboutan objects of the form [x1 · · · xn]T , with the usual geometric representation asarrows. But vectors are elements of more general sets.

Example 2.5.3 The set of matrices with the usual addition and scalar multi-plication (a real number times a matrix) forms a vector space.

Example 2.5.4 The set of functions with the real numbers as their domain,and with the usual addition and scalar multiplication forms a vector space.

Given a vector space V , some of its subsets may be vector spaces on their own,with the same operations of addition and scalar multiplication. More formally,we can characterize them through the following

2.5. VECTOR SPACES 37

Definition 2.27 Let V be a vector space and U a nonempty subset of V . Then,we say U is a vector subspace of V if

(i) u + v ∈ U , for all u, v ∈ U .

(ii) c u ∈ U , for all scalars c and all u ∈ U .

This definition simply says that operations performed with elements of U willgenerate vectors that will stay within U .

Example 2.5.5 Let V = R2, then V is a vector space (see Example 2.5.2), and

let U = {(x, y) ∈ R2 : y = x}. In other words U is the line y = x. Clearly, U isa subset of V . We need to verify the conditions in Definition 2.27.

(i) Let u, v be arbitrary elements in U . Then, u = (u1, u1) and v = (v1, v1), forsome real numbers, u1, v1. Thus, u+v = (u1 +v1, u1 +v1) and hence u+v ∈ U .

(ii) Let c be any real scalar, and u an arbitrary vector in U . Then, c u =c (u1, u1) = (cu1, cu1), and therefore c u ∈ U .

Hence, U is a subspace of V . See Figure 2.3.

y = xV = R2

U

Figure 2.3: The line y = x is a subspace U of V = R2.

Example 2.5.6 Let again V = R2, and let U = {(x, y) ∈ R2 : y = x + 1}.

Then, U is not a vector subspace of V (see Exercise 2.49).

Observe that since condition (ii) in Definition 2.27 must be true for c = 0, avector subspace (and in general any vector space) must contain the zero vector.The subset U in Example 2.5.6 fails to be a vector subspace because it does notcontain the origin.


Example 2.5.7 Let V be the vector space of all continuous function on aninterval [a, b]. Then, the set U of all polynomials of degree at most n, definedon the interval [a, b] is a vector subspace of U .

Example 2.5.8 Let V be the vector space of squares matrices of order n. Then,the set U of symmetric matrices of order n is a vectors subspace of V .

Example 2.5.9 (Sum of subspaces). If U and W are subspaces of V , then theset

U + W = {u + w : u ∈ U, w ∈ W} (2.31)

(called the sum of U and W ) is a subspace of V .

Here is a specific example of sum of subspaces.

Example 2.5.10 Let V be the vector space of square real matrices of order 2,and let

U =

{[

0 a0 b

]

: a, b ∈ R

}

, W =

{[

0 0c d

]

: c, d ∈ R

}

Clearly, U and W are subspaces of V , and

U + W =

{[

0 ef g

]

: e, f, g ∈ R

}

,

which is a vector subspace of V .

Example 2.5.11 A very important example of subspace is that of the columnspace of a matrix, which is introduced in Section 2.8.

2.6. LINEAR INDEPENDENCE AND BASIS 39

2.6 Linear Independence and Basis

The concepts introduced in this section are related to a fundamental notion inlinear algebra: a linear combination of a set of vectors v1, . . . , vk, which is anexpression of the form

c1v1 + · · · + ckvk,

where the c1, . . . , ck are scalars.

Definition 2.28 Let V be a vector space, and U ⊆ V a vector subspace of V .A set of vectors S = {v1, . . . , vk} in V is said to span U if any vector x ∈ Ucan be written as a linear combination of elements in S, that is

x = c1v1 + · · · + ckvk, (2.32)

for some scalars c1, . . . , ck. This linear combination does not need to be unique.

Example 2.6.1 Let V = R2. Then, the set S1 = {[−1 1]T , [0 1]T , [2 2]T }

spans V , that is, any vector in R2 can be expressed as a linear combination of

the vectors in S1. By instance,

[

3−4

]

= −5

[

−11

]

+ 3

[

01

]

−[

22

]

. (2.33)

However, the combination in (2.33) is not unique. Indeed, instead of the scalars−5, 3,−1 in (2.33), we could also use −7, 7,−2 and the new combination wouldstill gives us [3 −4]T . This indicates again that in general the linear combination(2.32) need not be unique.

Example 2.6.2 Again, let V = R2, and let S2 = {[−1 1]T , [2 2]T }. Observe

that S2 is formed with only two vectors of S1 from Example 2.6.1, but it alsospans V . In particular, we can write

[

3−4

]

= −7

2

[

−11

]

− 1

4

[

22

]

. (2.34)

Thus, it seems that although S1 does span V , it has one element too many. Thisis not the case with S2. Even more, the combination (2.34) is unique.


Example 2.6.3 If the sets of vectors T1, T2 span the subspaces V and W re-spectively, then the set T1

⋃

T2 spans the sum subspace V + W .

In fact, let T1 = {v1, . . . , vp} and T2 = {w1, . . . , wq}, and let x ∈ V + W bearbitrary. Then x = v + w, for some v ∈ V and w ∈ W . Then, we can writex = (α1v1 + · · · + αpvp) + (β1w1 + · · · + βqwq), which implies that x lies in thespan of T1

⋃

T2.

The fundamental difference between the sets S1 and S2 of Examples 2.6.1 and2.6.2 is in a central concept in linear algebra given in the following

Definition 2.29 A set of vectors {v1, ..., vn} is said to be linearly indepen-dent if

c1v1 + . . . + cnvn = 0 implies c1 = · · · = cn = 0. (2.35)

In simple words, this definitions says that if v1, ..., vn are linearly independent,none of them can be written as a combination of the others. (Otherwise, sayc1 6= 0; then, we can write v1 as the linear combination v1 = − c2

c1v2−· · ·− cn

c1vn).

Example 2.6.4 Let V = R2 and let v1 = [2 1]T and v2 = [1 3]T . To showlinear independence, assume that c1v1 + c2v2 = [0 0]T . Then, this equalitygives the system

2c1 + c2 = 0c1 + 3c2 = 0,

whose solution is c1 = c2 = 0. Therefore, v1 and v2 are linearly independent.Observe that in this case, linear independence means the vectors are not parallelor multiple of each other. See Figure 2.4.

Example 2.6.5 The set S1 in Example 2.6.1 is not linearly independent. Infact, any of the vectors in S1 can be written as a combination of the other two.For instance,

[

01

]

=1

2

[

−11

]

+1

4

[

22

]

.

However, the set S2 in Example 2.6.2 is linearly independent.


−2 −1 0 1 2 3−1

−0.5

0

0.5

1

1.5

2

2.5

3

v1

v2

Figure 2.4: v1 and v2 in Example 2.6.4 are linearly independent

Example 2.6.6 Let V = P2 be the vector space of real polynomials of degree atmost 2. The set { f1 = x2 + 3x − 1, f2 = x + 3, f3 = 2x2 − x + 1 } is linearlyindependent. In fact, if c1f1 + c2f2 + c3f3 = 0, we get the system

c1 + 2c3 = 03c1 + c2 − c3 = 0−c1 + 3c2 + c3 = 0

whose solution is c1 = c2 = c3 = 0.

A set S of vectors could span a vector space V but not necessarily be linearlyindependent, and some other set may be linearly independent but not span V .If a given set of vectors is linearly independent and at the same time spans agiven vector space, then it is a very special set.

Definition 2.30 A set of vectors B = {v1, ..., vn} is said to be a basis of avector space V if it spans V and is linearly independent.

The most important application of this definition is that if x is an arbitraryelement of V then it can be written in a unique form as a linear combination ofthe elements of the basis. That is,

x = c1v1 + · · · + cnvn, (2.36)

for some unique scalars c1, . . . , cn. In other words, the scalars or coefficients ci in(2.36) uniquely determine the vector x on the given basis. For a different basis,there will be different coefficients, but again this combination is unique, on that


basis. From Definition 2.30 we can say that a basis B is a genuine (though notunique) representative of a vector space V .

Example 2.6.7 The set of vectors

B = { e1 = [1 0 · · · 0]T , e2 = [0 1 · · · 0]T , . . . , en = [0 0 · · · 1]T }

is a basis for V = Rn. It is known as the canonical or standard basis of R

n.

Observe for instance that

834

= 8

100

+ 3

010

+ 4

001

.

Thus, in the basis B = {e1, e2, e3}, the (unique) coefficients in the linear com-bination are exactly the entries of the given vector. That is why it is called thecanonical or standard basis.

• MATLAB commands: orth(A).

Example 2.6.8 The set of vectors

B = { v1 = [1 0 2]T , v2 = [−2 1 8]T , v3 = [0 2 0]T }

is a basis of R3, so that any vector x ∈ R

3 can be written as a unique combinationof such basis vectors. By instance,

834

= 6

102

− 1

−218

+ 2

020

.

This means that on the basis B above, the vector [8 3 4]T is fully representedby its coordinates {6,−1, 2}. Similarly, on the basis B of Example 2.6.7, thesame vector is represented by its corresponding coordinates {8, 3, 4}.

Example 2.6.9 The set of matrices

B =

1 0 0 00 0 0 00 0 0 00 0 0 0

,

0 1 0 00 0 0 00 0 0 00 0 0 0

, · · · ,

0 0 0 00 0 0 00 0 0 00 0 0 1

(2.37)


is a basis for the vector spaces of squares matrices of order 4, and just as inExample 2.6.7, it is known as the canonical or standard basis of such space.This basis is particularly important (among other things) in image compression,which we study in Chapter 6.

Example 2.6.10 The set B = { f1 = 1, f2 = x, f3 = x2 } is a basis for thevector space P2 of the polynomials of degree at most two.

It is not difficult to see that the set B in Example 2.6.10 spans P2. Let us seethat it is also linear independent. Assume that c1f1+c2f2+c3f3 = 0. This givesc1 · 1 + c2x + c3x

2 = 0. Comparing coefficients, we immediately get c2 = c3 = 0,leaving us with c1 · 1 = 0. Thus, c1 must also be zero. This proves B is linearlyindependent.

The next is a particular case of a very interesting and useful class of polynomialscalled Bernstein polynomials, that form a basis for the vector space Pn ofpolynomial of degree at most n.

Example 2.6.11 The following set of polynomials

B0,3(t) = (1 − t)3,

B1,3(t) = 3t(1 − t)2,

B2,3(t) = 3t2(1 − t),

B3,3(t) = t3.

is a basis for the vector space P3 of the real polynomials of degree at most three.In fact, any polynomial of degree at most three can be expressed as a uniquecombination of the Bernstein polynomials above. By instance, if p(t) = 25t3 −21t2 − 3t + 2, then

p (t) = 2B0,3(t) + B1,3(t) − 7B2,3(t) + 3B3,3(t).

We will study Bernstein polynomials and their applications in detail later inSection 2.12.

Intuition tells us that the real line R is one-dimensional, the plane R2 is two-

dimensional, and the space R3 is three-dimensional. This coincides with the


number of vectors in the bases of each of of these spaces, e.g. B = {e1, e2} isthe canonical basis of R2. These ideas generalize into the following

Definition 2.31 If a vector space V has a basis consisting of m vectors, thenwe say the dimension of V is m, and we write dim(V ) = m.

Example 2.6.12 In Example 2.5.5, dim(V ) = 2 and dim(U) = 1.

Remark 2.32 If U is a subspace of V , then dim(U) ≤ dim(V ); if dim(U) =dim(V ), then we must have U = V .

Example 2.6.13 The basis in Example 2.6.10 has 3 elements, therefore thedimension of the vector space P2 of real polynomials of degree at most 2, is 3.Similarly, the basis in Example 2.6.11 indicates that the dimension of the vectorspace P3, is 4. In general, the dimension of Pn is n + 1.

The next result expresses the relationship between the dimensions of two vectorssubspaces and that of its sum.

Theorem 2.33 If V and W are subspaces of U , then

dim(V + W ) = dim(V ) + dim(W ) − dim(V ∩ W ). (2.38)

Proof. Let R = {u1, . . . , ur} be a basis of V ∩W . For some positive integers mand n, extend this set R to form the set BV = {u1 . . . , ur, v1, . . . , vm}, a basis ofV , and the set BW = {u1, . . . , ur, w1, . . . , wn}, a basis of W . In Example 2.6.3we saw that the set B = BV ∪ BW spans V + W . Let us prove that B is alsolinearly independent. Thus, assume that

r∑

i=1

aiui +

m∑

j=1

bjvj +

n∑

k=1

ckwk = 0 (2.39)

for some scalars ai, bj, ck. This implies that

r∑

i=1

aiui +m∑

j=1

bjvj = −n∑

k=1

ckwk

2.7. ORTHOGONALIZATION AND DIRECT SUMS 45

Since the left hand side is a vector in V , then the vectorn∑

k=1

ckwk lies in V ∩W

and therefore can be written asn∑

k=1

ckwk =r∑

i=1diui, for some scalars di. Thus,

n∑

k=1

ckwk −r∑

i=1

diui = 0

But since BW is linearly independent, we must have ck = 0 and di = 0, for allk = 1, . . . , n and i = 1, . . . , r. Then from (2.39) we get

r∑

i=1

aiui +m∑

j=1

bjvj = 0.

Similarly, since BV is linearly independent, we must have ai = 0 and bj = 0, forall i = 1, . . . , r and j = 1, . . . ,m. Thus, B is linearly independent and thereforea basis of V + W . Finally, using the definition of dimension we get

dim(V +W ) = r+m+n = (r+m)+(r+n)−r = dimV +dimW −dim(V ∩W ).

�

2.7 Orthogonalization and Direct Sums

Given a set of linearly independent vectors, it is always possible to transformit into an orthonormal set. In particular, this means that given a basis of avector space, we can always transform such basis into an orthonormal one. Infact, due to some technical advantages, several applications start by consideringonly orthonormal bases. One well-known procedure to orthonormalize a linearlyindependent set is the following

Gram-Schmidt process: Given m linearly independent vectors {v1, ..., vm}in R

n, (m ≤ n), an orthonormal set of vectors {q1, ..., qm} can be obtained bydefining


q1 = v1

‖v1‖ ,

w2 = v2 − (vT2 q1) q1, q2 = w2

‖w2‖ ,

w3 = v3 − (vT3 q1) q1 − (vT

3 q2) q2, q3 = w3

‖w3‖ ,...

...

wm = vm −m−1∑

k=1

(vTmqk) qk, qm = wm

‖wm‖ .

(2.40)

Observe that in general the vector v2 − q1 need not be orthogonal to q1. Thisis why in the definition of w2, the vector q1 is first rescaled by a factor vT

2 q1 sothat w2 is orthogonal to q1. A similar idea is applied to the remaining vectorswi. See Figure 2.5.

0 1 2 3 4 5−0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

q1

v1

(v2T q

1) q

1

w2

v2

Figure 2.5: Gram-Schmidt: w2 is orthogonal to q1

• MATLAB command: orth(A).

Example 2.7.1 Let v1 = [2 − 1 0]T , v2 = [1 0 − 1]T , v3 = [3 7 − 1]T .Then, following (2.40),

‖v1‖ =√

5, q1 = 1√5

[2 − 1 0]T .

w2 = v2 − 2√5

q1 = 15 [1 2 − 5]T , ‖w2‖ =

√305 ,

q2 = w2

‖w2‖ = 1√30

[1 2 − 5]T .

2.7. ORTHOGONALIZATION AND DIRECT SUMS 47

w3 = v3 − −1√5q1 − 22√

30q2 = 1

3 [8 16 8]T , ‖w3‖ = 83

√6,

q3 = w3

‖w3‖ = 1√6[1 2 1]T .

The set { q1, q2, q3} is orthonormal. See Figure 2.6.

2vv

13v q

q

q

1

2

3G r a m − S c h m i d t

Figure 2.6: Gram-Schmidt orthonormalization

Direct Sums: In some special cases, every element of a vector space can beexpressed as a unique sum of two vectors, each one lying in one of two differentvector subspaces. More formally, we have the following

Definition 2.34 Let V be a vector space, and U and W be two subspaces of V .Then, we say that U and W form a direct sum of V , denoted by V = U ⊕W ,if

U ⊕ W = {v ∈ V : v = u + w, u ∈ U, w ∈ W, U ∩ W = 0}. (2.41)

This definition implies that V is spanned by U and W , and that every vectorin V can be uniquely expressed as v = u + w.

Remark 2.35 If V = U ⊕W , then U and W are called complementary spaces.The vectors u and w in (2.41) are called the projection of v onto U along W andthe projection of v onto W along U respectively.


Example 2.7.2 Let V = R3, and consider two subspaces: U the xy plane, and

W the line spanned by w = [1 1 1]T . Then, V = U ⊕W . In fact, any vectorv = [x y z]T can be expressed as a unique sum of a vector of U and a vectorof W :

xyz

=

x − zy − z

0

+

zzz

.

Observe that U ∩ W = {0}.

Example 2.7.3 In Example 2.5.10, we can observe that U ∩ W 6= {0} andtherefore the sum U + W is not a direct sum.

For a very useful example of direct sums, consider now an arbitrary vectorsubspace U of a vector space V . We can define another vector subspace, the or-thogonal complement of U , which consists of all vectors that are orthogonalto each vector in U (and it is usually called the “perp” of U):

U⊥ = {v ∈ V : vT u = 0, ∀u ∈ U}. (2.42)

A direct sum of subspaces can be obtained by using the perp of a vector subspace.In fact, we have the following

Theorem 2.36 Let V be a vector space, and U an arbitrary vector subspace ofV . Then,

V = U ⊕ U⊥.

Proof. Let v be an arbitrary vector in V , and let {u1, . . . , um} be an orthonor-mal basis of U . Then, similar to the way we defined the vectors in the Gram-Schmidt process (2.40), let

u = (vT u1)u1 + · · · + (vT um)um.

Then, the vector w = v−u is orthogonal to each ui, i = 1, . . . ,m, and thereforeit is in U⊥. Hence, v = u + w, where u ∈ U and w ∈ U⊥. Also, it is clear thatthe only intersection of U and U⊥ is the zero vector, for if u ∈ U and u ∈ U⊥,then

‖u‖2 = uT u = 0.

This proves that in fact, V = U ⊕ U⊥.

2.8. COLUMN SPACE, ROW SPACE AND NULL SPACE 49

�

Example 2.7.4 If V = R3, and U is the XY -plane, then U⊥ is the Z-axis.

Then, it is clear that V = U ⊕ U⊥. In fact, every vector [x y z]T can beuniquely written as [x y z]T = [x y 0]T + [0 0 z]T .

Example 2.7.5 A matrix An×n is said to be skew symmetric if A = −AT .If we let S be the subspace of n × n symmetric matrices and K the subspace ofn × n skew symmetric matrices, then

Rn×n = S ⊕ K.

Here is an example of a skew symmetric matrix: A =

0 2 5−2 0 −4−5 4 0

.

Note. In the complex case, we say that a matrix is A skew hermitian if

A = −A∗.

Later on we will see some particular and useful examples of direct sums.

2.8 Column Space, Row Space and Null Space

Associated to every m×n matrix A there are three fundamental vector spaces:the column space of A, col(A), the row space of A, row(A), and the null spaceof A, N(A), which are important in several applications, e.g. in the solution orleast squares solution of a linear system Ax = b, in web page ranking, and ininformation retrieval, topics that we consider in detail later on. Here we willuse the concepts of linear independence, span, etc. that we have introduced inthe previous sections.

Definition 2.37 The column space of a matrix Am×n is defined as the vectorsubspace of R

m spanned by the columns of A.


This definition says that col(A) is the vector subspace formed by the vectorsobtained through all possible linear combination of the columns of A.

Another appropriate and useful definition of the column space is given as

col(A) = { y ∈ Rm : y = Ax, for x ∈ Rn }. (2.43)

In other words, the column space of A is the set of vectors that can be expressedas Ax, for some vector x. In this sense, the column space is also known as therange of A, or the image of A, and it is usually denoted as R(A), or Im(A).

That is,

col(A) = R(A).

Remark 2.38 Observe that each column of Am×n is an m-dimensional vector,that is why the col(A) is a subspace of R

m.

Remark 2.39 Combining Definition 2.37 and (2.43), we conclude that for anyx ∈ Rn, Ax is a combination of the columns of A.

Definition 2.40 The dimension of the column space of a matrix Am×n (thenumber of linearly independent columns of A) is called the rank of A.

• MATLAB command: rank(A).


1 −25 3

−4 7

, and let x =

[

x1

x2

]

be arbitrary. Then,

Ax =

x1 − 2x2

5x1 + 3x2

−4x1 + 7x2

= x1

15−4

+ x2

−237

.

This clearly illustrates that in fact Ax is a combination of the columns of A, forany x. We also observe that in this case, since the two columns of A are linearlyindependent, the col(A) is 2-dimensional; that is, geometrically it is a plane inR

3 spanned by the two columns of the matrix A, and thus rank(A) = 2.



2 −1 00 1 −12 0 −1

. By definition, the col(A) is the set

of vectors of the form Ax, for any x. In this case,

Ax =

2x1 − x2

x2 − x3

2x1 − x3

= x1

202

+ x2

−110

+ x3

0−1−1

.

It is very important to observe in Example 2.8.2 that although it is true thatthe col(A) is spanned by (all) the columns of A, and A has 3 columns, theyare not linearly independent. In fact, it is easy to verify that for instance, thethird column is a combination of the first two, and that these first two columnsare linearly independent. This means that the rank of A, and the dimension ofcol(A) is two. Thus, we can say that the col(A) is also spanned by just the firsttwo columns of A, and hence it has dimension two. In other words, for Example2.8.2, the column space of A is the subspace of vectors that can be obtained vialinear combinations of the form

col(A) = c1

202

+ c2

−110

,

where c1, c2 are arbitrary scalars. The col(A) is therefore a plane in R3.

Remark 2.41 We have expressed col(A) as a combination of the first twocolumns, but it can also be expressed as a combination of the last two, or thefirst and the last. In any case, there are only two linearly independent columns.

• MATLAB command: orth

Naturally, we can also consider the subspace spanned by the columns of AT , orin other words, by the rows of A, and we denote this space by row(A).

Definition 2.42 The row space of a matrix Am×n is defined as the vectorsubspace of R

n spanned by the rows of A.


Example 2.8.3 For the matrix A of Example 2.8.1, we have seen that dimcol(A)=2. If we now denote the first, second and third rows of that matrix withr1, r2 and r3 respectively, then we can prove that r2 = −47 r1 − 13 r3, that is, r2

is a linear combination of the other two rows, which means that the three rowsare not linearly independent. And in fact, we must have dim row (A) = 2. Inthis case, the row space of A is the set of linear combinations of the form

row(A) = c1 [1 − 2] + c2[−4 7],

where c1, c2 are arbitrary scalars. Thus, row (A) spans the whole space R2.

It is a very remarkable fact that the row space has the same dimension as thecolumn space of a matrix A. More formally,

Theorem 2.43 Let A be an m × n matrix. Then,

rank (A) = dim col(A) = dim row(A). (2.44)

Proof. Denote the rows of A by r1, r2, . . . , rm, and let dim row(A) = k. Assumethat B = {v1, . . . , vk} is a basis of row(A). Then, each row vector ri, i = 1, . . . ,mcan be written as

ri = ci1v1 + ci2v2 + · · · + cikvk,

for some scalars ci1, . . . , cik. But the j-th component of each row vector ri isnothing else than aij . Thus, with respect to each j-th component (j = 1, . . . , n),the above equation can be written as

aij = ci1v1j + ci2v2j + · · · + cikvkj, i = 1, . . . ,m

which is the same as

a1j

a2j...

amj

= v1j

c11

c21...

cm1

+ v2j

c12

c22...

cm2

+ · · · + vkj

c1k

c2k...

cmk

, j = 1, . . . , n.

This tells us that the columns of A are linear combinations of the k vectors c.Therefore,

dim col(A) ≤ k = dim row(A).

Applying the ideas above to AT , we can also show that dim row(A) ≤ dim col(A),and therefore conclude that indeed dim col(A) = dim row(A).


�

We next state an important result on the rank of product of matrices.

Theorem 2.44 Let A and B be two arbitrary matrices whose product AB iswell defined. Then,

(a) rank(AB) ≤ rank(A), and rank(AB) ≤ rank(B).

(b) If B is nonsingular, then rank(AB) = rank(A).

If A is nonsingular, then rank(AB) = rank(B).

Proof. (a) If we denote the columns of B by b1, . . . , bn, then we have

AB = A[b1 · · · bn] = [Ab1 · · ·Abn].

Now, recalling Remark 2.39 we observe that the columns of AB : Ab1, . . . , Abn

are linear combinations of the columns of A, and therefore we must have thatrank(AB) ≤ rank(A). The second inequality is proved similarly using rows.

(b) If B is nonsingular, we have

rank(A) = rank(AB B−1) ≤ rank(AB).

Combining this with the first inequality in (a), we conclude that rank(AB) =rank(A). The proof of the last statement is similar.

�

We now define the null space of A, denoted as N(A).

Definition 2.45 The null space of a matrix Am×n is defined as the vectorsubspace of R

n:N(A) = {x ∈ Rn : Ax = 0 }. (2.45)


1 2 32 4 63 6 9

. From the definition, we see that the

null space is the set {[x1 x2 x3]T : x1 + 2x2 + 3x3 = 0}. Geometrically, this

is a plane (hence, N(A) is two-dimensional) passing through (0, 0, 0) and with


normal vector [ 1 2 3 ]T . On the other hand, since the three columns of A arejust multiple of each other, it is clear that the col(A) is one-dimensional and isspanned by the same vector [ 1 2 3 ]T . So, for this particular example, col(A)and N(A) are orthogonal.

Note: It is very interesting to learn that in general, col(A) and N(AT ) areorthogonal. In the example above, the matrix A is symmetric; that is whycol(A) is also orthogonal to N(A). See Theorem 2.48 below and the discussionafter it.

Even more, there are some orthogonality relationships between these subspacesas the following theorem states.

Theorem 2.46 Let A be an arbitrary m × n matrix. Then,

col(A)⊥ = N(AT ), and col(AT )⊥ = N(A). (2.46)

Proof. Let v be any vector in col(A)⊥. For an arbitrary vector x ∈ Rn, we

know that Ax ∈ col(A), then we have:

(Ax)T v = 0 ⇐⇒ xT (AT v) = 0 ⇐⇒ AT v = 0 ⇐⇒ v ∈ N(AT ).

This implies that in fact, col(A)⊥ = N(AT ). The second part of the theorem isproved similarly.

�

In general, an arbitrary matrix Am×n generates the four subspaces: col (A),col (AT ), N(A) and N(AT ). These vector subspaces decompose R

m and Rn in

direct sums, and their dimensions are related by closed formulas.

We can now state the Fundamental Theorem of Linear Algebra:

Theorem 2.47 Let Am×n be an arbitrary matrix. Then,

Rm = col(A) ⊕ N(AT ), R

n = col(AT ) ⊕ N(A). (2.47)

Proof. The vector spaces col(A) and N(AT ) are subspaces of Rm. Similarly,the vector spaces col(AT ) and N(A) are subspaces of Rn. Now, to obtain theresult, simply combine Theorems 2.36 and 2.46.


�

An immediate consequence of Theorems 2.43 and 2.47 is the following

Corollary 2.8.5 If Am×n is a matrix with rank(A) = r, then

dim col(A) = dim col(AT ) = r, dimN(A) = n − r, dim N(AT ) = m − r.(2.48)

At the same time, Corollary 2.8.5 implies the following important result.

Theorem 2.48 (Dimension Formula). Consider an m × n matrix A. Then,

dim col(A) + dim N(A) = n. (2.49)


1 4 72 5 83 6 9

. Observe that we can write the third

column a3 as a combination of the first two a1, a2 as a3 = 2a2 − a1. Andsince these first two columns are linearly independent, this implies that col(A)is 2-dimensional. Geometrically then, col(A) is the plane −x + 2y − z = 0passing through the origin, with normal vector [−1 2 − 1]T . From Theo-rem 2.48 the null space N(A) is one-dimensional. In fact, solving the sys-tem Ax = 0, we observe that all solutions are of the form [x1 x2 x3]

T , withx2 = −2x1, x3 = x1. That is, N(A) is a line spanned by the vector [−1 2 −1]T .Using this vector, we observe that N(A) is orthogonal to every row of A andtherefore N(A) ⊥ col (AT ), in agreement with Theorem 2.46, and finally, be-cause N(A)⊥ = col (AT ), we also have R

3 = col(AT ) ⊕ N(A), as in (2.47).


1 0 1 3 00 1 1 2 01 1 2 5 10 0 0 0 0

. In this case the columns 1, 2 and

5 are linearly independent and they form a basis of col(A). That is, any vectorin col(A) (in particular, the third and fourth columns of A) can be expressed asa unique combination of columns 1, 2 and 5. This means that geometrically,col(A) is a 3-dimensional hyperplane (in R

4) spanned by those three columns,and therefore N(A) is a two-dimensional subspace of R

5.


As stated before, one of the applications of column spaces, row spaces and nullspaces is in the solution of linear systems. We can now put together the linksbetween the solution of Ax = b and the vector spaces col(A) and N(A) in atheorem.

Theorem 2.49 Given an m×n matrix A, consider solving the system Ax = b.Then,

1. (Existence) The system has a solution if and only if b ∈ col(A).

2. (Uniqueness) The system has at most one solution for every b if and onlyif N(A) = {0}.

Proof. Existence: We have already seen that for arbitrary x, the vector Ax ∈col(A). Then, for Ax = b to have a solution, b must lie in the same subspace.

Uniqueness: If N(A) 6= {0}, then besides x = 0, there is another solution toAx = b, with b = 0, which is a contradiction. On the other hand, assumingN(A) = {0}, if there is a b for which Ax = b has more than one solution, that isAx1 = b and Ax2 = b, with x1 6= x2, then A(x1 − x2) = Ax1 −Ax2 = b− b = 0,which means that x1 − x2 = 0, or x1 = x2.

�

Remark 2.50 Observe that combining Theorem 2.48 with the the uniquenesspart of Theorem 2.49 we conclude that the system Ax = b has a unique solutionif and only if dim col(A)=n, which means A has to be a full rank matrix, orwhich is the same, all the columns of A must be linearly independent.

2.8.1 Linear transformations

Given two arbitrary vector spaces V and W we can define a function

T : V −→ W

that for any two vectors u, v ∈ V and any scalar c, it satisfies

(a) T (u + v) = T (u) + T (v),

(b) T (cu) = c T (u).


Such a function T is called a linear transformation from V to W . In the caseV = W , it is called linear operator.

These transformations play an important role in the theory of linear algebra andwe refer the reader to any standard textbook in linear algebra for a full discussionon general linear transformations. For completion, here we just give a fewremarks on how some terminology for a particular class of linear transformationsis closely related to the theory introduced in this chapter.

More precisely, we consider linear transformations

T : Rn −→ Rm. (2.50)

Given a vector x ∈ Rn, the linear transformation T transform such vector x into

a vector y ∈ Rm so that T (x) = y.

The important fact here is that any linear transformation of the form (2.50)has a matrix representation and vice versa, any matrix Am×n corresponds to alinear transformation (2.50).

Example 2.8.8 Define

T (x1, x2, x3) = (2x1 − x2, 3x1 + 4x2 − x3).

We can write this transformation as T (x1, x2, x3) = (y1, y2), where

y1 = 2x1 − x2

y2 = 3x1 + 4x2 − x3

Or, in matrix-vector notation as

[

y1

y2

]

=

[

2 −1 03 4 −1

]

x1

x2

x3

.

That is,y = Ax.

Just as the transformation T , the matrix A takes a 3-dimensional vector x andvia multiplication it transforms it into a 2-dimensional vector y.

The example above illustrates the fact that

To any linear transformation T : Rn → R

m, there corresponds a matrix Am×n


Remark 2.51 The correspondence between a linear transformation and a ma-trix is true not only for the spaces R

n and Rm, but for any vector spaces V of

dimension n and W of dimension m.

Since there is a correspondence between linear transformations and matrices,then the same is true for some definitions in the two cases.

Given a linear transformation

T : V −→ W

we say that the vector space V is the domain and the vector space W is thecodomain of T .

The set

R(T ) = { y ∈ W : y = T (x) }

is called the range or image of T .

The set

Ker(T ) = {x ∈ V : T (x) = 0}

is called the kernel of T .

From these definitions we immediately see that the range of T is exactly thecolumn space of the corresponding matrix A. Similarly, the kernel of T is exactlythe null space of the matrix A.

Linear Transformations Matrices

range column spacekernel null space

Accordingly, the rank of the linear transformation T is defined as the dimensionof its range. The dimension of the kernel is called the nullity.

Some of the definitions and results on matrices we have introduced so far, andsome others presented later, have their equivalent counterpart in the contextof linear transformations. We speak of zero, identity linear transformations,similarity, eigenvalues, etc. For a linear transformation T : V → W , where V

2.9. ORTHOGONAL PROJECTIONS 59

is a vector space of dimension n and W a vector space of dimension m, thedimension formula (2.49) now reads

dim R(T ) + dim Ker(T ) = n. (2.51)

Remark 2.52 The set of all linear operators on Rn, denoted by L(Rn) forms

a normed vector space, with ‖T‖ = max‖x‖=1

‖T (x)‖.

2.9 Orthogonal Projections

Definition 2.53 A projection matrix is a square matrix P such that P 2 = P . Aprojection matrix onto a subspace S is a projection matrix for which range(P ) =S.

If P is a projection matrix onto a subspace S, then the condition P 2 = P saysthat for an arbitrary vector x, since Px is already in S, a new application of P ,that is P (Px), should not move the vector Px at all: P (Px) = Px.

Example 2.9.1 Consider the vector subspace S of R2 to be the line y = x.

Then, a projection matrix onto S is given by

P =

[

0 10 1

]

.

In fact, for an arbitrary vector v = [x y]T , we get Pv = [y y]T , which lies onS. Even more, it is easy to verify that P 2 = P .

Example 2.9.2 Let S be again the subspace of R2 determined by the line y = x,

and let P =

[

1 11 1

]

. Then, for any vector x = [x1 x2]T , the vector Px lies

on S, that is, ran(P ) = S. However, P is not a projection because P 2 6= P . Infact, if we take say x = [0 1]T , then (see Figure 2.7)

Px = [1 1]T but P 2(x) = P (Px) = [2 2]T 6= [1 1]T .


−1 −0.5 0 0.5 1 1.5 2 2.5 3−1

−0.5

0

0.5

1

1.5

2

2.5

3

xPx

P2x

Figure 2.7: Example 2.9.2: P not a projection.

Example 2.9.3 Let S be the subspace of R3 spanned by u1 = [2 1 − 1]T and

u2 = [0 − 1 1]T . Then the matrix

P =1

4

4 0 02 1 1

−2 3 3

is a projection matrix onto S. One can verify that for an arbitrary vector v ∈ R3,

the vector Pv can be written as a linear combination of u1 and u2.

The projection matrices that have found more applications are those that arealso orthogonal.

Definition 2.54 An orthogonal projection matrix is a projection matrix P forwhich P T = P .

Example 2.9.4 The matrix

P =

[

1/2 1/21/2 1/2

]

is an orthogonal projection. It clearly satisfies P 2 = P and P T = P . It is infact an orthogonal projection onto the subspace S determined by the line y = x(see Example 2.9.2).


Remark 2.55 It is important to note that an orthogonal projection matrix isnot necessarily an orthogonal matrix. However, we will use orthogonal matricesto construct orthogonal projection matrices.

Explicit formulas for orthogonal projections. We want to see how toobtain an explicit expression for an orthogonal projection onto a subspace S,starting with the particular case when S = col(A).

Later we will see that for a general m × n matrix A, with m > n, the leastsquares solution to the system Ax = b is given by the solution of the so-callednormal equations AT Ax = AT b. This is nothing else but a consequence ofprojecting b orthogonally onto col(A), as it will be explained in detail in Chapter5. Observe also that from the normal equations, if the columns of A are linearlyindependent, we can write (see Exercise 2.54)

x = (AT A)−1AT b.

The matrixA† = (AT A)−1AT (2.52)

is called the pseudoinverse matrix of A.

Now, since Ax is in col(A), and Ax = A(AT A)−1AT b, then the matrix

P = A (AT A)−1AT (2.53)

has taken the vector b onto col(A), so we suspect such matrix is an orthogonalprojection onto col(A). In fact,

P 2 = A (AT A)−1AT A (AT A)−1AT = A (AT A)−1AT = P,

and since AT A is symmetric,

P T = A(AT A)−T AT = A(AT A)−1AT = P.

Example 2.9.5 Consider the matrix A =

2 1 01 2 10 1 21 2 1

. Then, the matrix

P = A (AT A)−1AT =

1 0 0 00 1/2 0 1/20 0 1 00 1/2 0 1/2


will project an arbitrary vector in R4 orthogonally onto the col(A).

Remark 2.56 1) If the matrix A is square and nonsingular, then its pseu-doinverse coincides with its inverse matrix. In fact, in this case, we have thatA† = (AT A)−1AT = A−1A−T AT = A−1.

2) A† satisfies A†A = I and AA† = P.

• MATLAB command: pinv(A)

The definition (2.53) of the orthogonal projection onto col(A), although theo-retically valid and illustrative, is not useful in practice due especially to the factthat an inverse has to be computed, which is something that must always beavoided, if possible. A more practical, and by far more efficient way to computea projection matrix onto col(A) is through orthogonal matrices.

The idea is to compute an orthonormal basis { q1, . . . , qn} of col(A) and definethe orthogonal matrix Q = [ q1 · · · qn]. Then the projection is

P = Q QT . (2.54)

Even more, this approach is in fact true for a projection matrix onto any vectorsubspace S. That is, the orthogonal projection matrix onto a vector subspaceS can be defined as in (2.54), where Q is a matrix whose columns form anorthonormal basis of S. See Exercise 2.62.

Example 2.9.6 Let S be the subspace spanned by v1 = [1 2 3]T , and v2 =[1 1 1]T . We can orthonormalize {v1, v2} say, by using Gram-Schmidt (2.40)to obtain the vectors q1 = [1 2 3]T /

√14, q2 = [4 1 − 2]T /

√21. Then, we

define the matrix Q with q1 and q2 as first column and second column respec-tively, so that the projection matrix onto S is given by

P = Q QT =1

6

5 2 −12 2 2−1 2 5

,

where Q = [ q1 q2]. Thus, given any vector x ∈ R3, the vector P x will be inthe subspace S.


Example 2.9.7 Consider the matrix A of Example 2.9.5. The vectors

v1 = [1 0 0 0]T , v2 = [0 1 0 1]T , v3 = [0 0 1 0]T

form a basis of col(A). They can be orthonormalized using Gram-Schmidt to get

q1 = [1 0 0 0]T , q2 = [0 1/√

2 0 1/√

2]T , q3 = [0 0 1 0]T

Now define the matrix Q = [q1 q2 q3]. Then, the projection matrix ontocol(A) is

P = Q QT =

1 0 0 00 1/2 0 1/20 0 1 00 1/2 0 1/2

,

which is exactly what we had obtained before by using P = A (AT A)−1AT .

Given an orthogonal projection P onto a subspace S, there is an obvious wayof determining the orthogonal projection onto S⊥.

Theorem 2.57 If P is the orthogonal projection onto a subspace S, then I −Pis the orthogonal projection onto S⊥.

Through these two orthogonal projection matrices, an arbitrary vector v ∈ Rn

gets effectively decomposed into its orthogonal components. In other words,

x = Px + (I − P )x.

The orthogonality property can also be observed by multiplying both matrices:

P (I − P ) = P − P 2 = P − P = 0, and(I − P )P = P − P 2 = P − P = 0.

(2.55)

We will find this property very useful when we study dynamical systems inChapter 8.

A projection matrix onto a subspace S is in general not unique, but there isonly one orthogonal projection.

Theorem 2.58 Let V be a vector space and let S be a vector subspace of V .Then, the orthogonal projection matrix P onto S is unique.


Proof. Assume there are two orthogonal projections P1 and P2 onto the sub-space S. Then, for any x ∈ V ,

‖(P1 − P2)x‖2 = ((P1x)T − (P2x)T ) (P1x − P2x)= xT P1 x − (P1x)T P2x − (P2x)T P1 x + xT P2x= (P1x)T (I − P2)x + (P2x)T (I − P1)x.

But P1x, P2x ∈ S and (I −P1), (I −P2) ∈ S⊥, so that the last expression aboveis zero and therefore P1 = P2.

�

2.10 Eigenvalues and Eigenvectors

Theorem 2.49 gives conditions for the existence and uniqueness of solutions oflinear systems of equations of the form

Ax = b, (2.56)

where A is a general rectangular matrix. In particular, (2.56) has a solutionif b is in col(A). For the special case when A is square, which is the case weare interested in, we ask instead for conditions under which the system has aunique solution. The answer can be given in terms of the singularity of thesquare matrix A.

According to Theorem 2.49, a system with a square coefficient matrix A of ordern has a unique solution if dimN(A) = 0, or equivalently, dim col(A) = n, thatis, all the columns of A must be linearly independent, and this is equivalent tosay that A is nonsingular. More precisely,

Theorem 2.59 Let A be a square matrix of order n. The system Ax = b hasa unique solution if and only if the matrix A is nonsingular.

Remark 2.60 In particular, this theorem implies that if A is nonsingular, thenthe system Ax = 0 has the unique solution x = 0.

2.10. EIGENVALUES AND EIGENVECTORS 65

Two essential concepts associated with every square matrix are eigenvalues andeigenvectors; they contain important information about the matrix, some ofits associated subspaces, and about the structure of problems and phenomenawhose modeling contain such a matrix. There are numerous applications ofeigenvalues and eigenvectors, within and outside mathematics, e.g. differentialequations, control theory, Markov chains, web page ranking, image compression,etc. We will study in detail some of these applications later on.

Definition 2.61 Given a square matrix A of order n, we say that λ is aneigenvalue of A with associated eigenvector v 6= 0 if

Av = λv. (2.57)

In general, eigenvalues are complex numbers, and eigenvectors are complex (n -dimensional) vectors, even though A is a real matrix.

One first and immediate geometric observation is that given an eigenvector v,equation (2.57) says that the vector Av is just a multiple of v, with larger orsmaller magnitude. In fact, the line containing v is a one-dimensional vectorspace and Av lands on that same vector space. This is a simple instance ofinvariance, and we say that the space spanned by v is invariant with respectto A. We will return to this concept when we study matrix factorizations inChapter 4 and dynamical systems in Chapter 8.

Observe that equation (2.57) can be written as

(A − λI)v = 0. (2.58)

Since we are looking for eigenvectors v 6= 0, we need the matrix (A − λI) to besingular (see Remark 2.60). In other words, we need to require that

det(A − λI) = 0. (2.59)

Equation (2.59) is known as characteristic equation. The left hand side is apolynomial on λ of degree n, and the solutions to this equation are the eigen-values of A. On the other hand, for each given eigenvalue λ, the system (2.58)is used to find a corresponding eigenvector.

• MATLAB command: eig(A)


Remark 2.62 Eigenvalues are uniquely determined as solutions of the charac-teristic equation (2.59), but eigenvectors are not unique, as they are solutionsof the singular system (2.58).


[

1 −20 3

]

. Then, the characteristic equation is

det

[

1 − λ −20 3 − λ

]

= (3 − λ)(1 − λ) = 0.

Then, the eigenvalues of A are λ1 = 3 and λ2 = 1. To find the correspondingeigenvectors we use equation (2.58) for each eigenvalue. First let us consider λ=3. Then, equation (2.58) gives

[

1 − 3 −20 3 − 3

] [

v1

v2

]

=

[

−2 −20 0

] [

v1

v2

]

=

[

00

]

,

from which v2 = −v1. Thus, any nonzero vector of the form v = [v1 − v1]T

is an eigenvector of A corresponding to λ = 3. In particular, we may takev = [1 − 1]T . In a similar way, any eigenvector associated with λ = 1 is of theform [v1 0]T , so in particular we may take v = [1 0]T .

Remark 2.63 In general, finding the eigenvalues of upper or lower triangularmatrices like the one in Example 2.10.1 is immediate. The eigenvalues arenothing else but the diagonal entries.

Example 2.10.2 There is a particular matrix that appears very often, and itseigenvalues can be seen directly from the matrix itself:

If A =

[

a b−b a

]

, then λ1,2 = a ± i b. (2.60)


−2 −1 01 −2 00 0 3

. Then, the characteristic equation

is

det

−2 − λ −1 01 −2 − λ 00 0 3 − λ

= λ3 + λ2 − 7λ − 15 = 0.


The solution of this characteristic equation gives the eigenvalues of A: λ1,2 =−2 ± i and λ3 = 3. To find the corresponding eigenvectors, let us consider firstλ1 = −2 + i. In this case, equation (2.58) gives

−2 − (−2 + i) −1 01 −2 − (−2 + i) 00 0 3 − (−2 + i)

v1

v2

v3

=

−i −1 01 −i 00 0 5 − i

v1

v2

v3

=

000

.

From the first or second row, we get that v1 = i v2, and from the third row weget v3 = 0. Then, any eigenvector associated to λ1 is of the form [ i v2 v2 0]T .Similarly, any eigenvector associated to λ2 is for the form [−i v2 v2 0]T . Fi-nally, it is easy to see that any eigenvector associated to λ3 = 3 is of the form[0 0 v3]

T , and in particular we can take w = [0 0 1]T .

In some applications, like in the solutions of differential equations, it is necessaryto extract real solutions by using eigenvectors (even if they are complex). In thiscase, using for instance the complex eigenvectors associated with λ1,2 = −2 ± iwe can write

u ± i v : =

010

± i

100

.

In this way, we can obtain two (linearly independent) real vectors u and v thatspan a 2-dimensional subspace.

Remark 2.64 Using the fact that the determinant of any matrix is the same asthat of its transpose, one can easily show that any matrix An×n and its transposeAT have the same eigenvalues. However, their eigenvectors are not necessarilythe same.

The matrix A in Example 2.10.3 has a particular form. Observe that we canrepresent A as

A =

−2 −1 01 −2 0

0 0 3

=

[

A1 00 A2

]

.


Then, finding the eigenvalues of A reduces itself to finding the eigenvalues of A1

and A2 separately. In this case, according to (2.60), the eigenvalues of A1 areλ1,2 = −2 ± i, and the eigenvalue of A2 is λ3 = 3.

This is actually a general result in linear algebra to compute the eigenvalues ofblock diagonal matrices : Let us denote with σ(A) the set of eigenvalues of A(this set is known as the spectrum of A). Then, we have the following

Theorem 2.65 Let A =

A1

A2

. . .

Ak

. Then,

σ(A) = σ(A1) ∪ σ(A2) ∪ · · · ∪ σ(Ak).

A similar result applies if the matrix is block upper triangular. We have thefollowing

Theorem 2.66 Consider the block upper triangular matrix

A =

[

B C0 D

]

,

where B and D are square matrices of order say p and q respectively, for somepositive integers p, q. Then, σ(A) = σ(B) ∪ σ(D), counting multiplicities.

Proof. Let λ be an eigenvalue of A with eigenvector v = [v1 v2]T , where v1 ∈ R

p

and v2 ∈ Rq. Then,

Av =

[

B C0 D

] [

v1

v2

]

= λ

[

v1

v2

]

.

We either have v2 = 0 or v2 6= 0. If v2 = 0, then v1 6= 0 and Bv1 = λv1, so thatλ is an eigenvalue of B. If v2 6= 0, then Dv2 = λv2 so that λ is an eigenvalue ofD. This proves that σ(A) ⊆ σ(B) ∪ σ(D). In addition, the two sets σ(A) andσ(B) ∪ σ(D) have the same cardinality. Therefore, they must be equal.

�


Remark. The result in Theorem 2.66 is also true if A is block lower triangular.

Example 2.10.4 The eigenvalues of A =

[

8 −22 4

]

are λ = 6, 6 and the eigen-

values of A =

5 3 0−3 5 0

1 −2 7

are λ = 7, 5 ± 3 i. Therefore, the eigenvalues

of

A =

8 −2 3 5 82 4 2 −1 4

0 0 5 3 00 0 −3 5 00 0 1 −2 7

are λ = 6, 6, 7, 5 ± 3 i.

We have seen that in general the eigenvalues of a matrix A can be real orcomplex. However, if the matrix is symmetric, then its eigenvalues are real.Even more, its eigenvectors are orthogonal! The proof of the following theoremis left as an exercise.

Theorem 2.67 Let An×n be a symmetric matrix. Then all its eigenvalues arereal, and eigenvectors corresponding to different eigenvalues are orthogonal.


3 −1 4−1 0 1

4 1 3

. Using the characteristic equation

we find that the eigenvalues are λ1 = −2, λ2 = 1, and λ3 = 7. Let us computethe eigenvectors. For λ1 = −2, letting v = [x y z]T , equation (2.58) gives

5x − y + 4z = 0, −x + 2y + z = 0, 4x + y + 5z = 0.

From the second equation, z = x − 2y. Substituting this into the third equationwe get x = y. Then, any eigenvector associated to λ1 = −2 is of the formu = [x x − x]T .

For λ2 = 1, equation (2.58) gives

2x − y + 4z = 0, −x − y + z = 0, 4x + y + 2z = 0.


From the second equation, z = x + y. Substituting this into the first equation,we get y = −2x. Then, any eigenvector associated to λ2 = 1 is of the formv = [x − 2x − x]T .

For λ3 = 7, equation (2.58) gives

−4x − y + 4z = 0, −x − 7y + z = 0, 4x + y − 4z = 0.

From the second equation, z = x + 7y. Substituting this into the third equation,we get y = 0. Then, any eigenvector associated to λ3 = 7 is of the formw = [x 0 x]T . Now, it is simple to verify that in fact

uT v = 0, uT w = 0, vT w = 0,

which means that the eigenvectors are mutually orthogonal.

Note: In Example 2.10.5 a general fact is illustrated: since by definition,det(A − λI)=0, in each set of three equations to determine the eigenvectors,each of the individual equations can be written in terms of the other two; thatis why the solution is not unique (eigenvectors are not unique). In other words,in each of the sets of three equations, one is redundant and we can solve eachsystem by using only two of its equations.

The next question is whether repeated eigenvalues of a general matrix can givelinearly independent eigenvectors.


9 4 0−6 −1 0

6 4 3

. By solving the corresponding char-

acteristic equation, the eigenvalues are λ1 = 5, λ2 = 3, λ3 = 3. Thus, we haverepeated eigenvalues. The eigenvector corresponding to λ1 is w1 = [1 − 1 1]T .To find the eigenvector(s) corresponding to λ2 = λ3, equation (2.58) with v =[x y z]T gives

6x + 4y = 0, −6x − 4y = 0, 6x + 4y = 0.

(The three equations are the same). From the first equation we get x = −23y.

The third entry z is free. Then, any eigenvector associated to λ2 = λ3 = 3has the general form [−2

3y y z]T . We have two free variables, and in par-ticular, by taking y = 0, z = 1 and then y = −3, z = 0 respectively, we getw2 = [0 0 1]T , w3 = [2 − 3 0]T . Thus, even though there are repeatedeigenvalues, we are still able to find a complete set of three linearly independenteigenvectors.


There is something very important to observe in Example 2.10.6. The eigen-vectors w1, w2 and w3 not only are different but they are linearly independent.This is something that becomes very important in several applications, espe-cially in the solution of differential equations. We will study this in more detailin Chapter 7.

In the examples above, we have seen that it was possible to find the same numberof linearly independent eigenvectors and eigenvalues, even when the eigenvaluesare repeated. However this is not always the case. There are cases when repeatedeigenvalues give a smaller number of linearly independent eigenvectors, so thatit is not possible to form a complete set of linearly independent eigenvectors.In this case, the repeated eigenvalue is called defective, and we need to gen-eralize the concept of eigenvector, so that we complete our set of (generalized)eigenvectors.

More formally, if an eigenvalue λ is repeated k times, we say λ has algebraicmultiplicity k; e.g. in Example 2.10.6 above, λ = 3 has algebraic multiplicitytwo. The number of linearly independent eigenvectors associated with an eigen-value λ is called the geometric multiplicity of λ; e.g. in Example 2.10.6, thegeometric multiplicity of λ = 3 is two, so in this particular case, algebraic andgeometric multiplicity coincide. In general we have (see Exercise 4.60)

geometric multiplicity ≤ algebraic multiplicity

Thus, an eigenvalue is called defective if the above inequality is strict.

An eigenvalue with algebraic multiplicity 1 is called a simple eigenvalue.

For each eigenvalue λ of a matrix An×n, we can define a special set: the set ofall the eigenvectors associated to it. If we add to this set the zero vector, thenit becomes a subspace of R

n.

Definition 2.68 Let A be a square matrix of order n, and λ an eigenvalue ofA. The set

E = {v ∈ Rn : Av = λv}

is called the eigenspace of A with respect to λ.

Several observations are in order.


1. We are considering all solutions of the equation Av = λv. Thus all eigenvec-tors of A associated to λ and also the zero vector are the elements of E.

2. The geometric multiplicity of λ is the dimension of its eigenspace E.

3. The linearly independent eigenvectors of A associated to λ form a basis of E.

4. The eigenspace of A can be defined as the nullspace of A − λI.

When the geometric multiplicity of an eigenvector is strictly smaller than itsalgebraic multiplicity, that is, when the number of linearly independent eigen-vectors is smaller that the number of times the corresponding eigenvalue isrepeated, then the concept of an eigenvector needs to be generalized.

Definition 2.69 (Generalized Eigenvector) If λ is an eigenvalue of a matrix A,then a rank-r generalized eigenvector associated with λ is a vector v such that

(A − λI)rv = 0, and (A − λI)r−1v 6= 0. (2.61)

Observe that a rank-1 generalized eigenvector is an ordinary eigenvector. Thatis,

(A − λI)v = 0, and v 6= 0.


2 1 00 2 10 0 3

. Then, according to Remark 2.63, the

eigenvalues are λ1,2 = 2, λ3 = 3. For λ = 2, equation (2.58) with v =[x y z]T gives y = 0, z = 0, and x is a free variable. This means thatany eigenvector associated to λ1,2 = 2 is of the form [x 0 0]T . Thus, we canget only one (linearly independent) eigenvector. We need to find a generalizedeigenvector associated to λ = 2. Following (2.61), the equation

(A − λI)2v =

0 0 10 0 10 0 1

xyz

=

000

gives z = 0. Thus, any rank-2 generalized eigenvector of A associated to λ1 =λ2 = 2 has the form [x y 0]T , with two free variables. In particular, we cantake v1 = [1 0 0]T and v2 = [0 1 0]T . Observe that v1 is the eigenvector


that was determined before, and v2 is a generalized eigenvector. Even more, wecan verify that in this case

(A − λI)v2 = v1 6= 0,

thus satisfying the definition in (2.61). Finally, we can readily find out that anyeigenvector associated to λ3 = 3 is of the form [0 0 z ], and in particular wecan take v3 = [0 0 1].

We will see more about generalized eigenvectors in Chapter 7.

Left eigenvectors. It is possible to define eigenvectors when they are actuallyrow vectors. Even though the eigenvalues of A would be the same, these neweigenvectors are in general different to the (right) eigenvectors introduced before.Suppose λ is an eigenvalue of A with associated eigenvector x, so that Ax =λx. From Remark 2.64 we know that λ is also an eigenvalue of AT , for someeigenvector v, that is,

AT v = λv.

Applying the transpose operation to both sides of this equation, we obtain thefollowing

Definition 2.70 We say a nonzero vector v ∈ Rn is a left eigenvector of a

matrix An×n associated to an eigenvalue λ if

vT A = λvT . (2.62)

In other words, a left eigenvector of A associated to an eigenvalue λ is a (right)eigenvector of AT corresponding to the same eigenvalue λ.


[

5 40 3

]

. For the eigenvalue λ = 3, any associated

eigenvector x has the form x =

[

x1

−12x1

]

, e.g. x =

[

2−1

]

. However, for

the same eigenvalue λ = 3, any left eigenvector of A has the form v =

[

0v2

]

,

e.g. v =

[

05

]

Thus, we have

[

5 40 3

] [

2−1

]

= 3

[

2−1

]

, and


[0 5]

[

5 40 3

]

= 3 [0 5].

2.11 Similarity

We have seen that several sets of vectors can be chosen as a basis of a given vectorspace, and that arbitrary vectors in that space get different representationsdepending on the basis being used. In a similar way, given an arbitrary matrixA, we would like to find a simpler matrix B that is related or is equivalentto A in the sense that it shares some of its properties (e.g. same eigenvalues).Although ideally we would like B to be diagonal (the simplest possible), for mostapplications it suffices if B is triangular, block diagonal or block triangular.

We start this topic with the following

Definition 2.71 Two square matrices A and B are said to be similar if thereexists a nonsingular matrix P such that

B = P−1AP. (2.63)

Note: Equation (2.63) is sometimes written as B = P AP−1.

The following theorem states one of the most important applications of similar-ity.

Theorem 2.72 Similar matrices have the same eigenvalues, counting multi-plicities.

Proof. The idea is to show that characteristic polynomials of the similar ma-trices are equal, for if that is true, then they will have the same eigenvalues,counting multiplicities. Let A and B be similar. Then,

det (B − λI) = det (P−1AP − λI) = det (P−1AP − P−1λI P )= det (P−1(A − λI)P ) = det(P−1) det(A − λI) detP= 1

det(P ) det(A − λI) det(P )

= det (A − λI).

2.11. SIMILARITY 75

�

Example 2.11.1 The matrix A =

[

1 2−1 3

]

is similar to B =

[

7 13−2 −3

]

. In

fact, you can verify that this is true by using the matrix P =

[

2 51 3

]

. You can

also verify that both A and B have the same eigenvalues λ = 2± i.

Ideally, we would like matrices to be similar to diagonal matrices, the simplestmatrices possible. If this is true, the matrix has a special name.

Definition 2.73 A matrix An×n is called diagonalizable if it is similar to adiagonal matrix.

Given a matrix An×n, how can we find the matrix P that performs the similaritytransformation (2.63)? One answer to this is the eigenvectors. If A has n linearlyindependent eigenvectors, then the columns of the matrix P can be defined asthe eigenvectors of A. Such a matrix P is nonsingular because its columnsare linearly independent. The existence of such matrix P guarantees that A isdiagonalizable. The following results come in handy.

Theorem 2.74 Let A be a square matrix of order n. Then,

A has distinct evalues =⇒ A has n l.i. evectors ⇐⇒ A is diagonalizable

Proof. To prove the first part of the theorem, assume by contradiction thatA has n distinct eigenvalues λ1, . . . , λn but that the corresponding set of eigen-vectors V = {v1, . . . , vn} is not linearly independent. For some m < n, let{v1, . . . , vm} ⊂ V be the set with the maximum number of linearly independenteigenvectors of A. Then,

vm+1 = c1v1 + · · · + cmvm,

for some scalars c1, . . . , cm. Using the general equation (A − λI)v = 0 from(2.58) we have

0 = (A − λm+1I)vm+1 = (A − λm+1I)(c1v1 + · · · + cmvm)= (c1λ1v1 + · · · + cmλmvm) − (c1λm+1v1 + · · · + cmλm+1vm)= c1(λ1 − λm+1)v1 + · · · + cm(λm − λm+1)vm.


The linear independence of v1, . . . , vm implies that

c1(λ1 − λm+1) = · · · = cm(λm − λm+1) = 0,

and since the eigenvalues are distinct, we get c1 = · · · = cm = 0. But thenvm+1 = 0, which is not possible because vm+1 is an eigenvector.

The proof of the second part is left as an exercise.

�

Example 2.11.2 Consider the matrix A =

1 2 16 −1 0

−1 −2 −1

. The eigenval-

ues are λ1 = −4, λ2 = 3, λ3 = 0. The eigenvectors are v1 = [1 − 2 − 1]T , v2 =[2 3 − 2]T , and v3 = [1 6 − 13]T respectively, and they are linearly indepen-dent. Define

P = [v1 v2 v3] =

1 2 1−2 3 6−1 −2 −13

. Then,

P−1AP =

−4 0 00 3 00 0 0

.

Thus, the matrix A is diagonalizable, and the diagonal matrix has the eigenvaluesof A on its diagonal entries.

Invariant properties under similarity. We have seen that two similarmatrices have the same eigenvalues. But not only eigenvalues are invariantunder a similarity transformation. The following properties are also invariant(the proof of each is left as exercise to the reader):

rank, determinant, trace, invertibility.

This means e.g. that if A and B are similar matrices and A has rank k then sodoes B and if A is invertible, so is B.

We finish this section by mentioning some more properties and applications ofthe eigenvalues of a matrix A.

2.12. BEZIER CURVES AND POSTSCRIPT FONTS 77

1. The determinant of a matrix A can be defined as the product of its eigen-values. That is, if λ1, . . . , λn are the eigenvalues of A, then

det A = λ1 · λ2 · · · λn. (2.64)

2. From above, we see that A having one zero eigenvalue is necessary andsufficient to have a zero determinant. That is,

det A = 0 ⇐⇒ λi = 0, for some i = 1, . . . , n. (2.65)

3. The matrix norm ‖ · ‖2 given by (2.9) with p = 2, can also be definedthrough eigenvalues. More precisely, we compute the maximum eigenvalueof the symmetric matrix AT A. That is,

‖A‖2 =√

max λ (AT A). (2.66)

2.12 Bezier Curves and Postscript Fonts

As we mentioned in Section 2.6, one powerful and useful concept in vectorspaces is that of a basis. Given a vector space V of dimension n and a basisB = {v1, . . . , vn} of V , then any element x ∈ V can be uniquely expressed as

x = c1v1 + · · · + cnvn, (2.67)

for some scalars c1, . . . , cn. We say that x is written as a linear combination ofthe elements of the basis B.

We want to see how this central concept in linear algebra can be used in a real-world application. Let V be the vector space of all real polynomials of degreeat most n. This space has dimension n + 1, and one basis is given by the set

{1, t, t2, . . . , tn}.

Several other bases can be formed for this vector space, but here we considerone that is very useful in several practical applications. In particular, it hasproved to be essential in the construction of special curves, such those neededin defining letter fonts, the ones we use everyday in text editing. The goal is to


construct the so-called Bezier curves, which are currently used in computer fontrendering technologies such as Adobe PDF and Postscript.

The special basis we are interested in can be formed using the so-called Bernsteinpolynomials of degree n, which can be defined recursively as

B0,0(t) = 1,Bi,n(t) = (1 − t)Bi,n−1(t) + t Bi−1,n−1(t),

where we take Bi,n(t) = 0, for i < 0 or i > n.

For example, the Bernstein polynomials of degrees 1, 2 and 3 are respectively

B0,1(t) = 1 − t, B1,1(t) = t;B0,2(t) = (1 − t)2, B1,2(t) = 2t(1 − t), B2,2(t) = t2;B0,3(t) = (1 − t)3, B1,3(t) = 3t(1 − t)2, B2,3(t) = 3t2(1 − t), B3,3(t) = t3.

(2.68)

A much nicer way of obtaining the polynomials in (2.68) is just expanding 1.For instance, for the Bernstein cubic polynomials:

13 = [(1 − t) + t]3 = (1 − t)3 + 3t(1 − t)2 + 3t2(1 − t) + t3.

Observe that each term in this expansion is one of the cubic Bernstein polyno-mials Bi,3, i = 0, . . . , 3. In general, we can write explicitly:

Bi,n(t) =

(

ni

)

ti(1 − t)n−i, i = 0, . . . , n.

Here we are mostly interested in cubic Bernstein polynomials; these are shownin Figure 2.8. They form a basis of the space of polynomials of degree at mostthree. Again, the most important fact is that any polynomial of degree atmost three can be expressed as a unique linear combination of those four cubicBernstein polynomials.

As an example, let P3(t) = −5t3 + 15t2 − 9t + 2. Then,

P3(t) = 2B0,3(t) − B1,3 + B2,3 + 3B3,3.

Now let us see the usefulness of these special polynomials. Assume you are givenfour points

Pi = (xi, yi), i = 0, . . . , 3,


0 0.5 10

0.2

0.4

0.6

0.8

1

B0,3

(t)

0 0.5 10

0.2

0.4

0.6

0.8

1

B1,3

(t)

0 0.5 10

0.2

0.4

0.6

0.8

1

B2,3

(t)

0 0.5 10

0.2

0.4

0.6

0.8

1

B3,3

(t)

Figure 2.8: The cubic Bernstein polynomials.

henceforth called control points, and that you are looking for a polynomial ofdegree at most three that passes through the first and last point, and whosegraph roughly follows the shape of the polygonal path determined by the fourpoints.

This is a problem quite different from that of interpolation, where the curve mustpass through all the given points (a general Bezier curve always passes throughthe first and the last point, but not necessarily through the other points, whichare mostly used to control the shape of the curve). The problem here has otherapplication in mind: obtain an arbitrary plane curve and be able to modify itsshape by redefining (moving) a few of the control points. This Bezier curve isto be defined as a polynomial of degree at most three, and following (2.67), itwill be expressed as a combination of basis functions

C(t) =

3∑

i=0

PiBi,3(t), t ∈ [0, 1], (2.69)

where Bi,3 are the cubic Bernstein polynomials, and Pi are the given controlpoints.


For general linear combinations of the type (2.67), the main problem is to findthe coefficients ci, but in this present case the coefficients Pi in (2.69) are thecontrol points, which are always given.

An important fact is that we can express the Bezier curve (2.69) parametricallyas

C(t) = (x(t), y(t)),

where

x(t) =

3∑

i=0

xiBi,3(t), y(t) =

3∑

i=0

yiBi,3(t), t ∈ [0, 1], (2.70)

with Pi = (xi, yi). Thus, a Bezier curve is a differentiable curve in the xy plane.

This is a very useful and illustrative application of the concept of basis of avector space. Given four points Pi, i = 0, 1, 2, 3 , we construct a smooth curvein parametric form (2.70), where for i = 0, 1, 2, 3, xi, yi are the given coordinatesof the points and Bi,3 are the cubic Bernstein polynomials in (2.68), which forma basis for the space of polynomials of degree ≤ 3.

2.12.1 Properties of Bezier curves

First and last point. An important property of the uniquely determined curveC(t) is that it will pass through the first and last point. In fact, using (2.70),we have for the first coordinate of the curve at t = 0 and t = 1:

x(0) = x0B0,3(0) + x1B1,3(0) + x2B2,3(0) + x3B3,3(0)

= x0 + 0 + 0 + 0 = x0, and

x(1) = x0B0,3(1) + x1B1,3(1) + x2B2,3(1) + x3B3,3(1)

= 0 + 0 + 0 + x3 = x3.

A similar result applies for the second coordinate y(t) of C(t) at t = 0 and t = 1.

Note: The fact that the curve passes through the first and the last points istrue in general for Bernstein polynomials of arbitrary degree n, not only cubic.


2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.50

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

P0

P1

P2

P3

Figure 2.9: Bezier curve of Example 2.12.1.

Example 2.12.1 As an illustration, let us find the Bezier curve for the controlpoints P0 = (4, 1), P1 = (3, 2), P2 = (5, 5), and P3 = (7, 3).

Substituting the first and second coordinates of the given control points into(2.70) we have

x(t) = 4B0,3(t) + 3B1,3(t) + 5B2,3(t) + 7B3,3(t)= 4(1 − t)3 + 9t(1 − t)2 + 15t2(1 − t) + 7t3

= 4 − 3t + 9t2 − 3t3.

Similarly

y(t) = 1B0,3(t) + 2B1,3(t) + 5B2,3(t) + 3B3,3(t)= 1(1 − t)3 + 6t(1 − t)2 + 15t2(1 − t) + 3t3+= 1 + 3t + 6t2 − 7t3.

Thus,

C(t) = (4 − 3t + 9t2 − 3t3, 1 + 3t + 6t2 − 7t3), t ∈ [0, 1].

The curve and the control points are shown in Figure 2.9. Observe how thecurve somehow follows the polygonal path determined by the control points.

Tangency property. The main reason why a Bezier curve roughly follows theshape determined by the control points and the segments joining them is that


the slope of the curve at P0 is the same as that of the segment joining P0 and P1.Similarly, the slope of the curve at P3 coincides with the slope of the segmentjoining P2 and P3. This is apparent in Figure 2.9.

In fact, with P0 = (x0, y0), and P1 = (x1, y1), the slope of the segment joiningthese two points is

m =y1 − y0

x1 − x0.

Now let us find the slope of the Bezier curve at P0. In parametric form, theBezier curve is

x(t) = (1 − t)3x0 + 3t(1 − t)2x1 + 3t2(1 − t)x2 + t3x3

y(t) = (1 − t)3y0 + 3t(1 − t)2y1 + 3t2(1 − t)y2 + t3y3.(2.71)

The corresponding derivatives are

dx

dt= −3(1 − t)2x0 + 3(1 − t)2x1 − 6t(1 − t)x1 + 6t(1 − t)x2 − 3t2x2 + 3t2x3,

dy

dt= −3(1 − t)2y0 + 3(1 − t)2y1 − 6t(1 − t)y1 + 6t(1 − t)y2 − 3t2y2 + 3t2y3

Then, the slope of the curve at P0 is

dy

dx

∣

∣

∣

∣

t=0

=dy/dt

dx/dt

∣

∣

∣

∣

t=0

=−3y0 + 3y1

−3x0 + 3x1=

y1 − y0

x1 − x0= m.

Similarly, it can be proved that the slope of the Bezier curve at P3 coincideswith the slope of the segment joining P2 and P3.

Bezier curve and convex hull. The effectiveness and usefulness of Beziercurves lies in the great ease with which the shape of the curve can be modified,say by means of a mouse, by making adjustments to the control points. Onegood advantage is that the changes in the shape of the curve will be somewhatlocalized. The only term(s) in (2.71) that are modified are the ones involvingthe point(s) being moved.

Example 2.12.2 Suppose that we change (or move) the control points P1 andP2 of Example 2.12.1 to P1 = (3.7, 3) and P2 = (6.5, 4.5). Then, the new Beziercurve will be

C(t) = (4 − 3t + 9t2 − 3t3, 1 + 3t + 6t2 − 7t3), t ∈ [0, 1].


3 3.5 4 4.5 5 5.5 6 6.5 7 7.50

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

P0

P1

P2

P3

Figure 2.10: Modified Bezier curve.

In Figure 2.10 we show how by moving these two control points, the shape of thecurve has been accordingly modified, the new curve being pulled towards the newcontrol points.

The fact that the Bezier curve follows the shape of the polygonal path deter-mined by the control points and stays within the quadrilateral is more than justa geometrical fact or coincidence. There is a concrete linear algebra conceptbehind this.

Definition 2.75 The convex hull of a set S of n points: S = {P0, . . . , Pn−1}is the smallest convex polygon that contains all the points in S.

This definition is saying that any given point in the convex hull of S is eitherin the boundary of the polygon or in its interior. Intuitively, we can think ofeach point in S as a nail on a board; then, the convex hull is the shape formedby a tight rubber band that surrounds all the nails. Two clear examples aregiven in Figures 2.9 and 2.10, where the convex hull of the control points isthe quadrilateral given by the lines and its interior. The relationship betweenconvex hulls and Bezier curves, which is apparent in both Figures 2.9 and 2.10is that

A Bezier curve always lies in the convex hull of its set of control points.


2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.50.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

c

b

a

M

P1

P2

P0

P3d

e

Figure 2.11: Midpoint property in Bezier curves.

Midpoint property. We already know that P0 = (x(0), y(0)) and that P3 =(x(1), y(1)). All other points of the Bezier curve are (x(t), y(t)), for somet ∈ (0, 1). One interesting question is whether we can characterize the pointon the curve for which t = 1

2 .

If we substitute t = 12 in the parametric equations (2.71) we obtain

x(12) = 1

8x0 + 38x1 + 3

8x2 + 18x3

y(12) = 1

8y0 + 38y1 + 3

8y2 + 18y3.

These are the coordinates of the point M on the curve. However, the inter-esting thing is that this same point can be obtained in a different way, thatgeometrically is more intuitive (see Figure 2.11):

The point a is the midpoint between P0 and P1. The point b is the midpointbetween P1 and P2, and the point c is the midpoint between P2 and P3. Similarly,the point d is the midpoint between a and b, and the point e is the midpointbetween b and c. Finally, the sought point M = (x(1

2 ), y(12) ) is the midpoint

between d and e. We leave the verification of the statements above as an exercise.

Even more, it can be shown that the part of the Bezier curve that goes fromP0 to M can be defined using the control points, P0, a, d and M . Similarly, the


part of the Bezier curve that goes from M to P3 can be defined using the controlpoints M,e, c, and P3.

Bezier curve and center of mass. An interesting way of introducing Beziercurves is through the concept of center of mass of a set of point masses. Supposewe have four masses m0,m1,m2 and m3 which are located at points P0, P1, P2

and P3 respectively. Then, the center of mass of these four point masses is

P =m0P0 + m1P1 + m2P2 + m3P3

m0 + m1 + m2 + m3.

Assume also that the masses are not constant but that they vary as a functionof a parameter t according to the equations

m0 = (1 − t)3, m1 = 3t(1 − t)2, m2 = 3t2(1 − t), m3 = t3,

for t ∈ [0, 1].

Since for any value of t we always have that m0(t)+m1(t)+m2(t)+m3(t) = 1,the center of mass reduces to

P = m0P0 + m1P1 + m2P2 + m3P3.

Observe that for t = 0 the center of mass P is at P0 and for t = 1 it is locatedat P3. As t varies between 0 and 1, the center of mass P moves describing acurve, a cubic Bezier curve. The masses described above are the cubic Bernsteinpolynomials.

2.12.2 Composite Bezier curves

In practical applications, curves will be more sophisticated than a single Beziercurve, but these sophisticated curves can be produced by using a sequence ofBezier curves that share common end points and then are patched togetherensuring continuity of the final curve (but not necessarily differentiability). Thefinal curve obtained is a composite Bezier curve, also known as Bezier spline.

Since a general Bezier curve always lies in the convex hull of its control points,oscillatory behavior will not be present. Also, changes in the curve will meanmaking just local changes of some control points, minimizing in this way thenumber of total modifications.


One more observation is in order. When performing interpolation, say withcubic splines, the resulting curve is smooth, with continuous first and secondderivatives, so that sharp corners are out of the question. Composite Beziercurves are more flexible: a sharp corner will be well defined, since only continuityis required (in some applications, sharp cornes will be needed, such as in creatingpostscript fonts), and if we need to have smoothness at the points where twoBezier curves meet, it is sufficient to require (see Exercise 2.95) that the threecontrol points (the point where they meet, the one before and the one after) arecollinear. We illustrate these ideas with the following

Example 2.12.3 Let us find the composite Bezier curve for the following setsof control points

{(−5.0, 0.5), (−12, 1.5), (−12, 4.5), (−5.0, 5.0)},{(−5.0, 5.0), (−6.0, 3.0), (−2.0, 2.5), (−1.5, 5.0)},{(−1.5, 5.0), (−1.5, 5.0), (−1.0, 4.4), (−1.0, 4.4)},{(−1.0, 4.4), (−0.5, 4.6), (0.5, 4.6), (1.0, 4.4)},{(1.0, 4.4), (1.0, 4.4), (1.5, 5.0), (1.5, 5.0)},{(1.5, 5.0), (2.0, 2.5), (6.0, 3.0), (5.0, 5.0)},{(5.0, 5.0), (12, 4.5), (12, 1.5), (5.0, 5.0)}

Following the same process as in Example 2.12.1 to find one Bezier curve C(t),we start by finding the following seven Bezier curves corresponding to the sevensets of points above.

C1(t) = ( 21t2 − 21t − 5, −4.5t3 + 6t2 + 3t + 0.5 )C2(t) = (−8.5t3 + 15t2 − 3t − 5, 1.5t3 + 4.5t2 − 6t + 5 )C3(t) = (−t3 + 1.5t2 − 1.5, 1.2t3 − 1.8t2 + 5 )C4(t) = ( 0.6t3 + 0.3t2 + 2.7t − 1.4, −0.6t2 + 0.6t + 4.4 )C5(t) = (−t3 + 1.5t2 + 1, −1.2t3 + 1.8t2 + 4.4 )C6(t) = (−8.5t3 + 10.5t2 + 1.5t + 1.5, −1.5t3 + 9t2 − 7.5t + 5 )C7(t) = (−21t2 + 21t + 5, 9t3 − 7.5t2 − 1.5t + 5 ).

The resulting composite Bezier curve is shown in Figure 2.12 along with thecontrol points. Observe that we can have sharp corners wherever needed. Alsoobserve that if a particular Bezier curve is to be a line segment, then the corre-sponding four points are listed by repeating them. By instance, for C3 we havethe control points (−1.5, 5), (−1.5, 5), (−1, 4.4), (−1, 4.4).

One very important application of composite Bezier curves is the design of fonts.The polynomials used to create fonts could be quadratic or cubic (even linear),


−15 −10 −5 0 5 10 150.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Figure 2.12: Composite Bezier curve of Example 2.12.3.

depending on the application. True Type typically uses quadratic compositeBezier curves, while Postscript uses cubic ones. In the next example, we wantto illustrate this real-world application of Bezier curves (and hence of Bernsteinpolynomials and basis of a vector space) in creating a postscript font.

Example 2.12.4 We can use the following set of points (listed without paren-thesis for convenience) to generate the Times Roman character R as the com-position of 22 Bezier curves. See Figure 2.13.

0.00 5.70 0.00 5.70 0.00 5.55 0.00 5.55

0.00 5.55 0.60 5.55 0.80 5.35 0.80 4.80

0.80 4.80 0.80 4.80 0.80 0.90 0.80 0.90

0.80 0.90 0.80 0.30 0.60 0.15 0.00 0.15

0.00 0.15 0.00 0.15 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 2.40 0.00 2.40 0.00

2.40 0.00 2.40 0.00 2.40 0.15 2.40 0.15

2.40 0.15 1.80 0.15 1.65 0.30 1.65 0.90

1.65 0.90 1.65 0.90 1.65 2.60 1.65 2.60

1.65 2.60 1.65 2.60 2.65 2.62 2.65 2.62

2.65 2.62 2.65 2.62 4.10 0.00 4.10 0.00

4.10 0.00 4.10 0.00 5.50 0.00 5.50 0.00

5.50 0.00 5.50 0.00 5.50 0.15 5.50 0.15

5.50 0.15 5.15 0.15 4.95 0.20 4.76 0.50

4.76 0.50 4.80 0.50 3.50 2.74 3.50 2.74

3.50 2.74 5.38 3.00 5.10 5.70 3.05 5.70

3.05 5.70 3.05 5.70 0.00 5.70 0.00 5.70

1.65 3.00 1.65 3.00 1.65 5.00 1.65 5.00

1.65 5.00 1.65 5.30 1.75 5.35 1.84 5.35

1.84 5.35 1.84 5.35 2.30 5.35 2.30 5.35

2.30 5.35 4.40 5.20 4.20 3.00 2.40 3.00

2.40 3.00 2.40 3.00 1.65 3.00 1.65 3.00


0 1 2 3 4 5 60

1

2

3

4

5

6

Figure 2.13: Times Roman character as a composite Bezier curve.

2.13 Final Remarks and Further Reading

In this chapter we have introduced those concepts and techniques of linear alge-bra and matrix analysis that are needed for some of the applications covered inthis book. In particular, in Section 2.12, concepts like linear combination, basisof a vector space, convex hull, etc. have been illustrated as very useful tools forthe construction of Bezier curves and postscript fonts. More applications arepresented in the following chapters.

Linear algebra and matrix analysis offer several other very interesting conceptsand techniques not covered here, but that are also very important in applica-tions. An extensive study of matrix norms and their properties, matrix com-putations and matrix analysis in general can be found in the classical booksby Horn and Johnson [31] and Golub and Van Loan [21], the latter especiallyfocused on the numerical and computational aspects of matrix algebra. An ad-ditional excellent reference for matrix computations, linear algebra and theirapplications is the book by Meyer [41].

2.14. EXERCISES 89

2.14 Exercises

Exercise 2.1 Given two matrices A and B, show that

(A + B)T = AT + BT , and (AB)T = BT AT ,

whenever the sum or the product respectively are well defined.

Exercise 2.2 Show that for two matrices A and B whose product is well defined,

tr(AB) = tr(BA).

Exercise 2.3 Find a nonzero vector x = [x1 x2]T such that the sum, Eu-

clidean and maximum norm all coincide.

Exercise 2.4 Show that for any n-dimensional vector x,

‖x‖∞ ≤ ‖x‖2 ≤ ‖x‖1.

Exercise 2.5 Prove the Cauchy-Schwarz inequality:

|xT y| ≤ ‖x‖2 ‖y‖2,

for any two vectors x, y ∈ Rn.

Exercise 2.6 Show that the Frobenius norm of a matrix is consistent, in thesense that

‖AB‖F ≤ ‖A‖F ‖B‖F ,

for any two matrices A,B whose product is well-defined.

Exercise 2.7 Let Am×n be a matrix such that ‖Ax‖2 ≤ ‖x‖2, for all vectorsx ∈ R

n. Show that also ‖AT y‖2 ≤ ‖y‖2, for all vectors y ∈ Rm.

Exercise 2.8 Let A be an m × n matrix, and x ∈ Rn. Show that

‖Ax‖2 = ‖x‖2 if and only if AT A = I.


Exercise 2.9 Show that for any matrix norm induced from a vector norm asin (2.9), we have

‖AB‖ ≤ ‖A‖ ‖B‖, (2.72)

for any two matrices A and B for which the product is well-defined.

Exercise 2.10 Let A =

3 −2 50 8 −12 2 7

. Find ‖A‖F using the definitions

given in (2.6), (2.7) and (2.8).

Exercise 2.11 We may want to extend the vector norm ‖x‖ = maxi

|xi| into a

matrix norm by defining‖A‖ = max

1≤ i,j≤n|aij |.

However, this function cannot be considered a matrix norm because it does notsatisfy the submultiplicative inequality (2.72). Provide a counterexample toconfirm such statement.

Exercise 2.12 Show that for p = 1 and p = ∞, the p-norms in (2.9) coincidewith the corresponding norms in (2.6).

Exercise 2.13 Show that the Frobenius norm in (2.6) can also be defined inthe following two ways:

(a) ‖A‖2F = ‖a1‖2

2 + · · · + ‖an‖22.

(b) ‖A‖2F = tr(AT A).

Here, a1, . . . , an denote the columns of Am×n.

Exercise 2.14 Let A be an m × n matrix. Show that

‖A‖2 ≤√

‖A‖∞‖A‖1.

Exercise 2.15 Let An×n be an arbitrary matrix. We define its spectral radiusas ρ(A) = max

i=1,...,n|λi|, where λi are the eigenvalues of A. Show that

ρ(A) ≤ ‖A‖,

for any matrix norm ‖ · ‖.

2.14. EXERCISES 91

Exercise 2.16 Show that the product of two square lower triangular matricesis lower triangular.

Exercise 2.17 True or False? A symmetric Hessenberg matrix is tridiagonal,that is a matrix whose nonzero elements are on the diagonal, the diagonal belowand the diagonal above the main diagonal.

Exercise 2.18 Show that the inverse of a nonsingular matrix An×n is unique.

Exercise 2.19 Let A be a nonsingular square matrix of order n. Show that ifAX = I, for some matrix Xn×n, then also XA = I.

Exercise 2.20 Show that if An×n is nonsingular, then AT is also nonsingularand

(AT )−1 = (A−1)T .

(We usually write this matrix as A−T ).

Exercise 2.21 Let the matrices A1, A2, . . . , Ak be nonsingular. Show that

(A1A2 · · ·Ak)−1 = A−1

k A−1k−1 · · ·A−1

1 .

Exercise 2.22 Show that a matrix An×n is nonsingular if there is a matrixnorm such that ‖I − A‖ < 1. Then show that in such a case,

A−1 =∞∑

k=0

(I − A)k.

Exercise 2.23 Let A be a square matrix of order n such that ‖I − A‖ < 1, forsome matrix norm ‖ · ‖.Show that

‖A−1‖ ≤ 1

1 − ‖I − A‖ .

Exercise 2.24 Let An×n be an arbitrary matrix that can be expressed as A =M −N , for some matrices M and N , with M nonsingular, and let B = M−1N .Show that if ‖B)‖ < 1, for some matrix norm, then A is nonsingular.



1 a a0 1 a0 0 1

. Show that its inverse is given by the matrix

A−1 = I + B + B2 + B3 + · · · ,

where B = I − A. Find A−1 explicitly.

Exercise 2.26 Let A be a square matrix of order n which is strictly diagonaldominant, that is,

|aii| >n∑

j=1

j 6=i

|aij |, i = 1, 2, . . . , n. (2.73)

Show that A is nonsingular.

Note: If the inequality in (2.73) is not strict, we say A is weakly diagonaldominant.

Exercise 2.27 Give an example of a matrix that is symmetric but not hermitianand an example of matrix that is hermitian but not symmetric.

Exercise 2.28 True or False? The diagonal entries of a Hermitian matrixmust be real.

Exercise 2.29 True or False? The sum of two symmetric matrices is a sym-metric matrix.

Exercise 2.30 Verify that the matrices in Example 2.4.7 are positive definite,and that the matrices in Example 2.4.5 are not.


[

B CD E

]

be symmetric positive definite (spd), with B

and E square. Show that B and E are also spd.

Exercise 2.32 Let A be a square matrix of order n. Show that

d

dteAt = AeAt.

2.14. EXERCISES 93

Exercise 2.33 Let An×n be nonsingular and assume that e−At → 0 as t → ∞.Show that

A−1 =

∫ ∞

0e−Atdt

Exercise 2.34 Show that if a matrix Am×n has rank n, then there exists amatrix X such that XA = I. Is the matrix X unique?. (X is called a leftinverse of A).

Exercise 2.35 Let, v and q be two n-dimensional vectors, with ‖q‖ = 1. Wedefine the projection of v along q as

Projqv = (vT q)q,

and therefore, the orthogonal projection of v onto q is v − (vT q)q, just asin Gram-Schimdt process. Let v = [−1 0 3]T and u = [1 0 2]T . Find theorthogonal projection of v along q = u/‖u‖.

Exercise 2.36 The vectors v1 = [−1 0 1]T , v2 = [0 − 1 1]T and v3 =[1 1 0]T are linearly independent. Apply the Gram-Schimdt process to thesevectors, but using the following dot produt:

xT y :=1

3x1y1 + 2x2y2 + x3y3.

Exercise 2.37 The same row and column permutation of the matrix A in Ex-ample 2.4.14 could have also been obtained by simply computing the productPAP . Is this true for any permutation matrix P?

Exercise 2.38 Show that the columns of an orthogonal matrix Qm×n are or-thonormal vectors

Exercise 2.39 Let Pn×n and Qn×n be two orthogonal matrices. Show that thematrix PQ is also orthogonal. Show however that the sum P + Q may fail to beorthogonal.

Exercise 2.40 Let x, y be two arbitrary vectors in Rn and Qn×n an orthogonal

matrix. Let θ1 be the angle between x and y, and θ2 the angle between Qx andQy. Show that

cos θ1 = cos θ2.


Exercise 2.41 Let Qm×n = [ q1 q2 · · · qn] be an orthogonal matrix. What doyou obtain when Gram-Schmidt is applied to the columns of Q?

Exercise 2.42 A matrix norm is called orthogonally invariant if for an arbi-trary matrix Am×n and orthogonal matrices Q and U of appropriate dimensions,we have

‖Q AUT ‖ = ‖A‖.Show that the Frobenius and the 2-norm are orthogonally invariant.

Exercise 2.43 Show that the orthogonal invariance of a norm implies that mul-tiplication by an orthogonal matrix does not magnify errors, in the followingsense: Let An×n be an arbitrary matrix and Qn×n orthogonal. If an error E isintroduced in A, then the error in QAQT has the same norm as E. Also showthat for a general nonsingular matrix P , the error in PAP−1 is bounded bycond(P )‖E‖.

Exercise 2.44 Let An×n be a nonsingular matrix and b an arbitrary vector inR

n. Let x be the solution of Ax = b and suppose there is a perturbation δb onthe vector b so that the solution of the system is perturbed to x + δx, that is,A(x + δx) = b + δb. Show that

‖δx‖‖x‖ ≤ cond(A)

‖δb‖‖b‖ .

Exercise 2.45 Let Qn×n be an orthogonal matrix. Prove that cond(Q) = 1.

Exercise 2.46 Let Q be an orthogonal matrix and A a nonsingular matrix.Show that

cond (QA) = cond (A).

Exercise 2.47 Show that no two vectors in R3 can span all of R

3.

Exercise 2.48 Find the subspace spanned by the three vectors [2 3 1]T ,

[2 1 − 5]T , and [2 4 4]T .

2.14. EXERCISES 95

Exercise 2.49 Show that the set U in Example 2.5.6 is not a vector subspaceof V .

Exercise 2.50 Let V be a vector space and S = {v1, . . . , vk} an arbitrary subsetof V . Show that the set of all linear combinations of vectors from S is a subspaceof V .

Exercise 2.51 Let B = {v1, . . . , vn} be a basis of vector space V . Show that anysubset of V containing more than n vectors is linearly dependent (not linearlyindependent).

Exercise 2.52 Let u and v be arbitrary linearly independent vectors in Rn.Show that for some values of the scalars c1 and c2, the vector x = c1u+ c2v hasboth positive and negative components.

Exercise 2.53 Show that the set { f1 = 2x2+1, f2 = x2+4x, f3 = x2−4x+1 }is linearly dependent.

Exercise 2.54 Show that if the columns of a matrix Am×n are linearly inde-pendent, then the matrix AT A has an inverse.

Exercise 2.55 Let S be a subspace of a vector space V . Show that

(S⊥)⊥ = S.

Exercise 2.56 Let B1 = {u1, . . . , un} and B2 = {w1, . . . , wn} be two orthonor-mal bases of R

n, and let a1, . . . , an and b1, . . . , bn be the coordinates of a vectorx ∈ R

n in those bases respectively. Show that

a21 + · · · + a2

n = b21 + · · · + b2

n.

Exercise 2.57 A =

4 0 1 −1 10 −5 0 5 00 0 3 4 1

. Find dim col(A) and dimN(A).


Exercise 2.58 Let A be an m × n matrix. Show that

N(AT A) = N(A) and rank(AT A) = rank(A).

Exercise 2.59 Let V be the vector space of square real matrices of order 2, andlet

U =

{[

0 a0 b

]

: a, b ∈ R

}

, W =

{[

0 0c d

]

: c, d ∈ R

}

.

Find dim(U + W ) and verify formula (2.38).

Exercise 2.60 Consider the block matrix A = [A1 A2], for some matrices A1

and A2. Show that

col(A) = col(A1) + col(A2).

Exercise 2.61 Let P1 and P2 be orthogonal projection matrices onto two sub-spaces V1 and V2 of a vector space V . Show that

col(P1 + P2) = col(P1) + col(P2).

Exercise 2.62 Let the columns of a matrix Qm×n form an orthonormal basisof a vector subspace S. Show that

P = QQT (2.74)

is the orthogonal projection matrix onto S.

Exercise 2.63 Let A be a square matrix of order n, and λ an eigenvalue ofA. Show that the set of all eigenvectors associated to λ, together with the zerovector, form a subspace of R

n.

Exercise 2.64 Show that the determinant of a matrix An×n is given by theproduct of its eigenvalues, that is,

det A = λ1 · λ2 · · ·λn,

where some of the eigenvalues may be repeated.

2.14. EXERCISES 97

Exercise 2.65 Consider the matrix

A =

2 1 0 30 3 3 04 1 0 11 0 0 5

.

Without any computation of the characteristic equation, determine one of theeigenvalues of A as well as an associated eigenvector.

Exercise 2.66 Suppose that the sum of the entries in each column of a squarematrix A is 1. Show that 1 is an eigenvalue of AT .

Exercise 2.67 True or False? The eigenvalues of a positive definite matrixare positive numbers.

Exercise 2.68 Let u and v be right and left eigenvectors respectively of a matrixAn×n for a given eigenvalue λ of A, with vT u = 1. Let Qn×(n−1) be an orthogonal

matrix whose columns form a basis of v⊥. Show that the matrix P = [u Q]satisfies

P−1AP =

[

λ 00 B

]

,

for some square matrix B of order n − 1.

Exercise 2.69 Show by counterexample that the product of two symmetric ma-trices A and B is not necessarily symmetric. Show however that if A and Bcommute then the product AB is also symmetric.

Exercise 2.70 True or False? The inverse of a nonsingular symmetric matrixis also symmetric.

Exercise 2.71 Let An×n be a symmetric matrix. Show that all its eigenvaluesare real, and that eigenvectors corresponding to distinct eigenvalues are orthog-onal.

Exercise 2.72 Let U = V ⊕W , and let B1,B2 be bases of V and W respectively.Prove that B1 ∩ B2 = ∅ and that B1

⋃B2 is a basis for U .


Exercise 2.73 For a matrix An×n define r = Av − λv 6= 0, where v is a unitvector (we say v is an approximate eigenvector of A, with associated eigenvalueλ). Show that

(A + δA)v = λv,

where δA = −rvT .

Exercise 2.74 Suppose that An×n has n distinct eigenvalues with associatedeigenvectors v1, . . . , vn and left eigenvectors w1, . . . , wn. Show that wT

i vj = 0if i 6= j, and wT

i vj 6= 0 if i = j.

Exercise 2.75 (Gershgorin) Let A be a square matrix of order n, and forsome nonsingular matrix P , let P−1AP = D +E, with D=diag(d1, . . . , dn) andE has zero diagonal entries. Show that

σ(A) ⊆n⋃

i=1

Di,

where Di = { z ∈ C : |z − di| ≤n∑

j=1|eij | }.

Hint: Show (D − λI) + E is singular, and use Exercise 2.23 with p = ∞.

Exercise 2.76 Let D =diag(d1, . . . , dn) with all di distinct. Consider a scalarc 6= 0 and assume that all entries of a vector u ∈ R

n are nonzero. Define thematrix

A = D + cuuT .

Show that D and A do not have any common eigenvalues.

Exercise 2.77 Let S and K represent the subspaces of symmetric and skew-symmetric n × n matrices respectively. Show that

Rn×n = S ⊕ K.

Hint: For any square matrix A of order n, the matrix A + AT is symmetricand A − AT is skew-symmetric.

Exercise 2.78 Let An×n be a symmetric matrix. Show that the matrix iA isskew-hermitian.

2.14. EXERCISES 99

Exercise 2.79 Let An×n be a skew symmetric matrix. Show that A is singularwhen n is odd.

Exercise 2.80 True or False? If A is a skew symmetric matrix, then eA isorthogonal.

Exercise 2.81 Let Pn×n be a projection matrix. Show that

col(P ) ⊕ N(P ) = Rn.

Exercise 2.82 Let P be a projection matrix on a vector space V . Show thatfor x ∈ V ,

x ∈ col(P ) if and only if x = Px.

Exercise 2.83 Let x, y be two vectors in Rn such that yT x = 1. Show that

P = xyT is a projection matrix and that

‖P‖2 = ‖x‖2 ‖y‖2.

Exercise 2.84 Show that if P is a projection matrix, then each eigenvalue ofP is either 0 or 1.

Exercise 2.85 Let u and v be right and left eigenvectors respectively of a matrixAn×n, associated to a simple eigenvalue λ of A. Show that

P =u vT

vT u

is a projection matrix onto N(A − λI).

Exercise 2.86 Show that if two square matrices A and B are similar, then theymust have the same rank.

Hint. Use Theorem 2.44.

Exercise 2.87 Let A and B be similar matrices. Show that for any positiveinteger k

Bk = P−1AkP,

for some nonsingular matrix P .


Exercise 2.88 True or False? Similar matrices have the same eigenvectors.

Exercise 2.89 Let A be a square matrix of order n. Show that A has a completeset of n linearly independent eigenvectors if and only if A is diagonalizable.

Exercise 2.90 Show that two matrices A and B can be simultaneously diago-nalized if and only if AB = BA.

Exercise 2.91 Let An×n be a symmetric matrix. Show that the algebraic mul-tiplicity of any eigenvalue of A is equal to its geometric multiplicity.

Exercise 2.92 Let A and B be diagonalizable matrices such that AB = BA.Show that

eA+B = eAeB .

Exercise 2.93 Show that the matrix norm ‖ · ‖2 in (2.9) with p = 2, can alsobe defined as

‖A‖2 =√

max λ (AT A).

0 1 2 30

1

2

3

4

5

6

7

0 1 2 3 40

2

4

6

8

10

12

14

16

P0

P1

P2

P3

P3

P1

P0

P2

Figure 2.14: Control points for Exercise 2.97

2.14. EXERCISES 101

Exercise 2.94 Is it possible to use quadratic Bernstein polynomials and stillsatisfy the tangential properties at both endpoints of a Bezier curve?

Exercise 2.95 Show that to guarantee smoothness at a point Pi where twoBezier curves meet, the three points Pi−1, Pi, Pi+1 must be collinear.

Exercise 2.96 Verify the midpoint property of Bezier curves, that is, the pointM = (x(1

2 , y(12)) ) can be obtained by first computing five other midpoints. Refer

to Figure 2.11.

Exercise 2.97 Sketch the Bezier curves corresponding to the control points inFigure 2.14.

Exercise 2.98 Show that the slope of the Bezier curve at the midpoint M co-incides with the slope of the segment joining d and M .

Exercise 2.99 True or False? It is possible for the control points P1 and P2

to lie on opposite sides of the corresponding Bezier curve.

Exercise 2.100 True or False? It is not possible to draw an ellipse with onesingle Bezier curve.

Linear Algebra - Missouri State Universitypeople.missouristate.edu/jrebaza/assets/10linearalgebra.pdfLinear Algebra JORGE REBAZA Undoubtedly, one of the subjects in mathematics that

Documents