main-20130724

7/27/2019 main-20130724

1/74

Essential Linear Algebra:

The (Mostly) Real Story

Ethan Bolker and Catalin Zara

Version: 0.04

July 24, 2013

7/27/2019 main-20130724

2/74

7/27/2019 main-20130724

3/74

Contents

Chapter 1. The Linear Space Rn 51.1. The linear space Rn 51.2. Linear decompositions 71.3. Planes in space 9

Chapter 2. Linear Systems 13

2.1. Linear systems 132.2. Row operations 162.3. Echelon forms 182.4. Application: flow patterns 242.5. Solution sets 26

Chapter 3. Matrix Algebra 293.1. Product of matrices 293.2. Inverse Matrix 343.3. Linear Systems and Inverses 383.4. Elementary matrices 393.5. Linear Operations on Matrices 42

Chapter 4. Fundamental Forms 454.1. Fundamental Forms 454.2. Rank of a Matrix 464.3. Data Compression 48

Chapter 5. Linear Subspaces 495.1. Column Space and Null Space 495.2. Linear Subspaces 525.3. Bases 535.4. Systems of Generators 555.5. Linear Independence 575.6. Construction of Bases 585.7. Coordinates in a Basis 635.8. Change of Coordinates 665.9. Dimension of a Subspace 69

Index 73

Index 73

3

7/27/2019 main-20130724

4/74

7/27/2019 main-20130724

5/74

CHAPTER 1

The Linear Space Rn

1.1. The linear space Rn

The set Rn is the set of all lists of n real numbers:

(1.1) Rn = {(x1, x2, . . . , xn) | x1, x2, . . . , xn real numbers} .

In particular,

R2 = {(x1, x2) | x1, x2 real numbers} = {(x, y) | x, y real numbers}

and

R3 = {(x1, x2, x3) | x1, x2, x3 real numbers} = {(x,y,z) | x,y,z real numbers}

When regarded as an element ofRn, such a list will be called a vector. vectorLists are used to represent, for example, points in Rn. Consider, for

example, the space R3, with the intuitive Oxyz Cartesian coordinate system.

The origin of the system is the point O, which is identified with the list of itscoordinates, (0, 0, 0). The point A obtained by moving one unit along Ox,then two units parallel to Oy, and then one unit parallel to Oz is identifiedwith the list (1, 2, 1). The point B that corresponds to the list (2, 1, 3) isreached by moving two units parallel to Ox, but in the opposite direction,then one unit parallel to Oy and then three units parallel to Oz.

The dispalacement vector from the point A(1, 2, 1) to the point B(2, 1, 3) dispalacement vtortells us how to reach B if we start at A: we can do that by moving 3 1 = 2

units in the direction Oz, then 1 2 = 1 units in the direction Oy (henceone unit parallel to Oy but in the opposite direction), and then (2)1 = 3units in the direction Ox. Therefore the displacement vector is

(1.2) AB = (3, 1, 2) .

A displacement vector from O(0, 0, 0) is special and is called a position

vector. The position vector OA is the vector (1, 2, 1) and the position vector position vectorOB is the vector (2, 1, 3). Since the points A and B are identified by the

same lists as the position vectors OA and OB, we will informally identifypoints with their position vectors.

5

7/27/2019 main-20130724

6/74

6 1. THE LINEAR SPACE Rn

An alternative notation for a vector X = (x1, x2, . . . , xn) in Rn is

(1.3) X =

x1x2. . .xn

.There are several operations on Rn that are important in linear algebra:

Addition of vectors:Addition of vectors

(1.4) (x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn) .

Scaling of vectors:Scaling of vectors

(1.5) c (x1, . . . , xn) = (cx1, . . . , c xn) .

Dot product:Dot product

(1.6) (x1, x2, . . . , xn) (y1, y2, . . . , yn) = x1y1 + x2y2 + + xnyn .

The first two operations (addition and scaling of vectors) are essentialfor linear algebra; they define the linear structure ofRn. The third operationlinear structure(dot product) is related to the metric structure of Rn; we will have a lotmetric structuremore to say about it later.

Addition can be extended to any finite number of vectors. For example

(1.7) u + v + w + q = (((u + v) + w) + q) .

The addition of vectors is associative

(1.8) (u + v) + w = u + (v + w) .

and commutative

(1.9) u + v = v + u .

An immediate consequence of these properties is that the sum of vectorsdoes not depend on the order in which we add or group them:

u + v + w + q = (((u + v) + w) + q) = ((q+ u) + (w + v)) .

We can combine the two operations on any finite number of vectors.Let c1, c2, . . . , ck be real numbers (also called scalars) and u1, u2, . . . , uk bevectors in some Rn. The vector

(1.10) w = c1u1 + c2u2 + + ckuk

is called the linear combination of the vectors u1, u2, . . . , uk with scalarslinear combinationc1, c2, . . . , ck. One can also think of w as a weighted sum of u1, . . . , uk, withweights c1, . . . , ck.

Example 1.1. Let u1 = (1, 2, 1) and u2 = (2, 1, 3). Then

3u1 2u2 = 3(1, 2, 1) 2(2, 1, 3) = (3, 6, 3) (4, 2, 6) = (7, 4, 3)

7/27/2019 main-20130724

7/74

1.2. LINEAR DECOMPOSITIONS 7

or, in the column notation,

3u1 2u2 = 312

1

2213

= 363

426

= 743

.If x1 and x2 are scalars, then

x1u1 + x2u2 = x1

12

1

+ x2

21

3

=

x12x1

x1

+

2x2x2

3x2

=

x1 2x22x1 + x2

x1 + 3x2

.

1.2. Linear decompositions

The reverse operation is called linear decomposition along given vectors: linear decompotiongiven vectors w, u1, . . . , uk in R

n, find scalars c1, . . . , ck (if possible) such

thatw = c1u1 + c2u2 + + ckuk .

Let X = (x1, x2) be a vector in R2. Then

(1.11) X =

x1x2

=

x10

+

0

x2

= x1

10

+ x2

01

= x1e1 + x2e2 ,

where e1 = (1, 0) and e2 = (0, 1). Therefore every vector X in R2 can

be written (in a unique way) as a linear combination of e1 = (1, 0) ande2 = (0, 1).

We consider now a slightly more complicated example. Let u1 = (1, 2, 1)and u2 = (2, 1, 3), as in Example 1.1 and w = (1, 4, 3). To decompose w

along u1 and u2 we need to write w as a linear combination of u1 and u2,that is, to find scalars x1 and x2 such that w = x1u1 + x2u2. This conditionis

(1.12)

14

3

= w = x1u1 + x2u2 =

x1 2x22x1 + x2

x1 + 3x2

,

hence the scalars x1 and x2 must be solutions of the system of equations

(1.13)

x1 2x2 = 1

2x1 + x2 = 4

x1 + 3x2 = 3

.

There are many ways of solving this system of equations; for example, if wesubtract the first equation from the third and twice the first equation fromthe second we get

(1.14)

x1 2x2 = 1

5x2 = 2

5x2 = 2

,

7/27/2019 main-20130724

8/74


and we can determine x2 both from the second and the third equations, bydividing the equations by 5:

(1.15)

x1 2x2 = 1

x2 = 2/5

x2 = 2/5

.

Fortunately we got the same value for x2, and if we replace it in the firstequation we end up with x1 = 9/5 and x2 = 2/5. Therefore there is aunique way of writing w = (1, 4, 3) as a linear combination of u1 = (1, 2, 1)and u2 = (2, 1, 3):

(1.16)

143

=

9

5

121

+

2

5

213

.

We might not be always so fortunate to find the same value for x2: forexample, if we try to decompose (1, 4, 4) along the same u1 and u2, then thesystem of equations is

(1.17)

x1 2x2 = 1

2x1 + x2 = 4

x1 + 3x2 = 4

.

and after the first step we get

(1.18)

x1 2x2 = 1

5x2 = 2

5x2 = 3

.

This system of equations does not have any solutions, so it is not possibleto write (1, 4, 4) as a linear combination of (1, 2, 1) and (2, 1, 3).

Some vectors can be written as linear combinations of u1 and u2, andsome can not, depending on their components. To determine the condi-tion(s) that the components x,y,z of w = (x,y,z) must satisfy for w to bea linear combination of u1 and u2, we need to determine the condition(s)that x,y,z should satisfy for the system of equations

(1.19)

x1 2x2 = x

2x1 + x2 = y

x1 + 3x2 = z

.

with unknowns x1 and x2 to have a solution. If we subtract the first equationfrom the third and twice the first equation from the second we get

(1.20)

x1 2x2 = x

5x2 = y 2x

5x2 = z x

;

7/27/2019 main-20130724

9/74

1.3. PLANES IN SPACE 9

if we now subtract the second equation from the third we get

(1.21)

x1 2x2 = x

5x2 = y 2x

0 = x y + z

.

If x y + z = 0, then the last equality is impossible, hence the systemdoes not have any solutions. If x y + z = 0, then the last equality isautomatically true and does not impose any conditions on x1 and x2. Wecan now determine x2 = (y 2x)/5 from the second equation, and thenx1 = x + 2(y 2x)/5 = (x + 2y)/5 from the first equation.

To conclude: ifx y + z = 0, then (x,y,z) can not be written as a linearcombination of (1, 2, 1) and (2, 1, 3), and if x y + z = 0 then (x,y,z) canbe written in a unique way as a linear combination of (1 , 2, 1) and (2, 1, 3).

1.3. Planes in space

Geometrically, the vectors (x,y,z) in R3 that can be written as linearcombinations of (1, 2, 1) and (2, 1, 3) are position vectors for the pointsof the plane passing through O(0, 0, 0), A(1, 2, 1), and B(2, 1, 3). A pointM(x,y,z) is on that plane if and only if x y + z = 0; we say that

(1.22) x y + z = 0

is the implicit equation of the plane OAB. This equation can be written as implicit equation

(1.23) (1, 1, 1) (x,y,z) = 0 .

To understand the geometric meaning of the vector (1, 1, 1) we need tointerpret what it means that the dot product of two vectors is zero. For

that, consider the points X(x1, x2, x3) and Y = (y1, y2, y3) in R3. Thesquare of the Euclidean distance between these two points is

XY2 = (x1 y1)2 + (x2 y2)

2 + x3 y3)2

= (x21 + x22 + x

23) + (y

21 + y

22 + y

23) 2(x1y1 + x2y2 + x3y3)

= OX2 + OY2 2OX OY .

ThereforeOX OY = 0 XY 2 = OX2 + OY2 ,

and by Pythagorean Theorem, that is equivalent to OX and OY beingperpendicular to each other.

Therefore the vectors in the plane OAB are the vectors perpendicular

to (1, 1, 1); we say that (1, 1, 1) is a normal vector for the plane (OAB). normal vectorExtending this concept to Rn, we will say that two vectors X and Y in

Rn are orthogonal vectors if and only if their dot product is zero: X Y = 0. orthogonal vecto

Suppose now that we did not know the geometric description of the setof points (x,y,z) in R3 satisfying the condition x y + z = 0.

We can solve for x in terms ofy and z and get x = y z. Therefore, forevery pair of values y = s and z = t, the point (s t,s,t) is the only point

7/27/2019 main-20130724

10/74


with the given second and third coordinate that is on the set of equationx y + z = 0. Hence a position vector w(x,y,z) corresponds to a point in

the set of equation x y + z = 0 if and only if there exist scalars s and tsuch that (x,y,z) = s(1, 1, 0) + t(1, 0, 1) or, equivalently, if

(1.24)

x = s t

y = s

z = t

for some scalars s and t. The equations above are parametric equations ofparametric equa-tions the plane OAB. If instead we use the vectors (1, 2, 1) and (2, 1, 3), we get

a different set of parametric equations for the same plane (OAB):

(1.25) x = s 2t

y = 2s + t

z = s + 3t

s, t real numbers .

What if the right hand side of the equation is not zero? For example,what is the geometric description of the set of points (x,y,z) in R3 suchthat

(1.26) x y + z = 2 ?

Again, if we solve for x and use y and z as parameters s and t, then

(1.27)

x = 2 + s t

y = s

z = thence

(1.28)

xy

z

=

20

0

+ s

11

0

+ t

10

1

x 2y

z

= s

11

0

+ t

10

1

Hence a point (x,y,z) is on the set xy +z = 2 if and only if (x2, 0, 0)is on the plane x y + z = 0. Therefore the set x y + z = 2 is a planeparallel to the plane (OAB) and passing through the point (2, 0, 0).

What if we want to determine equations for the plane passing throughthree points Q(2, 3, 1), R(3, 1, 2), and S(0, 2, 4)? This plane passesthrough Q(2, 3, 1) and is parallel to the directions QR = (1, 2, 1) andQS = (2, 1, 3). Therefore, (x,y,z) is on the plane (QRS) if and only if(x 2, y + 3, z 1) is on the plane (OAB), hence if and only if

(1.29) (x 2) (y 3) + (z 1) = 0 x y + z = 6 .

Suppose now that we want to determine the intersection of two planes.For example, the intersection of the planes x+y z = 1 and 2x+2y3z = 3.A point (x,y,z) is in the intersection if and only if its coordinates x,y,z

7/27/2019 main-20130724

11/74

1.3. PLANES IN SPACE 11

satisfy the equations of b oth planes, hence if and only if (x,y,z) is a solutionof the system of equations

(1.30)

x + y z = 12x + 2y 3z = 3

.

If we subtract twice the first equation from the second we get

(1.31)

x + y z = 1

z = 1

x + y z = 1

z = 1

If we substitute z = 1 in the first equation or, equivalently, if we addthe second equation to the first we get

(1.32)

x + y = 0

z = 1

x = y

z = 1

We solved for x in terms of y; we treat y = s as a parameter and obtainthe general solution general solution

(1.33)

xy

z

=

ss

1

=

00

1

+ s

11

0

.

For s = 0 we obtain a particular solution (x,y,z) = (0, 0, 1), hence particular solutione point in the intersection is the point C(0, 0, 1). All other points in theintersection differ from C by a multiple of the vector (1, 1, 0). Geometri-cally, the intersection of the two planes is the line with direction (1, 1, 0)passing through the point C(0, 0, 1).

Suppose now that we want to find the intersection of this line with the

plane x y + z = 0. A point on the line has coordinates (s,s, 1) for somevalue of the parameter s. Such a point is on the plane xy+z = 0 if and onlyif (s) (s) + (1) = 0, hence if and only if s = 1/2. The correspondingpoint is the point D(1/2, 1/2, 1). This point D is the unique point inthe intersection of the three planes, 2x + 2y 3z = 3, x + y z = 1, andx y + z = 0, hence the unique solution of the system of equations

(1.34)

x + y z = 1

2x + 2y 3z = 3

x y + z = 0

7/27/2019 main-20130724

12/74

7/27/2019 main-20130724

13/74

CHAPTER 2

Linear Systems

2.1. Linear systems

In the previous section we were able to solve systems of equations withseveral variables by adding, subtracting, and multiplying equations by non-zero scalars. The method described above works so nicely because of theparticular form of the equations.

Suppose we have several (n) variables, named x1, x2, . . . , xn. An equa-tion of the form

(2.1) a1x1 + a2x2 + + anxn = b

is called a linear equation. An equation is linear if the left hand side is linear equationa sum of constant multiples of variables. In other words, all the variablesare with power 1, and there is no product of variables or other operationsinvolved.

Now suppose we have several (m) linear equations with the same vari-ables x1, x2, . . . , xn.

A system of linear equations

(2.2)

a11x1 + a12x2 + + a1nxn = b1a21x1 + a22x2 + + a2nxn = b2

......

am1x1 + am2x2 + + amnxn = bm

is called a linear system. In this notation, aij is the coefficient of the variable linear systemxj in the i

th equation and bi is the right hand side of the ith equation.

Solving the system means finding values for the variables x1, . . . , xn thatsimultaneously satisfy all the equations.

The matrix of the linear system (2.2) is the rectangular table of coeffi- matrix of the linesystemcients

(2.3) A =

a11 a12 . . . a1na21 a22 . . . a2n

......

......

am1 am2 . . . amn

A rectangular table of the form (2.4) is called a matrix, whether it is matrix

associated with a linear system or not. The horizontal lists are called rows rowsand the vertical lists are called columns. Hence the matrix in (2.4) has m columns

13

7/27/2019 main-20130724

14/74

14 2. LINEAR SYSTEMS

rows and n columns, and we say that A is of size (or order) m n, or thatsizeA is an m n matrix and read that as m-by-n matrix.

Let A be an m n matrix. Each of the n columns of A is a vectorin Rm; these n vectors are denoted by col1(A), . . . , coln(A). Similarly, eachof the m rows of A is a vector in Rn; these m vectors are denoted byrow1(A), . . . , rowm(A):

(2.4)

col1(A) col2(A) colj(A) coln(A)

row1(A) a11 a12 . . . a1j . . . a1nrow2(A) a21 a22 . . . a2j . . . a2n

......

......

rowi(A) ai1 ai2 . . . aij . . . ain...

......

...rowm(A) am1 am2 . . . amj . . . amn

Note that each column is also an m 1 column matrix and each row is ancolumn matrix1 n row matrix. It will be clear from the context when we refer to therow matrixrows and columns as vectors and when as matrices.

The element aij, sitting on the ith row and jth column, is called the (i, j)

entry of the matrix. The entry aij is the ith component of the jth columnentry

of A, and the jth component of the ith row of A:

(2.5) aij = [colj(A)]i = [rowi(A)]j .

The matrix of a linear system does not include the right hand side of theequations; to remedy that unfortunate exclusion we define the augmentedmatrix of the system (2.2) to be the matrixaugmented matrix

(2.6) A = [A | b] =

a11 a12 . . . a1n | b1a21 a22 . . . a2n | b2

......

...... |

...am1 am2 . . . amn | bm

We will occasionally use the terminology the linear system [A | b] for thelinear system that has the augmented matrix A = [A | b]. The divider em-phasizes that the augmented matrix is a block matrix, with one mn matrixblock matrix(block) to the left and an m 1 block to the left.

When m = n = 1, a linear system is just a linear equation ax = b,with a, b, and x real numbers and the operation between a and x the usualproduct of real numbers. Extending this to general linear systems, we definethe product between an m n matrix A and a column X in Rn by

(2.7)

a11 a12 . . . a1na21 a22 . . . a2n

......

......

am1 am2 . . . amn

x1x2...

xm

=

a11x1 + a12x2 + + a1nxna21x1 + a22x2 + + a2nxn

...am1x1 + am2x2 + + amnxn

7/27/2019 main-20130724

15/74

2.1. LINEAR SYSTEMS 15

The product AX can also be interpreted as

(2.8) AX = x1 col1(A) + . . . xn coln(A) =

row1(A) X

row2(A) X...

rowm(A) X

.

Hence AX is the vector in Rm obtained as a linear combination of thecolumns of A with coefficients given by the components of X.

For example

(2.9)

1 22 1

1 3

x1

x2

=

x1 2x22x1 + x2

x1 + 3x2

and

(2.10)

1 2 1

2 1 3

010

= 2

1

.

More general, if A is an m n matrix and ej is the vector in Rn that has

the jth entry equal to 1 and all other entries equal to 0, then Aej = colj(A).With the product notation, the linear system [A | b] has the matrix form matrix form

AX = b. Each solution of the linear system is a way of writing the righthand side vector b as a linear combination of the columns of the coefficientmatrix A.

We have seen that some linear systems have solutions and some do not.

A linear system that has at least one solution is called consistent; a linear consistentsystem that does not have any solutions is called inconsistent. When a inconsistentlinear system is consistent, it may have a unique solution or it may haveinfinitely many solutions. A linear system that has a unique solution iscalled determined; a linear system with more than one solution is called determinedundetermined. undetermined

In general we need to solve the system in order to determine whetherit is consistent or not; there is however, one situation when we know fromthe beginning that the linear system is consistent: if the right hand sideof the linear equations are all equal to 0, then (0 , 0, . . . , 0) is a solution ofthe linear system. A linear system [A | b] is called homogeneous if all the homogeneouscomponents of the vector b are zero; otherwise, the linear system is called

nonhomogeneous. nonhomogeneousTo solve the linear system means to transform the equations into

(2.11)

x1 = c1x2 = c2

. . ....

xm = cn

7/27/2019 main-20130724

16/74


or, equivalently, the augmented matrix

(2.12) A = [A | b] =

a11 a12 . . . a1n | b1

a21 a22 . . . a2n | b2...

......

... |...

am1 am2 . . . amn | bm

into

(2.13)

1 0 . . . 0 | c10 1 . . . 0 | c2...

.... . .

... |...

0 0 . . . 1 | cm

Clearly this is not possible if the number of equations (m) is not the

same as the number of variables (n). That special case deserves a specialname: a matrix A is called a square matrix if the number of rows is thesquare matrixsame as the number of columns.

Getting to the simplest form (2.11) will not always be possible; we willtry to get to a form as close to it as possible. That would mean that wemay have to solve for some of the variables in terms of the others.

2.2. Row operations

We will try to achieve the simple form (2.11) with the following opera-tions on equations:

adding/subtracting a multiple of an equation to another equation.

multiplying an equation by a nonzero number.These correspond to row operations on augmented matrices:row operations

add/subtract a multiple of a row to/from another row. The nota-tion

Ri Ri + aRj

will mean that the new row i is the old row i plus a times the oldrow j; no other rows are modified, hence the new row j is the sameas the old row j.

multiply a row by a nonzero number. The notation

Ri aRi

will mean that the new row i is the a times the old row i; no otherrows are modified.

A third operation may be required if we are picky about having the solutionfor x1 on the first row, the solution for x2 on the second row, and so on:

swap rows, corresponding to changing the order of equations. Thenotation

Ri Rj

7/27/2019 main-20130724

17/74

2.2. ROW OPERATIONS 17

will mean that the new row i is the old row j and the new row j isthe old row i; no other rows are modified.

The operations on rows/equations are reversible, hence after each changewe have a system that is equivalent to the original one, in the sense that thecurrent system has the same solutions as the original one.

Consider the linear system

(2.14)

x + y z = 1

2x + 2y 3z = 3

x y + z = 0

.

To solve it means to change the augmented matrix from the left hand sideof

(2.15)1 1 1 | 12 2 3 | 3

1 1 1 | 0

1 0 0 | a0 1 0 | b0 0 1 | c

to the right hand side, using a combination of row operations. We will dothat systematically, one column at a time.

The first entry on the first column is right (1 on both sides). The secondentry needs to be changed from 2 to 0. We can achieve that subtractingtwice the first row from the second row (R2 R2 2R1). The third entryon the first column needs to be changed from 1 to 0, and we can do thatby subtracting the first row from the third row (R3 R3 R1). The orderin which we perform these operations is not important, as we get the sameresult:

(2.16)

1 1 1 | 12 2 3 | 3

1 1 1 | 0

R2 R2 2R1R3 R3 R1

1 1 1 | 10 0 1 | 1

0 2 2 | 1

.

We are done with the first column and we move to the second one. Tomake the second column look like the second column of (2.15), we need a 0on the first row, a 1 on the second row and a 0 on the third row. A naiveattempt would be to swap the first two rows; but that would mess the firstcolumn, so forget about it. There is no number that would multiply a 0 into1, so it looks like were stuck. At this stage the second equation is

(2.17) z = 1 ,

and there is no way to solve it for y, since y is not even present! But y doesappear in the third equation (2y + 2z = 1), so swapping the second andthird rows would be helpful:

(2.18)

1 1 1 | 10 0 1 | 1

0 2 2 | 1

R2R3

1 1 1 | 10 2 2 | 1

0 0 1 | 1

.

7/27/2019 main-20130724

18/74


We can now get a 1 on the second column by dividing the second row by2 or, equivalently, by multiplying the second row with 1/2:

(2.19)

1 1 1 | 10 2 2 | 10 0 1 | 1

R2 12 R21 1 1 | 10 1 1 | 1/2

0 0 1 | 1

.To fix the other entries on the second column we only need to change thetop 1 into a 0; we can do that by subtracting the second row from the firstrow:

(2.20)

1 1 1 | 10 1 1 | 1/2

0 0 1 | 1

R1R1R2

1 0 0 | 1/20 1 1 | 1/2

0 0 1 | 1

.

Next we multiply the third row by 1

(2.21)

1 0 0 | 1/20 1 1 | 1/2

0 0 1 | 1

R3R3

1 0 0 | 1/20 1 1 | 1/2

0 0 1 | 1

and add the third row to the second one

(2.22)

1 0 0 | 1/20 1 1 | 1/2

0 0 1 | 1

R2R2+R3

1 0 0 | 1/20 1 0 | 1/2

0 0 1 | 1

.

We achieved what we wanted: the last augmented matrix has an identitymatrix on the left side. The original system has been transformed into theequivalent system

(2.23)

x = 1/2

y = 1/2

z = 1

;

this system has the obvious solution (x,y,z) = (1/2, 1/2, 1), and thatis the unique solution of the original system. Hence the linear system isconsistent and determined.

The three planes of equations x + y z = 1, 2x + 2y 3z = 3, andx y + z = 0 have a unique point in common, the point of coordinates(1/2, 1/2, 1).

2.3. Echelon forms

Are we always going to be able to transform the matrix into the iden-tity matrix by row operations? The answer is no, b ecause we know fromgeometry that it is possible for three planes in space to have no points incommon (imagine the top, bottom, and side of a box) or to have infinitelymany points in common (imagine the files of a book). Therefore glitchesmay occur in this row operation process. What could go wrong?

7/27/2019 main-20130724

19/74

2.3. ECHELON FORMS 19

Essentially there is only one thing that could go wrong: at some pointwe are unable to find a nonzero entry to be changed into a 1. Those entries

equal to 1 are designated as pivots. pivotsThere are several different reasons why we might not be able to find morepivots, and each situation has different implications on the linear system.

A first situation is when we run out of columns before rows, as in thecase of the linear system (1.13)

x1 2x2 = 1

2x1 + x2 = 4

x1 + 3x2 = 3

.

In this case we are able to transform the augmented matrix

(2.24)1 2 | 1

2 1 | 41 3 | 3

=1 0 | 9/5

0 1 | 2/50 0 | 0

The last equation (0 x1 + 0 x2 = 0) can be ignored because it doesntimpose any condition on the unknowns, and all values of x1 and x2 wouldsatisfy that third equation. The linear system has a unique solution, whichcan be read from the first two rows: x1 = 9/5, x2 = 2/5.

We also run out of columns before rows for the linear system (1.17)

x1 2x2 = 1

2x1 + x2 = 4

x1 + 3x2 = 4

.

In this case we are able to transform the augmented matrix

(2.25)

1 2 | 12 1 | 4

1 3 | 4

=

1 0 | 9/50 1 | 2/5

0 0 | 1

The last equation (0 x1 + 0 x2 = 1) does not have any solutions, hencethe linear system does not have solutions, so it is inconsistent.

For a system with m equations and n unknowns, we encounter thissituation when n < m (more equations than unknowns) and the augmentedmatrix is transformed into

(2.26) A | b = In | S |

0 | C

The matrix on the right is a slightly more complicated example of a blockmatrix, with four blocks: an n n block In, an n 1 block S, an (m n) nblock 0, and an (m n) 1 block C. The last n m rows have only zerosto the left of the divider; if the vector C is not the zero vector, the systemhas no solutions. If the vector C is zero, then S is the unique solution ofthe linear system.

7/27/2019 main-20130724

20/74


A second situation when we do not reach the identity matrix is when werun out of rows b efore running out of columns. Consider for example the

linear system

(2.27)

x y + z = 0

x + y z = 1;

the augmented matrix is transformed to

(2.28)

1 1 1 | 01 1 1 | 1

=

1 0 0 | 1/20 1 1 | 1/2

We found two pivots on the first two rows and the left two columns; there isno way to find more pivots, because we dont have any more rows. We treatthe variable corresponding to the column without a pivot as a parameterand solve for the variables corresponding to columns with pivots:

(2.29)

x y + z = 0

x + y z = 1=

x = 1/2

y z = 1/2=

x = 1/2

y = 1/2 + t

z = t

.

Therefore the solutions of the linear system have the form

(2.30)

xy

z

=

1/21/2 + t

t

=

1/21/2

0

+ t

01

1

.

The linear system is consistent and undetermined.For a linear system with m equations and n unknowns, we encounter

this second situation when n > m (more unknowns than equations) and theaugmented matrix is transformed to

(2.31)

A | b

=

Im P | S

.

We solve for the first m unknowns, using the remaining n m ones asparameters.

A third situation may occur when we cant find a pivot where we wantit even though we still have rows and columns available.

Consider for example the linear system

(2.32)

x + y z = 1

2x + 2y 3z = 3

x + y 2z = 2

.

We choose the first pivot on position (1, 1) and perform the row operationsto change the first column:

(2.33)

1 1 1 | 12 2 3 | 3

1 1 2 | 3

=

1 1 1 | 10 0 1 | 1

0 0 1 | 1

.

7/27/2019 main-20130724

21/74


At this point we would like to build a pivot on position (2, 2). Unfortunately,that entry is 0, and row swapping doesnt help, since all the eligible positions

on the second column (from the second row down) are also 0. The linearsystem now looks like

(2.34)

x + y z = 1

z = 1

z = 1

The first variable (x) occurs only in the first equation, hence we can solve forit from that equation once the other variables are determined. Our attemptto build a pivot on the second column was motivated by the goal of solvingfor the second variable (y) from one of the remaining equations, and that isclearly impossible, because y doesnt occur in the remaining equations. Butthe third variable(z) does occur, so we have to move over and try to solvefor z. In matrix terms, that translates into trying to build a pivot on thethird column:

(2.35)

1 1 1 | 10 0 1 | 1

0 0 1 | 1

=

1 1 0 | 00 0 1 | 1

0 0 0 | 0

.

Now were out of columns, so we need to stop. We obtained a copy ofthe identity matrix I2, but not in the top left corner. Instead, the copy ison the first two rows but on columns 1 and 3. The variable correspondingto a column without a pivot is treated as a parameter ( y = t) and we solve

for the variables x and z corresponding to the columns of the copy of theidentity matrix:

(2.36)

x + y = 0

z = 1=

x = y

z = 1=

x = t

y = t

z = 1

.

A similar situation occurs for the homogeneous linear system

(2.37)

2x + y z = 0

2x + 2y = 0

2x + 3y + z = 0

x + 2y + z = 0

,

corresponding to finding the intersection of four planes 2x + y z = 0,2x + 2y = 0, 2x + 3y + z = 0, and x + 2y + z = 0 in R3, or to determining allthe combinations of three vectors (2, 2, 2, 1), (1, 2, 3, 2), and (1, 0, 1, 1) in R4

that are equal to the zero vector. Row operations transform the augmented

7/27/2019 main-20130724

22/74


matrix

(2.38)

2 1 1 | 02 2 0 | 02 3 1 | 01 2 1 | 0

=

1 0 1 | 00 1 1 | 00 0 0 | 00 0 0 | 0

.

We built pivots on rows 1 and 2, but there is no way to continue with theother rows, because all the remaining eligible positions have entries zero.Even though we still have one more column and two more rows, there isnothing we can do at this point and we need to stop.

Lets look now a bit closer to all the final versions of the matrices of the

linear systems considered in the previous examples:

(2.39)

1 0 00 1 0

0 0 1

,

1 00 1

0 0

, 1 0 0

0 1 1

,

1 1 00 0 1

0 0 0

,

1 0 10 1 10 0 00 0 0

.

All these matrices have several common characteristics:

If a row has a nonzero entry, then the leftmost entry on that row isa 1. This entry is the pivot on that row and column. The columncorresponds to the variable that is solved for from the correspondingequation.

On a column that has a pivot all the other entries are zero. Thatmeans that the variables we solve for occur in exactly one equation- the equation used to solve for each variable.

The pivot on a row appears on a column to the right of the columnsof the pivots on rows above that row. That means that we solvefor variables in increasing order from left to right.

If the entries on a row are all zero, then the entries on all rows afterthat are all zero. That means that we list all the zero equations atthe bottom of the table.

There is another common characteristic: all entries are 0, 1 and 1.That is an unfortunate coincidence that needs to be fixed by finding betterexamples.

A matrix that satisfies all the conditions above is called a reduced rowechelon form (RREF). Every matrix A can be transformed into a reducedreduced row eche-

lon form row echelon form by row operations.Every matrix can be transformed into a reduced row echelon form by row

operations. The following algorithm, called the Gauss-Jordan eliminationGauss-Jordanelimination

7/27/2019 main-20130724

23/74


algorithm, can be used to transform a matrix into its reduced row echelonform, using row operations.

1 /* initialize current position */2 (i, j) (1, 1)3 /* as long as the current position is within the matrix */

4 while current position is valid do5 search for a nonzero entry on column j on or below row i

6 if successful then7 /* make the current (i, j) entry nonzero */

99 stop at first nonzero entry

10 if needed, swap rows to bring that entry on row i

11 /* make the current (i, j) entry 1 */

12 divide row i by the (i, j) entry

13

/* make all other entries on column j equal to 0 */1515 for each row k other than row i do16 /* make the (k, j) entry 0

17 subtract (k, j) entry times row i from row k

18 end

19 /* move down and to the right to the next row and column */

20 change the current position to (i + 1, j + 1)

21 else

22 /* move to the right to the next column */

23 change the current position to (i, j + 1)

24 end

25 end

The algorithm has the following valid modifications:

in line 9, one could look not for the first nonzero entry, but for theone that would make the computations in line 9 as nice as possible.That works when you apply the algorithm by hand and may reducethe rounding errors in computer implementation.

in the loop starting on line 15, one can make only the entries be-low (i, j) to be zero and leave the ones above unmodified. Thatwouldnt produce a reduced row echelon form, but one could stilluse the scattered identity block to solve for the variables of thepivots, starting with the last and back substituting. The corre-sponding final matrix is called a row echelon form. row echelon form

If a matrix A has a row echelon form with r pivots, then it has a reducedrow echelon form with r pivots. While a matrix A with more than onepivot has infinitely many REFs, generically denoted by ref(A), we will showthat every matrix A has a unique RREF, denoted by rref(A): whateverrow operations we perform and whatever the order, we end up with thesame RREF. The number of pivots in the rref(A) is called the rank of the rankmatrix A and is denoted by rank(A). The rank of a matrix cant exceed the

7/27/2019 main-20130724

24/74


number of rows or the number of columns, hence if A has size m n, thenrank(A) min(m, n).

2.4. Application: flow patterns

We apply Gauss-Jordan elimination to solve a more complex problem.The figure below

A B

CD

mA mB

mCmD

xCD

xAD

xAB

xBCxAC

shows a four node network, with connections along which traffic can flow.Imagine that they are water pipes. We will analyze network traffic assuminga conservation law the amount of stuff entering any node is the same asconservation lawthe amount leaving. 1

We need to organize and manipulate the data systematically.

First, we identify the decision variables. In other words, what is itdecision variables exactly that we have to find? This problem asks us to to find possibleaverage rates, so the decision variables are the values of the five variablesxAB, xAC, xBC, xCD , and xDA, where xPQ is the amount of traffic flowingfrom node P to node Q. If the network is a water supply system then theunits of the variables might b e gallons per hour. The arrows on the edgesindicate the flow direction we count as positive.

Then we identify the conditions these variables must satisfy. At node Athe total rate coming out must be the same as the total rate coming in, sowe must have

xAB + xAC = xDA + mA

or, equivalently,xAB + xAC xDA = mA.

1 To model a network of routers, with traffic along each link measured in megabits

per second, or street intersections, with traffic measured in cars per hour, we need a more

complicated structure, since traffic will be moving in both directions along an edge. If the

rates in both directions were the same the average transfer rate between the nodes would

be zero, but that wouldnt really mean no traffic.

7/27/2019 main-20130724

25/74

2.4. APPLICATION: FLOW PATTERNS 25

Considering the similar equations for the other nodes we obtain thefollowing augmented matrix:

(2.40)

xAB xAC xBC xCD xDA

node A 1 1 0 0 1 | mAnode B 1 0 1 0 0 | mBnode C 0 1 1 1 0 | mCnode D 0 0 0 1 1 | mD

Even before solving this linear system we can make some remarks on itssolutions, just by paying attention to the particularities of the network:

A global conservation law implies that the rate coming in the sys-tem should be the same with the rate going out. Hence

(2.41) mA + mD = mB + mC

is a necessary condition for the system to be consistent - if this con-dition is not satisfied, then the system does not have any solutions.

Flow along the sides of the triangle ACD is undetected by theoutside. Hence whatever solution (xAB , xAC, xBC, xCD , xDA) wehave, adding a multiple of (0, 1, 0, 1, 1) to it will also be a solution.Therefore, if the system has one solution, it has infinitely many.

We confirm these remarks and determine all the solutions using theGauss-Jordan elimination algorithm. We get the following equivalent aug-mented matrix

(2.42)

xAB xAC xBC xCD xDA

1 0 1 0 0 | mB

0 1 1 0 1 | mA mB0 0 0 1 1 | mA mB mC0 0 0 0 0 | mA + mD mB mC

The linear system is inconsistent unless

(2.43) mA + mD mB mC = 0 .

Therefore (2.43) is the only condition that must be satisfied for thesystem to have a solution; if (2.43) is satisfied, then the fourth equation canbe ignored, since it is just 0 = 0.

We have found three pivots, on the first three rows and the columns1, 2, and 4, corresponding to variables xAB, xAC, and xCD . The other

two variables will be treated as parameters xBC = s and xDA = t and thesolutions of the linear system are of the form

(2.44)

xABxACxBCxCDxDA

=

mB + smA mB s + t

smD + t

t

=

mBmA mB

0mD

0

+ s

11

100

+ t

01011

.

7/27/2019 main-20130724

26/74


The vector corresponding to t encodes the flow along the sides of thetriangle ACD ; the one corresponding to s is related to the triangle ABC.

Their sum, (1, 0, 1, 1, 1) corresponds to the quadrilateral ABCD. Whens = t = 0, we obtain one particular solution (mB , mA mB , 0, mD, 0) andany other solution is the sum between this particular solution and a linearcombination of the vectors (1, 1, 1, 0, 0) and (0, 1, 0, 1, 1).

2.5. Solution sets

In the flow example, we ended up with a reduced row echelon form withpivots on columns corresponding to the variables xAB, xAC, and xCD . Thecopy of the identity matrix is scattered because those variables were not thefirst listed. Going back to the original system and reordering the variablesso that xAB , xAC, and xCD are listed first does not change the nature ofthe problem or of the linear system. Then the original augmented matrixwould be

(2.45)

xAB xAC xCD xBC xDA

node A 1 1 0 0 1 | mAnode B 1 0 0 1 0 | mBnode C 0 1 1 1 0 | mCnode D 0 0 1 0 1 | mD

with the following reduced row echelon form:

(2.46)

xAB xAC xCD | xBC xDA

1 0 0 | 1 0 | mB0 1 0 | 1 1 | mA mB

0 0 1 | 0 1 | mD 0 0 0 | 0 0 | 0

.

The solutions of the linear system can now be written as

(2.47)

xABxACxCDxBCxDA

=

mB + smA mB s + t

mD + tst

=

mBmA mB

mD00

+ s

11

010

+ t

01101

,

or

(2.48)

xABxACxCDxBCxDA

=

mBmA mB

mD00

+

1 01 1

0 11 00 1

st

.

More general, by row operations and reordering of variables, the aug-mented matrix A = [A | b] of a linear system is reduced to an augmented

7/27/2019 main-20130724

27/74

2.5. SOLUTION SETS 27

matrix of the form

(2.49) basic variables free variables

essential equations Ir P | Sredundant equations 0mr,r 0mr,nr | C

.

This block matrix has six blocks: Ir (r r), P ((n r) r), S (r 1),two blocks 0 of given sizes, and a block C ((m r) 1).

The variables corresponding to columns with pivots are called basic vari-ables and the ones corresponding to columns without pivots are free vari- basic variablesables. The equations corresponding to the rows of zeros impose consistency free variablesconditions: if the vector C has any nonzero components, then the systemis inconsistent. An equivalent statement is known as the Kronecker-CapelliTheorem: A linear system is consistent if and only if rank(A) = rank(A).

If the vector C is the zero vector, then those equations are irrelevantand we ignore them, because they impose no real conditions. The equationscorresponding to rows with pivots are called essential equations. From each essential equatiosuch equation we solve for one basic variable, treating the free variables asparameters.

Let Z be the vector given by the free variables. Then any solution ofthe linear system is of the form

(2.50) X =

S0

+

PInr

Z = X0 + HZ .

When Z = 0 we get the particular solution X0; the difference X X0is a solution of the homogeneous linear system AX = 0, with the samematrix of coefficients. All solutions of the homogeneous system are linearcombinations of the columns of the (m r) n matrix H.

To summarize: Let AX = b be a linear system with m equations and nvariables and (2.49) the reduced row echelon form, with the basic variableswritten first.

If r < m and C = 0, then the system is inconsistent. We can notwrite the vector b as a linear combination of the columns of thematrix A.

If r = m or C = 0, then the system is consistent. The vector b canbe written as a linear combination of the columns of A.

If r < n then the system has infinitely many solutions; eachdepends on n r parameters, one for each free variable. Allsolutions can be obtained by giving arbitrary values to deter-

mined nr variables and then finding the remaining variablesthrough a specified recipe. There are infinitely many ways ofwriting b as a linear combination of the columns of A.

If r = n, then the system has a unique solution. The vector bcan be written in a unique way as a linear combination of thecolumns of A.

7/27/2019 main-20130724

28/74

7/27/2019 main-20130724

29/74

CHAPTER 3

Matrix Algebra

3.1. Product of matrices

3.1.1. The Wish List. The simplest case of a linear system

a11x1 + a12x2 + + a1mxm = b1a21x1 + a22x2 + + a2mxm = b2

.

.....

an1x1 + an2x2 + + anmxm = bn

,

is that of one equation (n = 1) and one variable (m = 1). In that situationwe have the equation ax = b, and you learned how to solve such an equationa long time ago:

If a = 0, then we divide both sides by a and get x = b/a. If a = 0, then:

If b = 0, then the equation has no solutions; Ifb = 0, then the equation has infinitely many solutions, since

every real number is a solution of the equation.

It would be nice to be able to write the linear system [A|b] as AX = b,

and to solve it using a similar method. Here the role of a is taken by thematrix A of the linear system, the variable x is replaced by a variable columnmatrix X, and the right hand side b is replaced by a column matrix b. Themain problems are that we do not know what it means to multiply a matrixand a column, and what it means to divide by a matrix.

To understand exactly what we need to define, lets go back to ax = b. Ifa = 0, then there exists a real number a1 such that a1a = 1. We multiplyto the left by a1 and use the associativity of real number multiplication toget

ax = b a1(ax) = a1b (a1a)x = a1b 1 x = a1b .

Since 1 x = x for every real number x, we conclude that x = a1b, which

can also be written as b/a.To generalize this method to a linear system AX = b, we need similar

operations for matrices:

a product of a matrix A and a column X; a matrix I such that IX = X; an associative product of matrices, (C, A) CA; an inverse of a matrix A A1, such that A1A = I.

29

7/27/2019 main-20130724

30/74

30 3. MATRIX ALGEBRA

3.1.2. Product of Matrix and Column. The starting point in ourprogram is defining a product of a matrix and a column (matrix), and we

have already done that in an earlier section (see (2.7)). If

A =

a11 a12 . . . a1ma21 a22 . . . a2m

......

......

an1 an2 . . . anm

, X =

x1x2...

xm

,

then

AX =

a11x1 + a12x2 + + a1mxma21x1 + a22x2 + + a2mxm

...

an1x1 + an2x2 + + anmxm

The product of a matrix A of size n-by-m and a column X of size p-by-1is defined only when m = p; that is, only when the number of columns ofthe matrix A is the same as the number of entries (rows) of the column X.If that is the case, then the result is a column (matrix) with n entries.

3.1.3. Identity Matrices. Our next task is the equivalent of 1. Thedefining property is that IX = X whenever the product makes sense. WhenX is an n-by1 matrix, a matrix I such that IX = X must be of size n-by-n,and that indicates that we are looking for infinitely many Is: one for eachsize of the column matrix X. Therefore, we are looking for square matrices

I of size n-by-n such that IX = X for all n-by-1 column matrices X.Let In be an n-by-n square matrix such that InX = X for every column

matrix with n rows. If In = (aij)i,j, then InX = X is equivalent to

a11 a12 . . . a1nna21 a22 . . . a2n

......

......

an1 an2 . . . ann

x1x2...

xn

=

x1x2...

xn

which in turn is equivalent to

(a11 1)x1 + a12x2 + a13x3 + + a1nxn = 0

a21x1 + (a22 1)x2 + a23x3 + + a2nxn = 0a31x1 + a32x2 + (a33 1)x3 + + a3nxn = 0

...an1x1 + an2x2 + an3x3 + + (ann 1)xn = 0

for all x1, x2, . . . , xn. That is possible if and only ifa11 = a22 = = ann = 1and all other entries are zero.

7/27/2019 main-20130724

31/74

3.1. PRODUCT OF MATRICES 31

What we just showed is that for every n, there exists a unique matrixIn such that InX = X for all X of size n-by-1, and

In = diag(1, 1, . . . , 1) =

1 0 . . . 00 1 . . . 0...

......

...0 0 . . . 1

.

This matrix In will be the equivalent of 1.

3.1.4. Product of Matrices. So far we have defined only the productof a matrix A and a column matrix X, such that the number of entries ofXis the same as the number of columns of A. We now proceed to extending

this operation to a product of two matrices, although some compatibilityconditions will need to be imposed.

The key property we want for our product is associativity: (CA)B =C(AB). In particular, for B = X, a column matrix, we want to have(CA)X = C(AX). Note that we have already defined the operations inC(AX): we know how to multiply A and X, and the result is another columnmatrix Y = AX, and we know how to multiply C and Y. Then CA shouldbe defined as the unique matrix F with the property that F X = C(AX).Note that we dont yet know whether such a matrix does in fact exist, nordo we know whether it is in fact unique.

First, there are some compatibility conditions: If C is a p-by-q matrixand A is an n-by-m matrix, then X must be an m-by-1 matrix for the

product AX to make sense. Then Y = AX is an n-by-1 matrix, and theproduct CY makes sense if and only if q = n. If that is the case, then CYis a p-by-1 matrix. Since F X = Y, that means that F = CA must be ap-by-m matrix. To summarize:

The product of a p-by-q and an n-by-m matrix is defined only whenq = n; that is, only when the number of columns of the first (left)matrix is the same as the number of rows of the second matrix.

If that is the case, then the result must be a p-by-m matrix; so(p-by-n)(n-by-m) p-by-m.

Note that the first condition is asymmetric, giving different roles to the firstand the second matrix in a product. If you conclude that order matters,then youre right.

To figure out exactly what the matrix F = CA should look like, weconsider a simple example. Let

C =

a bc d

, A =

1 2 34 5 6

, X =

x1x2

x3

7/27/2019 main-20130724

32/74


Then

C(AX) =a b

c d1 2 3

4 5 6 x1

x2x3

=a b

c d x

1+ 2x

2+ 3x

34x1 + 5x2 + 6x3

=

=

a(x1 + 2x2 + 3x3) + b(4x1 + 5x2 + 6x3)c(x1 + 2x2 + 3x3) + d(4x1 + 5x2 + 6x3)

=

=

(a + 4b)x1 + (2a + 5b)x2 + (3a + 6b)x3(c + 4d)x1 + (2c + 5d)x2 + (3c + 6d)x3

=

=

a + 4b 2a + 5b 3a + 6bc + 4d 2c + 5d 3c + 6d

x1x2x3

Therefore we should define the product CA as

(3.1) CA =

a bc d

1 2 34 5 6

=

a + 4b 2a + 5b 3a + 6bc + 4d 2c + 5d 3c + 6d

The computation above suggests the following equivalent ways of defin-

ing the product of two matrices of compatible sizes.Let A and B be matrices of sizes m-by-n and p-by-q, respectively. The

product AB is defined if and only if n = p (that is, the number or columnsof the first matrix is the same as the number of rows of the second matrix).If that is the case, then the product AB is the m-by-q matrix whose:

Columns are given by

(3.2) colj(AB) = A colj(B) ;

Rows are given by

(3.3) rowi(AB) = rowi(A)B

Entries are given by

(3.4) (AB)ij = rowi(A)colj(B) = ai1b1j + ai2b2j + + ainbnj

for all 1 i m, 1 j q.

Example. Let

A = 1 23 4 , B =

2 1 31 4 2 , C =

2 12 3

3 1

Then

AB =

1 23 4

2 1 3

1 4 2

=

=

1 2 + 2 (1) 1 (1) + 2 4 1 3 + 2 (2)3 2 + 4 (1) 3 (1) + 4 4 3 3 + 4 (2)

=

0 7 12 13 1

.

7/27/2019 main-20130724

33/74

3.1. PRODUCT OF MATRICES 33

Similarly

(AB)C = 11 2025 38

, BC = 3 24 11

, A(BC) = 11 2025 38

CB =

5 6 87 14 12

7 7 11

Note that BA doesnt make sense, because B has three columns and A hasonly two rows.

In the examples above we see that A(BC) = (AB)C. That is not acoincidence: matrix multiplication is indeed associative. We started by im-posing the condition (AB)X = A(BX), that is, we imposed the associativitycondition only when the last matrix was a column matrix. That conditionimplies that matrix multiplication is associative in general, and an easy wayto see that is to observe that, since

colj(AB) = Acolj(B) ,

we have

colj(AB)C = (AB)coljC = A(BcoljC) = Acolj(BC) = coljA(BC)

for all columns, hence (AB)C = A(BC).Also, we started with a matrix I such that IX = X for all n-by-1

matrices, and concluded that In = diag(1, 1, . . . , 1) is the only such matrix.But the identity matrix In has the more general property that

InA = A , BI n = B

for every matrix A that has n rows and every matrix B that has n columns.One has to be careful with matrix products, since several properties of

multiplication of real numbers are no longer valid for matrix products.

If a and b are real numbers, then the product ab is always defined. Thisis not true for matrices, since the product AB is defined only when thenumber of columns of A is the same as the number of rows of B.

Ifa and b are real numbers, then ab = ba. This is in general not true formatrices. Matrix multiplication is not commutative: in general AB = BA.That may happen for several reasons:

because only one of the operations makes sense (the case of A andB above);

both operations make sense but the results dont even have thesame size (case of B and C above); even when both AB and BA make sense and have the same size

(which automatically implies that A and B must be square of thesame size), as shown in the following example:

1 23 4

0 11 0

=

2 14 3

,

0 11 0

1 23 4

=

3 41 2

.

7/27/2019 main-20130724

34/74


Order matters, and that is a major difference of matrix multiplication com-pared to real number multiplication. Now, just because AB is in general

not the same as BA, it doesnt mean that it is impossible to get AB = BA.For example, if B = A then clearly AB = BA, and even when B = A2 westill have AB = A(AA) = (AA)A = BA. Similarly for higher powers.

If a and b are real numbers and ab = 0, then a = 0 or b = 0 (or both).As a consequence, ifa,b,c are real numbers and a = 0, then ab = ac impliesb = c. In other words, we can simplify by nonzero real numbers. Theseimplications are not true for matrices: it is possible that A = 0, B = 0, butAB = 0. We have already seen that when we found nonzero solutions tohomogeneous systems: AX = 0 but neither A = 0 nor X = 0. Similarly, itis possible that A = 0 and AB = AC but B = C. We have already seen thatwhen we found multiple solutions to the same linear system: AX = b = AY,but X = Y. While in general one cant simplify by a matrix, there are cases

when that is possible; more about that a bit later.We conclude this subsection with another property of matrix multipli-

cation. Consider, for example, matrices

A =

a bc d

and B =

1 2 34 5 6

.

Then

AB =

a bc d

1 2 34 5 6

=

a + 4b 2a + 5b 3a + 6bc + 4d 2c + 5d 3c + 6d

=

=

a 2a 3ac 2c 3c

+

4b 5b 6b4d 5d 6d

=

ac

1 2 3

+

bd

4 5 6

=

= (col1 A)(row1 B) + (col2 A)(row2 B).More general, ifA is an n-by-m matrix and B is an m-by-p matrix, then

(3.5) AB = (col1 A)(row1 B) + + (colm A)(rowm B) .

3.2. Inverse Matrix

The next step in our generalization program is the definition of an inversefor an n-by-m matrix A: to start, we need to find a matrix B such thatBA is an identity matrix. For that to be possible, B must be an m-by-n matrix, and then BA should be Im; in this case AB is a square (n, n)matrix. Suppose he have found such a matrix B. Then, returning to thelinear system AX = b, we can multiply both sides to the left by B and get

AX = b = B(AX) = Bb (BA)X = Bb X = Bb .

But if X = Bb, then AX = (AB)b, and that is not necessarily equal to b!It would be equal to b, however, if AB = In. This discussion motivates thefollowing definition.

Let A be an n-by-m matrix. A matrix B is

a left inverse of A if BA = Im;left inverse

7/27/2019 main-20130724

35/74

3.2. INVERSE MATRIX 35

a right inverse of A if AB = In; right inverse an inverse of A if it is both a left and a right inverse, that is, if inverse

AB = In and BA = Im.If A is a matrix, we say that A is invertible if there exists a matrix B invertible

such that AB = BA = I; in particular this implies that A must be a squarematrix. The matrix B is called the inverse matrix of A and denote it by inverse matrixB = A1. Note the symmetric role played by A and B; a consequence isthat B is also invertible and B1 = A. Note that an inverse ofA (left, right,or both) must be of size m-by-n.

Our search for an inverse of A starts with finding matrices B such thatAB = Im (that is, with finding a right inverse). If AB = Im, then thecolumns of AB are the same as the columns of Im. In particular,

col1(AB) = col1 In A col1(B) = e1 .

Hence the first column ofB must be a solution of the linear system AX = e1,with augmented matrix [A | e1]. Similarly, the second column ofB must be asolution of [A | e2], and, in general, the j

th column ofB must be a solution of[A | ej ] for all 1 j m. For each of these m linear systems we would per-form row operations to transform the left hand side into rref(A). We can dothat simultaneously, starting with an augmented matrix [A | e1 e2 . . . , em],that is, with [A | Im]. Performing row operations, we end up with an aug-mented matrix [rref(A) | b1 b2 . . . , bm]. The j

th column of B must then bea solution of the linear system [rref(A) | bj ], for all 1 j m; therefore,a right inverse exist if and only if all there linear systems are consistent.Because of the particular form of the initial right hand side (Im), each row

of the matrix [b1 b2 . . . bm] has a nonzero entry. Therefore, all systems areconsistent if and only if rref(A) does not have any zero rows, which is equiv-alent to rref(A) having one pivot on each row. That, in turn, is the sameas imposing the condition rank(A) = n = #rows, which means that n m,because the rank cant exceed the number of columns.

We summarize the above argument:

An n-by-m matrix A has a right inverse if and only if rank(A) = n; A right inverse for an n-by-m matrix A with rank(A) = n is ob-

tained from [A | Im] by performing row operations to change A toreduced row echelon form, [rref(A) | b1 b2 . . . , bm].

Any solution of the linear system [rref(A) | bj ] is a possible choice

for the jth column of a right inverse B.How about left inverses? The key observation is that if B is a left inverse

ofA, then A is a right inverse of B. So ifA has a left inverse B, then B hasa right inverse, which is equivalent to rank(B) = #rows of B = m. Thatimplies m n, since the rank of B cant exceed the number of columnsof B. Therefore, if A has both a left inverse and a right inverse, thenn m n. That is possible only when n = m, hence only for square

7/27/2019 main-20130724

36/74


matrices. Therefore, ifA is not a square matrix, then it cant have both leftand right inverses.

But ifA is a square matrix of size (n, n), it doesnt automatically implythat A has an inverse. Returning to our discussion on right inverses, weconcluded that A has a right inverse if and only if it has a pivot on each ofthe n rows. That means that it must have a pivot on each of the n columns.That is possible if and only if rref(A) = In, and then for every bj, the linearsystem [rref(A) | bj ] has a unique solution, bj itself.

To summarize: let A be a square (n, n) matrix.

A has a right inverse if and only if rref(A) = In; If rref(A) = In, then A has a unique right inverse, B, and B is the

matrix obtained from [A | In] [In |B].

Example. Let

A =1 1 12 3 2

3 8 2

To

determine whether A has a right inverse B, and determine the right inverse, B, if it exists,

we start with [A | I3] and row-reduce the left hand side:1 1 1 | 1 0 02 3 2 | 0 1 0

3 8 2 | 0 0 1

1 0 0 | 10 6 10 1 0 | 2 1 0

0 0 1 | 7 5 1

We conclude that

rref(A) = I3, hence A has a right inverse; the right inverse is

B =

10 6 12 1 0

7 5 1

In general, if rref(A) = In, then the right inverse B is obtained from Inby performing the row operations that transform A into rref(A) = In. Allthose row operations are reversible, and that means that rref(B) = In, andtherefore B has a right inverse, C. Hence AB = In and BC = In. If wemultiply AB = In to the right by C, we get

C = InC = (AB)C = A(BC) = AIn = A .Therefore C = A, so BA = In, which means that B is also a left inverse forA. To conclude, if A is a square (n, n) matrix, then

A has an inverse if and only if rref(A) = In; If rref(A) = In, then the inverse of A is unique; The inverse of A, denoted from now on by A1, is obtained from

[A | In] by row reducing the left hand side: [A | In] [In | A1].

7/27/2019 main-20130724

37/74

3.2. INVERSE MATRIX 37

What can go wrong? Essentially, only one thing: we get a row of zeroesin rref(A), meaning that rref(A) = In. For example, if

A =

1 23 6

and we try to find the inverse using the above method, then

1 2 | 1 03 6 | 0 1

1 2 | 1 00 0 | 3 1

,

hence rref(A) = I2, meaning that A does not have an inverse.For 2-by-2 matrices it is easy to determine whether they are invertible

or not, and to compute the inverse. Let

A =

a bc d

be an arbitrary 2-by-2 matrix. Ifa = c = 0, then there is no way to get a pivot on the first column,

hence the matrix A does not have an inverse in this case. If a = 0, then by row operations we get

a b | 1 0c d | 0 1

1 ba |

1a 0

0 adbca | ca 1

,

If ad bc = 0, then the second row of rref(A) has only zeroes,hence A does not have an inverse.

Ifad bc = 0, then we can continue the row reduction and get

a b | 1 0c d | 0 1

1 0 |d

adbc b

adbc

0 1 | c

adbcd

adbc ,

hence if ad bc = 0, thena bc d

1=

d

adbcb

adbcc

adbca

adbc

=

1

ad bc

d b

c a

.

If c = 0, then we end up with a similar conclusion and result.

For example, if

A =

1 23 4

,

then 1 4 2 3 = 2 = 0, hence A has an inverse, and

A

1

=

1

2 4 2

3 1

=2 1

32 12

.We conclude this subsection with an important property of multiplica-

tion of invertible matrices. Let A and B be invertible matrices of the samesize, n-by-n. Is their product, AB, an invertible matrix? And if it is, howcan one compute its inverse, (AB)1?

Suppose AB is invertible, and let C be its inverse. Then ABC = In,and multiplying to the left by A1 we get BC = A1. If we now multiply,

7/27/2019 main-20130724

38/74


again to the left, by B1, we get C = B1A1. So if AB is invertible, theonly candidate for (AB)1 is C = B1A1. Notice that, at this point, we

dont know yet whether AB is invertible or not.But

(AB)(B1A1) = A(BB1)A1 = AInA1 = AA1 = In ,

which means that B1A1 is the right inverse for AB. It is then also theleft inverse, and therefore the inverse (AB)1.

Proposition 1. If A and B are invertible matrices of the same size,then AB is also an invertible matrix, and

(AB)1 = B1A1 .

3.3. Linear Systems and Inverses

We now have all the pieces needed to generalize the simple method ofsolving a linear equation ax = b to a linear system AX = b.IfA has an inverse B = A1 (which means A must be a square matrix),

with A1A = In = AA1, then

AX = b = A1(AX) = A1b (A1A)X = A1b X = A1b .

Hence if X is a solution of AX = b, then X = A1b. We are not done yet,because we only showed that anything else cant be a solution, but not thatX = A1b is indeed a solution! However, that is quite simple: ifX = A1b,then AX = A(A1b) = (AA1)b = Inb = b, hence X is indeed the uniquesolution of the linear system AX = b. (Notice how we used the left inverseto find the solution and the right inverse to prove that what we found isindeed a solution.)

Example. We use the inverse matrix method to solve the linear system

x1 + x2 + x3 = 12x1 + 3x2 + 2x3 = 23x1 + 8x2 + 2x3 = 4

The matrix of the linear system AX = b is

A =

1 1 12 3 2

3 8 2

and we have seen in a previous section that A has an inverse and

A1 =10 6 12 1 0

7 5 1

.Then the linear system has a unique solution,

x1x2x3

= A1

12

4

=

10 6 12 1 0

7 5 1

12

4

=

20

1

.

7/27/2019 main-20130724

39/74

3.4. ELEMENTARY MATRICES 39

We conclude this subsection with an observation that will be very helpfullater: consider a homogeneous system AX = 0, with A a square (n, n)

matrix (hence as many equations as variables). If rref(A) = In, then A hasan inverse, hence the linear system has a unique solution, X = A10 = 0. Ifrref(A) = In, then rref(A) must have some rows of zeroes, and in turn thatimplies that it must have some columns without pivots. If that is the case,then the homogeneous system has infinitely many solutions. In particular,it has nonzero solution. To conclude: if A is a square (n, n) matrix, thenthe following are equivalent:

the homogeneous system AX = 0 has a nonzero solution; A does not have an inverse; rref(A) = In.

3.4. Elementary matrices

The row operations used to transform a matrix into a (reduced) rowechelon form involve only one or two rows at a time. We study them inmore detail starting with 2 2 matrices. Let

(3.6) A =

a bc d

be a matrix. The effect of the row operation R2 R2 + hR1 is

(3.7)

a bc d

R2R2+hR1

a b

c + ha d + hb

=

1 0h 1

a bc d

is the same as left multiplication by a certain matrix. Similarly

(3.8)a b

c d R1hR1

ha hb

c d

=h 0

0 1 a b

c d

and

(3.9)

a bc d

R1R2

c da b

=

0 11 0

a bc d

.

For h = 0, all these operations are reversible:

(3.10)

a bc d

R2R2+hR1

a b

c + ha d + hb

R2R2hR1

a bc d

;

for matrices,

(3.11) 1 0

h 1 1 0h 1 =

1 00 1 =

1 0h 1

1 0h 1 ,

and similarly for the other two operations.For higher order matrices, the row operations are still equivalent to

multiplication to the left by certain matrices, but the blocks above occur onthe rows involved, and the corresponding columns.

The row operation Ri Ri + kRj on an nby-m matrix A is equivalentto multiplication to the left by the matrix Tij(k) = In + kEij , where Eijis the n-by-n matrix with the ij-entry equal to 1 and all other entries 0.

7/27/2019 main-20130724

40/74


(The 2-by-2 matrix above is then T21(3) = I2 3E21. The other rowoperation is the swapping of two rows, and that, too, can be achieved by left

multiplication. For example, swapping the first and second rows is obtainedby multiplication to the left by

(3.12)

0 11 0

.

In general, swapping the ith and jth rows of an n-by-m matrix is achievedby left multiplication by the matrix Sij = In + Eij + Eji Eii Ejj .

Performing a succession of row operations is then equivalent to successivemultiplication, to the left, by matrices of the form Tij(k) and Sij.

For example, for 3 3 matrices, the row operations below correspond toleft multiplication by the following matrices:

a11 a12 a13a21 a22 a23

a31 a32 a33

R3R3+hR1

1 0 00 1 0

h 0 1

a11 a12 a13a21 a22 a23

a31 a32 a33

(3.13)

a11 a12 a13a21 a22 a23

a31 a32 a33

R1hR1

h 0 00 1 0

0 0 1

a11 a12 a13a21 a22 a23

a31 a32 a33

(3.14)

a11 a12 a13a21 a22 a23

a31 a32 a33

R1R3

0 0 10 1 0

1 0 0

a11 a12 a13a21 a22 a23

a31 a32 a33

(3.15)

The matrices that correspond to elementary row operations are squarematrices and are called elementary matrices. Since all row operations areelementary matri-

ces reversible, if E is an elementary matrix, then there exists an elementarymatrix F such that EF = F E = I.

Consider the row operations performed in (2.15). We started with

(3.16)

1 1 1 | 12 2 3 | 3

1 1 1 | 0

R2 R2 2R1R3 R3 R1

1 1 1 | 10 0 1 | 1

0 2 2 | 1

.

The order in which we perform the row operations is not important;from the matrix viewpoint, this means that the matrices that have the sameeffect as the row operations commute:

1 0 02 1 0

0 0 1

1 0 00 1 0

1 0 1

=

1 0 00 1 0

1 0 1

1 0 02 1 0

0 0 1

=

1 0 02 1 0

1 0 1

.

7/27/2019 main-20130724

41/74

3.4. ELEMENTARY MATRICES 41

The next steps correspond to left multiplications as follows:,

R2 R3 1 0 00 0 1

0 1 0

R2 1

2R2

1 0 00 1/2 0

0 0 1

R1 R1 R2

1 1 00 1 0

0 0 1

R3 R3 1 0 00 1 0

0 0 1

R2 R2 + R3

1 0 00 1 1

0 0 1

Therefore the reduced row echelon form of the original augmented matrix isobtained through successive left multiplications, or directly by left multi-plication by the product B below:

1 0 00 1 1

0 0 1

1 0 00 1 0

0 0 1

1 1 00 1 0

0 0 1

1 0 00 1/2 0

0 0 1

1 0 00 0 1

0 1 0

1 0 02 1 0

1 0 1

Notice the order in which the matrices above occur in the product! Whileperforming the above multiplications would provide a good exercise in ma-trix multiplication, we can save a lot of time with a simple remark: theresult of the product is the matrix obtained by performing the same rowoperations on the identity matrix I3:

1 0 00 1 00 0 1

R2 R2 2R1R3 R3 R1

1 0 02 1 01 0 1

R2R3

1 0 01 0 12 1 0

R21/2R2

1 0 01/2 0 1/2

2 1 0

R1R1R2

1/2 0 1/21/2 0 1/2

2 1 0

R3R3

1/2 0 1/21/2 0 1/2

2 1 0

R2R2+R3

1/2 0 1/25/2 1 1/2

2 1 0

7/27/2019 main-20130724

42/74


Therefore the product of the six matrices corresponding to the row op-erations is equal to

(3.17) B =

1/2 0 1/25/2 1 1/22 1 0

.Moreover, one can check that1/2 0 1/25/2 1 1/2

2 1 0

1 1 1 | 12 2 3 | 3

1 1 1 | 0

=

1 0 0 | 1/20 1 0 | 1/2

0 0 1 | 1

.

Wow! We solved the linear system AX = b simply by multiplying tothe left by the magic matrix B. The property that made that possible isBA = I3:

(3.18) AX = b = BAX = Bb I3X = Bb X = Bb .Note that the first arrow points to the right only; the opposite implica-

tion would be true if we could simplify to the left by B. In general that isnot possible, b ecause B is a matrix! But the matrix B is obtained through aproduct of matrices corresponding to reversible row operations, hence if wesuccessively reverse each operation, we end up with the matrix we startedwith. Let C be the matrix corresponding to this reversal. Then C is theproduct

1 0 02 1 01 0 1

1 0 00 0 10 1 0

1 0 00 2 00 0 1

1 1 00 1 00 0 1

1 0 00 1 00 0 1

1 0 00 1 10 0 1

and CB = I3. Then

BAX = Bb = CBAX= CBb I3AX = I3b AX = b ,

hence we can simplify to the left by B = A1.

3.5. Linear Operations on Matrices

There are two more operations on matrices motivated by operations ontransformations: addition and scaling.

Let T: Rn Rm, T(X) = AX and S: Rn Rm, S(X) = BX betwo linear transformations, with canonical matrices A and B of size m n.Then T + S: Rn Rm, (T + S)(X) = T(X) + S(X), is also a lineartransformation; its canonical matrix is by definition the sum of the matrices

A and B, denoted by A + B. Thencolj(A + B) = (A + B)ej = (T + S)(ej) = T(ej) + S(ej) = Aej + Bej

= colj(A) + colj(B) ,

hence the columns of A + B are the sums of the corresponding columns ofA and B. Therefore

(3.19) (A + B)ij = Aij + Bij ;

7/27/2019 main-20130724

43/74

3.5. LINEAR OPERATIONS ON MATRICES 43

the sum of matrices is done entry-wise. Notice that the sum is defined only sum of matriceswhen the matrices have the same size.

Scaling is similar: if T:Rn

Rm

, T(X) = AX is a linear transforma-tion and c a real number, then cT: Rn Rm, (cT)(X) = cT(X) is also alinear transformation; its canonical matrix is denoted by cA. Then

(3.20) colj(cA) = (cA)ej = (cT)(ej) = cT(ej) = c(Aej) = c colj(A) ,

hence

(3.21) (cA)ij = cAij ;

scaling of matrices is also entry-wise. scaling of matricLet Mm,n(R) be the set of m n matrices with all entries real num-

bers. On this set we defined two operations similar to the linear operationsdefined on vectors: addition and scaling. From these we can define linear

combinations of matrices of the same size, and a lot of properties and resultsbased solely on the linear structure ofRn will continue to be valid for thespace Mm,n(R).

At this p oint we have defined several operations on matrices: sum, scal-ing, and product. The scaling can be regarded as a product, since cA isthe product of the matrix diag(c , . . . , c) with diagonal entries equal to c andother entries equal to 0 and the matrix A:

(3.22) cA =

c 0 . . . 00 c . . . 0...

.... . .

...

0 0 . . . c

A .

Sum and scaling are related in a way similar to Rn; sum and productare related to the similar operations on R. For example, by definition,(A + B)X = AX+ BX for all column vectors X; more general,

(3.23) (A + B)C = AC+ BC

whenever the operations on matrices A, B, and C make sense. This is theequivalent of the distributivity law for real numbers: (a + b)c = ac + bc. distributivity law

Extra caution must be taken when dealing with products: for example,if a and b are real numbers, then a2 b2 = (a b)(a + b). A similar formulafor matrices is in general false. Why? First, for A2 and B2 to be defined, Aand B must be square matrices. Then, for A2 B2 to be defined, the squarematrices must have the same size. Even then, A2 B2 is not necessarilyequal to (AB)(A+B) = A2 +ABBA B2, because AB is not necessarilyequal to BA.

There are two real numbers that are very special for sums and products:0 and 1. For every real number a we have a +0 = 0+ a = a, a 1 = 1 a = a,and a 0 = 0 a = 0. Since matrix operations are defined only when certainsize compatibility conditions are met, the matrix equivalents of 0 and 1 are

7/27/2019 main-20130724

44/74


size-dependent. The m n matrix 0m,n, with all entries equal to 0, is thezero matrix of size m n. The m m matrixzero matrix

(3.24) Im = diag(1, 1, . . . , 1) =

1 0 . . . 00 1 . . . 0...

.... . .

...0 0 . . . 1

with all diagonal entries equal to 1 and all other entries equal to 0 isthe identity matrix of order m. The corresponding linear transformationidentity matrixT: Rm Rm is the identity transformation T(X) = X.identity transfor-

mation Then, if A is an m n matrix,

A + 0m,n = 0m,n + A = A,(3.25)

AIn = ImA = A,(3.26)

0p,mA = 0p,n , A0n,q = 0m,q .(3.27)If a,b,c,d,g,h,j,k are real numbers, then

(3.28)

a bc d

g hj k

=

ag + bj ah + bkcg + dj ch + dk

;

a similar formula holds when multiplying block matrices with blocks of com-block matricespatible sizes:

(3.29)

A BC D

G HJ K

=

AG + BJ AH + BKCG + DJ CH + DK

.

These properties are particularly helpful when multiplying matrices thathave lots of zeros. For example, if P is an r n matrix, X an r 1 column

matrix, and Y and m column matrix, then

(3.30)

Ir P

0m,r 0m,n

XY

=

X+ P Y

0m,1

.

Similarly, if A has size r p, B has size r q, C has size n p, and D hassize n q, then

(3.31)

Ir 0r,n

0m,r 0m,n

A BC D

=

A 0m,q

0m,p 0m,q

.

7/27/2019 main-20130724

45/74

CHAPTER 4

Fundamental Forms

4.1. Fundamental Forms

In a previous section we have seen that if a matrix A is invertible, thenits reduced row echelon form rref(A) is the identity, and by keeping track ofthe row operations we can find its inverse A1. What if the matrix A is notinvertible?

Example. Conside the matrix

A =

1 2 34 5 6

.

To find rref(A) we perform the following row opearations:

Replace R2 by R2 4R1:1 2 30 3 6

=

1 0

4 1

1 2 34 5 6

Divide R2 by -3; equivalently, multiply R2 by 13 :

1 2 30 1 2

=

1 00 13

1 2 30 3 6

=

1 00 13

1 0

4 1

1 2 34 5 6

Replace R1 by R1 2R2:

1 0 10 1 2

=

1 20 1

1 2 30 1 2

=

1 20 1

1 00 13

1 0

4 1

1 2 34 5 6

Therefore 1 0 10 1 2

=

53

23

43

13

1 2 34 5 6

.

Note that the matrices corresponding to row operations are invertible(as expected, because the row operations themselves are reversible), andtherefore the product is also invertible.

The computations above can be generalized to arbitrary sizes.

Proposition 2. Let A be an n-by-m matrix. There exists an invertiblen-by-n matrix U such that rref(A) = UA.

45

7/27/2019 main-20130724

46/74

46 4. FUNDAMENTAL FORMS

We can do even better, if we allow similar operations on columns, notjust on rows. For example, swapping two columns, or rescaling a column,

or adding to a column a multiple of another column. These operationscorrespond to multiplications to the right by matrices of the form Tij(k) orSij. For example, we can continue the computations in the example above:

Replace the third column, C3, by C3 + C1:1 0 00 1 2

=

1 0 10 1 2

1 0 10 1 00 0 1

Replace C3 by C3 2C2:

1 0 00 1 0 =

1 0 00 1 2

1 0 00 1 2

0 0 1

=

1 0 10 1 2

1 0 10 1 0

0 0 1

1 0 00 1 2

0 0 1

.

Therefore1 0 00 1 0

=

53

23

43

13

1 2 34 5 6

1 0 10 1 20 0 1

.

We do not insist too much now on the proof of this result because we willrevisit it later, in the context of linear transformations, when we will under-stand better its significance and the role the matrices U and V play. Butwe will discuss now an important application, related to data compression.

4.2. Rank of a Matrix

These computations can be easily generalized to arbitrary matrices. Forr min(m, n), let

(4.1) Im,n,r =

Ir 00 0

be the m-by-n matrix that has a copy of Ir in the top left corner and zeroeseverywhere else. The matrix Im,n,r is called the fundamental form of a rankfundamental formr matrix of size m-by-n.

And now a very important result.

Theorem. Let A be any m-by-n matrix. There exists a non-negative

integer r min(m, n) and invertible matrices U and V such that(4.2) A = UIn,m,rV ;

moreover, the value of r is unique.

Proof. The existence follows from the computations in the previoussection: we have showed that there is an invertible matrix U such thatU A = rref(A), and that by continuing with column operations, there exists

7/27/2019 main-20130724

47/74

4.2. RANK OF A MATRIX 47

an invertible matrix V such that UAV is of the form Im,n,r , where r is thenumber of pivots in a reduced row echelon form. Then

(4.3) A = U1Im,n,rV1

and the existence part is proven, because both U1 and V1 are invertible.The somehow trickier part is to show that r is unique: we would be in

real trouble if some matrix A had reduced row echelon forms with differentnumber of pivots, because for each such rref we would get different values forr. We will show that the value ofr is unique by showing that an assumptionthat it is not leads to some invalid conclusion.

Suppose that there is some matrix A of size mn, and invertible matricesU, U, V , V such that

(4.4) U1Im,n,r1V1 = A = U2Im,n,r2V2

for values r1 = r2; without loss of generality, suppose r1 > r2. Then

(4.5) (U12 U1)Im,n,r1 = Im,n,r2(v2V1

1 ,

hence there would exists invertible matrices U and V such that

(4.6) U Im,n,r1 = Im,n,r2V .

We divide U and V into blocks, with the top left one of dimension r1 r1.Then the equation

(4.7) U11 U12U21 U22

Ir1 00 0

= Ir1,r1,r2 00 0

V11 V12V21 V22

implies U21 = 0 and

(4.8) U11 =

Ir2 00 0

V11 .

In particular, the last r1 r2 rows ofU11 contain only zero entries. It followsfrom these facts that on the first r1 columns ofU we can get at most r2 < r1pivots. Therefore the reduced row echelon form rref(U) contains columnswithout pivots, and that contradicts the fact that U is invertible.

Therefore the number r is unique.

The uniqueness of the number r allows us to define it as the rank of the rankmatrix A. For example, since

(4.9) A =

1 2 34 5 6

=

1 2

4 5

1 0 00 1 0

1 0 10 1 20 0 1

the rank of the matrix A is 2.

7/27/2019 main-20130724

48/74

48 4. FUNDAMENTAL FORMS

4.3. Data Compression

Let A be an n-by-m matrix of rank k. A priori, to encode the matrix A

we need to store nm values, namely the entries ofA. Using the fundamentalform, we can write A as

A = U In,m,kV = col1 Urow1(In,m,kV) + + coln Urown(In,m,kV) =

= col1 Urow1 V + + colk Urowk V ,

because all the other rows are zero. For each column of U we have n values,and for each row of V we have m values. Hence we need m + n values foreach term of the sum, for a total ofk(m + n) total values. When k is close tomin(m, n), thats usually more than nm values, but when k

7/27/2019 main-20130724

49/74

CHAPTER 5

Linear Subspaces

5.1. Column Space and Null Space

To solve the linear system

a11x1 + a12x2 + + a1mxm = b1a21x1 + a22x2 + + a2mxm = b2

.

.....

an1x1 + an2x2 + + anmxm = bn

(in matrix form, AX = b) means to find values for the variables x1, . . . , xmsuch that

x1

a11a21

...an1

+ + xm

a1ma2m

...anm

=

b1b2...

bn

,

hence to write b as a linear combination of the columns of A. Thereforethe linear system AX = b is consistent if and only if the vector b can bewritten as a linear combination of the columns of A, and the solutions ofthe linear system correspond to all the different ways to write b as a linearcombination of the columns of A.

Definition 5.1. Let U = {u1, u2, . . . , um} be a set of vectors in Rn.

The span of the set U is the subset of vectors inRn that can be written as alinear combination of u1, u2, . . . , um. The span of U is denoted by span(U)or span(u1, . . . , um).

More explicitly,

span(u1, . . . , um) = {x1u1 + + xmum | for x1, . . . , xm in R} .

Definition 5.2. The column space of a matrix is the span of its column

vectors. The row space of a matrix is the span of its row vectors.The column space of A is denoted by ColSp(A) and the row space by

RowSp(A). IfA is an n-by-m matrix, then the column space ColSp(A) is asubset ofRn, and the row space RowSp(A) is a subset ofRm. If AT is thetranspose ofA, then ColSp(A) = RowSp(AT) and RowSp(A) = ColSp(AT).

With this terminology, the linear system AX = b is compatible if andonly if the vector b is in the column space of A.

49

7/27/2019 main-20130724

50/74

50 5. LINEAR SUBSPACES

Example 5.1. Let

u1 =

1

234

, u2 = 1

10

2

, and u3 = 2

132

be three vectors in R4. The vectors

0000

= 0 u1 + 0 u2 + 0 u3 and

156

10

= 2u1 + (1) u2 + 0 u3

are both in span(u1, u2, u3), because they can be written as linear combina-tions ofu1, u2, and u3. The span of{u1, u2, u3} is the subset ofR

4 consisting

of all vectors that can be written like that.To determine whether a vector

b =

b1b2b3b4

is in the subset spanned by u1, u2, and u3, or, equivalently, in the columnspace of the matrix

A =

1 1 22 1 1

3 0 34 2 2

,

we need to determine whether there exist some choices of x1, x2, and x3for which b = x1u1 + x2u2 + x3u3. In other words, we need to determinewhether the linear system AX = b is consistent or not.

We do that using Gaussian elimination, but we dont have to go all theway. We can stop at

1 1 2 | b12 1 1 | b23 0 3 | b34 2 2 | b4

=

1 1 2 | b10 3 3 | b2 2b10 0 0 | b3 b2 b10 0 0 | b4 2b2

because at this point we can already see what conditions the vector b shouldsatisfy for the system AX = b to be consistent:

b1 b2 + b3 = 02b2 + b4 = 0

1 1 1 00 2 0 1

b1b2b3b4

=

00

.

7/27/2019 main-20130724

51/74

5.1. COLUMN SPACE AND NULL SPACE 51

Therefore b is in ColSp(A) if and only ifb is a solution of the homogeneoussystem CY = 0, where

C =

1 1 1 00 2 0 1

.

We can solve this homogeneous system using a variant of Gauss-Jordanelimination: we already have a copy of I2 inside C, on the third and fourthcolumns. If we use that copy to solve for b3 and b4, then we regard b1 = sand b2 = t as parameters and get the solutions as

b1b2b3b4

= s

1010

+ t

0112

for some choice of parameters s and t.In particular, if

w =

2143

, v1 =

1010

and v2 =

0112

,

then w is not in ColSp(A) (because 2 + 1 = 4), but both v1 and v2 arein ColSp(A). Moreover, a vector b is in ColSp(A) if and only if it can bewritten as a linear combination of v1 and v2. This shows that

span(u1, u2, u3) = span(v1, v2)

and

ColSp

1 1 22 1 13 0 34 2 2

= ColSp

1 00 11 10 2

.The example we have just finished contains many other ideas that well

develop further in this chapter.

Definition 5.3. The null space of a matrix A

main-20130724

Documents