Math 121A — Linear Algebra - ndonalds/math121a/notes.pdf · Math 121A — Linear Algebra Neil Donaldson Spring 2016 Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence Spence,

Math 121A Linear Algebra

Neil Donaldson

Spring 2016

Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence Spence, 4th Ed 2003, Prentice Hall.

1 Vector Spaces

1.1 Introduction

What is Linear Algebra?

Linearity is one of the most important properties in mathematics. A function is said to be linearif it preserves addition and scalar multiplication. More precisely, a function L : V W betweenvector spaces V and W is linear if, for all vectors v1, v2 V and all scalars , we have the followingproperties:

(a) L(v1 + v2) = L(v1) + L(v2)

(b) L(v1) = L(v1)

Linear algebra is simply the study of linear functions. You have already spent much of your mathe-matical career studying linear functions. For example:

If V = Rn and W = Rm, and L is multiplication by a real nm matrix.

If L = ddx is the usual differential operator, and V is a vector space of differentiable functions.More generally, L could be a linear differential operator such as L = d

2

dx2 + 2xd

dx + x2 + 1 whence

L(y) = y + 2xy + (x2 + 1)y

The standard methods for solving linear differential equations such as L(y) = 0 are based onlinear algebra.

If V is a vector space of integrable functions we could similarly define L( f ) = x

a f (t)dt.

If youve studied group theory, the second part of the formula says that L : (V,+) (W,+) isa homomorphism of Abelian groups.

In mathematics the word linear often indicates that a problem or structure is easy to deal with. Linearsystems may be analyzed systematically using standard techniques. A non-linear system, by contrast,is likely to be much more difficult to attack: if one can solve a non-linear problem, it is often due tosome one-off piece of trickery or luck.

1

What makes linear problems easy? The essence of why linear problems are easier is that one canuse simple solutions as building blocks to construct more complex solutions. For example, the factthat integration is linear is what allows us to compute integrals of polynomials using only the powerlaw:

x2 + 5x3 dx =

x2 dx + 5

x3 dx (linearity)

=13

x3 +54

x4 + c (power law)

To reiterate, linearity says that we only need to know how to integrate powers

xn dx = 1n+1 xn+1 in

order to be able to integrate all polynomials.

Here is a trickier example: consider the linear function

L : R2 R2

which rotates a point 30 clockwise around the origin. You should believe, although it is a trickyexercise at the moment to prove it, that L is indeed linear. To discover a formula for L it is enough soconsider what L does to the standard basis of R2, namely the vectors

i =(

10

)j =

(01

)This is because if v = ( xy ) is any vector, then, by linearity

v = xi + yj = L(v) = xL(i) + yL(j)

Using the picture and a little trigonometry, it should be ob-vious that

L(i) =(

cos 30 sin 30

)=

( 3

2 12

)L(j) =

(sin 30cos 30

)=

(123

2

)

We therefore obtain

L(

xy

)=

( 3

2 x +12 y

12 x +

32 y

)=

( 3

212

12

32

)(xy

)

1y

1x

L(i)

L(j)

30

30

i

j

In the above example, we only needed to know what the function L did to the basis vectors i and jin order to completely determine the function. This is not a property shared by non-linear functions.For example, if |v| is the length of v R2, then the function

f : R2 R2 : v 7 (|v|2 + 1)v

is non-linear. Simply being told that f (i) = 2i and f (j) = 2j is insufficient for you to to completelyunderstand the function.

2

A Review of R2 and R3

You have already spent some time thinking about the simple realvector spaces R2 and R3, where vectors may be thought of as ar-rows joining two points. In the picture we take the vector v to jointhe origin and the point P = (x, y). Writing i, j for the standardbasis vectors, we have several notations for v:

v =OP =

(xy

)= xi + yj

The column vector notation is used to distinguish the vector ( xy ) fromthe point (x, y). The vector space R2 is simply the set of all such vec-tors.

In three dimensions we have a similar idea, except that a point nowhas three co-ordinates and we need the three standard basis vectorsi, j, k.

Scalar multiplication involves lengthening a vector by a real multiple:thus the vector tv has components tx and ty and we may write

tv =OQ =

(txty

)= txi + tyj

1

1i

j

v

x

tyy

O

QP

tv

v

x

y

tx

ty

O

P

Q

Vector addition is defined by the parallelogram law. Al-

gebraically, if v1 =(

x1y1

)and v2 =

(x2y2

), then

v1 + v2 =(

x1 + x2y1 + y2

)= (x1 + x2)i + (y1 + y2)j

v

w

v + w

x1

y1

x2

y2

x1 + x2

y1 + y2

An important subtlety of vector spaces is that there is no need for the vector v to have its tail at theorigin: direction and magnitude are all that matters.1 This approach allows us to view the oppositeedges of the above parallelogram as being the same vector. Vector addition then has the intuitivenose-to tail interpretation.

u

vw

u + v + w1To labor the point, a directed line segment joining two points may be described as the ordered pair of those points(

(a, b), (c, d)). Two such segments are equivalent vectors if and only if they have the same length and direction. Specifically,

if we define on the set of pairs of points in the plane by

((a, b), (c, d)

)((p, q), (r, s)

)

{c a = r pd b = s q

then is an equivalence relation. A vector is nothing more than an equivalence class of directed line segments under .

3

1.2 Vector Spaces

Vector spaces are the universes of linear algebra. In general, a vector space is a set with two operations(addition and scalar multiplication) which behave similarly to the intuitive structure of R2. What dowe mean by this? there are certain identities which are obvious in R2, such as commutativity:

v + w = w + v

You can probably think of several more. The axioms of a vector space are simply that all these obviousidentities hold. Precisely, we have the following definition.

Definition 1.1. A vector space (or linear space) V over a field F consists of a set V together with twooperations

Vector Addition If v and w are elements of V then we can form the sum v + w.

Scalar Multiplication If v V and F then we can form the product v.

Together, these sets and operations satisfy the following axioms:

G1: Closure under addition v, w V, v + w VG2: Associativity of addition u, v, w V, (u + v) + w = u + (v + w)G3: Identity for addition 0 V, such that v V, v + 0 = vG4: Inverse for addition v V, v V, such that v + (v) = 0G5: Commutativity of addition v, w V, v + w = w + vA1: Closure under scalar multiplication v V, F, v VA2: Identity for scalar multiplication v V, 1v = vA3: Action of scalar multiplication , F, v V, (v) = ()vD1: Distributivity I v, w V, F, (v + w) = v + wD2: Distributivity II v V, , F, ( + )v = v + v

Elements of V are called vectors, while elements of F are scalars.

For those who have studied groups, the first five axioms say that (V,+) is an Abelian group, whilethe next three say that the field F has a left action on V. The distributivity axioms tell us how the twooperations interact.

Fields A field F is a set which behaves very like the real numbers under addition and multiplica-tion. Indeed, in almost all concrete examples of vector spaces that you will encounter, F will be eitherthe real numbers R or the complex numbers C. The symbols 0 and 1 (as seen in the seventh axiom)will always refer to the additive and multiplicative identities in the field. Be careful to distinguishthe scalar 0 F from the zero vector 0 V.

4

Inverses and subtraction Subtraction of vectorscan be viewed as a binary operation. It is taken tomean addition of the inverse, namely

vw := v + (w)

In the vector space R2 this can be viewed pictorially.

v

w

w

v w

Pictures and Intuition Strictly speaking, the pictorial arrow interpretation is only valid in thevector spaces R2 and R3. This doesnt diminish the use of pictures in other spaces as a guide to yourintuition. Any result will still have to be proved using the abstract definition of a vector space and/orthe specific properties of your example, but the intuition obtained from drawing a picture can stillbe helpful for both your and your readers understanding. This is similar to how Venn diagrams areuseful, but do not constitute a proof when considering sets.

Important Examples

n-tuples If F is any field, then the set Fn of n-tuples forms a vector space over F. That is

Fn =

a1a2...

an

: a1, . . . , an F

where addition and scalar multiplication are defined bya1a2...

an

+

b1b2...

bn

:=

a1 + b1a2 + b2

...an + bn

and

a1a2...

an

:=

a1a2

...an

This is precisely the column vector notation we are used to in R2 and R3. We refer to the valuesa1, . . . , an as the entries or components of a vector. It is tedious to do so, but each of the axioms of avector space can be checked individually. For example, axiom D2 may be proved as follows:

( + )

a1a2...

an

=( + )a1( + )a2

...( + )an

=

a1 + a1a2 + a2

...an + an

=

a1a2

...an

+

a1a2

...an

=

a1a2...

an

+

a1a2...

an

The red equalities are from the definition of addition and scalar multiplication, while the blue equal-ity holds because of the distributivity laws in the field F.

m n Matrices If F is any field, then the set Mmn(F) of m n matrices with entries in F formsa vector space. Vector addition in Mmn(F) is the usual matrix addition, and scalar multiplicationsimply multiplies all entries of a matrix by the same constant. Indeed, by stacking the columns ofa matrix, it should be clear that there is essentially no different between the vector spaces Mmn(F)and Fmn.

5

Sets of functions Suppose that D is a set and F is a field. Then the set

F (D, F) = { f : D F}

of functions with domain D and codomain F forms a vector space over F. Vector addition and scalarmultiplication are defined as follows:

Addition If f , g F (D, F), then f + g is the function defined by

x D, ( f + g)(x) = f (x) + g(x)

Scalar multiplication If f F (D, F) and F, then f is the function defined by

x D, ( f )(x) = ( f (x))

It is important to note that f + g and f are vectors (i.e., functions). By contrast f (x) is a scalar (anelement of the field F). It is a common mistake2 to refer to the function f (x).We can restrict to certain types of functions, for example continuous functions, differentiable functions,polynomials, sums of trigonometric functions, etc., provided that these sets are closed under addition.We will think about this more in the next section.Sequences form an important class of vector spaces. These can be viewed simply as functions whosedomain is the set of natural numbers.

Basic Theorems for Vector Spaces

Just as in group theory there are certain basic facts about vector spaces that you will use withoutthinking. Strictly speaking, however, if these facts are not axioms, then they need to be proved.

Lemma 1.2. 1. Cancellation law: x + z = y + z = x + y

2. Uniqueness of Identity: The zero vector 0 posited in axiom G3 is unique.

3. Uniqueness of Inverse: Given v V, the vector v posited in axiom G4 is unique.

4. Action of additive identity in F: v V, we have 0v = 0.

5. Action of negatives: v V, F, we have ()v = (v).

6. Action on zero vector: F, we have 0 = 0.

Most of these are left as exercises: they are easiest if proved in order. For an example argument,consider number 4. Since 0 = 0 + 0 in any field F, we apply axioms D2, G3, G5 and the cancellationlaw to see that

0v = (0 + 0)v = 0v + 0v (Distributivity D2)= 0 + 0v = 0v + 0v (Identity G3 and Commutativity G5)= 0 = 0v (Cancellation Law)

2Endemic among calculus students. . .

6

1.3 Subspaces

As in other areas of algebra,3 the prefix sub means that an object is a subset, while simultaneouslyretaining the algebraic structure of the original set.

Definition 1.3. Let V be a vector space over F. A subset W V is a subspace of V if it is also a vectorspace over F with respect to the same addition and scalar multiplication operations as V.A subspace W V is proper if it is a proper subset (i.e., W 6= V).The trivial subspace of V is the point set {0}.As a shorthand, we write W V for a subspace to distinguish from W merely being a subset.

All of the axioms of a vector space except G1, G3, G4 and A1 hold for any subset of V, so it is sufficientto check these. In fact, we need only check the two closure axioms, as the next result shows.

Theorem 1.4. Suppose that W is a non-empty subset of a vector space V over F. Then W is a subspace if andonly if the following two properties hold:

S1: Closed under addition: w1, w2 W, we have w1 + w2 W.

S2: Closed under scalar multiplication: w,W, F, we have w W.Proof. If W is a subspace of V, then S1 and S2 are simply the axioms G1 and A1, whence the aboveproperties hold.Conversely, suppose that the properties S1 and S2 hold. We therefore have that all of the axioms of avector space except G3 and G4 hold for W. It remains to prove that these are also satisfied.Since W is non-empty, we may assume that w W. By Lemma 1.2, part 4., and property S2, we seethat

0 = 0w W

Thus axiom G3 is satisfied.Now let w V be the additive inverse (in V) of the vector w W. We need to see that w W.But this is immediate since

w = (1)w W

by Lemma 1.2, part 5., and property S2.

Note: one could also state the Theorem by additionally requiring that 0 W. This removes the needto assume that W is non-empty.

Examples

1. If n m then we may view consider a subspace W of Rm to be the set of all vectors of the form

w =

w1...wm

where i > n = wi = 03Cf. subgroup, subring, subfield, etc.

7

In essence, W looks like the set of column vectors of the form

w =(

x0

)where x Rn and 0 Rmn

Some writers will use the notation Rn to mean any vector space over R which looks like4 thespace of column-vectors of length n. In this language we can therefore write

n m = Rn Rm

The challenge this more generaly idea of Rn is that there are now many ways in which Rn couldbe viewed as a subspace of Rm.

2. Let I R be an open interval. We have seen that the vector space V = I , R is a vector spaceover R. The subset

C(I, R) = { f V : f is continuous}

is a subspace of V. We simply need to check the Theorem.

S1: If f , g : I R are continuous, then f + g : I R is continuous.S2: If f : I R is continuous and R, then f : I R is continuous.

The proofs you give for these facts depend on using the definition of continuity.5

3. The vector space C1(I, R) of functions f : I R which are differentiable and with continuousderivative similarly forms a subspace of C(I, R). This can be extended naturally to the spacesCm(I, R) and even C(I, R). All power series which converge on I are infinitely differentiable,and thus are elements of the vector space C(I, R).

4. The space of degree n real polynomials Pn(R) is a subspace of all Cm(R, R). Similarly thevector space of all polynomials P(R).

5. The trace of an n n matrix is the function tr : Mn(F) F defined by

tr A =n

i=1

aii = a11 + a22 + + ann

That is, we sum the terms on the main diagonal. The subset of trace-free matrices is denoted6

sln(F) = {A Mn(F) : tr A = 0}

It is easy to check that sln(F) Mn(F).4The correct term is isomorphic to, as we will see later.5For example, using the fact that f is continuous at x = a if lim

xa f (x) = f (a), you need to observe that

limxa[ f (x) + g(x)] = f (a) + g(a) = ( f + g)(a)

6If youve taken group theory, the notation sln should remind you of the special linear group SLn. This is no accident:since det eA = etr A, the relationship is that A sln = exp(A) SLn.

8

Intersections and Direct Sums

Since vector spaces are sets, we may take unions and intersections of them. Unfortunately only oneof these is a vector space. . .

Theorem 1.5. If V and W are both subspaces of some larger vector space U , then their intersection V W isa subspace of both V and W.

Proof. Since V and W are both subspaces of U , they both contain 0 and so V W is non-empty.Now suppose that x, y V W and F. Then, since V and W are both vector spaces, they areclosed under addition and scalar multiplication, whence we have

x + y V, x + y W, x V, x W

But then x + y V W and x V W, whence properties S1 and S2 hold. V W is therefore asubspace of both V and W.

Example Suppose that

V = {xi + zk : x, z R}W = {yj + zk : y, z R}

are the xz- and yz-planes respectively. Both V and W areclearly subspaces of R3. Their intersection is the subspace

V W = {zk : z R}

otherwise known as the z-axis.

If we try to do the same thing for unions we hit a problem.Think of the easy counterexample. Let V = {xi : x R} andW = {yj : y R} be the x- and y-axes, viewed as subspacesof R2. Their intersection is the trivial subspace V W = {0}.However, their union

V W = {xi, yj : x, y R}

is not a subspace of R2. It is nothing more than the positionvectors of all the points on both axes. In particular, V W isnot closed under addition:

i V and j W but i + j 6 V W

W

Vi

ji + j

Instead we search for the smallest vector space which contains the union of V and W.

Definition 1.6. Suppose that V and W are subspaces of U with trivial intersection (V W = {0}).The direct sum7 of V and W is the set

V W = {v + w : v V, w W}7If we remove the requirement that V W be trivial, the set V + W := {v + w : v V, w W} is called the sum of V

and W. We only use the circled-plus -symbol when V W = {0}.

9

Examples

1. If V = {xi : x R} and W = {yj : y R} are both subspaces of R2, then V W = R2.2. More generally, suppose that V = {tv : t R} and W ={sw : s R} are distinct, proper, non-trivial subspaces ofR2. If we let w be any vector perpendicular to w, thenwe observe, for any x R2, that

x tv W (x tv) w = 0

t = x w

v w

where is the usual dot product of vectors. Since v 6W it is immediate that v w 6= 0, whence t is properlydefined. If we now choose s so that sw = x tv, it followsthat

x = tv + sw

is the unique decomposition of x in terms of V and W. Inparticular, this shows that V W = R2.

V

tv

vW

sw = x tvw

x

w

O

The following properties of direct sums are straightforward to prove from the definition. Try it!

Theorem 1.7. 1. V W is a subspace of U .2. V and W are subspaces of V W.3. If X is a subspace of U with the property that both V and W are subspaces of X, then VW is a subspace

of X.

4. If x V W, then there exist unique v V and w W such that x = v + w.The third property essentially says that V W is the smallest vector space containing both V andW as subspaces.

Advanced: a more general notion of direct sum The final property allows us to make an alternativedefinition of direct sum, one which will look more familiar if you have studied group theory.

Definition 1.8. Suppose that V and W are any vector spaces over the same field F. Their direct sumis the vector space

V W := {(v, w) : v V, w W}of ordered pairs, where addition and scalar multiplication are defined by

(v1, w1) + (v2, w2) := (v1 + v2, w1 + w2)(v, w) := (v, w)

In this definition, the vector space V is in bijective correspondence with the subspace

V := {(v, 0W) : v V} V WV W in the original definition is the same as V W under the new.

10

1.4 Linear Combinations

Definition 1.9. Suppose that V is a vector space over F and that {v1, . . . , vn} is a non-empty collectionof vectors in V. A linear combination of these vectors is any vector of the form

a1v1 + + anvn ()

where a1, . . . , an F are the coefficients of the linear combination.More generally, if S is a non-empty subset of V, then a linear combination of vectors is S is any expres-sion of the form () where all8 v1, . . . , vn S.The span of S is the subset of all linear combinations of vectors in S:

Span(S) = {a1v1 + + anvn : n N, a1, . . . , an F, v1, . . . , vn S}It is a convention that Span() is the trivial subspace {0}.

Examples

1. Let S = {i, k} R3. The span of S is the set of all linear combinations of the vectors i and k:this is simply the xz-plane

Span(S) = {ai + bk : a, b R}

2. Let S = {v, w} R3 where

v =

121

w =11

2

Then

Span(S) =

a 121

+ b11

2

: a, b R

These vectors comprise the plane through the originspanned by v and w: hence the use of the word span.

These examples should immediately suggest the following result to you:

Theorem 1.10. If S is a subset of a vector space V, then Span(S) is a subspace of V.

Proof. According to Theorem 1.4 we need only show that Span(S) is closed under addition and scalarmultiplication. This is tedious to write out, but comes straight from the definition of span.Let v, w Span(S) and F. It follows that there exist vectors

v1, . . . , vn, w1, . . . , wm S

and scalars

a1, . . . , an, b1, . . . , bm F8Note: a linear combination must contain only finitely many terms.

11

such that

v = a1v1 + + anvn, w = b1w1 + + bmwm

But then

v + w = a1v1 + + anvn + b1w1 + + bmwm Span(S)

and

v = a1v1 + + anvn Span(S)

Now think about why the following is an obvious corollary of our proof.

Corollary 1.11. If W is a subspace of V which contains all elements of a subset S of V, then Span(S) is asubspace of W.

Generating sets

One of the primary purposes of considering spans of subsets of a vector space is to answer the fol-lowing question:

What are the smallest subsets S V such that Span(S) = V?

Such subsets S will be known as bases of V. Before we get there, we will need to think, in the nextsection, about linear independence. In the mean time we can give a preliminary name to subsetswhich span V:

Definition 1.12. Suppose that S is a subset of V such that Span(S) = V. We say that S generates V.

Examples

1. S = {i, j} generates R2.

2. S = {1 + x + x2, x x2, 2 + 3x2, 4x} generates the vector space P2(R) of polynomials over Rof degree 2.This second example may appear a little tricky, but it can be approached using a familiar matrixmethod from elementary linear algebra. Recall that

P2(R) = {a + bx + cx2 : a, b, c R}

Certainly S P2(R). If we are to see that Span(S) = P2(R) we need to see that any polynomiala+ bx + cx2 can be written as a linear combination of the elements of S. That is, we are requiredto solve the following problem: given any a, b, c R, find coefficients p, q, r, s R such that

a + bx + cx2 = p(1 + x + x2) + q(x x2) + r(2 + 3x2) + s(4x) ()= (p + 2r) + (p + q + 4s)x + (p q + 3r)x2

12

Since two polynomials are equal if and only if their coefficients are equal, the problem becomes:given a, b, c R find p, q, r, s R such that

a = p + 2rb = p + q + 4sc = p q + 3r

Otherwise said, we are looking for a solution to the underdetermined matrix problem

abc

=1 0 2 01 1 0 4

1 1 3 0

pqrs

which can be represented as an augmented matrix 1 0 2 0 a1 1 0 4 b

1 1 3 0 c

Performing basic row operations, we can put the augmented matrix in reduced row echelonform 1 0 2 0 a0 1 2 4 b a

0 1 1 0 c a

1 0 2 0 a0 1 2 4 b a

0 0 1 4 b + c 2a

1 0 0 8 2b + 2c 3a0 1 0 12 3a b 2c0 0 1 4 2a b c

This says that we have three leading variables p, q, r and one free variable s. Since s can be anyvalue we like, we may therefore take the solution

pqrs

=

2b + 2c 3a3a b 2c2a b c

0

which, it may be readily checked, satisfies ().

3. As a final example, we show that, in the vector space P3(R), the vector v = x3 lies in the spanof

S = {1 2x2, 1 + x x2, 1 + 2x + x3}

and that the vector w = 1 + 3x + x3 does not.

13

(a) For the first part, we need to find coefficients p, q, r such that

p(1 2x2) + q(1 + x x2) + r(1 + 2x + x3) = x3

p + q + r = 0q + 2r = 02p q = 0r = 1

This corresponds to the following augmented matrix, which we can easily put in reducedrow echelon form

1 1 1 00 1 2 02 1 0 00 0 1 1

1 0 0 10 1 0 20 0 1 10 0 0 0

Since the last line corresponds to the consistent equation 0p + 0q + 0r = 0, we concludethat p = 1, q = 2, r = 1 are suitable coefficients and that v Span(S).

(b) For the second part, we try to find coefficients p, q, r such that

p(1 2x2) + q(1+ x x2) + r(1+ 2x + x3) = 1+ 3x + x3

p + q + r = 1q + 2r = 32p q = 0r = 1

()

This corresponds to the following augmented matrix, which we can easily put in reducedrow form

1 1 1 10 1 2 32 1 0 00 0 1 1

1 0 0 00 1 0 10 0 1 10 0 0 1

Since the last line corresponds to the inconsistent equation 0p + 0q + 0r = 1, we concludethat there are no coefficients p, q, r satisfying () and that, consequently, w 6 Span(S).

You should recall that augmented matrices simply encode the coefficients of a system of linear equa-tions, and that each row operation merely replaces a system with a new system whose solution set isidentical. In the context of the above example, the system of equations () has identical solution setto the system

p = 0q = 1r = 10 = 1

Since 0 = 1 is a contradiction, the solution set is the empty set . The original system () thereforehas no solutions.

14

1.5 Linear Dependence and Linear Independence

When considering the span of a set of vectors, we often have an inbuilt redundancy. For example,suppose that {v1, . . . , v4} is a generating set for the vector space V: that is

V = Span{v1, . . . , v4}

Suppose also that v2 lies in the span of the other three vectors. Otherwise said, if S = {v1, v3, v4}, thenv2 Span(S). It should be intuitively obvious that v2 is redundant when it comes to generating V:that is,

V = Span(S)

and that S is a smaller generating set for V.The fact that v2 Span{v1, v3, v4}means that there exists a linear combination of v1, v3, v4 equallingv2:

a1, a3, a4 F such that v2 = a1v1 + a3v3 + a4v4

The expression

a1v1 v2 + a3v3 + a4v4 = 0

whereby a linear combination equals the zero vector, is known as a linear dependence. As we shall see,if you can remove a vector, in this case v2 from a generating set, while still generating the same space,then the generating set must be linearly dependent.

Definition 1.13. Suppose that S = {v1, . . . , vn} is a subset of a vector space V. We say that S is alinearly dependent set if there exist scalars a1, . . . , an F, not all zero,9 for which

a1v1 + + anvn = 0

Such an equation is termed a linear dependence of the vectors v1, . . . , vn.If the vectors of S are not linearly dependent, we say that they are linearly independent.

Examples

1. The vectors v1 =

210

, v2 =11

2

and v3 =75

6

are linearly dependent since2v1 + 3v2 v3 = 0

2. Are the polynomials v1 = 1 x2, v2 = x + 2x2 and v3 = 1 + 2x x2 linearly dependent inthe vector space P2(R)?

9This condition is crucial! You can always write 0 = 0v1 + + 0vn (this is known as a trivial representation of 0), butit tells you nothing about the vectors v1, . . . , vn. A linear dependence is therefore a non-trivial representation of the zerovector.

15

{v1, v2, v3} are linearly dependent if and only there exists a non-trivial solution (a1, a2, a3) tothe system of linear equations

a1(1 x2) + a2(x + 2x2) + a3(1 + 2x x2) = 0

a1 + a3 = 0a2 + 2a3 = 0a1 + 2a2 a3 = 0

It is readily seen that the only solution is the trivial (a1, a2, a3) = (0, 0, 0), whence the polyno-mials are linearly independent.

This last example illustrates the alternative definition of linear independence:10

Definition 1.14. Vectors v1, . . . , vn are linearly independent if

a1v1 + + anvn = 0 = a1 = a2 = = an = 0

Can we extend a linearly independent set?

Our main goal is the construction of a minimal generating set for a vector space. With this in mind,we consider what happens when we either shrink or extend certain sets of vectors. The next lemmashould be easy for you to prove yourself.

Lemma 1.15. Suppose that V is a vector space and suppose that S1 S2 are nested subsets of V. Then;1. If S1 is linearly dependent, so is S2.

2. If S2 is linearly independent, so is S1.

Next we consider extending a linearly independent subset:

Theorem 1.16. Suppose that S is a linearly independent subset of V and suppose that v V. Then S {v}is linearly independent if and only if v 6 Span(S).Proof. () Suppose that v Span(S). Then there exists a finite subset {v1, . . . , vn} S andscalars a1, . . . , an such that

v = a1v1 + + anvnBut this says that a1v1 + + anvn v = 0 is a linear dependence and so S {v} is linearly depen-dent.() Conversely, suppose that v 6 Span(S) and that S {v} is linearly dependent. Then scalarsa, a1, . . . , an such that,

av + a1v1 + + anvn = 0 ()If a 6= 0, we see that v = 1a (a1v1 + + anvn) Span(S), which contradicts our assumption. Itfollows that a = 0, whence () is a linear dependence of S, also a contradiction.

10Recalling negation of quantifiers from elementary logic:

ai F, a1v1 + + anvn = 0 = a1 = a2 = = an = 0

has negation

ai F, such that a1v1 + + anvn = 0 and a1, . . . , an are not all zero

16

Example We revisit the example on page 11. Let S = {v, w} R3, where v = 121

and w =112

. These are linearly independent.1. If we let u =

220

, then we easily see that u 6 Span(S). Indeed, if u were in Span(S), thenthere would exist a, b R such that

u = a

121

+ b11

2

=11

0

= 1 12 11 2

(ab

)

Since the augmented matrix 1 1 22 1 21 2 0

has reduced row form 1 1 20 1 2

0 0 1

whose last line reads 0 = 1, we would obtain a contradiction.It follows that {u, v, w} is a linearly independent set.Indeed, in this case we have

Span{u, v, w} = R3

2. If we let d =

062

, thend = 2v + 2w

whence d Span{v, w} and {d, v, w} is a linearly depen-dent set.In the picture, it should be clear that d lies in the planespanned by v, w and that u does not.

1.6 Bases and Dimension

The concept of a basis for a vector space V is extremely important. A basis can be thought of as either:

1. A linearly independent subset of V which is as large as possible; or,

2. A generating set for V which is as small as possible.

Definition 1.17. A basis for a vector space V is a linearly independent generating set for V.

17

Standard Bases Many common vector spaces have special bases which are used more often thanall others: these are known as standard bases. You should convince yourself that these bases reallysatisfy the definition.

Vector Space V Standard Basis R2 {i, j}R3 {i, j, k}Fn {e1, . . . , en} where ei is the column vector with ith entry 1 and

all other entries 0Mmn(F) {Eij : 1 i m, 1 j n} where Eij is the matrix with ijth

entry 1 and all other entries 0Pn(F) {1, x, x2, . . . , xn}P(F) {1, x, x2, x3, . . .}

For example, the standard bases of P3(R) and M2(R) are, respectively,

{1, x, x2, x3} and {E11, E12, E21, E22} ={(

1 00 0

),(

0 10 0

),(

0 01 0

),(

0 00 1

)}Think about our two conditions above for each of these examples. Can you add a vector to any ofthese bases so that the new set is still linearly independent? Can you remove a vector from any of thebases and still have a generating set? Hopefully your answer to both questions is always no. Indeedproperties 1 and 2 hold in general.

1. If is a basis, then generates V, whence Span() = V. It follows that, for any non-zero v V,we have {v} linearly dependent (Theorem 1.16). We cannot therefore make the basis anylarger without it failing to be linearly independent.

2. If is a basis and v then := \ {v} is certainly linearly independent (Lemma 1.15).However

v 6 Span()

whence is no longer a generating set for V.

Construction and Existence of Bases for Finite Dimensional Vector Spaces

Standard bases for the common examples above are all well and good, but we need to know whetherall vector spaces have a basis. Almost as important, we need some strategies for finding them. Thisprocess is fairly difficult even for finite dimensional vector spaces. For infinite dimensional vectorspaces, we postpone the discussion to the next section. The critical component and major challengeis the Exchange Theorem (or Replacement Theorem) which follows. Read it once then try an example ortwo: you are unlikely to get comfortable with it on the first read-through!

Definition 1.18. A vector space V is termed finite dimensional if it is finitely generated: that is if thereexists a finite subset S V such that Span(S) = V.Throughout we use |S| to denote the cardinality of a set S. The idea is to be able to replace elementsin a spanning set S one at a time with elements from a linearly independent set X, and that we willbe able to use up the entirity of X in this process, thus seeing that |X| |S|.

18

Theorem 1.19 (Exchange Theorem). Let V be a finite-dimensional vector space. Suppose also that

S is a finite generating set for V (i.e., Span(S) = V).

X is a linearly independent subset of V.

Then T S such that|T| = |X| and Span(X (S \ T)) = V.

Furthermore, |X| |S|.The subset T is sometimes referred to as the exchange.

Proof. Denote n = |S| and m = min{n, |X|} (we shall see shortly that |X| = m, but at present wedont even know whether X is finite).Since m |X|, we see that {x1, . . . , xm} is a subset of X. We make the following claim:

k {0, 1, . . . , m}, s1, . . . , sk S such that Span({x1, . . . , xk}

(S \ {s1, . . . , sk}

))= V

We prove by induction:

Base case If k = 0 then the claim is true, for S spans V.

Induction step Suppose the claim is true for some k < m. Thus we assume that

V = Span({x1, . . . , xk}

(S \ {s1, . . . , sk}

))Since {x1, . . . , xk, sk+1, . . . , sn} spans V it follows that there exists coefficients ai, bj for which

xk+1 = a1x1 + + akxk + bk+1sk+1 + + bnsn ()Since () is a linear dependence where the coefficient in front of xk+1 is non-zero, and becausethe elements of X are linearly independent, it follows that at least one of the bjs is non-zero:WLOG we may take bk+1 6= 0. Therefore

sk+1 = b1k+1(a1x1 + + akxk xk+1 + bk+2sk+2 + + bnsn)We may therefore eliminate sk+1 from all linear combinations describing elements of V, at thecost of including xk+1. We therefore have

V = Span({x1, . . . , xk+1}

(S \ {s1, . . . , sk+1}

))By induction, the claim is proved. Taking k = m and setting T = {s1, . . . , sm} we see that

V = Span({x1, . . . , xm}

(S \ T

))It remains to see that |X| > m is impossible. By the definition of m, if |X| > m, then m = n = |S| andthere must exist some xm+1 X. However, applying the induction step with k = m, we see that ()contains no terms sj, whence

xm+1 = a1x1 + + amxmBut this contradicts the linear independence of X. Therefore

|X| = m n = |S|which completes the proof.

19

Example of the Exchange Theorem Let V = R3, S = {i, j, k}, X ={(

235

),( 6

912

)}.

Since x1 = 2i + 3j + 5k, and the coefficient of i is non-zero, put s1 = i. Now

x2 =

6912

= 3x1 + 0j 3kwe choose s2 = k. Therefore T = {i, k} is the exchange, and we conclude that

R3 = Span{x1, x2, j}

Every Finitely Generated Vector Space has a Basis Recall that if V is finite-dimensional, then ithas a finite generating set S. Given any linearly independent set X in V, we may use the ExchangeTheorem to create a new finite spanning set X (S \ T). Indeed all we need from the ExchangeTheorem is that X is a finite set, and the existence of a finite spanning set which contains X. Armedwith this, we can now construct a basis.

Theorem 1.20 (Extension Theorem). Let V be a finite-dimensional vector space. Suppose that X and S aresubsets of V such that X is linearly independent, X S, Span(S) = V and |S| = n is finite. Then thereexists a basis of V such that X S.

Proof. Let m = |X| where m n. Let X = {x1, . . . , xm}. Then

X S = Span(X) Span(S) = V

If Span(X) = V then we are done: X is a basis.Otherwise, suppose that Span(X) 6= V. Then sm+1 S such that sm+1 6 Span(X). This means that

X {sm+1} = {x1, . . . , xm, sm+1}

is a linearly independent set in V.Now consider X {sm+1} in place of X and repeat (induction). The process must stop in at mostnm many steps since S is a finite spanning set.11

Now we are in the home straight. We know that finite-dimensional vector spaces have bases andwe can immediately use the Exchange Theorem to compare their cardinalities.

Corollary 1.21 (Well-definition of Dimension). Suppose that V is a finite-dimensional vector space. Sup-pose that 1, 2 are two bases of V. Then |1| = |2|.

Proof. Since V is finite dimensional, such 1, 2 exist. Taking X = 1, S = 2 in the ExchangeTheorem we see that, |1| |2|. Now repeat the argument with 1, 2 reversed.

By the corollary we may now define dimension.

Definition 1.22. If V is finite-dimensional, then its dimension dim V is the cardinality of any basis set.11In the worst case we would have X {sm+1, . . . , snm} = S being the desired basis .

20

Corollary 1.23. Suppose that W is a subspace of a finite dimensional vector space V. Then

dim W dim V

Moreover if dim W = dim V then W = V.

Proof. Let X be a basis of W, let B be a basis of V and let S = B X. Certainly Span(S) = V. By theExtension Theorem, a basis satisfying X S, whence

|X| || which says dim W dim V

Since and X are finite sets, we have equality of dimension if and only if = X. Thus X is a basis ofV and so W = V.

Example of the Extension Theorem Find a basis as a subset of the following spanning set of R3:

S = {v1, v2, v3, v4, v5, v6} =

12

0

,21

1

, 11

1

,54

2

,01

3

,11

4

Start by observing that v1 and v2 are non-parallel, whence

X = {v1, v2}

is linearly independent. Since X does not span R3, we need another vector from S. Observe thatv5 6 Span(X), so we may choose s3 = v5 to obtain the linearly independent set

= {v1, v2, v5} =

12

0

,21

1

,01

3

Since is a linearly independent set and dim Span() = 3 = dim R3, Corollary 1.23 says thatSpan() = R3 so that is a basis.

Characterization of a basis: Uniqueness of representation

One of the purposes of a basis is to be able to represent a vector in terms of its coefficients. Forexample, suppose that = {1, x, x2} is the standard basis of P2(R) and define

[a + bx + cx2] =

abc

We refer to the vector

( abc

) R3 as the co-ordinate representation of a + bx + cx2 with respect to the basis

. Such representations allow us to apply matrix methods to questions about vector spaces. It isan important fact that the co-ordinate representation of any vector with respect to a basis is unique.Moreover, this property essentially characterises the concept of a basis.

21

Theorem 1.24. Let = {v1, . . . , vn} be a finite subset of a vector space V. Then is a basis if and only ifeach v V has a unique representation with respect to . Otherwise said, every v V can be written as aunique linear combination

v = a1v1 + + anvnwhere each vi .Proof. () Suppose that is a basis. Then generates V and so every vector v V can bewritten as a linear combination of elements vi V. Suppose that v V which has at least tworepresentations with respect to . Then we have

v = a1v1 + + anvn = b1v1 + + bnvnfor some scalars ai, bi F. It follows that

(a1 b1)v1 + + (an bn)vn = 0which is a linear dependence on . Contradiction.() Conversely, suppose that is not a basis. There are two possibilities:

(a) does not generate V. In this case, v V with no representation in terms of the vi.(b) generates V but is linearly dependent. In this case there exists a linear dependence

c1v1 + + cnvn = 0But then, for any v V we see that

v = a1v1 + + anvn = (a1 + c1)v1 + + (an + cn)vnare two genuinely different representations of v.

Either way, there exists some v V without a unique representation in terms of .

Because of the Theorem, we may make the following definition.

Definition 1.25. If = {v1, . . . , vn} is a basis of V over F, and v V, we call the unique representa-tion in Theorem 1.24

[v] =

a1...an

Fnthe co-ordinate representation of v with respect to .

Example With respect to the basis = {1 x, 1 + x2, x 2x2} of P3(R), the polynomialv = 3 5x + 7x2 = 2(1 x) + (1 + x2) 3(x 2x2)

has co-ordinate representation [v] =( 2

13

)with respect to .

With respect to the basis = {2 x, x2, 1 + x} we instead have

v = 3 5x + 7x2 = 83(2 x) + 7x2 7

3(1 + x) = [v] =

8/377/3

We will return to co-ordinate representations in the next chapter.

22

1.7 Maximal linearly independent subsets (non-examinable)

In the previous section, we showed that every finite-dimensional (i.e., finitely generated) vector spacehas a basis. What about vector spaces which are not finitely generated? Does every vector space havea basis?

This is a subtle question. Take for instance the vector space P(R) of all polynomials with coeffi-cients in R. We stated that its standard basis is the infinite set

= {1, x, x2, x3, }This certainly satisfies Definition 1.17; every polynomial is a finite combination of terms in , and isa linearly independent set. Thus P(R) is an infinite-dimensional vector space with a countable basis, and we could write dim P(R) = 0.

Here is a related example. Consider the vector space V of power series with coefficients in R. Ourproblem is that the vector

n=0

xn = 1 + x + x2 + x3 +

(which converges on (1, 1) to the function 11x ) is an infinite combination of the vectors in . Wecannot therefore claim that forms a basis of V. But does V have a basis?

There are two ways around this problem. The first is to consider extending the definition of linearcombination to allow for infinite sums. The problem with this approach is that of convergence of sums.In an abstract vector space we only assume that

v, w V, v + w VThis allows us to conclude, by induction, that any finite sum ni=1 vi still lies in V. How do we knowthat n=1 vn has meaning? In the abstract we dont: you have to make some additional convergenceassumptions. If you study Banach and Hilbert spaces in an advanced analysis course, this is the typeof approach you will follow. In the context of power series, even though is not a basis, it is typicallymuch more useful to the study of V than a basis would be.

An alternative approach is to appeal to the (somewhat) controversial axiom of choice from settheory. The axiom of choice can be shown to be equivalent to Zorns Lemma which follows. The ideais to consider the set F of all linearly independent subsets of a vector space V. Of course F is goingto be very large! We can think of certain subsets of F , called chains, where every pair of elements inthe chain may be compared:

Definition 1.26. Let F be a set of sets. We say that a subset C F is a chain12 in F ifA, B C either A B orB A

We say that a chain C has an upper bound in F if there is some element B F such thatA C, A B

We say that F is a maximal member of F if is a subset of no member of F except itself.12Alternatively C is a nest, a tower, or is totally ordered.

23

The idea is that if F is taken to be the set of all linearly independent subsets in a vector space V,then a maximal element should be a basis. This should be completely obvious when V is finite-dimensional. Consider, for example, = {i, j, k} as a basis of R3. Then is an upper bound for thechain

C ={{i}, {i, j}, {i, j, k}

}For an infinite-dimensional example, = {1, x, x2, . . .} as a basis of P(R) is an upper bound for thechain

C ={{1}, {1, x}, {1, x, x2}, . . .

}Read this second example carefully: the ellipsis dots are hiding infinitely many subsets! In particularthe upper bound does not have to be an element of the chain.It is this example that gives us the idea of how to find a basis in general: =

UC

U is precisely the

union of all of the elements of the chain C.

Here are some of the details:

Definition 1.27. Let V be a vector space. We define a subset V to be a maximal linearly independentsubset of V if it satisfies the following two properties:

1. is linearly independent.

2. The only linearly independent subset of V that contains is itself.

You should be able to convince yourself that:

Lemma 1.28. A subset V is a basis of V if and only if it is a maximal linearly independent subset.

Finally we need the additional input that makes this work for infinite-dimensional vector spaces.

Theorem 1.29 (Zorns Lemma). Let F be a non-empty family of sets. Suppose that every chain C in F hasa maximal member M which contains every member of C. Then F has a maximal member.

Theorem 1.30. Every vector space has a basis.

Proof. Let F = {linearly independent subsets of V}. If V = {0} we are done. Otherwise, somenon-zero v V. Thus {v} F so that F is non-empty. Appealing to Zorns Lemma, our job is toshow that every chain in F has a maximal member which contains every member of said chain.Suppose that C F is a chain, and define

MC =

UCU

We claim that MC is an upper bound for C in F . For this, we need to show two things:

1. MC F : otherwise said, MC is a linearly independent set.

2. A C, we have A MC .

24

The latter is obvious from the definition of union! For the former, suppose that u1, . . . , un MC aredistinct vectors such that

a1u1 + + anun = 0By the total ordering of C, we see13 that U C such that u1, . . . , un U. But each U is linearlyindependent, whence a1 = = an = 0. It follows that MC F .We have shown that every chain in F has an upper bound MC in F . Applying Zorns lemma, we seethat F has a maximal element , which is necessarily a basis of V.

Such an argument (take the union over a chain to create an explicit upper bound so we can invokeZorns Lemma) is replicated in several other places in mathematics. If you study mathematics atgraduate level, you will very likely see it again. A basis whose existence is justified by the Theoremis known as a Hamel basis of V. Disappointingly, Hamel bases are almost completely useless for com-putational purposes, but it is nice to know that they exist all the same!

The essential results (the Exchange/Extension Theorems and the uniqueness of representation)may be generalized to cover infinite-dimensional vector spaces: one just has to be careful with inter-pretation. For instance here are some generalizations:

(Theorem 1.20) If X V is linearly independent then it may be extended to a basis of V. Thismay be proved by applying Zorns Lemma to the family of all linearly independent subsets ofV containing X, exactly as in Theorem 1.30.

(Corollary 1.23) All bases of an infinite-dimensional vector space have the same cardinality,whence dimension is well-defined. This is a bit trickier and requires an infinite-dimensionalversion of the Exchange Theorem.

(Theorem 1.24) If is a basis of V, then for all non-zero v V there is a unique finite subset{v1, . . . , vn} and unique non-zero scalars a1, . . . , an such that

v = a1v1 + + anvnOur only freedom is in the order of the vectors vi.

To see this last, for instance, suppose that v V is a non-zero vector. Since spans V there certainlyexists a finite linear combination for v in terms of the elements of . Suppose that there are two suchcombinations,

v = a1v1 + + anvn = b1w1 + + bmwm where each vi, wj WLOG we may assume that all ai, bj are non-zero. Let X = {v1, . . . , vn, w1, . . . , wm} (note thatthere might be repeats, so that |X| n + m). Relabelling X = {x1, . . . , xk} where k n + m, weobtain two linear combinations for v:

v = c1x1 + + ckxk = d1x1 + + dkxkwhere are least some of the ci, di are non-zero. But this is now a linear dependence on the set unless ci = di for all i. It follows that X = {v1, . . . , vn} is the unique subset of such that v =a1v1 + + anvn with all ai 6= 0.

13Since ui

UC U, Ui C such that ii Ui. Now let U = U1 Un. By total ordering, one of these Ui containsall the others: this is U. Note that this only works for finite n!

25

2 Linear Transformations and Matrices

The standard systematic approach in algebra is to study a collection of sets which have a commonstructure, and the maps between them which preserve that structure. In the context of vector spacesthis means maps which preserve the structure of vector addition and scalar multiplication.

2.1 Linear Transformations, Null Spaces and Ranges

Definition 2.1. Let V and W be vector spaces over the same field F. We say that a function L : V Wis linear if it satisfies the following properties:

v1, v2 V, F,{

L(v1) + L(v2) = L(v1 + v2)L(v1) = L(v1)

The idea is that the operations of vector addition and scalar multiplication in V and W are com-patible: we may add vectors in V first then map to W, or we may map to W first then add, the resultmust be the same.

You have met many examples of linear maps already (see, e.g., the intro).

Examples

1. Matrix multiplication: if v Fn and A Mmn(F), then

L : Fn Fm : v 7 Av

is linear. Spelling this out is tedious: for instance, the ith entry of the vector A(x + y) is

[A(x + y)]i =n

j=1

aij(xj + yj) =n

j=1

aijxj +n

j=1

aijyj = [Ax]i + [Ay]i

which is precisely the ith entry of the vector Ax + Ay. Scalar multiplication is similar.

2. Differentiation: If WD is the set of functions with domain D and VD is the subspace of differen-tiable functions in WD, then

L : VD WD : f 7d fdx

is linear.

3. If C(R) is the set of continuous functions with domain R, then

L : C(R) R : f 7 b

af (x)dx

is linear, for any constants a, b.

26

Definition 2.2. The set of linear maps from V to W is denoted14 L(V, W). If V = W we simplywrite L(V) instead of L(V, V).

The zero function 0 L(V, W) is the linear map defined by15

v V, 0 : v 7 0W

The identity function I L(V) is the linear map defined by

v V, I : v 7 v

The following should be easy to prove straight from the definition of linearity.

Theorem 2.3. 1. L L(V, W) if and only if L preserves all linear combinations: i.e.,

vi V, ai F, L(

n

i=1

aivi

)=

n

i=1

aiL(vi)

In particular, L(0) = 0.

2. L(V, W) is a vector space whose identity is the zero function 0 L(V, W).

Definition 2.4. Let V and W be vector spaces and L L(V, W). The range or image of V is the usualimage of L viewed as a function:

R(L) = {L(v) W : v V}

The null space or kernel of L is the set of all vectors which are mapped to zero by L:

N (L) = {v V : L(v) = 0W}

Theorem 2.5. The null space and range of a linear map are subspaces of V and W respectively.

Proof. Everything comes from the formula

L(v1 + v2) = L(v1) + L(v2)

Clearly if v1, v2 N (L), so is v1 + v2.Similarly, for any L(v1), L(v2) R(L) we see that L(v1) + L(v2) R(L).

Since the null space and range are vector spaces, they have a dimension:

Definition 2.6. The rank and nullity of a linear map L L(V, W) are the dimensions of the range andnull-space of L respectively:

rank L = dimR(L) null L = dimN (L)14Some texts use hom(V, W) instead of L(V, W). This is short for homomorphism, literally same transformation, indicat-

ing that something, in this case the structure of addition and scalar multiplication, stays the same after applying the map.If V = W the set of linear maps can also be written End(V), for endomorphism.

15Since we now have two vector spaces, you may find it helpful to explicitly distinguish between the zero in V and thezero in W. This can be done with suffices, e.g., 0V , 0W .

27

We are moving towards an important result linking the dimensions of these spaces. As a prelim-inary step, we need a lemma.

Lemma 2.7. If is a basis for V, then L() = {L(v) : v } is a spanning set forR(L).Proof. Let L(v) R(L). Then there is a finite combination

v = a1v1 + + anvnwhere each vi . But then

L(v) = L

(n

i=1

aivi

)=

n

i=1

aiL(vi) Span(

L())

()

ThusR(L) Span(

L()).

Conversely, the right hand side of () is a general element of Span(

L()), and which is certainly in

the range of L. Thus Span(

L()) R(L). It follows that these subspaces are equal.

In particular, note that we assumed nothing about the cardinality of the basis ; we could betalking about an infinite-dimensional vector space. Recall that every element of a vector space mustbe expressible as a finite combination of the basis vectors, even if the basis is infinite.

The critical relationship between rank, nullity and dimension is contained in the following theo-rem, also known as the dimension theorem.

Theorem 2.8 (RankNullity). If L L(V, W), then

rank L + null L = dim V

Proof. Suppose that X is a basis of the null spaceN (L). By the Extension Theorem,16 we may extendthis to a basis = X Y of V, where X Y = . We claim that L(Y) is a basis ofR(L).By Lemma 2.7, L() spans the range of L. However L(x) = 0 for each x X, whence

R(L) = Span(

L(Y))

It remains to see that L(Y) is linearly independent: for this, note that if y1, . . . , yr Y, thenr

i=1

aiL(yi) = 0 L(

r

i=1

aiyi

)= 0

r

i=1

aiyi N (L) Span(Y) = {0}

It follows that all ai = 0, so that L(Y) is linearly independent and thus a basis of the range R(L).Moreover, it is clear that, restricted to Y,

L|Y : Y L(Y)

is a bijection, whence

dim V = || = |X|+ |Y| = null L + |L(Y)| = null L + rank L

as required.

16If V is infinite-dimensional, we need the generalization on page 25, and we must interpret addition as that of cardinal-ity. The remainder of the proof is unchanged.

28

Examples

1. Let V = R3 and W = R4, and let L L(V, W) be left-multiplication by the matrix

A =

1 2 30 1 11 0 10 3 3

Since the range of this linear map is precisely the span of the columns of A, and because thethird column is the sum of the first two, it is clear that

R(L) = Span

1010

,

2103

= rank L = 2

Moreover, by performing row operations in an attempt to solve Ax = 0 we obtain the reducedrow echelon form

1 0 10 1 10 0 00 0 0

By solving these equations, it follows that the null space of the linear map is

N (L) =

xy

z

: x + z = 0 = y + z = Span

111

= null L = 1The ranknullity theorem simply reads 2 + 1 = 3.

2. If V = P3(R) and W = P2(R) we may take the linear map L L(V, W) defined by differentia-tion. With respect to the standard bases,

L(a + bx + cx2 + dx3) = b + 2cx + 3dx2

Clearly N (L) = {a : a R} P3(R) is the 1-dimensional space of constants, and R(L) =P2(R). It follows that

rank L + null L = 3 + 1 = 4 = dim P3(R)

3. (Non-examinable) Repeating the example 1 with V = W = P(R) we see that L L(P(R)) has

f N (L) f (x) = 0 f is constant

Thus null L = 1. However R(L) = P(R), since every polynomial is the derivative of another.Since P(R) has a countable basis = {1, x, x2, . . .}, we see that the ranknullity theorem says

0 + 1 = 0which makes perfect sense in the context of addition of infinite cardinals.

29

Injective Linear Maps

Because of the extra structure of linearity, injective linear maps have a very straightforward charac-terization.17

Theorem 2.9. L L(V, W) is injective if and only if N (L) = {0}.

Otherwise said, an linear map is injective if and only if its nullity is zero.

Proof. Let v1, v2 V. Then, by linearity,

L(v1) = L(v2) L(v1 v2) = 0W v1 v2 N (L)

from which the result is immediate.

Clearly none of the above examples are injective. A couple of quick appeals to the ranknullitytheorem gives the following corollaries.

Corollary 2.10. If dim V > dim W then there are no injective functions L L(V, W).

Proof. Since R(L) is a subspace of W, we have rank L = dimR(L) dim W. However, if dim V >dim W and L : V W is an injective linear map, then null L = 0. By the ranknullity theorem, weconclude that

rank L = dim V > dim W

which is a contradiction.

Corollary 2.11. Let V and W be finite-dimensional with equal dimension, and assume that L L(V, W).The following are equivalent:

1. L is injective.

2. L is surjective.

3. null L = 0.

4. rank L = dim V.

Proof. Observe the following:

L is injective null L = 0 rank L = dim V (holds for any vector spaces)

L is surjective R(L) = W rank L = dim W

The implication rank L = dim W = R(L) = W is the only part to require that W be finite-dimensional. Since we are assuming that dim W = dim V, we are done.

17Recall that a function f : A B is injective (one-to-one) if it never takes the same value twice. I.e., a1, a2 A, a1 6=a2 = f (a1) 6= f (a2).

30

2.2 The Matrix Representation of a Linear Map

Recall Theorem 1.24 where we saw that any vector has a unique co-ordinate representation withrespect to a basis. The same reasoning can be applied to a linear map L L(V, W).Theorem 2.12 (Matrix representations). Suppose that = {v1, . . . , vn} and = {w1, . . . , wm} are basesof V and W respectively. If L L(V, W) then there exists a unique matrix A Mmn(F) such that

v V, [L(v)] = A[v] ()

Moreover, A is the matrix whose jth column is the co-ordinate representation of L(vj) with respect to the basis:

A =

| |[L(v1)] [L(vn)]| |

()Definition 2.13. The matrix A = (aij) defined above is the matrix representation of L with respect to and . We use the notation A = [L] . If L L(V) and = , we simply write [L].

The Theorem can be summarized by the following commutative diagram. If L L(V, W), then thesymbol [ ] represents mapping a vector to its representation with respect to the basis , and A meansmultiply by the matrix A = [L] ,. What the diagrams mean is that you have two options to travelfrom V to Fm and that each must produce the same result.

V L //

[ ]

W

[ ]

Fn

A // Fm

v //

L(v)

[v] // [L(v)] = [L]

[v]

Proof of Theorem. Suppose first that a matrix A satisfying () exists. Then it must satisfy () for eachof the basis vectors v1, . . . , vn. However, with respect to the basis , we simply have

[vj] = ej

where ej is the jth standard basis vector in Fn. Since Aej is simply the jth column of A it follows thatA must have the form claimed in ().It remains to see that A as defined by () satisfies () for every vector v V. For this, note that theunique representation of v with respect to reads

v = b1v1 + + bnvn [v] =

b1...bn

FnNow observe, by (), that A = (aij) has ijth entry

aij =[[L(vj)]

]i

31

which is the ith entry of the representation of L(vj) with respect to the basis . By matrix multiplica-tion, the column vector A[v] Fm has ith entry

[A[v]

]i=

n

j=1

aijbj =n

j=1

[[L(vj)]

]ibj =

n

j=1

[bj[L(vj)]

]i=

[n

j=1

[bjL(vj)]

]i

=

[L( nj=1

bjvj

)]

i

=[[L(v)]

]i

(by linearity/Theorem 2.3)

Therefore the matrix A defined by () acts as we claim, and is the unique matrix satisfying () by thefirst part of the proof.

Since any list of n vectors {z1, . . . , zn} in W may be written in the form

zj =m

i=1

wiaij

for unique constants aij, the following corollary is immediate.

Corollary 2.14. A linear map L L(V, W) is completely determined by what it does to a basis. Otherwisesaid, if z1, . . . , zn W are any vectors in W, then there exists a unique linear map L L(V, W) such thatL(vj) = zj for each j. The linear map in question is precisely that L for which [L]

= A.

Examples

1. If you return to the introduction, the matrix of the linear map rotate clockwise by 30 withrespect to the standard basis = {i, j} of R2 is

[L] =

( 3

212

12

32

)

2. (See Example 2 on page 29) With respect to the standard bases of P3(R) and P2(R), the matrixof differentiate is

[L] =

| | | |[L(1)] [L(x)] [L(x2)] [L(x3)]| | | |

= | | | |[0] [1] [2x] [3x2]| | | |

=

0 1 0 00 0 2 00 0 0 3

3. Consider the vector space V = Span , where = {ex sin 2x, ex cos 2x} and let L L(V) be

the linear map differentiate. Since

L(ex sin 2x) = ex sin 2x + 2ex cos 2xL(ex cos 2x) = ex cos 2x 2ex sin 2x

32

we see that

[L] =(1 22 1

)In particular, given what you should already know about matrix multiplication, if f V, then

[L( f )] = [L][ f ] = [ f ] = [L]1 [L( f )] =15

(1 22 1

)[L( f )]

Think about what this is saying: if g(x) = L( f )(x) = aex sin 2x + bex cos 2x, then g has ananti-derivative

g(x)dx = a5

ex(sin 2x + 2 cos 2x) +b5

ex(2 sin 2x cos 2x)

No need for integration by parts!

4. (Tricky!) It is often easier to consider a linear map with respect to a basis which is chosen inorder to make the matrix of the linear map as simple as possible. For example, suppose thatL : R2 R2 is the linear map defined by reflect in the line y = 2x.If you draw a picture, it should be clear that the

vectors v1 =(

12

)and v2 =

(21

)behave very

nicely with respect to the linear map. Indeed

L(v1) = v1 L(v2) = v2

If we take = {v1, v2}, then is a basis of R2.Moreover, the matrix of L with respect to is sim-ply

[L] =(

1 00 1

)1

1

2

y

2 1 1 2x

i

jL(i)

L(j)

v1

v2

L(v2)

This is much simpler than trying to calculate the matrix of L with respect to the standard basise = {i, j}: it is an exercise to see that18

[L]e =15

(3 44 3

)

5. Here is another example of the same idea. Let n R3 be a fixed non-zero unit vector (|n| = 1)and define L : R3 R3 to be the linear map

L : v 7 v (v n)n

You should first convince yourself that this is linear! How could we find a matrix for this linearmap? There are two potential approaches.

18If you recall a previous class, 1 and 1 are the eigenvalues of the matrix [L]e, and v1, v2 the corresponding eigenvectors.[L] is then the diagonalization of [L]e. We will return to this concept later.

33

(a) Use the standard basis = {i, j, k}. For example, if n = 15i + 2

5k, then

L(i) =(

100

)+

15

1/

50

2/

5

= 15

402

, L(j) =01

0

, L(k) = 15

201

We therefore obtain the matrix

[L] =15

4 0 20 5 02 0 1

Repeating this in general, if n = n1i + n2j + n3k we have

[L] =

1 n21 n1n2 n1n3n1n2 1 n22 n2n3n1n3 n2n3 1 n23

(b) Alternatively, we can use a basis fitted more neatly to the linear map. Choose the first

basis vector to be v1 = n, then choose any two other non-parallel vectors v2, v3 which areperpendicular to n. It follows that

L(v1) = 0, L(v2) = v2, L(v3) = v3

With respect to the basis = {v1, v2, v3}, we have the matrix

[L] =

0 0 00 1 00 0 1

Having considered the linear map thusly it should be clear what it is doing: it is projectingonto the plane through the origin perpendicular to n, onto the subspace Span(v2, v3). Inthe case where n = 1

5i + 2

5k, we could easily choose v2 = j, v3 = 2i k.

It should also be from the linear maps interpretation as a projection, that its null-space isN (L) = Span(n) and its range isR(L) = Span(v2, v3).

34

Summary The big take-away from all of this is the following:

Linear Map = Matrix + BasesMore precisely, once you choose bases of finite dimensional vector spaces, then any linear map be-tween them is equivalent to a matrix.

2.3 Composition of Linear Maps and Matrix Multiplication

Given linear transformations L L(U, V) and M L(V, W) where all vector spaces have the samebase field, it makes sense to consider the composition:19

M L : U W : u 7 M(L(u))

Since

(M L)(u1 + u2) = M(

L(u1 + u2))= M

(L(u1) + L(u2)

)(linearity of L)

= M(

L(u1))+ M

(L(u2)

)(linearity of M)

= (M L)(u1) + (M L)(u2)

we have proved the following.

Theorem 2.15. The composition of two linear maps is a linear map.

We now consider the matrix of the composition of linear maps.

Theorem 2.16. Suppose that L L(U, V) and M L(V, W), where U, V, W are finite-dimensional.Suppose also that , , are bases of U, V, W respectively and that [L] and [M]

are the matrices of L, M

with respect to these bases. Then the matrix of the composition M L with respect to and is

[M L] = [M][L]

Proof. Let = {u1, . . . , un}, = {v1, . . . , vm} and = {w1, . . . , wl}, and write A = [M], B = [L]and C = [ML]. We simply compute what happens to each of the basis vectors of U.

ML(uk) = M

(m

j=1

vjBjk

)

=m

j=1

M(vj)Bjk =m

j=1

l

i=1

wi AijBjk

=l

i=1

wi

(m

j=1

AijBjk

)

Since, by definition,

ML(uk) =l

i=1

wiCik

19It is also common to write ML instead of M L for the composition.

35

it follows from the fact that = {w1, . . . , wl} is a basis, that

Cik =m

j=1

AijBjk

Otherwise said, C = AB.

Corollary 2.17. If V is finite-dimensional with basis , and M, L L(V), then

[ML] = [M][L]

Examples

1. Recall that the matrix of rotate clockwise by 30 with respect to the standard basis = {i, j}of R2 is

[L] =

( 3

212

12

32

)

It follows that L2 (rotate clockwise by 60) has matrix

[L2] = [L][L] =

( 3

212

12

32

)2=

(12

3

2

32

12

)

Similarly, rotation clockwise by 90 has matrix

[L3] =(

0 11 0

)

2. Recall the example where we projected onto the plane perpenduicular to n = 15i + 2

5k in

R3. Suppose that we wanted to compute the matrix of the linear map defined by: project ontothis plane followed by rotate 60 clockwise around the z-axis when viewed from above. By theprevious part, if M is the linear map for rotation, it should be clear that M has the followingmatrix with respect to the standard basis = {i, j, k}

[M] =

12

32 0

32

12 0

0 0 1

It follows that the composite linear map M L has matrix

[ML] = [M][L] =12

1

3 0

3 1 00 0 2

15

4 0 20 5 02 0 1

= 110

4 5

3 24

3 5 2

34 0 2

36

3. Also recall V = Span , where = {ex sin 2x, ex cos 2x} and L L(V) is differentiate. Then

[L] =(1 22 1

)Since the identity linear map I L(V) must have matrix I2 =

(1 00 1

)with respect to any basis,

it follows that [L][M] = I2 if and only if [M] = [L]1 . However, integration is the inverseprocess to differentiation, whence [M] must be the matrix of integration with respect to thebasis .

Definition 2.18. The Kroneker delta symbol ij is defined by

ij =

{1 if i = j0 if i 6= j

The n n identity matrix In is the matrix whose ijth entry is ij: that is

(In)ij = ij

Theorem 2.19. If V is an n-dimensional vector space with basis , then [A] = In if and only if A L(V)is the identity transformation of V.

Proof. If A = I is the identity, and = {v1, . . . , vn}, then clearly A(vi) = vi for each vi and so thematrix of A is the identity matrix In. Conversely, by Theorem 2.12, a matrix representation is unique,whence A = I is the only linear map with matrix In.

Left-multiplication by matrices When dealing with vector spaces of the form Fn, the it can bedifficult to distinguish between matrix multiplication and linear maps. This is because they are es-sentially the same, once youve chosen bases! Since Fn has a standard basis, it will often appear as if nosuch choice has been made, and confusion arises. Worst of all, this confusion spills over into othervector spaces. To tidy this up, it is a good idea to rephrase some of the discussion in the context oflinear maps L L(Fn, Fm).Theorem 2.20. Let A Mmn(F). Then left-multiplication of vectors in Fn by A results in a lineartransformation

LA : Fn Fm : v 7 Av

If B Mmn(F) is any matrix, and , are the standard bases of Fn, Fm, then we have the following.1. [LA]

= A

2. LA = LB A = B3. LA+B = LA + LB and LA = LA for all F4. If T L(Fn, Fm), then there is a unique C Mmn(F) such that T = LC.5. If E Mnp(F), then LAE = LALE.6. If m = n, then LIn = IFn (the linear map obtained by left-multiplication by the n n identity matrix is

the identity linear map).

37

2.4 Invertibility and Isomorphisms

In this section we restrict to linear maps which have inverses under composition.

Definition 2.21. Suppose that L : V W is linear. A function M : W V is an inverse of L if bothL M = IW and M L = IV

where IW and IV are the identity maps on V, W respectively. If such an M exists, we say that L isinvertible or an isomorphism.We say that vector spaces V, W are isomorphic if there exists an invertible L L(V, W).

Since invertible means the same as bijective, we note that L is invertible if and only if it is bothinjective and surjective.20 The following results consist of the basic properties of inverses.

Theorem 2.22. 1. If L is invertible, then it has a unique inverse: we call this function L1.

2. If L L(V, W) and N L(U, V) are invertible, then L N L(U, W) is invertible, with inverse(L N)1 = N1 L1

3. If L is invertible, then L1 is invertible, and (L1)1 = L.

4. If L L(V, W) is invertible, then L1 L(W, V): that is, L1 is itself linear.Proof. 1. Suppose that L L(V, W) has two inverses, M and N. Then, in particular, we have

L M = IW and N L = IVNow precompose the first equation with N to obtain

N (L M) = N IW= (N L) M = N (associativity of functional composition)= IV M = N= M = N

2. By associativity of functional composition,

(L N) (N1 L1) = L (N N1) L1 = L IV L1 = L L1 = IWShowing that (N1 L1) (L N) = IU is similar.

3. This is simply a re-reading of L1 L = IV and L L1 = IW .4. Suppose that L L(V, W) is invertible with inverse L1 : W V. Let w1, w2 W and F.

Since L is bijective, w1, v2 V such that wi = L(vi). But thenL1(w1 + w2) = L1

(L(v1) + L(v2)

)= L1

(L(v1 + v2)

)(linearity of L)

= v1 + v2 = L1(w1) + L1(v2)

20If L is invertible, the fact that L M = IW is bijective forces both L to be injective and M surjective. M L = IV beingbijective forces M injective and L surjective, whence both are bijective.Conversely, if L is bijective and w W, we know (surjectivity) that w = L(v) for some v V. Injectivity of L says that vis the only vector in V for which this is true. We define L1(w) = v, then L is invertible.

38

Remark: isomorphic is an equivalence relation on the collection of all vector spaces. In particular,IV is a isomorphism of any vector space with itself (reflexivity), while parts 2 and 3 of the abovetheorem show transitivity and symmetry respectively.

Invertibility and dimension Recall Corollary 2.11. In our new language this says that if V, W arefinite-dimensional vector spaces of the same dimension then

L L(V, W) is invertible null L = 0 rank L = dim V

What is interesting is that this can be turned on its head: equal dimension implies isomorphicity.21

Theorem 2.23. Suppose that V and W are vector spaces over the same field.

1. If L L(V, W) is an isomorphism and is a basis of V, then L() is a basis of W.

2. V and W are isomorphic if and only if dim V = dim W. In particular, if V and W are isomorphic, thenV is finite-dimensional if and only if W is.

Proof. 1. By Lemma 2.7, if is a basis of V, then L() is a spanning set for L(V). But L is surjective,thus L(V) = W whence L() spans W.Now suppose that v1, . . . , vn and a1, . . . , an F for which

a1L(v1) + + anL(vn) = 0

Since L is injective, Theorem 2.9 says that we have a trivial null-space, N (L) = {0}. It followsthat a1v1 + + anvn = 0 = a1, . . . , an = 0. Therefore L() is linearly independent and thusa basis of W.

2. By part 1, if is a basis and L L(V, W) an isomorphism, then L| : L() is a bijection. Itfollows that bases of V and W have the same cardinality, whence dim V = dim W.Conversely, if dim V = dim W and , are bases of V and W respectively, then and havethe same cardinality. It follows that there exists a bijection f : . Since any linear mapis defined by what it does to a basis, f gives rise to a unique linear map L : V W: for anyn N, if v1, . . . , vn are any elements of define

L

(n

i=1

aivi

):=

n

i=1

ai f (vi)

It is a straightforward exercise to check that L is an isomorphism.

While the theorem applies even to infinite-dimensional vector spaces, something more pleasanthappens in finite dimensions. Over a field F, there is, up to isomorphism, precisely one vector spaceof dimension n. The following corollaries are nothing more than restatements of Theorem 1.24 andCorollary 2.14 respectively using the language of isomorphisms.

Corollary 2.24. 1. If V is a vector space over F with dimension n, then V is isomorphic to Fn. In partic-ular, if is a basis of V, then (v) := [v] is an isomorphism : V Fn.

21This is very much in contrast to group theory, where the standard measure of size is the cardinality of the group. Forexample, Z4 and the Klein 4-group V have the same cardinality (four) but are not isomorphic.

39

2. If V, W are vector spaces over F of dimensions n and m respectively, then L(V, W) is a vector space overF of dimension mn. Moreover, if , are basis of V, W respectively, then

: L(V, W) Mmn(F) : L 7 [L]is an isomorphism.

Examples

1. R6, P5(R), M23(R), M32(R) are all isomorphic because they are all vector spaces over thesame field R with the same dimension 6.

2. It is incorrect to claim that P5(R) is isomorphic to M23(C) since the base fields are different.Since any complex vector space can be viewed as a real vector space with twice the dimension,it follows that, as real vector spaces

dimR P5(R) = 6 6= 12 = dimR M23(C)

3. (Non-examinable) It is very important to know the base field when talking about dimensionand isomorphicity! For instance, is it ever correct to claim that R and C are isomorphic? Asvector spaces over R, the answer is no, since dimR R = 1 6= 2 = dimR C. However, it canbe shown that dimQ R = 20 = dimQ C, whence they are isomorphic as vector spaces over Q.More technically, when claiming that two objects are isomorphic, it is important to stress inwhat category you are working: groups, rings, fields, vector spaces over a given field, etc.

Bases and the dual space (non-examinable) Let us think about the corollary in terms of the basis-comparison language of the theorem. First note that if = {v1, . . . , vn}, then

(vi) = eiso that simply maps the ith basis vector of to the ith standard basis vector of Fn.It is a little trickier to think about the isomorphism in terms of bases. For this, define the linearfunctions fi : V F by

fi(vj) := ijand let = {w1, . . . , wm} be a basis of W. It can be shown that every linear map L L(V, W) is aunique linear combination

L =m

i=1

n

j=1

aijwi f j

so that the set {wi f j} forms a basis of L(V, W). The isomorphism maps : wi f j 7 Eij

where {Eij} is the standard basis of Mmn(F).The elements { f1, . . . , fn} form a basis of the dual space

V := L(V, F)Indeed the basis { f1, . . . , fn} is said to be the dual basis to and it follows that vi 7 fi defines anisomorphism of V with its dual.Perhaps surprisingly, if V is infinite-dimensional, then the whole discussion breaks down and V isnot isomorphic to its dual.

40

2.5 The Change of Co-ordinate Matrix

Suppose that V is finite-dimensional over F with basis = {v1, . . . , vn}. We know that the map : V Fn : v 7 [x]

is an isomorphism of vector spaces. But what if we chose a different basis? Suppose that e is also abasis of V. Then we have another isomorphism

e : V Fn : v 7 [x]eSince inverses and compositions of isomorphisms are also isomorphisms, it follows that

1e : Fn Fn : [x]e 7 [x] ()is an isomorphism. However, by Theorem 2.20, any linear map in L(Fn) has to be left-multiplicationby some matrix Q Mn(F). Otherwise said,

Q Mn(F) such that 1e = LQor more concretely,

Q Mn(F) such that [x]e Fn we have ( 1e )([x]e) = Q[x]e ()and moreover, the matrix Q is unique. This Q has a name:

Definition 2.25. Let V be an n-dimensional vector space over F with bases and e. The matrixQ Mn(F) whose corresponding linear map is

LQ = 1e : [x]e 7 [x]is the change of co-ordinate matrix from e to .

We can compute the change of co-ordinate matrix explicitly. If = {v1, . . . , vn} and e = {e1, . . . , en}are our two bases, then we can find the jth column of the matrix Q be multiplying by the jth standardbasis vector in Fn; according to () this means multiplying by the co-ordinate vector of ej with respectto e: by () the result must be [ej],

Q

...010...

= Q[ej]e = [ej]

Rewriting the picture following Theorem 2.12 and taking W = Vwe obtain a commutative diagram and the proof of:

Theorem 2.26. The change of co-ordinate matrix from e to is thematrix of the identity linear map on V with respect to e and . That is,if e = {e1, . . . , en}, then

Q = [IV ]e =

| |[e1] [en]| |

x V IV //

e

x V

[x]e Fn

LQ // [x] Fn

41

If this appeal to commutative diagrams is unconvincing, here is an alternative way of seeing thisresult. It is easiest to follow if we write scalar multiplication vector-first. Since = {v1, . . . , vn} is abasis, for each ej e, we have unique constants Qij such that

ej = v1Q1j + + vnQnj =n

i=1

viQij

Then, for any x V, there exist unique scalars a1, . . . , an such that

x = a1e1 + + anen =n

j=1

ejaj =n

j=1

n

i=1

viQijaj

=n

i=1

vi

[n

j=1

Qijaj

]

It follows that

[x]e =

a1...an

and [x] = Qa1...

an

Q is clearly the change of co-ordinate matrix, and its jth column Q[ej]e is manifestly the vector [ej].

It is important to remember that the change or co-ordinate matrix is merely telling you how theco-ordinates of a vector v V change when a basis changes: nothing is happening to the vector vitself! An analogy is that [v]e and [v] are akin to looking at an object v though two different pairsof sunglasses: the two images might be different, but they are simply representations of the sameunchanging object v.

Computing co-ordinate changes Weve set up our notation so that the most common situation iseasiest to describe. Suppose that V is a vector space with a standard basis e = {e1, . . . , en}, and that = {v1, . . . , vn} is some other basis. A typical problem involves converting co-ordinates with respectto the standard basis to those with respect to : we therefore want the change of co-ordinate matrixQ = [IV ]

e . Unfortunately a direct computation is difficult: we would need to find the co-ordinates

of each standard basis vector ej with respect to a, perhaps, complicated basis . It should be clearhowever that the inverse change of co-ordinate matrix is much easier to compute:

Q = [IV ]e = change of co-ordinate matrix from e to

Q1 = [IV ]e = change of co-ordinate matrix from to e

Since, typically, the basis will be defined in terms of the standard basis e, the columns of Q1 aresimply the co-ordinates of the elements of :

Q1 = [IV ]e =

| |[v1]e [vn]e| |

We now take the inverse of Q1 to obtain the desired matrix Q.

42

Example Let e = {1, x, x2} be the standard basis of P2(R) and = {1 + x, 2 x2, 4 x2}. Wewant to compute Q = [IP2(R)]

e , the change of co-ordinate matrix from e to . Instead we compute its

inverse:

Q1 = [IV ]e =

| | |[1 + x]e [2 x2]e [4 x2]e| | |

=1 2 41 0 0

0 1 1

To check that this makes sense, consider the polynomial

p(x) = 7(1 + x) 2(2 x2) (4 x2)

which has co-ordinate representation

[p] =

721

Multiplying out p yields

p(x) = 1 + 7x + 3x2

Note that173

= [p]e = Q1 721

= Q1[p]as expected. Inverting the matrix allows us to find the desired change of co-ordinate matrix Q:

Q =12

0 2 01 1 41 1 2

For example, to find the co-ordinate representation of

r(x) = 2 + 3x + 4x2

with respect to , we compute

[r] = Q[r]e =12

0 2 01 1 41 1 2

234

= 3 152

72

which can be checked by multiplying out

r(x) = 3(1 + x) 152(2 x2) + 7

2(4 x2)

Of course, all this is predicated on being willing to invert a 3 3 matrix.

43

This process can be combined with matrix representations of linear maps.

Theorem 2.27. Suppose that V is an n-dimensional vector space with bases and e and let L L(V) be alinear map. If Q = [IV ]

e is the change of co-ordinate matrix from e to , then the matrices of L with respect to

e and are related by

[L]e = Q1[L]Q

To prove this, simply trace through what happens to the representation of any vector x V withrespect to e:

Q1[L]Q[x]e = Q1[L][x] = Q1[L(x)] = [L(x)]e = [L]e[x]e= [L]e = Q1[L]Q

The matrices of the linear map L with respect to different bases are therefore similar, or conjugate.

Examples

1. First recall the example on page 33 where L : R2 R2 is reflection in the line y = 2x. We recastthis example in our new language.Let e = {i, j} be the standard basis of R2 and let = {v1, v2} = {i + 2j,2i + j} be analternative basis. Since L(v1) = v1 and L(v2) = v2 we see that the matrix of L with respect tothe basis is

[L] =(

1 00 1

)The change of co-ordinate matrix Q1 from to e is clearly given by

Q1 = [IR2 ]e = ([v1]e, [v2]e) =

(1 22 1

)It follows that the matrix of the linear map L with respect to the standard basis e is

[L]e = Q1[L]Q =(

1 22 1

)(1 00 1

) 1

5

(1 22 1

)=

15

(3 44 3

)exactly as claimed on page 33.

2. More generally, let L : R2 R2 be reflection across the line making angle with the positivex-axis. Choose = {v1, v2} so that v1 is parallel to the line and v2 perpendicular. Given thatwe may choose v1 =

(cos sin

)and v2 =

( sin cos

), the change of co-ordinate matrix from to e is

Q1 = [IR2 ]e =

(cos sin sin cos

)The matrix of L with respect to is [L] =

(1 00 1

)as

before, whence, invoking some trig identities, we seethat the matrix of L with respect to the standard basis is

[L]e = Q1[L]Q =(

cos 2 sin 2sin 2 cos 2

)Observe that the columns of [L]e are [L(i)]e and [L(j)]e.

y

xi

jL(i)

L(j)

v1

v2

L(v2)

(cos , sin )

c2 s2

s2

c2

44

Change of basis in general (non-examinable) The change of basis approach can be further gener-alized to apply to a linear map between vector spaces, where we change basis on both spaces. This isa crazy result to attempt to work with by hand, but it does say what we should feed into a computerfor a given example!

Theorem 2.28. Suppose that L : V W is a linear map where dim V = n and dim W = m. Supposemoreover that e and are bases of V, and that and are bases of W. Then

[L]e = R1[L] Q

where Q = [IV ]e Mn(F) is the change or co-ordinate matrix from e to and R = [IW ] Mm(F) is the

change or co-ordinate matrix from to . Otherwise said, the following diagram commutes.

x V L(x) W

[x] Fn [L(x)] Fm

[x]e Fn [L(x)] Fm

L

e

[L]

[L]e

Q = [IV ]e

R = [IW ]

The above formula can also be viewed in the form

[IW ][L] [IV ]

e = [IW L IV ]e = [L]e

by observing that matrices of linear maps can be multiplied if the bases of the domain of the firstequals that of the codomain of the second.

Example The derivative operator D L(P3(R), P2(R)) defined by D(p)(x) = p(x) has matrix

[D]e =

0 1 0 00 0 2 00 0 0 3

with respect to the standard bases e = {1, x, x2, x3} and = {1, x, x2}.Suppose that = {1+ x, 1 x, 2x + x2, x3 1} and = {1 x, 2+ x2, x} are two new bases of P3(R)and P2(R) respectively. The change of co-ordinate matrices from these bases back to the standardbases are

Q1 = [IP3(R)]e =

1 1 0 11 1 2 00 0 1 00 0 0 1

R1 = [IP2(R)] = 1 2 01 0 1

0 1 0

45

The matrix of D with respect to and is therefore

[D] = R[D]eQ1 =

1 0 20 0 11 1 2

0 1 0 00 0 2 00 0 0 3

1 1 0 11 1 2 00 0 1 00 0 0 1

=1 1 2 60 0 0 3

1 1 4 6

We can check that this works on an example: let

p(x) = 3(1 + x) + 2(1 x) 4(2x + x2) + 5(x3 1)

be written with respect to . Then

(Dp)(x) = p(x) = 3 2 4(2 + 2x) + 15x2 = 7 8x + 15x2

= 37(1 x) + 15(2 + x2) 45x

with respect to . However

[D] [p] =

1 1 2 60 0 0 31 1 4 6

3245

=371545

= [Dp]as required.

3 Elementary Matrix Operations and Systems of Linear Equations

3.1 Elementary Matrix Operations and Elementary Matrices

This should all be revision.

Definition 3.1. Let A be an m n matrix. We can define three families of transformations, determinedby what they do to the rows of A.

Type I Swap any two of the rows. Leave the rest alone.

Type II Multiply one row by a non-zero constant and leave the others alone.

Type III Add a scalar multiple of one row to another. Leave the other rows alone.

These transformations are termed elementary row operations. Applying the same approach to columnsyields the elementary column operations.

Given our earlier discussion of linear maps an

Math 121A — Linear Algebra - ndonalds/math121a/notes.pdf · Math 121A — Linear Algebra Neil Donaldson Spring 2016 Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence Spence,

Documents