-
Math 121A Linear Algebra
Neil Donaldson
Spring 2016
Linear Algebra, Stephen Friedberg, Arnold Insel & Lawrence
Spence, 4th Ed 2003, Prentice Hall.
1 Vector Spaces
1.1 Introduction
What is Linear Algebra?
Linearity is one of the most important properties in
mathematics. A function is said to be linearif it preserves
addition and scalar multiplication. More precisely, a function L :
V W betweenvector spaces V and W is linear if, for all vectors v1,
v2 V and all scalars , we have the followingproperties:
(a) L(v1 + v2) = L(v1) + L(v2)
(b) L(v1) = L(v1)
Linear algebra is simply the study of linear functions. You have
already spent much of your mathe-matical career studying linear
functions. For example:
If V = Rn and W = Rm, and L is multiplication by a real nm
matrix.
If L = ddx is the usual differential operator, and V is a vector
space of differentiable functions.More generally, L could be a
linear differential operator such as L = d
2
dx2 + 2xd
dx + x2 + 1 whence
L(y) = y + 2xy + (x2 + 1)y
The standard methods for solving linear differential equations
such as L(y) = 0 are based onlinear algebra.
If V is a vector space of integrable functions we could
similarly define L( f ) = x
a f (t)dt.
If youve studied group theory, the second part of the formula
says that L : (V,+) (W,+) isa homomorphism of Abelian groups.
In mathematics the word linear often indicates that a problem or
structure is easy to deal with. Linearsystems may be analyzed
systematically using standard techniques. A non-linear system, by
contrast,is likely to be much more difficult to attack: if one can
solve a non-linear problem, it is often due tosome one-off piece of
trickery or luck.
1
-
What makes linear problems easy? The essence of why linear
problems are easier is that one canuse simple solutions as building
blocks to construct more complex solutions. For example, the
factthat integration is linear is what allows us to compute
integrals of polynomials using only the powerlaw:
x2 + 5x3 dx =
x2 dx + 5
x3 dx (linearity)
=13
x3 +54
x4 + c (power law)
To reiterate, linearity says that we only need to know how to
integrate powers
xn dx = 1n+1 xn+1 in
order to be able to integrate all polynomials.
Here is a trickier example: consider the linear function
L : R2 R2
which rotates a point 30 clockwise around the origin. You should
believe, although it is a trickyexercise at the moment to prove it,
that L is indeed linear. To discover a formula for L it is enough
soconsider what L does to the standard basis of R2, namely the
vectors
i =(
10
)j =
(01
)This is because if v = ( xy ) is any vector, then, by
linearity
v = xi + yj = L(v) = xL(i) + yL(j)
Using the picture and a little trigonometry, it should be
ob-vious that
L(i) =(
cos 30 sin 30
)=
( 3
2 12
)L(j) =
(sin 30cos 30
)=
(123
2
)
We therefore obtain
L(
xy
)=
( 3
2 x +12 y
12 x +
32 y
)=
( 3
212
12
32
)(xy
)
1y
1x
L(i)
L(j)
30
30
i
j
In the above example, we only needed to know what the function L
did to the basis vectors i and jin order to completely determine
the function. This is not a property shared by non-linear
functions.For example, if |v| is the length of v R2, then the
function
f : R2 R2 : v 7 (|v|2 + 1)v
is non-linear. Simply being told that f (i) = 2i and f (j) = 2j
is insufficient for you to to completelyunderstand the
function.
2
-
A Review of R2 and R3
You have already spent some time thinking about the simple
realvector spaces R2 and R3, where vectors may be thought of as
ar-rows joining two points. In the picture we take the vector v to
jointhe origin and the point P = (x, y). Writing i, j for the
standardbasis vectors, we have several notations for v:
v =OP =
(xy
)= xi + yj
The column vector notation is used to distinguish the vector (
xy ) fromthe point (x, y). The vector space R2 is simply the set of
all such vec-tors.
In three dimensions we have a similar idea, except that a point
nowhas three co-ordinates and we need the three standard basis
vectorsi, j, k.
Scalar multiplication involves lengthening a vector by a real
multiple:thus the vector tv has components tx and ty and we may
write
tv =OQ =
(txty
)= txi + tyj
1
1i
j
v
x
tyy
O
QP
tv
v
x
y
tx
ty
O
P
Q
Vector addition is defined by the parallelogram law. Al-
gebraically, if v1 =(
x1y1
)and v2 =
(x2y2
), then
v1 + v2 =(
x1 + x2y1 + y2
)= (x1 + x2)i + (y1 + y2)j
v
w
v + w
x1
y1
x2
y2
x1 + x2
y1 + y2
An important subtlety of vector spaces is that there is no need
for the vector v to have its tail at theorigin: direction and
magnitude are all that matters.1 This approach allows us to view
the oppositeedges of the above parallelogram as being the same
vector. Vector addition then has the intuitivenose-to tail
interpretation.
u
vw
u + v + w1To labor the point, a directed line segment joining
two points may be described as the ordered pair of those
points(
(a, b), (c, d)). Two such segments are equivalent vectors if and
only if they have the same length and direction. Specifically,
if we define on the set of pairs of points in the plane by
((a, b), (c, d)
)((p, q), (r, s)
)
{c a = r pd b = s q
then is an equivalence relation. A vector is nothing more than
an equivalence class of directed line segments under .
3
-
1.2 Vector Spaces
Vector spaces are the universes of linear algebra. In general, a
vector space is a set with two operations(addition and scalar
multiplication) which behave similarly to the intuitive structure
of R2. What dowe mean by this? there are certain identities which
are obvious in R2, such as commutativity:
v + w = w + v
You can probably think of several more. The axioms of a vector
space are simply that all these obviousidentities hold. Precisely,
we have the following definition.
Definition 1.1. A vector space (or linear space) V over a field
F consists of a set V together with twooperations
Vector Addition If v and w are elements of V then we can form
the sum v + w.
Scalar Multiplication If v V and F then we can form the product
v.
Together, these sets and operations satisfy the following
axioms:
G1: Closure under addition v, w V, v + w VG2: Associativity of
addition u, v, w V, (u + v) + w = u + (v + w)G3: Identity for
addition 0 V, such that v V, v + 0 = vG4: Inverse for addition v V,
v V, such that v + (v) = 0G5: Commutativity of addition v, w V, v +
w = w + vA1: Closure under scalar multiplication v V, F, v VA2:
Identity for scalar multiplication v V, 1v = vA3: Action of scalar
multiplication , F, v V, (v) = ()vD1: Distributivity I v, w V, F,
(v + w) = v + wD2: Distributivity II v V, , F, ( + )v = v + v
Elements of V are called vectors, while elements of F are
scalars.
For those who have studied groups, the first five axioms say
that (V,+) is an Abelian group, whilethe next three say that the
field F has a left action on V. The distributivity axioms tell us
how the twooperations interact.
Fields A field F is a set which behaves very like the real
numbers under addition and multiplica-tion. Indeed, in almost all
concrete examples of vector spaces that you will encounter, F will
be eitherthe real numbers R or the complex numbers C. The symbols 0
and 1 (as seen in the seventh axiom)will always refer to the
additive and multiplicative identities in the field. Be careful to
distinguishthe scalar 0 F from the zero vector 0 V.
4
-
Inverses and subtraction Subtraction of vectorscan be viewed as
a binary operation. It is taken tomean addition of the inverse,
namely
vw := v + (w)
In the vector space R2 this can be viewed pictorially.
v
w
w
v w
Pictures and Intuition Strictly speaking, the pictorial arrow
interpretation is only valid in thevector spaces R2 and R3. This
doesnt diminish the use of pictures in other spaces as a guide to
yourintuition. Any result will still have to be proved using the
abstract definition of a vector space and/orthe specific properties
of your example, but the intuition obtained from drawing a picture
can stillbe helpful for both your and your readers understanding.
This is similar to how Venn diagrams areuseful, but do not
constitute a proof when considering sets.
Important Examples
n-tuples If F is any field, then the set Fn of n-tuples forms a
vector space over F. That is
Fn =
a1a2...
an
: a1, . . . , an F
where addition and scalar multiplication are defined
bya1a2...
an
+
b1b2...
bn
:=
a1 + b1a2 + b2
...an + bn
and
a1a2...
an
:=
a1a2
...an
This is precisely the column vector notation we are used to in
R2 and R3. We refer to the valuesa1, . . . , an as the entries or
components of a vector. It is tedious to do so, but each of the
axioms of avector space can be checked individually. For example,
axiom D2 may be proved as follows:
( + )
a1a2...
an
=( + )a1( + )a2
...( + )an
=
a1 + a1a2 + a2
...an + an
=
a1a2
...an
+
a1a2
...an
=
a1a2...
an
+
a1a2...
an
The red equalities are from the definition of addition and
scalar multiplication, while the blue equal-ity holds because of
the distributivity laws in the field F.
m n Matrices If F is any field, then the set Mmn(F) of m n
matrices with entries in F formsa vector space. Vector addition in
Mmn(F) is the usual matrix addition, and scalar
multiplicationsimply multiplies all entries of a matrix by the same
constant. Indeed, by stacking the columns ofa matrix, it should be
clear that there is essentially no different between the vector
spaces Mmn(F)and Fmn.
5
-
Sets of functions Suppose that D is a set and F is a field. Then
the set
F (D, F) = { f : D F}
of functions with domain D and codomain F forms a vector space
over F. Vector addition and scalarmultiplication are defined as
follows:
Addition If f , g F (D, F), then f + g is the function defined
by
x D, ( f + g)(x) = f (x) + g(x)
Scalar multiplication If f F (D, F) and F, then f is the
function defined by
x D, ( f )(x) = ( f (x))
It is important to note that f + g and f are vectors (i.e.,
functions). By contrast f (x) is a scalar (anelement of the field
F). It is a common mistake2 to refer to the function f (x).We can
restrict to certain types of functions, for example continuous
functions, differentiable functions,polynomials, sums of
trigonometric functions, etc., provided that these sets are closed
under addition.We will think about this more in the next
section.Sequences form an important class of vector spaces. These
can be viewed simply as functions whosedomain is the set of natural
numbers.
Basic Theorems for Vector Spaces
Just as in group theory there are certain basic facts about
vector spaces that you will use withoutthinking. Strictly speaking,
however, if these facts are not axioms, then they need to be
proved.
Lemma 1.2. 1. Cancellation law: x + z = y + z = x + y
2. Uniqueness of Identity: The zero vector 0 posited in axiom G3
is unique.
3. Uniqueness of Inverse: Given v V, the vector v posited in
axiom G4 is unique.
4. Action of additive identity in F: v V, we have 0v = 0.
5. Action of negatives: v V, F, we have ()v = (v).
6. Action on zero vector: F, we have 0 = 0.
Most of these are left as exercises: they are easiest if proved
in order. For an example argument,consider number 4. Since 0 = 0 +
0 in any field F, we apply axioms D2, G3, G5 and the
cancellationlaw to see that
0v = (0 + 0)v = 0v + 0v (Distributivity D2)= 0 + 0v = 0v + 0v
(Identity G3 and Commutativity G5)= 0 = 0v (Cancellation Law)
2Endemic among calculus students. . .
6
-
1.3 Subspaces
As in other areas of algebra,3 the prefix sub means that an
object is a subset, while simultaneouslyretaining the algebraic
structure of the original set.
Definition 1.3. Let V be a vector space over F. A subset W V is
a subspace of V if it is also a vectorspace over F with respect to
the same addition and scalar multiplication operations as V.A
subspace W V is proper if it is a proper subset (i.e., W 6= V).The
trivial subspace of V is the point set {0}.As a shorthand, we write
W V for a subspace to distinguish from W merely being a subset.
All of the axioms of a vector space except G1, G3, G4 and A1
hold for any subset of V, so it is sufficientto check these. In
fact, we need only check the two closure axioms, as the next result
shows.
Theorem 1.4. Suppose that W is a non-empty subset of a vector
space V over F. Then W is a subspace if andonly if the following
two properties hold:
S1: Closed under addition: w1, w2 W, we have w1 + w2 W.
S2: Closed under scalar multiplication: w,W, F, we have w
W.Proof. If W is a subspace of V, then S1 and S2 are simply the
axioms G1 and A1, whence the aboveproperties hold.Conversely,
suppose that the properties S1 and S2 hold. We therefore have that
all of the axioms of avector space except G3 and G4 hold for W. It
remains to prove that these are also satisfied.Since W is
non-empty, we may assume that w W. By Lemma 1.2, part 4., and
property S2, we seethat
0 = 0w W
Thus axiom G3 is satisfied.Now let w V be the additive inverse
(in V) of the vector w W. We need to see that w W.But this is
immediate since
w = (1)w W
by Lemma 1.2, part 5., and property S2.
Note: one could also state the Theorem by additionally requiring
that 0 W. This removes the needto assume that W is non-empty.
Examples
1. If n m then we may view consider a subspace W of Rm to be the
set of all vectors of the form
w =
w1...wm
where i > n = wi = 03Cf. subgroup, subring, subfield,
etc.
7
-
In essence, W looks like the set of column vectors of the
form
w =(
x0
)where x Rn and 0 Rmn
Some writers will use the notation Rn to mean any vector space
over R which looks like4 thespace of column-vectors of length n. In
this language we can therefore write
n m = Rn Rm
The challenge this more generaly idea of Rn is that there are
now many ways in which Rn couldbe viewed as a subspace of Rm.
2. Let I R be an open interval. We have seen that the vector
space V = I , R is a vector spaceover R. The subset
C(I, R) = { f V : f is continuous}
is a subspace of V. We simply need to check the Theorem.
S1: If f , g : I R are continuous, then f + g : I R is
continuous.S2: If f : I R is continuous and R, then f : I R is
continuous.
The proofs you give for these facts depend on using the
definition of continuity.5
3. The vector space C1(I, R) of functions f : I R which are
differentiable and with continuousderivative similarly forms a
subspace of C(I, R). This can be extended naturally to the
spacesCm(I, R) and even C(I, R). All power series which converge on
I are infinitely differentiable,and thus are elements of the vector
space C(I, R).
4. The space of degree n real polynomials Pn(R) is a subspace of
all Cm(R, R). Similarly thevector space of all polynomials
P(R).
5. The trace of an n n matrix is the function tr : Mn(F) F
defined by
tr A =n
i=1
aii = a11 + a22 + + ann
That is, we sum the terms on the main diagonal. The subset of
trace-free matrices is denoted6
sln(F) = {A Mn(F) : tr A = 0}
It is easy to check that sln(F) Mn(F).4The correct term is
isomorphic to, as we will see later.5For example, using the fact
that f is continuous at x = a if lim
xa f (x) = f (a), you need to observe that
limxa[ f (x) + g(x)] = f (a) + g(a) = ( f + g)(a)
6If youve taken group theory, the notation sln should remind you
of the special linear group SLn. This is no accident:since det eA =
etr A, the relationship is that A sln = exp(A) SLn.
8
-
Intersections and Direct Sums
Since vector spaces are sets, we may take unions and
intersections of them. Unfortunately only oneof these is a vector
space. . .
Theorem 1.5. If V and W are both subspaces of some larger vector
space U , then their intersection V W isa subspace of both V and
W.
Proof. Since V and W are both subspaces of U , they both contain
0 and so V W is non-empty.Now suppose that x, y V W and F. Then,
since V and W are both vector spaces, they areclosed under addition
and scalar multiplication, whence we have
x + y V, x + y W, x V, x W
But then x + y V W and x V W, whence properties S1 and S2 hold.
V W is therefore asubspace of both V and W.
Example Suppose that
V = {xi + zk : x, z R}W = {yj + zk : y, z R}
are the xz- and yz-planes respectively. Both V and W areclearly
subspaces of R3. Their intersection is the subspace
V W = {zk : z R}
otherwise known as the z-axis.
If we try to do the same thing for unions we hit a problem.Think
of the easy counterexample. Let V = {xi : x R} andW = {yj : y R} be
the x- and y-axes, viewed as subspacesof R2. Their intersection is
the trivial subspace V W = {0}.However, their union
V W = {xi, yj : x, y R}
is not a subspace of R2. It is nothing more than the
positionvectors of all the points on both axes. In particular, V W
isnot closed under addition:
i V and j W but i + j 6 V W
W
Vi
ji + j
Instead we search for the smallest vector space which contains
the union of V and W.
Definition 1.6. Suppose that V and W are subspaces of U with
trivial intersection (V W = {0}).The direct sum7 of V and W is the
set
V W = {v + w : v V, w W}7If we remove the requirement that V W
be trivial, the set V + W := {v + w : v V, w W} is called the sum
of V
and W. We only use the circled-plus -symbol when V W = {0}.
9
-
Examples
1. If V = {xi : x R} and W = {yj : y R} are both subspaces of
R2, then V W = R2.2. More generally, suppose that V = {tv : t R}
and W ={sw : s R} are distinct, proper, non-trivial subspaces ofR2.
If we let w be any vector perpendicular to w, thenwe observe, for
any x R2, that
x tv W (x tv) w = 0
t = x w
v w
where is the usual dot product of vectors. Since v 6W it is
immediate that v w 6= 0, whence t is properlydefined. If we now
choose s so that sw = x tv, it followsthat
x = tv + sw
is the unique decomposition of x in terms of V and W.
Inparticular, this shows that V W = R2.
V
tv
vW
sw = x tvw
x
w
O
The following properties of direct sums are straightforward to
prove from the definition. Try it!
Theorem 1.7. 1. V W is a subspace of U .2. V and W are subspaces
of V W.3. If X is a subspace of U with the property that both V and
W are subspaces of X, then VW is a subspace
of X.
4. If x V W, then there exist unique v V and w W such that x = v
+ w.The third property essentially says that V W is the smallest
vector space containing both V andW as subspaces.
Advanced: a more general notion of direct sum The final property
allows us to make an alternativedefinition of direct sum, one which
will look more familiar if you have studied group theory.
Definition 1.8. Suppose that V and W are any vector spaces over
the same field F. Their direct sumis the vector space
V W := {(v, w) : v V, w W}of ordered pairs, where addition and
scalar multiplication are defined by
(v1, w1) + (v2, w2) := (v1 + v2, w1 + w2)(v, w) := (v, w)
In this definition, the vector space V is in bijective
correspondence with the subspace
V := {(v, 0W) : v V} V WV W in the original definition is the
same as V W under the new.
10
-
1.4 Linear Combinations
Definition 1.9. Suppose that V is a vector space over F and that
{v1, . . . , vn} is a non-empty collectionof vectors in V. A linear
combination of these vectors is any vector of the form
a1v1 + + anvn ()
where a1, . . . , an F are the coefficients of the linear
combination.More generally, if S is a non-empty subset of V, then a
linear combination of vectors is S is any expres-sion of the form
() where all8 v1, . . . , vn S.The span of S is the subset of all
linear combinations of vectors in S:
Span(S) = {a1v1 + + anvn : n N, a1, . . . , an F, v1, . . . , vn
S}It is a convention that Span() is the trivial subspace {0}.
Examples
1. Let S = {i, k} R3. The span of S is the set of all linear
combinations of the vectors i and k:this is simply the xz-plane
Span(S) = {ai + bk : a, b R}
2. Let S = {v, w} R3 where
v =
121
w =11
2
Then
Span(S) =
a 121
+ b11
2
: a, b R
These vectors comprise the plane through the originspanned by v
and w: hence the use of the word span.
These examples should immediately suggest the following result
to you:
Theorem 1.10. If S is a subset of a vector space V, then Span(S)
is a subspace of V.
Proof. According to Theorem 1.4 we need only show that Span(S)
is closed under addition and scalarmultiplication. This is tedious
to write out, but comes straight from the definition of span.Let v,
w Span(S) and F. It follows that there exist vectors
v1, . . . , vn, w1, . . . , wm S
and scalars
a1, . . . , an, b1, . . . , bm F8Note: a linear combination must
contain only finitely many terms.
11
-
such that
v = a1v1 + + anvn, w = b1w1 + + bmwm
But then
v + w = a1v1 + + anvn + b1w1 + + bmwm Span(S)
and
v = a1v1 + + anvn Span(S)
Now think about why the following is an obvious corollary of our
proof.
Corollary 1.11. If W is a subspace of V which contains all
elements of a subset S of V, then Span(S) is asubspace of W.
Generating sets
One of the primary purposes of considering spans of subsets of a
vector space is to answer the fol-lowing question:
What are the smallest subsets S V such that Span(S) = V?
Such subsets S will be known as bases of V. Before we get there,
we will need to think, in the nextsection, about linear
independence. In the mean time we can give a preliminary name to
subsetswhich span V:
Definition 1.12. Suppose that S is a subset of V such that
Span(S) = V. We say that S generates V.
Examples
1. S = {i, j} generates R2.
2. S = {1 + x + x2, x x2, 2 + 3x2, 4x} generates the vector
space P2(R) of polynomials over Rof degree 2.This second example
may appear a little tricky, but it can be approached using a
familiar matrixmethod from elementary linear algebra. Recall
that
P2(R) = {a + bx + cx2 : a, b, c R}
Certainly S P2(R). If we are to see that Span(S) = P2(R) we need
to see that any polynomiala+ bx + cx2 can be written as a linear
combination of the elements of S. That is, we are requiredto solve
the following problem: given any a, b, c R, find coefficients p, q,
r, s R such that
a + bx + cx2 = p(1 + x + x2) + q(x x2) + r(2 + 3x2) + s(4x) ()=
(p + 2r) + (p + q + 4s)x + (p q + 3r)x2
12
-
Since two polynomials are equal if and only if their
coefficients are equal, the problem becomes:given a, b, c R find p,
q, r, s R such that
a = p + 2rb = p + q + 4sc = p q + 3r
Otherwise said, we are looking for a solution to the
underdetermined matrix problem
abc
=1 0 2 01 1 0 4
1 1 3 0
pqrs
which can be represented as an augmented matrix 1 0 2 0 a1 1 0 4
b
1 1 3 0 c
Performing basic row operations, we can put the augmented matrix
in reduced row echelonform 1 0 2 0 a0 1 2 4 b a
0 1 1 0 c a
1 0 2 0 a0 1 2 4 b a
0 0 1 4 b + c 2a
1 0 0 8 2b + 2c 3a0 1 0 12 3a b 2c0 0 1 4 2a b c
This says that we have three leading variables p, q, r and one
free variable s. Since s can be anyvalue we like, we may therefore
take the solution
pqrs
=
2b + 2c 3a3a b 2c2a b c
0
which, it may be readily checked, satisfies ().
3. As a final example, we show that, in the vector space P3(R),
the vector v = x3 lies in the spanof
S = {1 2x2, 1 + x x2, 1 + 2x + x3}
and that the vector w = 1 + 3x + x3 does not.
13
-
(a) For the first part, we need to find coefficients p, q, r
such that
p(1 2x2) + q(1 + x x2) + r(1 + 2x + x3) = x3
p + q + r = 0q + 2r = 02p q = 0r = 1
This corresponds to the following augmented matrix, which we can
easily put in reducedrow echelon form
1 1 1 00 1 2 02 1 0 00 0 1 1
1 0 0 10 1 0 20 0 1 10 0 0 0
Since the last line corresponds to the consistent equation 0p +
0q + 0r = 0, we concludethat p = 1, q = 2, r = 1 are suitable
coefficients and that v Span(S).
(b) For the second part, we try to find coefficients p, q, r
such that
p(1 2x2) + q(1+ x x2) + r(1+ 2x + x3) = 1+ 3x + x3
p + q + r = 1q + 2r = 32p q = 0r = 1
()
This corresponds to the following augmented matrix, which we can
easily put in reducedrow form
1 1 1 10 1 2 32 1 0 00 0 1 1
1 0 0 00 1 0 10 0 1 10 0 0 1
Since the last line corresponds to the inconsistent equation 0p
+ 0q + 0r = 1, we concludethat there are no coefficients p, q, r
satisfying () and that, consequently, w 6 Span(S).
You should recall that augmented matrices simply encode the
coefficients of a system of linear equa-tions, and that each row
operation merely replaces a system with a new system whose solution
set isidentical. In the context of the above example, the system of
equations () has identical solution setto the system
p = 0q = 1r = 10 = 1
Since 0 = 1 is a contradiction, the solution set is the empty
set . The original system () thereforehas no solutions.
14
-
1.5 Linear Dependence and Linear Independence
When considering the span of a set of vectors, we often have an
inbuilt redundancy. For example,suppose that {v1, . . . , v4} is a
generating set for the vector space V: that is
V = Span{v1, . . . , v4}
Suppose also that v2 lies in the span of the other three
vectors. Otherwise said, if S = {v1, v3, v4}, thenv2 Span(S). It
should be intuitively obvious that v2 is redundant when it comes to
generating V:that is,
V = Span(S)
and that S is a smaller generating set for V.The fact that v2
Span{v1, v3, v4}means that there exists a linear combination of v1,
v3, v4 equallingv2:
a1, a3, a4 F such that v2 = a1v1 + a3v3 + a4v4
The expression
a1v1 v2 + a3v3 + a4v4 = 0
whereby a linear combination equals the zero vector, is known as
a linear dependence. As we shall see,if you can remove a vector, in
this case v2 from a generating set, while still generating the same
space,then the generating set must be linearly dependent.
Definition 1.13. Suppose that S = {v1, . . . , vn} is a subset
of a vector space V. We say that S is alinearly dependent set if
there exist scalars a1, . . . , an F, not all zero,9 for which
a1v1 + + anvn = 0
Such an equation is termed a linear dependence of the vectors
v1, . . . , vn.If the vectors of S are not linearly dependent, we
say that they are linearly independent.
Examples
1. The vectors v1 =
210
, v2 =11
2
and v3 =75
6
are linearly dependent since2v1 + 3v2 v3 = 0
2. Are the polynomials v1 = 1 x2, v2 = x + 2x2 and v3 = 1 + 2x
x2 linearly dependent inthe vector space P2(R)?
9This condition is crucial! You can always write 0 = 0v1 + + 0vn
(this is known as a trivial representation of 0), butit tells you
nothing about the vectors v1, . . . , vn. A linear dependence is
therefore a non-trivial representation of the zerovector.
15
-
{v1, v2, v3} are linearly dependent if and only there exists a
non-trivial solution (a1, a2, a3) tothe system of linear
equations
a1(1 x2) + a2(x + 2x2) + a3(1 + 2x x2) = 0
a1 + a3 = 0a2 + 2a3 = 0a1 + 2a2 a3 = 0
It is readily seen that the only solution is the trivial (a1,
a2, a3) = (0, 0, 0), whence the polyno-mials are linearly
independent.
This last example illustrates the alternative definition of
linear independence:10
Definition 1.14. Vectors v1, . . . , vn are linearly independent
if
a1v1 + + anvn = 0 = a1 = a2 = = an = 0
Can we extend a linearly independent set?
Our main goal is the construction of a minimal generating set
for a vector space. With this in mind,we consider what happens when
we either shrink or extend certain sets of vectors. The next
lemmashould be easy for you to prove yourself.
Lemma 1.15. Suppose that V is a vector space and suppose that S1
S2 are nested subsets of V. Then;1. If S1 is linearly dependent, so
is S2.
2. If S2 is linearly independent, so is S1.
Next we consider extending a linearly independent subset:
Theorem 1.16. Suppose that S is a linearly independent subset of
V and suppose that v V. Then S {v}is linearly independent if and
only if v 6 Span(S).Proof. () Suppose that v Span(S). Then there
exists a finite subset {v1, . . . , vn} S andscalars a1, . . . , an
such that
v = a1v1 + + anvnBut this says that a1v1 + + anvn v = 0 is a
linear dependence and so S {v} is linearly depen-dent.()
Conversely, suppose that v 6 Span(S) and that S {v} is linearly
dependent. Then scalarsa, a1, . . . , an such that,
av + a1v1 + + anvn = 0 ()If a 6= 0, we see that v = 1a (a1v1 + +
anvn) Span(S), which contradicts our assumption. Itfollows that a =
0, whence () is a linear dependence of S, also a contradiction.
10Recalling negation of quantifiers from elementary logic:
ai F, a1v1 + + anvn = 0 = a1 = a2 = = an = 0
has negation
ai F, such that a1v1 + + anvn = 0 and a1, . . . , an are not all
zero
16
-
Example We revisit the example on page 11. Let S = {v, w} R3,
where v = 121
and w =112
. These are linearly independent.1. If we let u =
220
, then we easily see that u 6 Span(S). Indeed, if u were in
Span(S), thenthere would exist a, b R such that
u = a
121
+ b11
2
=11
0
= 1 12 11 2
(ab
)
Since the augmented matrix 1 1 22 1 21 2 0
has reduced row form 1 1 20 1 2
0 0 1
whose last line reads 0 = 1, we would obtain a contradiction.It
follows that {u, v, w} is a linearly independent set.Indeed, in
this case we have
Span{u, v, w} = R3
2. If we let d =
062
, thend = 2v + 2w
whence d Span{v, w} and {d, v, w} is a linearly depen-dent
set.In the picture, it should be clear that d lies in the
planespanned by v, w and that u does not.
1.6 Bases and Dimension
The concept of a basis for a vector space V is extremely
important. A basis can be thought of as either:
1. A linearly independent subset of V which is as large as
possible; or,
2. A generating set for V which is as small as possible.
Definition 1.17. A basis for a vector space V is a linearly
independent generating set for V.
17
-
Standard Bases Many common vector spaces have special bases
which are used more often thanall others: these are known as
standard bases. You should convince yourself that these bases
reallysatisfy the definition.
Vector Space V Standard Basis R2 {i, j}R3 {i, j, k}Fn {e1, . . .
, en} where ei is the column vector with ith entry 1 and
all other entries 0Mmn(F) {Eij : 1 i m, 1 j n} where Eij is the
matrix with ijth
entry 1 and all other entries 0Pn(F) {1, x, x2, . . . , xn}P(F)
{1, x, x2, x3, . . .}
For example, the standard bases of P3(R) and M2(R) are,
respectively,
{1, x, x2, x3} and {E11, E12, E21, E22} ={(
1 00 0
),(
0 10 0
),(
0 01 0
),(
0 00 1
)}Think about our two conditions above for each of these
examples. Can you add a vector to any ofthese bases so that the new
set is still linearly independent? Can you remove a vector from any
of thebases and still have a generating set? Hopefully your answer
to both questions is always no. Indeedproperties 1 and 2 hold in
general.
1. If is a basis, then generates V, whence Span() = V. It
follows that, for any non-zero v V,we have {v} linearly dependent
(Theorem 1.16). We cannot therefore make the basis anylarger
without it failing to be linearly independent.
2. If is a basis and v then := \ {v} is certainly linearly
independent (Lemma 1.15).However
v 6 Span()
whence is no longer a generating set for V.
Construction and Existence of Bases for Finite Dimensional
Vector Spaces
Standard bases for the common examples above are all well and
good, but we need to know whetherall vector spaces have a basis.
Almost as important, we need some strategies for finding them.
Thisprocess is fairly difficult even for finite dimensional vector
spaces. For infinite dimensional vectorspaces, we postpone the
discussion to the next section. The critical component and major
challengeis the Exchange Theorem (or Replacement Theorem) which
follows. Read it once then try an example ortwo: you are unlikely
to get comfortable with it on the first read-through!
Definition 1.18. A vector space V is termed finite dimensional
if it is finitely generated: that is if thereexists a finite subset
S V such that Span(S) = V.Throughout we use |S| to denote the
cardinality of a set S. The idea is to be able to replace
elementsin a spanning set S one at a time with elements from a
linearly independent set X, and that we willbe able to use up the
entirity of X in this process, thus seeing that |X| |S|.
18
-
Theorem 1.19 (Exchange Theorem). Let V be a finite-dimensional
vector space. Suppose also that
S is a finite generating set for V (i.e., Span(S) = V).
X is a linearly independent subset of V.
Then T S such that|T| = |X| and Span(X (S \ T)) = V.
Furthermore, |X| |S|.The subset T is sometimes referred to as
the exchange.
Proof. Denote n = |S| and m = min{n, |X|} (we shall see shortly
that |X| = m, but at present wedont even know whether X is
finite).Since m |X|, we see that {x1, . . . , xm} is a subset of X.
We make the following claim:
k {0, 1, . . . , m}, s1, . . . , sk S such that Span({x1, . . .
, xk}
(S \ {s1, . . . , sk}
))= V
We prove by induction:
Base case If k = 0 then the claim is true, for S spans V.
Induction step Suppose the claim is true for some k < m. Thus
we assume that
V = Span({x1, . . . , xk}
(S \ {s1, . . . , sk}
))Since {x1, . . . , xk, sk+1, . . . , sn} spans V it follows
that there exists coefficients ai, bj for which
xk+1 = a1x1 + + akxk + bk+1sk+1 + + bnsn ()Since () is a linear
dependence where the coefficient in front of xk+1 is non-zero, and
becausethe elements of X are linearly independent, it follows that
at least one of the bjs is non-zero:WLOG we may take bk+1 6= 0.
Therefore
sk+1 = b1k+1(a1x1 + + akxk xk+1 + bk+2sk+2 + + bnsn)We may
therefore eliminate sk+1 from all linear combinations describing
elements of V, at thecost of including xk+1. We therefore have
V = Span({x1, . . . , xk+1}
(S \ {s1, . . . , sk+1}
))By induction, the claim is proved. Taking k = m and setting T
= {s1, . . . , sm} we see that
V = Span({x1, . . . , xm}
(S \ T
))It remains to see that |X| > m is impossible. By the
definition of m, if |X| > m, then m = n = |S| andthere must
exist some xm+1 X. However, applying the induction step with k = m,
we see that ()contains no terms sj, whence
xm+1 = a1x1 + + amxmBut this contradicts the linear independence
of X. Therefore
|X| = m n = |S|which completes the proof.
19
-
Example of the Exchange Theorem Let V = R3, S = {i, j, k}, X
={(
235
),( 6
912
)}.
Since x1 = 2i + 3j + 5k, and the coefficient of i is non-zero,
put s1 = i. Now
x2 =
6912
= 3x1 + 0j 3kwe choose s2 = k. Therefore T = {i, k} is the
exchange, and we conclude that
R3 = Span{x1, x2, j}
Every Finitely Generated Vector Space has a Basis Recall that if
V is finite-dimensional, then ithas a finite generating set S.
Given any linearly independent set X in V, we may use the
ExchangeTheorem to create a new finite spanning set X (S \ T).
Indeed all we need from the ExchangeTheorem is that X is a finite
set, and the existence of a finite spanning set which contains X.
Armedwith this, we can now construct a basis.
Theorem 1.20 (Extension Theorem). Let V be a finite-dimensional
vector space. Suppose that X and S aresubsets of V such that X is
linearly independent, X S, Span(S) = V and |S| = n is finite. Then
thereexists a basis of V such that X S.
Proof. Let m = |X| where m n. Let X = {x1, . . . , xm}. Then
X S = Span(X) Span(S) = V
If Span(X) = V then we are done: X is a basis.Otherwise, suppose
that Span(X) 6= V. Then sm+1 S such that sm+1 6 Span(X). This means
that
X {sm+1} = {x1, . . . , xm, sm+1}
is a linearly independent set in V.Now consider X {sm+1} in
place of X and repeat (induction). The process must stop in at
mostnm many steps since S is a finite spanning set.11
Now we are in the home straight. We know that finite-dimensional
vector spaces have bases andwe can immediately use the Exchange
Theorem to compare their cardinalities.
Corollary 1.21 (Well-definition of Dimension). Suppose that V is
a finite-dimensional vector space. Sup-pose that 1, 2 are two bases
of V. Then |1| = |2|.
Proof. Since V is finite dimensional, such 1, 2 exist. Taking X
= 1, S = 2 in the ExchangeTheorem we see that, |1| |2|. Now repeat
the argument with 1, 2 reversed.
By the corollary we may now define dimension.
Definition 1.22. If V is finite-dimensional, then its dimension
dim V is the cardinality of any basis set.11In the worst case we
would have X {sm+1, . . . , snm} = S being the desired basis .
20
-
Corollary 1.23. Suppose that W is a subspace of a finite
dimensional vector space V. Then
dim W dim V
Moreover if dim W = dim V then W = V.
Proof. Let X be a basis of W, let B be a basis of V and let S =
B X. Certainly Span(S) = V. By theExtension Theorem, a basis
satisfying X S, whence
|X| || which says dim W dim V
Since and X are finite sets, we have equality of dimension if
and only if = X. Thus X is a basis ofV and so W = V.
Example of the Extension Theorem Find a basis as a subset of the
following spanning set of R3:
S = {v1, v2, v3, v4, v5, v6} =
12
0
,21
1
, 11
1
,54
2
,01
3
,11
4
Start by observing that v1 and v2 are non-parallel, whence
X = {v1, v2}
is linearly independent. Since X does not span R3, we need
another vector from S. Observe thatv5 6 Span(X), so we may choose
s3 = v5 to obtain the linearly independent set
= {v1, v2, v5} =
12
0
,21
1
,01
3
Since is a linearly independent set and dim Span() = 3 = dim R3,
Corollary 1.23 says thatSpan() = R3 so that is a basis.
Characterization of a basis: Uniqueness of representation
One of the purposes of a basis is to be able to represent a
vector in terms of its coefficients. Forexample, suppose that = {1,
x, x2} is the standard basis of P2(R) and define
[a + bx + cx2] =
abc
We refer to the vector
( abc
) R3 as the co-ordinate representation of a + bx + cx2 with
respect to the basis
. Such representations allow us to apply matrix methods to
questions about vector spaces. It isan important fact that the
co-ordinate representation of any vector with respect to a basis is
unique.Moreover, this property essentially characterises the
concept of a basis.
21
-
Theorem 1.24. Let = {v1, . . . , vn} be a finite subset of a
vector space V. Then is a basis if and only ifeach v V has a unique
representation with respect to . Otherwise said, every v V can be
written as aunique linear combination
v = a1v1 + + anvnwhere each vi .Proof. () Suppose that is a
basis. Then generates V and so every vector v V can bewritten as a
linear combination of elements vi V. Suppose that v V which has at
least tworepresentations with respect to . Then we have
v = a1v1 + + anvn = b1v1 + + bnvnfor some scalars ai, bi F. It
follows that
(a1 b1)v1 + + (an bn)vn = 0which is a linear dependence on .
Contradiction.() Conversely, suppose that is not a basis. There are
two possibilities:
(a) does not generate V. In this case, v V with no
representation in terms of the vi.(b) generates V but is linearly
dependent. In this case there exists a linear dependence
c1v1 + + cnvn = 0But then, for any v V we see that
v = a1v1 + + anvn = (a1 + c1)v1 + + (an + cn)vnare two genuinely
different representations of v.
Either way, there exists some v V without a unique
representation in terms of .
Because of the Theorem, we may make the following
definition.
Definition 1.25. If = {v1, . . . , vn} is a basis of V over F,
and v V, we call the unique representa-tion in Theorem 1.24
[v] =
a1...an
Fnthe co-ordinate representation of v with respect to .
Example With respect to the basis = {1 x, 1 + x2, x 2x2} of
P3(R), the polynomialv = 3 5x + 7x2 = 2(1 x) + (1 + x2) 3(x
2x2)
has co-ordinate representation [v] =( 2
13
)with respect to .
With respect to the basis = {2 x, x2, 1 + x} we instead have
v = 3 5x + 7x2 = 83(2 x) + 7x2 7
3(1 + x) = [v] =
8/377/3
We will return to co-ordinate representations in the next
chapter.
22
-
1.7 Maximal linearly independent subsets (non-examinable)
In the previous section, we showed that every finite-dimensional
(i.e., finitely generated) vector spacehas a basis. What about
vector spaces which are not finitely generated? Does every vector
space havea basis?
This is a subtle question. Take for instance the vector space
P(R) of all polynomials with coeffi-cients in R. We stated that its
standard basis is the infinite set
= {1, x, x2, x3, }This certainly satisfies Definition 1.17;
every polynomial is a finite combination of terms in , and isa
linearly independent set. Thus P(R) is an infinite-dimensional
vector space with a countable basis, and we could write dim P(R) =
0.
Here is a related example. Consider the vector space V of power
series with coefficients in R. Ourproblem is that the vector
n=0
xn = 1 + x + x2 + x3 +
(which converges on (1, 1) to the function 11x ) is an infinite
combination of the vectors in . Wecannot therefore claim that forms
a basis of V. But does V have a basis?
There are two ways around this problem. The first is to consider
extending the definition of linearcombination to allow for infinite
sums. The problem with this approach is that of convergence of
sums.In an abstract vector space we only assume that
v, w V, v + w VThis allows us to conclude, by induction, that
any finite sum ni=1 vi still lies in V. How do we knowthat n=1 vn
has meaning? In the abstract we dont: you have to make some
additional convergenceassumptions. If you study Banach and Hilbert
spaces in an advanced analysis course, this is the typeof approach
you will follow. In the context of power series, even though is not
a basis, it is typicallymuch more useful to the study of V than a
basis would be.
An alternative approach is to appeal to the (somewhat)
controversial axiom of choice from settheory. The axiom of choice
can be shown to be equivalent to Zorns Lemma which follows. The
ideais to consider the set F of all linearly independent subsets of
a vector space V. Of course F is goingto be very large! We can
think of certain subsets of F , called chains, where every pair of
elements inthe chain may be compared:
Definition 1.26. Let F be a set of sets. We say that a subset C
F is a chain12 in F ifA, B C either A B orB A
We say that a chain C has an upper bound in F if there is some
element B F such thatA C, A B
We say that F is a maximal member of F if is a subset of no
member of F except itself.12Alternatively C is a nest, a tower, or
is totally ordered.
23
-
The idea is that if F is taken to be the set of all linearly
independent subsets in a vector space V,then a maximal element
should be a basis. This should be completely obvious when V is
finite-dimensional. Consider, for example, = {i, j, k} as a basis
of R3. Then is an upper bound for thechain
C ={{i}, {i, j}, {i, j, k}
}For an infinite-dimensional example, = {1, x, x2, . . .} as a
basis of P(R) is an upper bound for thechain
C ={{1}, {1, x}, {1, x, x2}, . . .
}Read this second example carefully: the ellipsis dots are
hiding infinitely many subsets! In particularthe upper bound does
not have to be an element of the chain.It is this example that
gives us the idea of how to find a basis in general: =
UC
U is precisely the
union of all of the elements of the chain C.
Here are some of the details:
Definition 1.27. Let V be a vector space. We define a subset V
to be a maximal linearly independentsubset of V if it satisfies the
following two properties:
1. is linearly independent.
2. The only linearly independent subset of V that contains is
itself.
You should be able to convince yourself that:
Lemma 1.28. A subset V is a basis of V if and only if it is a
maximal linearly independent subset.
Finally we need the additional input that makes this work for
infinite-dimensional vector spaces.
Theorem 1.29 (Zorns Lemma). Let F be a non-empty family of sets.
Suppose that every chain C in F hasa maximal member M which
contains every member of C. Then F has a maximal member.
Theorem 1.30. Every vector space has a basis.
Proof. Let F = {linearly independent subsets of V}. If V = {0}
we are done. Otherwise, somenon-zero v V. Thus {v} F so that F is
non-empty. Appealing to Zorns Lemma, our job is toshow that every
chain in F has a maximal member which contains every member of said
chain.Suppose that C F is a chain, and define
MC =
UCU
We claim that MC is an upper bound for C in F . For this, we
need to show two things:
1. MC F : otherwise said, MC is a linearly independent set.
2. A C, we have A MC .
24
-
The latter is obvious from the definition of union! For the
former, suppose that u1, . . . , un MC aredistinct vectors such
that
a1u1 + + anun = 0By the total ordering of C, we see13 that U C
such that u1, . . . , un U. But each U is linearlyindependent,
whence a1 = = an = 0. It follows that MC F .We have shown that
every chain in F has an upper bound MC in F . Applying Zorns lemma,
we seethat F has a maximal element , which is necessarily a basis
of V.
Such an argument (take the union over a chain to create an
explicit upper bound so we can invokeZorns Lemma) is replicated in
several other places in mathematics. If you study mathematics
atgraduate level, you will very likely see it again. A basis whose
existence is justified by the Theoremis known as a Hamel basis of
V. Disappointingly, Hamel bases are almost completely useless for
com-putational purposes, but it is nice to know that they exist all
the same!
The essential results (the Exchange/Extension Theorems and the
uniqueness of representation)may be generalized to cover
infinite-dimensional vector spaces: one just has to be careful with
inter-pretation. For instance here are some generalizations:
(Theorem 1.20) If X V is linearly independent then it may be
extended to a basis of V. Thismay be proved by applying Zorns Lemma
to the family of all linearly independent subsets ofV containing X,
exactly as in Theorem 1.30.
(Corollary 1.23) All bases of an infinite-dimensional vector
space have the same cardinality,whence dimension is well-defined.
This is a bit trickier and requires an infinite-dimensionalversion
of the Exchange Theorem.
(Theorem 1.24) If is a basis of V, then for all non-zero v V
there is a unique finite subset{v1, . . . , vn} and unique non-zero
scalars a1, . . . , an such that
v = a1v1 + + anvnOur only freedom is in the order of the vectors
vi.
To see this last, for instance, suppose that v V is a non-zero
vector. Since spans V there certainlyexists a finite linear
combination for v in terms of the elements of . Suppose that there
are two suchcombinations,
v = a1v1 + + anvn = b1w1 + + bmwm where each vi, wj WLOG we may
assume that all ai, bj are non-zero. Let X = {v1, . . . , vn, w1, .
. . , wm} (note thatthere might be repeats, so that |X| n + m).
Relabelling X = {x1, . . . , xk} where k n + m, weobtain two linear
combinations for v:
v = c1x1 + + ckxk = d1x1 + + dkxkwhere are least some of the ci,
di are non-zero. But this is now a linear dependence on the set
unless ci = di for all i. It follows that X = {v1, . . . , vn} is
the unique subset of such that v =a1v1 + + anvn with all ai 6=
0.
13Since ui
UC U, Ui C such that ii Ui. Now let U = U1 Un. By total
ordering, one of these Ui containsall the others: this is U. Note
that this only works for finite n!
25
-
2 Linear Transformations and Matrices
The standard systematic approach in algebra is to study a
collection of sets which have a commonstructure, and the maps
between them which preserve that structure. In the context of
vector spacesthis means maps which preserve the structure of vector
addition and scalar multiplication.
2.1 Linear Transformations, Null Spaces and Ranges
Definition 2.1. Let V and W be vector spaces over the same field
F. We say that a function L : V Wis linear if it satisfies the
following properties:
v1, v2 V, F,{
L(v1) + L(v2) = L(v1 + v2)L(v1) = L(v1)
The idea is that the operations of vector addition and scalar
multiplication in V and W are com-patible: we may add vectors in V
first then map to W, or we may map to W first then add, the
resultmust be the same.
You have met many examples of linear maps already (see, e.g.,
the intro).
Examples
1. Matrix multiplication: if v Fn and A Mmn(F), then
L : Fn Fm : v 7 Av
is linear. Spelling this out is tedious: for instance, the ith
entry of the vector A(x + y) is
[A(x + y)]i =n
j=1
aij(xj + yj) =n
j=1
aijxj +n
j=1
aijyj = [Ax]i + [Ay]i
which is precisely the ith entry of the vector Ax + Ay. Scalar
multiplication is similar.
2. Differentiation: If WD is the set of functions with domain D
and VD is the subspace of differen-tiable functions in WD, then
L : VD WD : f 7d fdx
is linear.
3. If C(R) is the set of continuous functions with domain R,
then
L : C(R) R : f 7 b
af (x)dx
is linear, for any constants a, b.
26
-
Definition 2.2. The set of linear maps from V to W is denoted14
L(V, W). If V = W we simplywrite L(V) instead of L(V, V).
The zero function 0 L(V, W) is the linear map defined by15
v V, 0 : v 7 0W
The identity function I L(V) is the linear map defined by
v V, I : v 7 v
The following should be easy to prove straight from the
definition of linearity.
Theorem 2.3. 1. L L(V, W) if and only if L preserves all linear
combinations: i.e.,
vi V, ai F, L(
n
i=1
aivi
)=
n
i=1
aiL(vi)
In particular, L(0) = 0.
2. L(V, W) is a vector space whose identity is the zero function
0 L(V, W).
Definition 2.4. Let V and W be vector spaces and L L(V, W). The
range or image of V is the usualimage of L viewed as a
function:
R(L) = {L(v) W : v V}
The null space or kernel of L is the set of all vectors which
are mapped to zero by L:
N (L) = {v V : L(v) = 0W}
Theorem 2.5. The null space and range of a linear map are
subspaces of V and W respectively.
Proof. Everything comes from the formula
L(v1 + v2) = L(v1) + L(v2)
Clearly if v1, v2 N (L), so is v1 + v2.Similarly, for any L(v1),
L(v2) R(L) we see that L(v1) + L(v2) R(L).
Since the null space and range are vector spaces, they have a
dimension:
Definition 2.6. The rank and nullity of a linear map L L(V, W)
are the dimensions of the range andnull-space of L
respectively:
rank L = dimR(L) null L = dimN (L)14Some texts use hom(V, W)
instead of L(V, W). This is short for homomorphism, literally same
transformation, indicat-
ing that something, in this case the structure of addition and
scalar multiplication, stays the same after applying the map.If V =
W the set of linear maps can also be written End(V), for
endomorphism.
15Since we now have two vector spaces, you may find it helpful
to explicitly distinguish between the zero in V and thezero in W.
This can be done with suffices, e.g., 0V , 0W .
27
-
We are moving towards an important result linking the dimensions
of these spaces. As a prelim-inary step, we need a lemma.
Lemma 2.7. If is a basis for V, then L() = {L(v) : v } is a
spanning set forR(L).Proof. Let L(v) R(L). Then there is a finite
combination
v = a1v1 + + anvnwhere each vi . But then
L(v) = L
(n
i=1
aivi
)=
n
i=1
aiL(vi) Span(
L())
()
ThusR(L) Span(
L()).
Conversely, the right hand side of () is a general element of
Span(
L()), and which is certainly in
the range of L. Thus Span(
L()) R(L). It follows that these subspaces are equal.
In particular, note that we assumed nothing about the
cardinality of the basis ; we could betalking about an
infinite-dimensional vector space. Recall that every element of a
vector space mustbe expressible as a finite combination of the
basis vectors, even if the basis is infinite.
The critical relationship between rank, nullity and dimension is
contained in the following theo-rem, also known as the dimension
theorem.
Theorem 2.8 (RankNullity). If L L(V, W), then
rank L + null L = dim V
Proof. Suppose that X is a basis of the null spaceN (L). By the
Extension Theorem,16 we may extendthis to a basis = X Y of V, where
X Y = . We claim that L(Y) is a basis ofR(L).By Lemma 2.7, L()
spans the range of L. However L(x) = 0 for each x X, whence
R(L) = Span(
L(Y))
It remains to see that L(Y) is linearly independent: for this,
note that if y1, . . . , yr Y, thenr
i=1
aiL(yi) = 0 L(
r
i=1
aiyi
)= 0
r
i=1
aiyi N (L) Span(Y) = {0}
It follows that all ai = 0, so that L(Y) is linearly independent
and thus a basis of the range R(L).Moreover, it is clear that,
restricted to Y,
L|Y : Y L(Y)
is a bijection, whence
dim V = || = |X|+ |Y| = null L + |L(Y)| = null L + rank L
as required.
16If V is infinite-dimensional, we need the generalization on
page 25, and we must interpret addition as that of cardinal-ity.
The remainder of the proof is unchanged.
28
-
Examples
1. Let V = R3 and W = R4, and let L L(V, W) be
left-multiplication by the matrix
A =
1 2 30 1 11 0 10 3 3
Since the range of this linear map is precisely the span of the
columns of A, and because thethird column is the sum of the first
two, it is clear that
R(L) = Span
1010
,
2103
= rank L = 2
Moreover, by performing row operations in an attempt to solve Ax
= 0 we obtain the reducedrow echelon form
1 0 10 1 10 0 00 0 0
By solving these equations, it follows that the null space of
the linear map is
N (L) =
xy
z
: x + z = 0 = y + z = Span
111
= null L = 1The ranknullity theorem simply reads 2 + 1 = 3.
2. If V = P3(R) and W = P2(R) we may take the linear map L L(V,
W) defined by differentia-tion. With respect to the standard
bases,
L(a + bx + cx2 + dx3) = b + 2cx + 3dx2
Clearly N (L) = {a : a R} P3(R) is the 1-dimensional space of
constants, and R(L) =P2(R). It follows that
rank L + null L = 3 + 1 = 4 = dim P3(R)
3. (Non-examinable) Repeating the example 1 with V = W = P(R) we
see that L L(P(R)) has
f N (L) f (x) = 0 f is constant
Thus null L = 1. However R(L) = P(R), since every polynomial is
the derivative of another.Since P(R) has a countable basis = {1, x,
x2, . . .}, we see that the ranknullity theorem says
0 + 1 = 0which makes perfect sense in the context of addition of
infinite cardinals.
29
-
Injective Linear Maps
Because of the extra structure of linearity, injective linear
maps have a very straightforward charac-terization.17
Theorem 2.9. L L(V, W) is injective if and only if N (L) =
{0}.
Otherwise said, an linear map is injective if and only if its
nullity is zero.
Proof. Let v1, v2 V. Then, by linearity,
L(v1) = L(v2) L(v1 v2) = 0W v1 v2 N (L)
from which the result is immediate.
Clearly none of the above examples are injective. A couple of
quick appeals to the ranknullitytheorem gives the following
corollaries.
Corollary 2.10. If dim V > dim W then there are no injective
functions L L(V, W).
Proof. Since R(L) is a subspace of W, we have rank L = dimR(L)
dim W. However, if dim V >dim W and L : V W is an injective
linear map, then null L = 0. By the ranknullity theorem, weconclude
that
rank L = dim V > dim W
which is a contradiction.
Corollary 2.11. Let V and W be finite-dimensional with equal
dimension, and assume that L L(V, W).The following are
equivalent:
1. L is injective.
2. L is surjective.
3. null L = 0.
4. rank L = dim V.
Proof. Observe the following:
L is injective null L = 0 rank L = dim V (holds for any vector
spaces)
L is surjective R(L) = W rank L = dim W
The implication rank L = dim W = R(L) = W is the only part to
require that W be finite-dimensional. Since we are assuming that
dim W = dim V, we are done.
17Recall that a function f : A B is injective (one-to-one) if it
never takes the same value twice. I.e., a1, a2 A, a1 6=a2 = f (a1)
6= f (a2).
30
-
2.2 The Matrix Representation of a Linear Map
Recall Theorem 1.24 where we saw that any vector has a unique
co-ordinate representation withrespect to a basis. The same
reasoning can be applied to a linear map L L(V, W).Theorem 2.12
(Matrix representations). Suppose that = {v1, . . . , vn} and =
{w1, . . . , wm} are basesof V and W respectively. If L L(V, W)
then there exists a unique matrix A Mmn(F) such that
v V, [L(v)] = A[v] ()
Moreover, A is the matrix whose jth column is the co-ordinate
representation of L(vj) with respect to the basis:
A =
| |[L(v1)] [L(vn)]| |
()Definition 2.13. The matrix A = (aij) defined above is the
matrix representation of L with respect to and . We use the
notation A = [L] . If L L(V) and = , we simply write [L].
The Theorem can be summarized by the following commutative
diagram. If L L(V, W), then thesymbol [ ] represents mapping a
vector to its representation with respect to the basis , and A
meansmultiply by the matrix A = [L] ,. What the diagrams mean is
that you have two options to travelfrom V to Fm and that each must
produce the same result.
V L //
[ ]
W
[ ]
Fn
A // Fm
v //
L(v)
[v] // [L(v)] = [L]
[v]
Proof of Theorem. Suppose first that a matrix A satisfying ()
exists. Then it must satisfy () for eachof the basis vectors v1, .
. . , vn. However, with respect to the basis , we simply have
[vj] = ej
where ej is the jth standard basis vector in Fn. Since Aej is
simply the jth column of A it follows thatA must have the form
claimed in ().It remains to see that A as defined by () satisfies
() for every vector v V. For this, note that theunique
representation of v with respect to reads
v = b1v1 + + bnvn [v] =
b1...bn
FnNow observe, by (), that A = (aij) has ijth entry
aij =[[L(vj)]
]i
31
-
which is the ith entry of the representation of L(vj) with
respect to the basis . By matrix multiplica-tion, the column vector
A[v] Fm has ith entry
[A[v]
]i=
n
j=1
aijbj =n
j=1
[[L(vj)]
]ibj =
n
j=1
[bj[L(vj)]
]i=
[n
j=1
[bjL(vj)]
]i
=
[L( nj=1
bjvj
)]
i
=[[L(v)]
]i
(by linearity/Theorem 2.3)
Therefore the matrix A defined by () acts as we claim, and is
the unique matrix satisfying () by thefirst part of the proof.
Since any list of n vectors {z1, . . . , zn} in W may be written
in the form
zj =m
i=1
wiaij
for unique constants aij, the following corollary is
immediate.
Corollary 2.14. A linear map L L(V, W) is completely determined
by what it does to a basis. Otherwisesaid, if z1, . . . , zn W are
any vectors in W, then there exists a unique linear map L L(V, W)
such thatL(vj) = zj for each j. The linear map in question is
precisely that L for which [L]
= A.
Examples
1. If you return to the introduction, the matrix of the linear
map rotate clockwise by 30 withrespect to the standard basis = {i,
j} of R2 is
[L] =
( 3
212
12
32
)
2. (See Example 2 on page 29) With respect to the standard bases
of P3(R) and P2(R), the matrixof differentiate is
[L] =
| | | |[L(1)] [L(x)] [L(x2)] [L(x3)]| | | |
= | | | |[0] [1] [2x] [3x2]| | | |
=
0 1 0 00 0 2 00 0 0 3
3. Consider the vector space V = Span , where = {ex sin 2x, ex
cos 2x} and let L L(V) be
the linear map differentiate. Since
L(ex sin 2x) = ex sin 2x + 2ex cos 2xL(ex cos 2x) = ex cos 2x
2ex sin 2x
32
-
we see that
[L] =(1 22 1
)In particular, given what you should already know about matrix
multiplication, if f V, then
[L( f )] = [L][ f ] = [ f ] = [L]1 [L( f )] =15
(1 22 1
)[L( f )]
Think about what this is saying: if g(x) = L( f )(x) = aex sin
2x + bex cos 2x, then g has ananti-derivative
g(x)dx = a5
ex(sin 2x + 2 cos 2x) +b5
ex(2 sin 2x cos 2x)
No need for integration by parts!
4. (Tricky!) It is often easier to consider a linear map with
respect to a basis which is chosen inorder to make the matrix of
the linear map as simple as possible. For example, suppose thatL :
R2 R2 is the linear map defined by reflect in the line y = 2x.If
you draw a picture, it should be clear that the
vectors v1 =(
12
)and v2 =
(21
)behave very
nicely with respect to the linear map. Indeed
L(v1) = v1 L(v2) = v2
If we take = {v1, v2}, then is a basis of R2.Moreover, the
matrix of L with respect to is sim-ply
[L] =(
1 00 1
)1
1
2
y
2 1 1 2x
i
jL(i)
L(j)
v1
v2
L(v2)
This is much simpler than trying to calculate the matrix of L
with respect to the standard basise = {i, j}: it is an exercise to
see that18
[L]e =15
(3 44 3
)
5. Here is another example of the same idea. Let n R3 be a fixed
non-zero unit vector (|n| = 1)and define L : R3 R3 to be the linear
map
L : v 7 v (v n)n
You should first convince yourself that this is linear! How
could we find a matrix for this linearmap? There are two potential
approaches.
18If you recall a previous class, 1 and 1 are the eigenvalues of
the matrix [L]e, and v1, v2 the corresponding eigenvectors.[L] is
then the diagonalization of [L]e. We will return to this concept
later.
33
-
(a) Use the standard basis = {i, j, k}. For example, if n = 15i
+ 2
5k, then
L(i) =(
100
)+
15
1/
50
2/
5
= 15
402
, L(j) =01
0
, L(k) = 15
201
We therefore obtain the matrix
[L] =15
4 0 20 5 02 0 1
Repeating this in general, if n = n1i + n2j + n3k we have
[L] =
1 n21 n1n2 n1n3n1n2 1 n22 n2n3n1n3 n2n3 1 n23
(b) Alternatively, we can use a basis fitted more neatly to the
linear map. Choose the first
basis vector to be v1 = n, then choose any two other
non-parallel vectors v2, v3 which areperpendicular to n. It follows
that
L(v1) = 0, L(v2) = v2, L(v3) = v3
With respect to the basis = {v1, v2, v3}, we have the matrix
[L] =
0 0 00 1 00 0 1
Having considered the linear map thusly it should be clear what
it is doing: it is projectingonto the plane through the origin
perpendicular to n, onto the subspace Span(v2, v3). Inthe case
where n = 1
5i + 2
5k, we could easily choose v2 = j, v3 = 2i k.
It should also be from the linear maps interpretation as a
projection, that its null-space isN (L) = Span(n) and its range
isR(L) = Span(v2, v3).
34
-
Summary The big take-away from all of this is the following:
Linear Map = Matrix + BasesMore precisely, once you choose bases
of finite dimensional vector spaces, then any linear map be-tween
them is equivalent to a matrix.
2.3 Composition of Linear Maps and Matrix Multiplication
Given linear transformations L L(U, V) and M L(V, W) where all
vector spaces have the samebase field, it makes sense to consider
the composition:19
M L : U W : u 7 M(L(u))
Since
(M L)(u1 + u2) = M(
L(u1 + u2))= M
(L(u1) + L(u2)
)(linearity of L)
= M(
L(u1))+ M
(L(u2)
)(linearity of M)
= (M L)(u1) + (M L)(u2)
we have proved the following.
Theorem 2.15. The composition of two linear maps is a linear
map.
We now consider the matrix of the composition of linear
maps.
Theorem 2.16. Suppose that L L(U, V) and M L(V, W), where U, V,
W are finite-dimensional.Suppose also that , , are bases of U, V, W
respectively and that [L] and [M]
are the matrices of L, M
with respect to these bases. Then the matrix of the composition
M L with respect to and is
[M L] = [M][L]
Proof. Let = {u1, . . . , un}, = {v1, . . . , vm} and = {w1, . .
. , wl}, and write A = [M], B = [L]and C = [ML]. We simply compute
what happens to each of the basis vectors of U.
ML(uk) = M
(m
j=1
vjBjk
)
=m
j=1
M(vj)Bjk =m
j=1
l
i=1
wi AijBjk
=l
i=1
wi
(m
j=1
AijBjk
)
Since, by definition,
ML(uk) =l
i=1
wiCik
19It is also common to write ML instead of M L for the
composition.
35
-
it follows from the fact that = {w1, . . . , wl} is a basis,
that
Cik =m
j=1
AijBjk
Otherwise said, C = AB.
Corollary 2.17. If V is finite-dimensional with basis , and M, L
L(V), then
[ML] = [M][L]
Examples
1. Recall that the matrix of rotate clockwise by 30 with respect
to the standard basis = {i, j}of R2 is
[L] =
( 3
212
12
32
)
It follows that L2 (rotate clockwise by 60) has matrix
[L2] = [L][L] =
( 3
212
12
32
)2=
(12
3
2
32
12
)
Similarly, rotation clockwise by 90 has matrix
[L3] =(
0 11 0
)
2. Recall the example where we projected onto the plane
perpenduicular to n = 15i + 2
5k in
R3. Suppose that we wanted to compute the matrix of the linear
map defined by: project ontothis plane followed by rotate 60
clockwise around the z-axis when viewed from above. By theprevious
part, if M is the linear map for rotation, it should be clear that
M has the followingmatrix with respect to the standard basis = {i,
j, k}
[M] =
12
32 0
32
12 0
0 0 1
It follows that the composite linear map M L has matrix
[ML] = [M][L] =12
1
3 0
3 1 00 0 2
15
4 0 20 5 02 0 1
= 110
4 5
3 24
3 5 2
34 0 2
36
-
3. Also recall V = Span , where = {ex sin 2x, ex cos 2x} and L
L(V) is differentiate. Then
[L] =(1 22 1
)Since the identity linear map I L(V) must have matrix I2 =
(1 00 1
)with respect to any basis,
it follows that [L][M] = I2 if and only if [M] = [L]1 . However,
integration is the inverseprocess to differentiation, whence [M]
must be the matrix of integration with respect to thebasis .
Definition 2.18. The Kroneker delta symbol ij is defined by
ij =
{1 if i = j0 if i 6= j
The n n identity matrix In is the matrix whose ijth entry is ij:
that is
(In)ij = ij
Theorem 2.19. If V is an n-dimensional vector space with basis ,
then [A] = In if and only if A L(V)is the identity transformation
of V.
Proof. If A = I is the identity, and = {v1, . . . , vn}, then
clearly A(vi) = vi for each vi and so thematrix of A is the
identity matrix In. Conversely, by Theorem 2.12, a matrix
representation is unique,whence A = I is the only linear map with
matrix In.
Left-multiplication by matrices When dealing with vector spaces
of the form Fn, the it can bedifficult to distinguish between
matrix multiplication and linear maps. This is because they are
es-sentially the same, once youve chosen bases! Since Fn has a
standard basis, it will often appear as if nosuch choice has been
made, and confusion arises. Worst of all, this confusion spills
over into othervector spaces. To tidy this up, it is a good idea to
rephrase some of the discussion in the context oflinear maps L
L(Fn, Fm).Theorem 2.20. Let A Mmn(F). Then left-multiplication of
vectors in Fn by A results in a lineartransformation
LA : Fn Fm : v 7 Av
If B Mmn(F) is any matrix, and , are the standard bases of Fn,
Fm, then we have the following.1. [LA]
= A
2. LA = LB A = B3. LA+B = LA + LB and LA = LA for all F4. If T
L(Fn, Fm), then there is a unique C Mmn(F) such that T = LC.5. If E
Mnp(F), then LAE = LALE.6. If m = n, then LIn = IFn (the linear map
obtained by left-multiplication by the n n identity matrix is
the identity linear map).
37
-
2.4 Invertibility and Isomorphisms
In this section we restrict to linear maps which have inverses
under composition.
Definition 2.21. Suppose that L : V W is linear. A function M :
W V is an inverse of L if bothL M = IW and M L = IV
where IW and IV are the identity maps on V, W respectively. If
such an M exists, we say that L isinvertible or an isomorphism.We
say that vector spaces V, W are isomorphic if there exists an
invertible L L(V, W).
Since invertible means the same as bijective, we note that L is
invertible if and only if it is bothinjective and surjective.20 The
following results consist of the basic properties of inverses.
Theorem 2.22. 1. If L is invertible, then it has a unique
inverse: we call this function L1.
2. If L L(V, W) and N L(U, V) are invertible, then L N L(U, W)
is invertible, with inverse(L N)1 = N1 L1
3. If L is invertible, then L1 is invertible, and (L1)1 = L.
4. If L L(V, W) is invertible, then L1 L(W, V): that is, L1 is
itself linear.Proof. 1. Suppose that L L(V, W) has two inverses, M
and N. Then, in particular, we have
L M = IW and N L = IVNow precompose the first equation with N to
obtain
N (L M) = N IW= (N L) M = N (associativity of functional
composition)= IV M = N= M = N
2. By associativity of functional composition,
(L N) (N1 L1) = L (N N1) L1 = L IV L1 = L L1 = IWShowing that
(N1 L1) (L N) = IU is similar.
3. This is simply a re-reading of L1 L = IV and L L1 = IW .4.
Suppose that L L(V, W) is invertible with inverse L1 : W V. Let w1,
w2 W and F.
Since L is bijective, w1, v2 V such that wi = L(vi). But
thenL1(w1 + w2) = L1
(L(v1) + L(v2)
)= L1
(L(v1 + v2)
)(linearity of L)
= v1 + v2 = L1(w1) + L1(v2)
20If L is invertible, the fact that L M = IW is bijective forces
both L to be injective and M surjective. M L = IV beingbijective
forces M injective and L surjective, whence both are
bijective.Conversely, if L is bijective and w W, we know
(surjectivity) that w = L(v) for some v V. Injectivity of L says
that vis the only vector in V for which this is true. We define
L1(w) = v, then L is invertible.
38
-
Remark: isomorphic is an equivalence relation on the collection
of all vector spaces. In particular,IV is a isomorphism of any
vector space with itself (reflexivity), while parts 2 and 3 of the
abovetheorem show transitivity and symmetry respectively.
Invertibility and dimension Recall Corollary 2.11. In our new
language this says that if V, W arefinite-dimensional vector spaces
of the same dimension then
L L(V, W) is invertible null L = 0 rank L = dim V
What is interesting is that this can be turned on its head:
equal dimension implies isomorphicity.21
Theorem 2.23. Suppose that V and W are vector spaces over the
same field.
1. If L L(V, W) is an isomorphism and is a basis of V, then L()
is a basis of W.
2. V and W are isomorphic if and only if dim V = dim W. In
particular, if V and W are isomorphic, thenV is finite-dimensional
if and only if W is.
Proof. 1. By Lemma 2.7, if is a basis of V, then L() is a
spanning set for L(V). But L is surjective,thus L(V) = W whence L()
spans W.Now suppose that v1, . . . , vn and a1, . . . , an F for
which
a1L(v1) + + anL(vn) = 0
Since L is injective, Theorem 2.9 says that we have a trivial
null-space, N (L) = {0}. It followsthat a1v1 + + anvn = 0 = a1, . .
. , an = 0. Therefore L() is linearly independent and thusa basis
of W.
2. By part 1, if is a basis and L L(V, W) an isomorphism, then
L| : L() is a bijection. Itfollows that bases of V and W have the
same cardinality, whence dim V = dim W.Conversely, if dim V = dim W
and , are bases of V and W respectively, then and havethe same
cardinality. It follows that there exists a bijection f : . Since
any linear mapis defined by what it does to a basis, f gives rise
to a unique linear map L : V W: for anyn N, if v1, . . . , vn are
any elements of define
L
(n
i=1
aivi
):=
n
i=1
ai f (vi)
It is a straightforward exercise to check that L is an
isomorphism.
While the theorem applies even to infinite-dimensional vector
spaces, something more pleasanthappens in finite dimensions. Over a
field F, there is, up to isomorphism, precisely one vector spaceof
dimension n. The following corollaries are nothing more than
restatements of Theorem 1.24 andCorollary 2.14 respectively using
the language of isomorphisms.
Corollary 2.24. 1. If V is a vector space over F with dimension
n, then V is isomorphic to Fn. In partic-ular, if is a basis of V,
then (v) := [v] is an isomorphism : V Fn.
21This is very much in contrast to group theory, where the
standard measure of size is the cardinality of the group.
Forexample, Z4 and the Klein 4-group V have the same cardinality
(four) but are not isomorphic.
39
-
2. If V, W are vector spaces over F of dimensions n and m
respectively, then L(V, W) is a vector space overF of dimension mn.
Moreover, if , are basis of V, W respectively, then
: L(V, W) Mmn(F) : L 7 [L]is an isomorphism.
Examples
1. R6, P5(R), M23(R), M32(R) are all isomorphic because they are
all vector spaces over thesame field R with the same dimension
6.
2. It is incorrect to claim that P5(R) is isomorphic to M23(C)
since the base fields are different.Since any complex vector space
can be viewed as a real vector space with twice the dimension,it
follows that, as real vector spaces
dimR P5(R) = 6 6= 12 = dimR M23(C)
3. (Non-examinable) It is very important to know the base field
when talking about dimensionand isomorphicity! For instance, is it
ever correct to claim that R and C are isomorphic? Asvector spaces
over R, the answer is no, since dimR R = 1 6= 2 = dimR C. However,
it canbe shown that dimQ R = 20 = dimQ C, whence they are
isomorphic as vector spaces over Q.More technically, when claiming
that two objects are isomorphic, it is important to stress inwhat
category you are working: groups, rings, fields, vector spaces over
a given field, etc.
Bases and the dual space (non-examinable) Let us think about the
corollary in terms of the basis-comparison language of the theorem.
First note that if = {v1, . . . , vn}, then
(vi) = eiso that simply maps the ith basis vector of to the ith
standard basis vector of Fn.It is a little trickier to think about
the isomorphism in terms of bases. For this, define the
linearfunctions fi : V F by
fi(vj) := ijand let = {w1, . . . , wm} be a basis of W. It can
be shown that every linear map L L(V, W) is aunique linear
combination
L =m
i=1
n
j=1
aijwi f j
so that the set {wi f j} forms a basis of L(V, W). The
isomorphism maps : wi f j 7 Eij
where {Eij} is the standard basis of Mmn(F).The elements { f1, .
. . , fn} form a basis of the dual space
V := L(V, F)Indeed the basis { f1, . . . , fn} is said to be the
dual basis to and it follows that vi 7 fi defines anisomorphism of
V with its dual.Perhaps surprisingly, if V is infinite-dimensional,
then the whole discussion breaks down and V isnot isomorphic to its
dual.
40
-
2.5 The Change of Co-ordinate Matrix
Suppose that V is finite-dimensional over F with basis = {v1, .
. . , vn}. We know that the map : V Fn : v 7 [x]
is an isomorphism of vector spaces. But what if we chose a
different basis? Suppose that e is also abasis of V. Then we have
another isomorphism
e : V Fn : v 7 [x]eSince inverses and compositions of
isomorphisms are also isomorphisms, it follows that
1e : Fn Fn : [x]e 7 [x] ()is an isomorphism. However, by Theorem
2.20, any linear map in L(Fn) has to be left-multiplicationby some
matrix Q Mn(F). Otherwise said,
Q Mn(F) such that 1e = LQor more concretely,
Q Mn(F) such that [x]e Fn we have ( 1e )([x]e) = Q[x]e ()and
moreover, the matrix Q is unique. This Q has a name:
Definition 2.25. Let V be an n-dimensional vector space over F
with bases and e. The matrixQ Mn(F) whose corresponding linear map
is
LQ = 1e : [x]e 7 [x]is the change of co-ordinate matrix from e
to .
We can compute the change of co-ordinate matrix explicitly. If =
{v1, . . . , vn} and e = {e1, . . . , en}are our two bases, then we
can find the jth column of the matrix Q be multiplying by the jth
standardbasis vector in Fn; according to () this means multiplying
by the co-ordinate vector of ej with respectto e: by () the result
must be [ej],
Q
...010...
= Q[ej]e = [ej]
Rewriting the picture following Theorem 2.12 and taking W = Vwe
obtain a commutative diagram and the proof of:
Theorem 2.26. The change of co-ordinate matrix from e to is
thematrix of the identity linear map on V with respect to e and .
That is,if e = {e1, . . . , en}, then
Q = [IV ]e =
| |[e1] [en]| |
x V IV //
e
x V
[x]e Fn
LQ // [x] Fn
41
-
If this appeal to commutative diagrams is unconvincing, here is
an alternative way of seeing thisresult. It is easiest to follow if
we write scalar multiplication vector-first. Since = {v1, . . . ,
vn} is abasis, for each ej e, we have unique constants Qij such
that
ej = v1Q1j + + vnQnj =n
i=1
viQij
Then, for any x V, there exist unique scalars a1, . . . , an
such that
x = a1e1 + + anen =n
j=1
ejaj =n
j=1
n
i=1
viQijaj
=n
i=1
vi
[n
j=1
Qijaj
]
It follows that
[x]e =
a1...an
and [x] = Qa1...
an
Q is clearly the change of co-ordinate matrix, and its jth
column Q[ej]e is manifestly the vector [ej].
It is important to remember that the change or co-ordinate
matrix is merely telling you how theco-ordinates of a vector v V
change when a basis changes: nothing is happening to the vector
vitself! An analogy is that [v]e and [v] are akin to looking at an
object v though two different pairsof sunglasses: the two images
might be different, but they are simply representations of the
sameunchanging object v.
Computing co-ordinate changes Weve set up our notation so that
the most common situation iseasiest to describe. Suppose that V is
a vector space with a standard basis e = {e1, . . . , en}, and that
= {v1, . . . , vn} is some other basis. A typical problem involves
converting co-ordinates with respectto the standard basis to those
with respect to : we therefore want the change of co-ordinate
matrixQ = [IV ]
e . Unfortunately a direct computation is difficult: we would
need to find the co-ordinates
of each standard basis vector ej with respect to a, perhaps,
complicated basis . It should be clearhowever that the inverse
change of co-ordinate matrix is much easier to compute:
Q = [IV ]e = change of co-ordinate matrix from e to
Q1 = [IV ]e = change of co-ordinate matrix from to e
Since, typically, the basis will be defined in terms of the
standard basis e, the columns of Q1 aresimply the co-ordinates of
the elements of :
Q1 = [IV ]e =
| |[v1]e [vn]e| |
We now take the inverse of Q1 to obtain the desired matrix
Q.
42
-
Example Let e = {1, x, x2} be the standard basis of P2(R) and =
{1 + x, 2 x2, 4 x2}. Wewant to compute Q = [IP2(R)]
e , the change of co-ordinate matrix from e to . Instead we
compute its
inverse:
Q1 = [IV ]e =
| | |[1 + x]e [2 x2]e [4 x2]e| | |
=1 2 41 0 0
0 1 1
To check that this makes sense, consider the polynomial
p(x) = 7(1 + x) 2(2 x2) (4 x2)
which has co-ordinate representation
[p] =
721
Multiplying out p yields
p(x) = 1 + 7x + 3x2
Note that173
= [p]e = Q1 721
= Q1[p]as expected. Inverting the matrix allows us to find the
desired change of co-ordinate matrix Q:
Q =12
0 2 01 1 41 1 2
For example, to find the co-ordinate representation of
r(x) = 2 + 3x + 4x2
with respect to , we compute
[r] = Q[r]e =12
0 2 01 1 41 1 2
234
= 3 152
72
which can be checked by multiplying out
r(x) = 3(1 + x) 152(2 x2) + 7
2(4 x2)
Of course, all this is predicated on being willing to invert a 3
3 matrix.
43
-
This process can be combined with matrix representations of
linear maps.
Theorem 2.27. Suppose that V is an n-dimensional vector space
with bases and e and let L L(V) be alinear map. If Q = [IV ]
e is the change of co-ordinate matrix from e to , then the
matrices of L with respect to
e and are related by
[L]e = Q1[L]Q
To prove this, simply trace through what happens to the
representation of any vector x V withrespect to e:
Q1[L]Q[x]e = Q1[L][x] = Q1[L(x)] = [L(x)]e = [L]e[x]e= [L]e =
Q1[L]Q
The matrices of the linear map L with respect to different bases
are therefore similar, or conjugate.
Examples
1. First recall the example on page 33 where L : R2 R2 is
reflection in the line y = 2x. We recastthis example in our new
language.Let e = {i, j} be the standard basis of R2 and let = {v1,
v2} = {i + 2j,2i + j} be analternative basis. Since L(v1) = v1 and
L(v2) = v2 we see that the matrix of L with respect tothe basis
is
[L] =(
1 00 1
)The change of co-ordinate matrix Q1 from to e is clearly given
by
Q1 = [IR2 ]e = ([v1]e, [v2]e) =
(1 22 1
)It follows that the matrix of the linear map L with respect to
the standard basis e is
[L]e = Q1[L]Q =(
1 22 1
)(1 00 1
) 1
5
(1 22 1
)=
15
(3 44 3
)exactly as claimed on page 33.
2. More generally, let L : R2 R2 be reflection across the line
making angle with the positivex-axis. Choose = {v1, v2} so that v1
is parallel to the line and v2 perpendicular. Given thatwe may
choose v1 =
(cos sin
)and v2 =
( sin cos
), the change of co-ordinate matrix from to e is
Q1 = [IR2 ]e =
(cos sin sin cos
)The matrix of L with respect to is [L] =
(1 00 1
)as
before, whence, invoking some trig identities, we seethat the
matrix of L with respect to the standard basis is
[L]e = Q1[L]Q =(
cos 2 sin 2sin 2 cos 2
)Observe that the columns of [L]e are [L(i)]e and [L(j)]e.
y
xi
jL(i)
L(j)
v1
v2
L(v2)
(cos , sin )
c2 s2
s2
c2
44
-
Change of basis in general (non-examinable) The change of basis
approach can be further gener-alized to apply to a linear map
between vector spaces, where we change basis on both spaces. This
isa crazy result to attempt to work with by hand, but it does say
what we should feed into a computerfor a given example!
Theorem 2.28. Suppose that L : V W is a linear map where dim V =
n and dim W = m. Supposemoreover that e and are bases of V, and
that and are bases of W. Then
[L]e = R1[L] Q
where Q = [IV ]e Mn(F) is the change or co-ordinate matrix from
e to and R = [IW ] Mm(F) is the
change or co-ordinate matrix from to . Otherwise said, the
following diagram commutes.
x V L(x) W
[x] Fn [L(x)] Fm
[x]e Fn [L(x)] Fm
L
e
[L]
[L]e
Q = [IV ]e
R = [IW ]
The above formula can also be viewed in the form
[IW ][L] [IV ]
e = [IW L IV ]e = [L]e
by observing that matrices of linear maps can be multiplied if
the bases of the domain of the firstequals that of the codomain of
the second.
Example The derivative operator D L(P3(R), P2(R)) defined by
D(p)(x) = p(x) has matrix
[D]e =
0 1 0 00 0 2 00 0 0 3
with respect to the standard bases e = {1, x, x2, x3} and = {1,
x, x2}.Suppose that = {1+ x, 1 x, 2x + x2, x3 1} and = {1 x, 2+ x2,
x} are two new bases of P3(R)and P2(R) respectively. The change of
co-ordinate matrices from these bases back to the standardbases
are
Q1 = [IP3(R)]e =
1 1 0 11 1 2 00 0 1 00 0 0 1
R1 = [IP2(R)] = 1 2 01 0 1
0 1 0
45
-
The matrix of D with respect to and is therefore
[D] = R[D]eQ1 =
1 0 20 0 11 1 2
0 1 0 00 0 2 00 0 0 3
1 1 0 11 1 2 00 0 1 00 0 0 1
=1 1 2 60 0 0 3
1 1 4 6
We can check that this works on an example: let
p(x) = 3(1 + x) + 2(1 x) 4(2x + x2) + 5(x3 1)
be written with respect to . Then
(Dp)(x) = p(x) = 3 2 4(2 + 2x) + 15x2 = 7 8x + 15x2
= 37(1 x) + 15(2 + x2) 45x
with respect to . However
[D] [p] =
1 1 2 60 0 0 31 1 4 6
3245
=371545
= [Dp]as required.
3 Elementary Matrix Operations and Systems of Linear
Equations
3.1 Elementary Matrix Operations and Elementary Matrices
This should all be revision.
Definition 3.1. Let A be an m n matrix. We can define three
families of transformations, determinedby what they do to the rows
of A.
Type I Swap any two of the rows. Leave the rest alone.
Type II Multiply one row by a non-zero constant and leave the
others alone.
Type III Add a scalar multiple of one row to another. Leave the
other rows alone.
These transformations are termed elementary row operations.
Applying the same approach to columnsyields the elementary column
operations.
Given our earlier discussion of linear maps an