Calculus II Derivatives in Approximationpeople.math.gatech.edu/~rohrs/1502.pdf · I. Approximation by Polynomials (Power Series) II. Integration and the Fundamental Theorem of Calculus

Calculus II

Derivatives in Approximation

John McCuan

Contents

Course Outline iii

Preface (for math teachers) iv

Prologue (Lecture 1) vii

The Geometric Interpretation viii

The Physical Interpretation ix

Integration xi

Approximation of Functions∗ xiii

The Fancy Chain Rule∗ xiii

Introduction to Vector Spaces and Linear Algebra 1

Note to the Student 1

Lecture 17 More General Functions 2

Lectures 18–19 Additional Properties of Rn 9

Lecture 19 Linear Combinations and Bases 13

Mini-Lecture on Determinants 17

Lectures 20–21 The Dot Product 19

Lecture 22 Visualizing a Linear Functional 22

Lecture 22 The derivative and differential of a functional 25

Summary Insert for Lecture 23 28

Lectures 24–25 The Cross Product 29

Lecture 26 Physical Quantities: Work and Torque 29

i

Lecture 27 The derivative of f : R1 → R

n; tangent vector 34

Summary Insert for Lectures 17–27 37

Lecture 28 Outline of Linear Algebra 38

Lecture 29 Lines and Planes (12.6–7) 41

Lectures 30–31 More Linear algebra 46

Lecture 32 Kernels 50

Lecture 33 Consequences of the Rank-Nullity Theorem; Sys-tems of Linear Equations 51

Lecture 33a Proof of the Rank-Nullity Theorem 56

Lectures 34–35 Determinants and Invertibility 57

Lecture 36 Changing Bases 62

Lecture 37 Applications 70

Lecture 38 From Approximating Data to Approximating Solu-tions 73

Lecture 39 The Closest Point Problem Part 1: Gramm-SchmidtOrthonormalization 77

Glossary of Notation 83

End Notes 84

Notes on Abstract Functions 85

1 The Graph of a Function 86

∗Lectures to be added.

ii

Course Outline

0. Introduction/Overview

I. Approximation of functions by polynomials (power series)

II. Integration and the Fundamental Theorem of Calculus

III. Approximating more general functions

Most of the material for this course comes from Calculus by Salas, Hille, andEtgen. The notes that follow pertain to a portion of section III which dealswith Chapter 12 of the text and the subject of linear algebra. A more detailed(day by day) description of the course along with a full list of homeworkassignments may be found at

http://www.math.gatech.edu/∼mccuan/courses/1502

iii

Preface (for math teachers)

Many people have recently been interested in the expository aspects of linearalgebra and in teaching linear algebra as a part of (or during) Calculus inparticular. This direction, it seems to me, has arisen in two quite differentcontexts. On the one hand, there is a demand in technically oriented uni-versities like Georgia Tech for engineering students to become familiar withmatrices and linear transformations at an early stage (as freshmen) in prepa-ration for courses like digital signal processing or robotics (where dynamicalsystems play a prominent role). On the other hand, there has been a move-ment in the more elite departments to “do multivariable calculus right” bypresenting the relatively modern multilinear underpinnings of the subject.∗

Both of these contexts involve a relatively high level of scientific inclinationon the part of the students and of expectations on the part of the faculty.

In my opinion, much of the need for matrix theory in engineering relateddisciplines has little to do with calculus, and is better satisfied via a ded-icated course. Nevertheless, calculus cannot be abandoned, and requiringtwo mathematics courses (calculus and linear algebra) in one term is admin-istratively unlikely. This situation, it seems to me, has led to two modes ofpresentation at Georgia Tech. Some instructors devote approximately twothirds of one semester to an essentially classical presentation of linear algebra(solving linear equations) made suitably (and carefully) elementary in orderto suit freshmen; the first one third of the course involves a dash of powerseries. Some instructors have adopted the modern point of view.

It is against this backdrop that these notes were written. They attemptto take a middle road between the classical presentation of linear algebra andthe modern presentation of multivariable calculus. At the same time, thereis an attempt to unify the topics of power series and linear algebra—at leastphilosophically. The unifying philosophical theme is perhaps worth mention-ing. In the first semester of calculus, we introduce the derivative with severalinterpretations. The derivative is defined as a limit of difference quotientsand interpreted geometrically as the slope of a tangent line, and physically,as a rate of change. There is a third, less emphasized interpretation whichprovides our unifying theme. The derivative is a building block or ingredientin the construction of an approximation—and in the construction of approxi-

∗This philosophy is represented notably by the book Vector Calculus, Linear Algebra,

and Differential Forms, A Unified Approach by Hubbard and Hubbard.

iv

mating functions in particular. This is a practical interpretation which is lessemphasized because it lacks the attractive geometric and physical intuitionsassociated with the other interpretations, and it is somewhat more abstractand technical. Accordingly, the classical groundwork for third semester cal-culus usually involves extending the limit definition and geometric (tangentplanes) and physical (temperature gradients) interpretations to real valuedfunctions of several variables. There may also be mention of approximationand differentials, but it is usually at the end and not emphasized. The intentof these notes to provide this emphasis in the second semester of Calculus andto present power series, incidentally, as the logical conclusion of nth orderapproximation.

In short, we begin with (a review of) first order approximation of realvalued functions of a single real variable (f : R

1 → R1). We use derivatives

to build an nth order (Taylor) approximation, consider the possibility ofletting n tend to ∞ to get a series representation, and finally discuss whatis required to approximate f : R

n → Rm at least to first order.

One final goal of this presentation, is to provide some “exercise” for thestudents in differentiation and integration. After a semester of sequences,series, and linear algebra, I find that the integration and differentiation re-quired in the third semester comes as quite a shock to most students. Thus,we do well to incorporate a dash of integration by parts and the chain rule.Along these lines, I take the liberty to introduce early the chain rule forfunctions of several variables (dubbed “a fancy chain rule”) and differenti-ation under the integral sign. On the one hand, these techniques allow thestudent to make the empowering claim, “I can differentiate any function youcan write down.” On the other hand, the chain rule is used to motivate thedefinition of the gradient as the vector of partial derivatives in Lecture 23.

All in all, the course breaks into three parts.

I. Approximation by Polynomials (Power Series)

II. Integration and the Fundamental Theorem of Calculus

III. Approximating Vector Valued Functions of Several Variables.

Approximating integrals is included in the second part. The first twoparts are, for my purposes, covered adequately in Calculus by Salas, Hille,and Etgen. These notes concentrate on the third part, which is covered in

v

two passes. The first pass is an enhanced version of Chapter 12 of Salas,Hille, and Etgen. The second pass emphasizes linear algebra.

Atlanta, December 2001 John McCuan

vi

Prologue: Functions in Calculus I

Calculus is about differentiating and integrating functions. We will reviewdifferentiation and integration presently, but first let us agree on an abstractdefinition of a function.

Definition 1 (function) Given two sets X and Y , a function is a rule whichassigns to each x in X a unique y in Y .

X Y

f

This sounds very simple, and it is, but the notion of a function takes sometime to get used to. The student should take time, as the course progresses,to ask (and answer) the questions

(1) “What function is under discussion?”

(2) “What is the domain X?”

(3) “What is the target Y ?”

In a first calculus course, the emphasis is on real valued functions of a singlereal variable. This means the domain X is the real numbers,† and the targetis also the real numbers. That is, X = Y = R. There is a wonderful notationthat collects these three entities (the rule f with its domain and target). Itis f : X → Y . One reads this “f is a function from X to Y .” In Calculus I,f : R → R, and we can picture the action of the function “dynamically” asabove

†Sometimes the domain is a subset of the real numbers, like an interval, but let’s ignoresuch details for now.

vii

f

In short, the derivative of f is a limit of difference quotients:

f ′(x0) = limh→0

f(x0 + h) − f(x0)

h

(when this limit‡ exists.) Calculus I also comes with two important interpre-tations of this limit.

The Geometric Interpretation

This involves the graph of the function. To emphasize that this is a newobject distinct from the function we state formally a definition.§

Definition 2 (Graph of a function) Given a function f : X → Y , thegraph of f is the set of ordered pairs (x, y) such that x ∈ X and y = f(x).Notationally, we have

graph(f) = {(x, f(x)) : x ∈ X}.

In the case under discussion X = R1 and Y = R

1 and, thanks to Descartes,we customarily draw a nice picture by turning one of the lines on its side:

‡The student may be bewildered by the emphasis on this limit definition both now andin Calculus I since it was hardly ever used directly to calculate a derivative. The studentshould note first, that this rigorous definition of the derivative is one of man’s greatestachievements, and so mathematicians are proud of it. Secondly, all of the useful rules forcomputing derivatives ultimately rest on this limit. Thirdly, it will play an important rolein this semester of calculus.

§There are two kinds of definitions in these notes; ones that are stated formally andones that are stated informally. It is important to understand both and to be able toswitch their presentations. (Exercise: Give a formal definition of derivative.)

viii

X

Y

x

f(x)(x,f(x))

Exercise 1 How many times can a horizontal line intersect the graph of afunction?

Now for the geometric interpretation.

Interpretation 1 (Derivative) The derivative of f : R1 → R

1 at x0 is theslope of the tangent line to graph(f) at (x0, f(x0)).

0f(x )

x0

slope m=f’(x )0

Exercise 2 How is this interpretation related to the definition?

Exercise 3 What is the equation of the tangent line?

The Physical Interpretation

For this, we interpret the numerator of the difference quotient as a changein value. And that’s what it is. It’s the value of f at x0 + h minus the value

ix

of f at x0. Secondly, we think of the variable x as time. To emphasize thispoint, let’s replace all the x’s with t’s:

f ′(t0) = limh→0

f(t0 + h) − f(t)

h,

numerator = f(t0 + h) − f(t).

If f is, for example, the fuel in a rocket, and at time t0 = 5 there is 1002 lbs.of fuel and at time t = 5 + .7 there is 1000 lbs. of fuel, then the change invalue is −2 lbs.

What is the rate of change? We remember that (rate) × (time) = (dis-tance). Of course, it’s a little funny to think of distance in pounds, but itmakes good sense to do so. The rate, then, should be the distance dividedby the time:

f(t0 + h) − f(t0)

t0 + h − t0.

This is just the difference quotient, and we see that as h tends to 0, thedifference quotient should get closer and closer to the instantaneous rate ofchange of f with respect to time (or with respect to x, whatever x happensto be).

Exercise 4 The pressure in a certain cylinder is inversely proportional tothe volume.

V

(a) If the pressure is 100 when the volume is 1/100 and 50 when the volume is2/100, what is the rate of change of the pressure with respect to this volumechange?(b) What is the instantaneous change in pressure with respect to volume?

Interpretation 2 (Derivative) The derivative of a function f : R1 → R

1

is the instantaneous rate of change of the function values with respect tochange in the domain parameter.

x

Integration

This course is mostly about differentiation, but we should not forget aboutintegration entirely. Remember that the integral of a function f : R

1 → R1

is also defined as a limit

∫ b

a

f(x)dx = lim‖P‖→0

k−1∑

j=0

f(x∗j)(xj+1 − xj).

This looks rather more complicated, but it’s not too bad. The big questionswith their answers are as follows:

1. What is P?

Answer: P is a partition of the interval [a, b]. This is just an orderedset of numbers:

a = x0 < x1 < · · · < xk = b.

These partition the interval:

1 2 k−1a x x ... x b

2. What is ‖P‖?

Answer: This is the norm of the partition. It is the length of the largestsubinterval.

‖P‖ = max0≤j≤k−1

(xj+1 − xj).

To say that ‖P‖ tends to 0, means that the lengths of all the subinter-vals get small. To say that the resulting limit exists and is a numberL =

∫ b

af(x)dx means that whenever the norm of the partition is small

enough, then any Riemann sum (the thing that we’re taking the limitof) must be close to L. In the jargon of limits, for any ε > 0, there issome δ > 0 such that ‖P‖ < δ implies

∣∣∣∣∣L −

k−1∑

j=0

f(x∗j)(xj+1 − xj)

∣∣∣∣∣< ε.

xi

3. What is x∗j?

Answer: Given a partition P, x∗j is a choice of a point in the subinterval

[xj, xj+1]. Thus, for each partition, there are lots of choices of Riemannsums corresponding to different choices of the points x∗

j . For example,we could take the left endpoint to get the Riemann sum

k−1∑

j=0

f(xj)(xj+1 − xj).

We could take the midpoint

k−1∑

j=0

f

(xj + xj+1

2

)

(xj+1 − xj).

Or we could let x∗j be the point that gives the maximum value for f on

[xj, xj+1]. This gives what is called an upper (Riemann) sum.

As mentioned above, no matter which choice you make, it is required thatthe Riemann sum be close to the value of the integral whenever the norm ofthe partition is small. This provides a basis for approximating integrals, aswe shall see later.

Exercise 5 Find the upper and lower Riemann sums of the function f(x) =x2 with respect to the partition P = {−7,−3,−1/2, 1, 5}.

Integration of functions also has some useful interpretations. Geometri-cally,

∫ b

af(x)dx is the area “under” the graph of f .

Exercise 6 Explain what Riemann sums have to do with area.

Exercise 7 Explain why there are quotation marks around “under” in thesentence above.

Physically, the integral represents a change in the value of a quantitywhose rate is given by f .

Example 1 If the speed of a car as it pulls up to a stop light is given by

r(t) = 90e1

t2−1 (and it stops at time t = 1 minute), then the distance travelledby the car while stopping is given by

d =

∫ 1

0

90e1

t2−1 dt.

xii

Exercise 8 Explain the foregoing formula in terms of (rate)×(time). Hint:Riemann sum. ME/class/MCCUAN/figures/notes3/work4.eps¿

Exercise 9 Why (mathematically) does the car stop at t = 1?

There are also techniques of integration that should be reviewed.

Exercise 10∫ 1

0sec x dx.

Exercise 11∫ 1

0x2 sin x dx.

Exercise 12∫ π

−πsin2 x dx.

Approximation of Functions∗

The Fancy Chain Rule∗

∗Lectures to be added.

xiii

Introduction to Vector Spaces

and Linear Algebra:

a Calculus Based Approach

John McCuan

29th March 2004

Note to the Student

These notes are intended as a supplement to Chapter 12 of Calculus bySalas, Hille, and Etgen and (primarily) as an introduction to Linear Algebra.It should be noted by the student that Linear Algebra is an extensive subjectin its own right and this introduction limits itself, to some extent, to thosetopics which are of immediate importance in understanding the derivativeof a transformation f : R

n → Rm. While there are other (perhaps many)

additional topics in linear algebra that are useful in various contexts, thematerial covered in these notes has direct relevance to Chapters 13–15 ofSalas, Hille, and Etgen, though these authors do not include an explicitdiscussion of linear algebra in the text.

1

Lecture 17 More General Functions

So far in calculus we have studied functions that have domains and ranges inR (the real numbers). The emphasis has been on analyzing (with differentia-tion and integration) the function itself, and we haven’t paid much attentionto the properties of the set R.

From now on we will consider more carefully the structure of the domainand target sets. These sets will be in R

n where n is some positive integer.The structure of this set R

n is discussed in Chapter 12. For today’s lecturewe will use only three things:

1. Definition:

Rn = {(x1, x2, . . . , xn) : x1, x2, . . . , xn ∈ R}.

2. You can add elements in Rn:

If X = (x1, . . . , xn) ∈ Rn and

Y = (y1, . . . , yn) ∈ Rn,

X + Y = (x1 + y1, . . . , xn + yn).

3. You can multiply an element in Rn by a constant c ∈ R

cX = (cx1, . . . , cxn).

With these three notions, we can talk about linearity.

Definition 3 A function L : Rn → R

m is linear if

L(X + Y ) = L(X) + L(Y ) and

L(cX) = cL(X)

whenever X, Y ∈ Rn and c ∈ R

1.

An affine function is a function of the form

P0 + L(X − X0)

2

where P0 is a fixed point in Rm, L : R

n → Rm is linear, and X0 is a fixed

point in Rn.

OBJECTIVE (of differential calculus): To approximate a functionf : R

n → Rm by an affine function.

More precisely, we want to approximate the value of f locally near X0 ∈ Rn.

The “zero order approximation” for f is f(X0); this gives us the P0 in theaffine function.

In summary, we want an approximation

f(X) ∼ f(X0) + L(X − X0).

The key ingredient in constructing the linear function L is (you guessedit!) the derivative of f . (Whatever that is.) So we want to know three things:

1. What is the derivative of f (at X0)?

2. How does one use the derivative of f to construct the linear functionL? What is the relation between the two?

3. What does that “∼” symbol (“should be close to”) in the approxima-tion formula really mean?

OK, Enough abstract stuff and strategy! Let’s at least look at one or twoconcrete examples of linear functions.

Example 2 L : R → R. (This is just multiplication by m.)

L(x) = mx. (L(x + y) = m(x + y) = mx + my = L(x) + L(y) . . . )

We can look at the graph:

length x

length mxL(x)

x

It’s a line through the originmx

x= m.with slope

3

Example 3 L : R2 → R.

L(x, y) = 5x + 2y. (Check that it’s linear.)

We can picture this like this:

IR

IR

2

3

(3,2)

0

19

L

2

1

We write (3, 2) 7→ 19. We can also draw the graph:

IR

IR

��

��

x

y

slope 2

slope 5

1

2

It’s a planethrough the origin (0,0,0)

Example 4 L(x, y) = 3x − 2y.

0 5

(3,2)L

��

��

��

��

slope −2

slope 3

4

Example 5 L : R2 → R

2

L(x, y) = (3x − 2y, x).

Now we can’t draw the graph (why?)

IRIR L2 2

What does the map look like? What does it do? Finding the answer to thesequestions takes some work, but here is the answer:

There are some special directions.They are (2, 1) and (1, 1). What happens when we apply L along these

directions?L(2, 1) = (4, 2) = 2(2, 1)

L(1, 1) = (1, 1)

L(1,1) (2,1) (1,1)

(4,2)

(1, 1) 7→ (1, 1) (Read: “(1, 1) maps to itself.”).

(2, 1) 7→ 2(2, 1) (“(2, 1) maps to twice (2, 1).”).

In fact, this kind of thing happens for the entire lines along these directions.

5

1/2length t (2 )

L

This line "y=x" is FIXED.

t(1,1)(t,t)

(1,1)

L(t, t) = (3t − 2t, t) = (t, t)

(2,1)

t(2,1)

L2t(2,1)

This line "y=(1/2)x" is

a factor of 2.INVARIANT but stretched by

L(2t, t) = (6t − 2t, 2t) = 2(2t, t)

Now we can sort of see what the whole map does. Imagine a circle centeredat (0, 0):

What is the image?L

6

L

The image of a circle is some curve that is stretched like an ellipse. (Do youthink it is an ellipse?)

We should have a pretty good idea now of what the map L : (x, y) 7→(3x − 2y, x) does. It stretches by a factor of 2 along the direction (2, 1) andleaves the line y = x fixed. 2

Notice that this gives us a nicer picture than we had of the map in Ex-ample 4. We first apply the map in Example 5 and then “collapse” or projectonto the x-axis:

Exercise 13 a) We could define an affine map to be one of the form Q0 +L(X). Why is this equivalent to the definition given above?

b) Prove that L(0) = 0 for any linear map L. Why is this fact importantfor the first order approximation formula

f(X) ∼ f(X0) + L(X − X0)?

Exercise 14 In Example 5 we discussed the image of a circle under thelinear map L(x, y) = (3x− 2y, x). How does the image depend on the radiusof the circle?

7

Exercise 15 Again for the map in Example 5, find the direction of maximumstretching, i.e., which point on the circle gets mapped farthest from the origin(0, 0). Hint: Express the points on the circle as r(cos θ, sin θ).

Exercise 16 Describe the linear map in Example 2 in terms of “stretching.”What happens if m < 0?

Exercise 17 Describe the linear maps L : R2 → R

2 given by

(a) L(x, y) = (3x, 2y)

and

(b) L(x, y) = (2y, 3x).

Do circles centered at (0, 0) map to ellipses under these maps?

Exercise 18 Below is a more accurate picture of the image of the unit circleunder the map (x, y) 7→ (3x − 2y, x). This picture was made using Mathe-matica. Indicate the image of (1, 0) on the picture.

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-3 -2 -1 1 2 3

-1

-0.5

0.5

1

Produce a figure like this using a mathematical software package (Math-ematica, Maple, Matlab, . . . ) for the map (x, y) 7→ (x, 2x − 3y).

8

Lectures 18–19 Additional Properties of Rn

§12.1 One can measure the distance between points in Rn:

d(X, Y ) =

√√√√

n∑

j=1

(xj − yj)2.

(Note: This is a generalization of the Pythagorean theorem

y 1x 1

x2

y2

Y

X

d(X,Y)

d(X, Y ) =√

(x1 − y1)2 + (x2 − y2)2.)

The distance from a point X to the origin is called the norm of X:

‖X‖ =

√√√√

n∑

j=1

x2j .

The set of all points equidistant from a fixed point is a sphere; if the di-mension of R

n is 4 or greater we sometimes say hypersphere. Note that thedistance between two points can be expressed as the norm of their difference:d(X, Y ) = ‖X − Y ‖. Thus, the equation of the sphere with center P0 andradius r is

‖X − P0‖2 = r2.

Example 6 (12.1.37) P = (1, 2, 3), Q = (4,−5, 2), R = (0, 0, 0). Showthat 4PQR is a right triangle.

d(P, Q) =√

(1 − 42)(2 + 5)2 + (3 − 2)2 =√

9 + 49 + 1 =√

59

d(P, R) =√

1 + 4 + 9 =√

14

d(Q, R) =√

16 + 25 + 4 =√

45

9

(√45)2

+(√

14)2

= 59 =(√

59)2

(right angle at R). 2

Example 7 (12.1.38) (5,−1, 3), (4, 2, 1), and (2, 1, 0) are the midpoints ofthe sides of 4PQR. Find P , Q, R.

(5,−1, 3) = midpoint of PQ, (4, 2, 1) = midpoint of PR, (2, 1, 0) = mid-point of QR.

5 = (p1 + q1)/2, 4 = (p1 + r1)/2, 2 = (q1 + r1)/2.

p1 + q1 = 10

p1 + r1 = 8

q1 + r1 = 4

⇒{

q1 − r1 = 2

q1 + r1 = 4

2q1 = 6 ⇒ q1 = 3, p1 = 7, r1 = 1.

Other coordinates are similar. 2

Problems: 12.1. 6, 10, 11, 20, 24, 33, 39

Vectors and Points: A Subtle Difference

In the last lecture we talked about elements in Rn. In this lecture we’ve been

calling them points. When we talk about points we mean elements of Rn

with primary reference to their position.Elements of R

n can also be considered with reference to length and direc-tion. In this case we use the term vector.

X

Length of a vector X means the distance from X to the origin. The directionof X is the direction one moves in going from the origin to X.

In most instances, making a distinction between points and vectors is notcrucial, and the terms can be used interchangeably. (If you wish to ponderthe difference, §1.1 of Hubbard and Hubbard has a discussion.)

10

There is one instance where the distinction can be really helpful. This isin the approximation formula where there are really two copies of R

n, one inwhich position (points) are important and one in which direction (vectors)are important:

f(X) ∼ f(X0) + L(X − X0).

X

X0

In this case, the X and X0 appearing in f(X) and f(X0) should bethought of as points in Dom(f). The X − X0 should be pictured as a vectorwith initial point at X0 and terminal point at X. In this way, X − X0 is avector in a second copy of R

n (that happens to have its origin at X0). Wewill discuss this point again later. For now we consider some facts aboutvectors that are useful for manipulating them:

• Vectors ~X and ~Y are parallel if there is a constant α for which ~Y = α ~X.

Example 8 (12.3.19)

~X = (1,−1, 2), ~Y = (2,−1, 2)~Z = (3,−3, 6), ~W = (−2, 2,−4)

Solution:

~Z = 3(1,−1, 2) = 3 ~X so ~Z‖ ~X, α = 3 > 0 (“same direction”)

~W = −2(1,−1, 2) = −2 ~X so ~W‖ ~X, α = −2 < 0 (“opposite direction”)

~Y 6 || ~X because if (we assume)

~X = (1,−1, 2) = α~Y = (2α,−α, 2α)

11

Then

1 = 2α ⇒ α = 1/2 }

contradiction

−1 = −α ⇒ α = 12 = 2α

2

• ‖ ~X‖ ≥ 0

• ‖ ~X‖ = 0 ⇒ ~X = 0

• ‖α ~X‖ = |α|‖ ~X‖

• ‖ ~X + ~Y ‖ ≤ ‖ ~X‖ + ‖~Y ‖ (Triangle inequality)

X

X

Y

Y+

Note: ~X + ~Y is the diagonal of the Parallelogram spanned by ~X and ~Y . Tobe precise this parallelogram is all the points

α ~X + β~Y where α, β ∈ [0, 1].

More generally, given ~X and ~Y and constants α, β

α ~X + β~Y

is called a linear combination of ~X and ~Y . The set of all vectors ~Z = α ~X+β~Ywhere ~X and ~Y are fixed but α and β take on all real values is called thespan of ~X and ~Y .

Exercise 19 Extend the definitions of linear combination and span to anynumber of vectors ~X1, ~X2, . . . , ~Xk ∈ R

n.

12

Example 9 (12.3.20)

|‖X‖ − ‖Y ‖| ≤ ‖X − Y ‖

Proof. We know ‖X − Y ‖ ≤ ‖X‖ + ‖ − Y ‖ = ‖X‖ + ‖Y ‖. But thisdoesn’t help much. (It has all the right terms, but the sign of ‖Y ‖ is wrong.

Let’s get rid of the absolute values:Case 1: ‖X‖ ≥ ‖Y ‖.

In this case we want to show ‖X‖ − ‖Y ‖ ≤ ‖X − Y ‖, i.e.,

‖X‖ ≤ ‖X − Y ‖ + ‖Y ‖.

Setting X − Y = Z we want

‖Y + Z‖ ≤ ‖Z‖ + ‖Y ‖.

which is clearly true by the triangle inequality.Case 2: (Exercise) 2

Exercise 20 Do Case 2.

Problems 12.3. 1, 18–20, 24, 27, 28

Linear Combinations and Bases (Lecture 19)

There are n special vectors in Rn:

e1 = (1, 0, . . . , 0)

e2 = (0, 1, 0, . . . , 0)

...

en = (0, . . . , 0, 1).

(In R2 and R

3 physicists use the names

ı = e1

= e2

k = e3. )

13

Notice that each X = (x1, . . . , xn) ∈ Rn has a unique decomposition

x =

n∑

j=1

cjej.

In other words, if one writes X as a linear combination of e1, . . . , en, thenthe coefficients cj are unique. What are they?

Exercise 21 If X = (x1, . . . , xn) =∑n

j=1cjej, then what are c1, . . . , cn?

Any set of vectors v1, . . . , vn with this property (that any vector can bewritten uniquely as a linear combination of v1, . . . , vn) is called a basis of R

n.It may not be obvious how to find the coefficients:

Example 10 (12.3.29) v1 = (2, 0,−1), v2 = (1, 3, 5), v3 = (−1, 1, 1)

X = (1, 1, 6)

(1, 1, 6) = c1(2, 0,−1) + c2(1, 3, 5) + c3(−1, 1, 1)

= (2c1 + c2 − c3, 3c2 + c3,−c1 + 5c2 + c3)

2c1 + c2 − c3 = 1

3c2 + c3 = 1

−c1 + 5c2 + c3 = 6

This is a system of 3 linear equations in 3 unknowns. There are various meth-ods for solving such systems: Cramer’s rule, inverse of a matrix, Gaussianelimination. We will mention the first two later. For now we use Gaussianelimination, which in its ad hoc form is calledSubstitution:

c3 = 1 − 3c2{

2c1 + c2 − 1 + 3c2 = 1

−c1 + 5c2 + 1 − 3c2 = 6or

{

2c1 + 4c2 = 2

−c1 + 2c2 = 5

Multiplying the second equation by 2 and adding, we get

8c2 = 12,

c2 = 3/2,

c1 = −2,

c3 = −7/2.

14

Thus,

X = (1, 1, 6) = −2(2, 0,−1) +3

2(1, 3, 5)− 7

2(−1, 1, 1) = −2v1 +

3

2v2 −

7

2v3.

Note that these are the only possibilities.

We still haven’t shown that v1, v2, v3 is a basis. Why not?

Fact 1 There are always exactly n vectors in a basis for Rn. (Fewer than n

won’t be enough to span. More than n won’t give unique decompositions—even if they do span.)

Example 11 (1, 0, 0), (2, 0, 0), and (1, 1, 0) do not span R3.

Exercise 22 Prove it.

Fact 2 The determinant of the matrix with columns v1, . . . , vn will be nonzeroexactly when v1, . . . , vn is a basis for R

n. See the mini-lecture on determi-nants at the end of this lecture.

Example 12∣∣∣∣∣∣

1 2 10 0 10 0 0

∣∣∣∣∣∣

= 1

∣∣∣∣

0 10 0

∣∣∣∣− 0

∣∣∣∣

2 10 0

∣∣∣∣+ 0

∣∣∣∣

2 10 1

∣∣∣∣= 0. Not a basis.

∣∣∣∣∣∣

2 1 −10 3 1−1 5 1

∣∣∣∣∣∣

= 2

∣∣∣∣

3 15 1

∣∣∣∣− 0

∣∣∣∣

1 −15 1

∣∣∣∣− 1

∣∣∣∣

1 −13 1

∣∣∣∣= −4 − 4 = −8 6= 0.

(2, 0,−1), (1, 3, 5), (−1, 1, 1) is a basis for R3.

We will talk more about bases later (and why these two facts are true),but for now, let’s apply what we know about bases etc. to the question

How do you construct a linear transformation?

We begin with the representation

X =n∑

j=1

xjej.

15

It follows that

L(X) = L(∑

xjej

)

=∑

L(xjej) =∑

xjL(ej). (1)

This is interesting. Consider the case L : Rn → R

1.

L(x) =∑

xjL(ej).

In this case L(e1), . . . , L(en) are numbers. This says that L is determinedby (multiplication by) n fixed numbers—sort of. The numbers are m1 =L(e1), . . . , mn = L(en). Once we know the numbers, we know L:

L(X) =∑

mjxj.

Look familiar? (Remember a linear function L : R1 → R

1 is given by aformula L(x) = mx.) We can make the resemblance even more striking byletting m = m = (m1, . . . , mn) and writing m · x =

∑mjxj:

L(X) = m · X.

Now we know exactly what it takes to construct a linear transformationL : R

n → R1. It takes n numbers (slopes) m1, . . . , mn and L(X) =

∑mjxj.

Exercise 23 Verify that L(x) = m · x is linear for any constant vectorm = m = (m1, . . . , mn).

Exercise 24 (Cramer’s Rule) In this lecture and the previous one, therewere three times that we solved a system of linear equations. Find thosethree systems of equations and try to solve them using Cramer’s rule: If

a1x1 + · · · + akxk = c1

b1x1 + · · · + bkxk = c2

...

z1x1 + · · · + zkxk = ck

is a system of k linear equations for k unknowns x1, . . . , xk AND the systemhas a unique solution (x1, . . . , xk), then

xj =det Mj

det M

16

where M is the coefficient matrix

M =

a1 · · · ak

...z1 · · · zk

and Mj is the matrix you get by replacing the jth column of M with

c1

...ck

.

Exercise 25 Show that {e1, . . . , en} is a basis for Rn. (This is called the

standard basis for Rn; e1, . . . , en are called the standard basis vectors.)

Mini-Lecture on Determinants

A matrix is a rectangular array of numbers:

M =

a11 · · · a1m

...an1 · · · anm

.

This one has m columns and n rows. It’s called an n × m matrix (read “nby m matrix”). If a matrix is square, then there is a number associated withit called the determinant. The determinant of a matrix M may be denotedeither by det(M) or by putting vertical bars around the entries in the array:

det(M) =

∣∣∣∣∣∣∣

a11 · · · a1n

...an1 · · · ann

∣∣∣∣∣∣∣

.

Given a square matrix M , one computes det(M) according to the follow-ing recipe.

If n = 1, det(M) = a11.If n = 2, det(M) = a22a11 − a12a21.

17

If n > 2, then we need to know the “sign diagram.” This is an arraythe same size as the matrix that contains alternating signs beginning in theupper-left corner (the 1,1-slot):

+ − + · · ·− + · · ·+ · · ·... +

.

For a 3 × 3 matrix, the sign diagram is

+ − +− + −+ − +

.

You can also remember that the sign of (−1)i+j goes in the i, j-slot.To compute the determinant, pick any row or column. Say I pick the

first column. In that case,

det(M) = (sign 1, 1)a11D11 + (sign 2, 1)a21D21

+ · · ·+ (sign n, 1)an1Dn1

where (sign k, 1) is the corresponding sign from the sign diagram; a11, . . . , an1

are the entries of the first column, and Dk1 is the corresponding minor de-terminant. A k, j-minor determinant is just the determinant of the matrixyou get when you delete the row and column of akj (i.e., the kth row and jthcolumn).

For example, if

M =

1 2 34 5 67 8 9

,

then the minor of a11 = 1 is (5 68 9

)

.

The 1, 1 minor determinant is (5)(9) − (6)(8) = −3. So the determinant ofM is

det(M) = 1(−3) − 4[(2)(9) − (3)(8)] + 7[(2)(6) − (3)(5)]

= −3 + 24 − 21

= 0.

18

It’s an interesting (and amazing) fact that the determinant can be cal-culated by “expanding” along any row or column. For example, a 3 × 3determinant is also (expanding along the second row)

a21(a12a33 − a13a32) − a22(a11a33 − a31a13) + a23(a11a32 − a31a12).

Exercise 26 Show that you can get the recipe for a 2× 2 determinant fromthe general n > 2 case.

Exercise 27 Compute a 4× 4 determinant two different ways, first expand-ing along a row, then along a column.

Lectures 20–21 The Dot Product

The product that defines a linear transformation L : Rn → R

1 is a specialcase of the dot product of two vectors:

X · Y =∑n

j=1xjyj

This again is a notion that has independent interest. The dot product isclosely related to the angle between two vectors.

Definition 4 The angle between vectors X and Y is defined by the equation

cos θ = X·Y‖X‖ ‖Y ‖

Two vectors are perpendicular if and only if X · Y = 0.

Example 13 (12.4.44) Find the angle between the diagonal of a cube andthe diagonal of one of its faces

a

θ

D = (a, a, a)

d = (a, a, 0)

19

cos θ =a2 + a2 + 0√3a2 ·

√2a2

=2√6

θ = arccos

(2√6

)

Note: We said that the dot product is used to “define” the angle betweentwo vectors—and this is the way you should remember it—but this equationcos θ = X · Y/(‖X‖ ‖Y ‖) has a proof from the Law of Cosines. (You canread about it on pg. 730 of Salas, Hille, and Etgen.)

BE SURE TO UNDERSTAND AND MEMORIZE THE PROPERTIESOF THE DOT PRODUCT ON PAGE 729.

Example 14 (12.4.8)

X · (X − Y ) + Y · (Y + X) = ‖X‖2 − X · Y + ‖Y ‖2 + X · Y= ‖X‖2 + ‖Y ‖2.

The most common use of the dot product is in determining if two vectors areperpendicular (⊥).

X ⊥ Y ⇔ X · Y = 0

Example 15 (12.4.47) Find all vectors perpendicular to (1, 2, 1) and (3,−4, 2).

Solution: We look for (x, y, z) satisfying

x + 2y + z = 0 = 3x − 4y + 2z.

(Note that we don’t expect a unique solution. Why?)

5x + 4z = 0 ⇒{

z = −5x4

y = x8

{

−1

4x + 2y = 0

1

2x − 4y = 0

Any vector

(t, t/8,−5t/4) = t(1, 1/8,−5/4)

= τ(8, 1,−10) check it.

Note: All the scalings of an element of Rn give the parameterization of a

line in Rn

L(t) = t~V

20

This function L : R1 → R

n. Is it linear?

IR0

1

1

v

The image is a line. 2

A VERY IMPORTANT USE OF THE NORM (unit vectors)

If X 6= 0, then X/‖X‖ is a vector of length 1 (in same direction as X).

Proof: ∥∥∥∥

X

‖X‖

∥∥∥∥

=1

‖X‖ ‖X‖ = 1. 2

A vector with length 1 is called a unit vector.

Exercise 28 Show that the standard basis vectors e1, . . . , en are pairwiseorthogonal and that each one is a unit vector. A collection of vectors withthese two properties is called orthonormal.

Example 16 (12.4.15) Find the angle between (3,−1,−2) and (1, 2,−2).Unit vectors: 1√

14(3,−1,−2), 1

3(1, 2,−2)

cos θ =5

3√

14⇒ θ = cos−1

(5

3√

14

)

???=

π

3.

Projections and Components: An important use of the dot product.

X

Y

||Y||

compX

(Y)

θ

The component of Y along X is the scalar ‖Y ‖ cos θ. Or alternatively,

compX(Y ) =X · Y‖X‖ = Y · ~U

21

where ~U is the unit vector in the X direction.The projection of Y along X is the scaled vector

compX(Y )~U = (~Y · ~U)~U =X · Y‖X‖2

X.

Direction Angles and Direction Cosines:Direction Angles: The angles that a vector X makes with the coordinateaxes (i.e., with the standard basis vectors).Direction Cosines: The cosines of the direction angles. There are n directionangles for a vector in R

n:

cos γj =X·ej

‖X‖ j = 1, . . . , n.

Exercise 29 Prove∑n

j=1cos2 γj = 1.

Schwarz’ InequalityX · Y ≤ ‖X‖ ‖Y ‖.

Exercise 30 Show that |X · Y | ≤ ‖X‖ ‖Y ‖.

Lecture 22 Visualizing a Linear Functional

We begin with a digression concerning linear vector fields L : Rn → R

n.Review: In Lecture 16 we discussed a linear transformation L : R

2 → R2

that admitted some special directions. In particular, L(2, 1) = 2(2, 1) so thatthe line y = x/2 was invariant under the mapping, and it was stretched by2. This way of thinking about a linear map (as a map that stretches vectors)is especially useful for linear maps L : R

1 → R1.

Example 17 L(x) = 2x

In some ways this is a more useful picture than graph (L).

L

−1 0 1 −2 0 2

Example 18 L(x) = − 1

2x.

22

1/2

L

00 1−1 −1/2

(A minus sign indicates a flip.)It would be nice if we could use special directions and the idea of stretch-

ing to understand all linear maps L : Rn → R

n (or even L : R2 → R

2).Unfortunately, that is not enough in general. BUT when there is a basis

of special directions, we are in relatively good shape.Key Fact: If you’re looking at L : R

n → Rn and you happen to be able to

find a basis {v1, . . . , vn} of special directions along which

L(vj) = λjvj

for some scalars λj, then L is easy to understand. L just scales by the factorsλj in the special vj directions and combines these scalings linearly on all theother vectors. To be more explicit about this “combining” business, anyvector X =

∑ajvj (since v1, . . . , vn is a basis) and

L(X) =∑

ajλjvj .

Thus, the image of a (hyper) sphere is a (hyper) ellipsoid—maybe with someflips.

Now, we can apply this to L : Rn → R

1 as follows. Consider the mappingL : R

n → Rn by

L(X) =(

x1, . . . , xn−1,∑

mjxj

)

= (x1, . . . , xn−1, L(X)).

This map L always admits a basis of special invariant directions. For exam-ple, if mn 6= 1 we take

v1 = (m2,−m1, 0, . . . , 0)

v2 = (m3, 0,−m1, . . . , 0)

...

vn−1 = (mn − 1, 0, . . . , 0,−m1)

vn = (0, . . . , 0, 1)

23

It is clear that L(vj) = vj for j = 1, . . . , n − 2.

L(vn−1) = (mn − 1, 0, . . . , 0, m1(mn − 1) − mnm1) = vn−1;

L(vn) = (0, . . . , 0, mn) = mnvn.

To see that v1, . . . , vn form a basis, we take the determinant of the matrixwith v1, . . . , vn as rows:

∣∣∣∣∣∣∣∣∣∣∣

m2 −m1 0 · · · 0m3 0 −m1 · · · 0...

mn − 1 0 · · · · · · 0 −m1

0 · · · · · · · · · 0 1

∣∣∣∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣∣∣∣

m2 −m1 0 · · · · · · 0m3 0 −m1 · · · · · · 0...

mn−2 0 · · · · · · 0 −m1

mn − 1 · · · · · · · · · 0 1

∣∣∣∣∣∣∣∣∣∣∣

= ±(mn − 1)

∣∣∣∣∣∣∣∣∣

−m1 0 · · · 00 −m1 · · · 0...0 · · · · · · −m1

∣∣∣∣∣∣∣∣∣

= ±(mn − 1)mn−21 6= 0.

Challenge Problem. Find v1, . . . , vn when mn = 1.Thus, L is a transformation that we understand rather well, and L is just

the projection of L onto en.

Example 19 X = (x, y, z)

L(X) = 3x − 5y + 2z

v1 = (−5,−3, 0)

v2 = (1, 0,−3)

v3 = (0, 0, 1)

∣∣∣∣∣∣

−5 1 0−3 0 00 −3 1

∣∣∣∣∣∣

= 3 6= 0.

L(X) = (x, y, 3x − 5y + 2z)

L(v1) = (−5,−3, 0) = v1

L(v2) = (1, 0,−3) = v2

L(v3) = (0, 0, 2) = 2v3

24

L

Exercise 31 Use mathematical software to produce images like those shownabove.

Lecture 23 The derivative and differential of a

functional

In this lecture, we consider a function f : Rn → R

1. For example, it mightbe the case that n = 3 and f(X) is the temperature at the point X. Wecould consider X in some small set in R

3 like a room. Or f might give thetemperature at every point X in the universe—assuming of course that theuniverse is R

3. In any event, recall that we want an approximation for f(X)when X is near X0 ∈ R

n.

X.

.X

0

The “zero order approximation” for f(X) is just f(X0), but we’d likesomething a little better than that. We want a linear function L : R

n → R1

such thatf(X) ∼ f(X0) + L(X − X0).

Way back in Lecture 19 we figured out that L must be determined by somenumbers (n of them): m1, . . . , mn. Once we know the n numbers, then

L(X − X0) = m · (X − X0)

25

where m = (m1, . . . , mn). But what is the vector m?Think back for a minute to the case n = 1. In that case f : R

1 → R1,

and f(x) ∼ f(x0) + L(x− x0) where L(x− x0) = m(x− x0), and m = f ′(x0)is the derivative of f at x0. It turns out that the m in this more general caseis also the derivative of f at X0. But in this case, the derivative is a vector;it is often called by the special name gradient and denoted Df or ∇f .

We can use what we know about the one variable case to figure outwhat the components of the gradient must be. Here’s how. Imagine thatX = X0 + te1. This means that X is located directly in the e1 direction fromX0:

X. 0

Xe1

In this case, f(X) = f(X0 + te1) can be thought of as a function of a singlevariable t. Let’s call this function g(t) = f(X0 + te1). One-variable calculustells us that

g(t) ∼ g(0) + g′(0)t. (1)

But we can figure out g′(0) using the fancy chain rule:

g′(0) =∂f

∂x1

(X0).

So we can rewrite (1) in terms of f to get

f(X) ∼ f(X0) +∂f

∂x1

(X0)t

and compare this to the general formula

f(X) ∼ f(X0) + (m1, . . . , mn) · (X0 − (X0 + te1))

∼ f(X0) + m1t.

This suggests that m1 = ∂f

∂x1

(X0). Similarly, one can choose X in a way that

suggests mj = ∂f

∂xj(X0) for each j = 1, 2, . . . , n.

26

Exercise 32 What choice of X and what function g of a single variable doesone use to see this?

Note: We have not exactly proved that mj must be ∂f

∂xj(X0), but we will be

able to do that once we know what the “∼” symbol means.In summary,

f(X) ∼ f(X0) + ∇f(X0) · (X − X0)

where

∇f(X0) =

(∂f

∂x1

, . . . ,∂f

∂xn

)

is the vector of partial derivatives of f . The derivative of a real valuedfunction of several variables is the vector of partial derivatives. It is alsocalled the gradient or total derivative.

Exercise 33 (a) Approximate f(3.02, 1.95) using the facts

f(3, 2) = 18, ∇f(3, 2) = (30,−9).

(b) Check that f(x, y) = x3y − x2y2 is a function on R2 satisfying these

conditions.

Exercise 34 (a) Find a parameterization r(t) = p0+tv of the line segmentfrom X0 to X. Make your function r satisfy r(0) = X0 andr(‖X − X0‖) = X.

(b) Assume f : Rn → R

1 and use the fancy chain rule on g(t) = f ◦ r(t) tofind g′(0). What kind of approximation does this give you for f(X)?

(c) (bonus) What does the “∼” symbol mean in one-variable calculus? Canyou use your answer to guess what it means in general?

A Vexing Change of Notation: Differentials

The linear function we have called L appears in the approximation formulaas ∇f(X0) · (X −X0). Notice that this expression depends on f , X0 and X.

27

One could criticize our notation L(X −X0) on the basis that it only recordsthe argument of L and ignores the dependence on f (because of the gradient∇f) and on X0 (because the gradient is evaluated at X0).

Well, there is another notation for L that records these dependencies. Itis

dfX0.

Thus, the linear function L is called the differential of f at X0. With thisnotation the approximation formula becomes

f(X) = f(X0) + dfX0(X − X0).

To emphasize what we have said, we repeat: dfX0is a linear map whose value

at a vector is obtained by taking the dot product of that vector with ∇f(X0).Unfortunately, whenever differential notation is used, the dependence

on f is emphasized to the exclusion of all else. That is, writers of textbooksinvariably omit the subscript X0 (indicating where the differential is “based”)and the argument X−X0. See Salas, Hille, and Etgen pg. 185 and 935. Thus,one finds

f(X) ∼ f(X0) + df.

One is to understand from this formula that “df” means dfX0(X − X0) (or

what we have called L(X − X0)).

Example 20 (15.7.1) Find the linear functional used to approximate f(x, y) =x3y − x2y2

(a) Near X0 = (x0, y0). Answer: df = (3x20y0 − 2x0y

20)(x − x0) + (x3

0 −2x2

0y0)(y − y0)

(b) Near X0 = (3, 2). Answer: df = 30(x − 3) − 9(y − 2)

(c) Approximate f(3.02, 1.95) using this differential. Answer: f(3.02, 1.95) ∼18 + 30(.02) + 9(.05) = 19.05.

Problems: 15.7.29, 30, 31, 33, 35.

Summary Insert for Lecture 23

1. A linear function L : Rn → R

1 is built from a constant vector m =(m1, . . . , mn). If you have this vector, then L(X) = m · X.

28

2. The affine approximation for (a differentiable function) f : Rn → R

1 isgiven by

f(X) = f(X0) + L(X − X0)

where L : Rn → R

1 is the linear function built from ∇f(X0) =(

∂f

∂x1

(X0), . . . ,∂f

∂xn(X0)

)

.

3. The linear map L : Rn → R

1 is a somewhat complicated thing, butit’s important to understand because it can approximate locally anydifferentiable function. One can understand it this way: There is anassociated linear map L : R

n → Rn with n-linearly independent eigen-

vectors v1, . . . , vn. L leaves v1, . . . , vn−1 fixed and scales vn by somefactor. (This distorts R

n in a simple ellipsoidal fashion.) And L is theprojection of L onto en.

Lectures 24–25 The Cross Product

This discussion is essentially contained in §12.5 of Salas, Hille, and Etgen.Problems: 12.5.13, 21, 26, 28, 31, 39, 42.

Lecture 26 Physical Quantities: Work and Torque

Work

If a point p moves in R3 and a force ~F acts at that point, then work is done.

Remarks 1 (The definition of force?) Force is (apparently) an undefinedphysical quantity that “tends” to cause acceleration. Intuitively, force hasmagnitude and direction; it is thus represented by a vector.

2. (Newton’s Second law?) In the situation described above, it is impor-tant to note that the motion of the point p is not required to satisfy Newton’ssecond law in the sense that ~F = mp. The reason for this is that Newton’slaw assumes the left side of the equation represents the sum of all forcesacting. Our scenario does not have in mind that ~F is necessarily the onlyforce acting.

29

Our objective is to describe quantitatively (with an exact number) how

much work is done by ~F . We begin with a simple case: p(t) = p0 + tv is alinear motion for t ∈ [0, T ] and F is constant.

p

F

p(T)

p(0)

In this case, the work done by F to accomplish the motion is (by defini-tion)

W = ~F · (p(T ) − p(0)) = ~F · T~v. (2)

Notice that W is exactly the component of ~F along v times the length of themotion.

Exercise 35 Verify this statement.

Note also that work can be negative (if F pulls against the motion).Now say the force varies in time and acts along a path of motion in R

3

p : [0, T ] → R3.

p

F(t)

p(0)

p(T)

p

t0 T

If F and p are smooth, then on each short section of path, F is almostconstant and p(t) almost traces out a straight line segment at constant speed.

30

p

F(t)p

F(t)p(0)

p(T)

t t tj j+1

p

This suggests that we partition the interval [0, T ] and try to express the workW as an integral. That is,

W =∑

Wj

where Wj is the work done by F on [tj, tj+1]. Hopefully, we can get the samenumber W as a limit of Riemann sums

∑Wj where each Wj is close to Wj

and comes from our formula (2). Thus, we want

Wj = F · (tj+1 − tj)vj = F · vj(tj+1 − tj)

for some constant vectors F and vj. WHAT SHOULD THESE VECTORSBE?

A good candidate for the force vector is fairly easy to come up with. If Fis continuous, then all values that it takes on a small interval [tj, tj+1] shouldbe about the same. So we could take any one:

p

jF(t * )

Wj = F (t∗j) · vj(tj+1 − tj).

This is really looking quite like a summand in a Riemann sum. That’s good.Choosing the vector vj is somewhat more difficult. Remember vj should

be a vector with the property that some affine function pj(t) = p∗j + tvj is

31

a good approximation for p(t) on [tj, tj+1]. The student should have enoughexperience with affine approximation by now to guess that vj has somethingto do with the derivative of p : R

1 → Rn (whatever that is). Let’s use what

we know about motion to see (i.e., guess) the relation.First of all, the point pj(tj) should be close to p(tj). Why don’t we make

it exactly that? This means that p∗j = p(tj) − tjvj or

pj(t) = p(tj) + (t − tj)vj.

It should also be true that pj(tj+1) should be close to p(tj+1). We could makeit exactly that (see Exercise 36 below), but let’s use a weaker condition assuggested by our picture with the tangent segment. Namely, we want thelengths close:

‖p(tj+1) − p(tj)‖ ∼ ‖pj(tj+1) − pj(tj)‖ = (tj+1 − tj)‖vj‖.

In particular, the length of vj should be close to

‖p(tj+1) − p(tj)‖tj+1 − tj

.

This we recognize as a distance divided by a time. In other words, it is a rate(or speed). Again, if p is a smooth motion, this number should be close tothe instantaneous speed s(t∗j) of p at any time t∗j ∈ [tj, tj+1]. This we knowhow to compute:

‖vj‖ = s(t∗j) = limh→0

‖p(t∗j + h) − p(t∗j)‖h

. (3)

This gives us a good candidate for the length of vj and it even looks like somekind of derivative, as we expected. In fact, the quotient in (3) would lookjust like the difference quotient used to calculate p′(t∗j) if it weren’t for thenorm appearing in the numerator. We’re tempted to just take those normsaway and look at the limit.

vj = limh→0

p(t∗j + h) − p(t∗j)

h. (4)

Unfortunately, p is vector valued, so we don’t really know how to deal withsuch a limit (yet). What we can do is take away the limit too and just look at

32

the vector (p(t∗j +h)−p(t∗j ))/h. We even look at the components p = (x, y, z)where we see honest to goodness difference quotients:

(x(t∗j + h) − x(t∗j)

h,y(t∗j + h) − y(t∗j)

h,z(t∗j + h) − z(t∗j)

h

)

.

In each component we know how to take a limit as h → 0, and this is whatwe do. We set

vj = p′(t∗j) ≡ (x′(t∗j), y′(t∗j), z

′(t∗j)).

The above equations both tell us what vj should be and define the derivativeof p. More will be said concerning the derivative of such a function in thenext section (and much more yet in Calculus III), but note for now that (4)suggests p′(t∗j) is a vector that is tangent to the image of p at p(t∗j)—it is a limitof scaled secants—and has length equal to the speed of the parameterization.

Let us go back to work. We have arrived at the expression

Wj = F (t∗j) · p′(t∗j)(tj+1 − tj)

for the approximate amount of work done by F on [tj, tj+1]. This gives riseto the Riemann sum

∑

F (t∗j) · p′(t∗j)(tj+1 − tj)

which converges to the integral

W =

∫ T

0

F (t) · p′(t)dt. (5)

Exercise 36 Let pj : [ti, tj+1] → R3 be a parameterization of the segment

from p(tj) to p(tj+1). Write down this parameterization explicitly and use itto derive the integral in (5). Hint: Break into components and use the meanvalue theorem.

Note: The force F is often given as a function of position in R3 rather than

time. In such a case (5) becomes

W =

∫ T

0

F (p(t)) · p′(t)dt.

Notice that in this setting F is thought of as a force acting solely alongthe path (rather than along the path and in time). This is an example ofsomething called a path integral.

33

Example 21 (17.1.19) Find the work done by the force F (x, y, z) = (x2, xy, z2),on a point that moves long the circular helix r(t) = (cos t, sin t, t) for t ∈[0, 2π].

Solution:

W =

∫2π

0

F · r(t)r′(t)dt

=

∫ 2π

0

(cos2 t, cos t sin t, t2) · (− sin t, cos t, 1)dt

=

∫ 2π

0

(− cos2 t sin t + cos2 t sin t + t2)dt

=

∫ 2π

0

t2dt

=1

3(2π)3

2

Torque

Torque is the quantity corresponding to force in the rotational formalism.It is most naturally considered a scalar τ . One finds however, from therotational equivalent of Newton’s Second Law τ = Iω (where I is rotationalmass or moment of inertia), that τ = ‖r‖‖F‖ sinα where t is the position

vector with respect to the center of rotation, ~F is the force acting, and α isthe angle between the two. Thus, it is natural to define a torque vector by

~τ = r × ~F .

Exercise 37 Explain why I = m‖τ‖2 is chosen as rotational mass (Hint:kinetic energy) and derive from τ = Iω the equation τ = ‖r‖‖F‖ sinα.

Lecture 27 The derivative of f : R1 → R

n; tan-

gent vector

In the course of discussing work, we defined the derivative of a functionf : R

1 → Rn. We now recast that discussion in the context of affine approx-

imation. In this setting, the image of f is a curve in Rn. We again consider

34

an affine approximation

f(x) ∼ f(x0) + L(x − x0)

where L is some linear map (L : R1 → R

n). From the fundamental equation(2) for linear maps, we see that

L(x) = xL(1).

Thus, the action of L is determined by scaling a single vector L(1). Again,this vector (which is the key ingredient in constructing L) is the derivativeof f at x0. And, as mentioned in the previous lecture, L(1) has the explicitvalue

f ′(x0) = (f ′1(x0), . . . , f

′n(x0))

where f has component functions f1, . . . , fn. Perhaps the easiest way to seethat this is the correct vector is to face head on the meaning of the symbol“∼” in the approximation formula.

The Symbol “∼”

We now generalize momentarily to f : Rn → R

m and the approximationformula

f(X) ∼ f(X0) + L(X − X0).

What this means is that when X is close to X0, f(X) is close to f(X0) +L(X − X0). One way to express this might be to say

lim‖X−X0‖→0

‖f(X) − [f(X0) + L(X − X0)]‖ = 0. (6)

Notice that this kind of property would hold even for a “zero order approxi-mation,”

lim‖X−X0‖→0

‖f(X) − f(X0)‖ = 0,

as long as f is continuous. (In fact, this is the definition of what it meansfor f to be continuous.) For a “first order approximation” we want theconvergence in (6) to be “faster.” How do we express that?

35

We want f(X) − [f(X0) + L(X − X0)] to converge to zero faster thansomething else that’s converging to zero, namely ‖X −X0‖. Thus, the “firstorder ∼” means

lim‖X−X0‖→0

‖f(X) − [f(X0) + L(X − X0)]‖‖X − X0‖

= 0. (7)

This means that f(X) − [f(X0) + L(X − X0)] goes to zero fairly quickly.Think about it.

Back to Parametrized Curves

In the particular case f : R1 → R

n, (7) becomes

limx→x0

‖f(x) − f(x0) − (x − x0)L(1)‖|x − x0|

= limx→x0

∥∥∥∥

f(x) − f(x0)

x − x0

− L(1)

∥∥∥∥

= 0.

Breaking into components, f = (f1, . . . , fn), L(1) = (v1, . . . , vn), and we have

limx→x0

∥∥∥∥

fj(x) − fj(x0)

x − x0

− vj

∥∥∥∥≤ lim

x→x0

√√√√

n∑

j=1

(fj(x) − fj(x0)

x − x0

− vj

)2

= limx→x0

∥∥∥∥

f(x) − f(x0)

x − x0

− L(1)

∥∥∥∥

= 0.

We recognize limx→x0

(fj(x)− fj(x0))/(x− x0) as f ′j(x0); write x = x0 + h if you

don’t. Therefore, vj = f ′j(x0), and

L(1) = (f ′1(x0), . . . , f

′n(x0))

as expected.

Functionals and first order approximation

We can also apply our new definition of “∼” to clean up our discussion offunctionals in Lecture 23. If f : R

n → R1, the partial derivative with respect

to xj of f at X0 is defined by the limit

∂f

∂xj

(X0) = limh→0

f(X0 + hej) − f(X0)

h. (8)

36

It’s easy to check that this equation agrees with our intuitive definition of“leaving all the other variables (other than xj) fixed.”

Now, we want to find L : Rn → R

1, i.e., we want m = (m1, . . . , mn) withL(X0) = m · (X − X0), such that

lim‖X−X0‖→0

|f(X) − f(X0) − m · (X − X0)|‖X − X0‖

= 0. (9)

If this is going to hold no matter what X is, then we can set X = X0 + hej

and let h → 0. With this choice (9) becomes

lim|h|→0

|f(X0 + hej) − f(X0) − mjh||h| = lim

|h|→0

∣∣∣∣

f(X0 + hej) − f(X0)

h− mj

∣∣∣∣= 0.

Combining this equation with (8), one sees (Exercise 38) that mj = ∂f

∂xj(X0).

Exercise 38 Show that mj = ∂f

∂xj(X0). Hint:

∣∣∣∣mj −

∂f

∂xj

(X0)

∣∣∣∣≤∣∣∣∣mj −

f(X) − f(X0)

h

∣∣∣∣+

∣∣∣∣

f(X) − f(X0)

h− ∂f

∂xj

(X0)

∣∣∣∣.

Summary Insert for Lectures 17–27

We have studied the structure of Rn.

1. There is an algebraic structure that comes from adding elements in Rn

and multiplying them by constants.2. These can be related to geometry in the plane (the diagonals of a

parallelogram, triangle inequality, and scaling of lengths).3. There is a more intricate geometric structure that comes from the dot

product (angles between vectors).4. We can write various equations using the things we know (sphere, line,

plane).5. Some things are special to R

3 (cross product).

Along the way, we have encountered various functions on Rn and have

especially discussed certain linear functions.

37


1 (called a functional) is determined by aconstant vector (m1, . . . , mn) and the dot product.


n (called a vector field) can be visualizedif it admits a basis of eigenvectors.

3. A linear function L : R1 → R

n (called a parameterized line) is deter-mined by a single vector L(1).

4. The affine approximation for f : Rn → R

1 is built from L(X − X0) =∇f(X0) · (X − X0).

5. The affine approximation for f : R1 → R

n is built from L(t − t0) =f ′(t0) · (t − t0).

6. We did not discuss the derivative of f : Rn → R

n or of f : Rn → R

m

in general.

Lecture 28 Outline of Linear Algebra

Roughly speaking, linear algebra is the study of linear maps. One reasonto study linear maps is that many functions arising in applications are ap-proximated well near a point by a linear map. (This statement is slightlyimprecise; we should really say that the approximation is by an affine map,but the key ingredient in building affine maps is the linear part.) In otherwords, the derivative of a function is closely related to a linear map, so if youwant to do calculus with vector valued functions of several variables, thenyou need to know some things about linear maps.

But there are other reasons to study linear maps. One of those reasonsis simply to understand the solutions of systems of linear equations. Noticethat we have solved several such systems in the course of our discussion. Lin-ear algebra provides a much deeper insight into what we can expect to findas solutions of such systems and how to find those solutions. (This aspectgeneralizes the discussion in high school algebra of 2× 2 systems which rep-resent the intersection of two lines.) Another reason to study linear algebrais that linear maps are (as we shall see) closely related to matrices. There arelots and lots of applications of matrices. The data from a table of statistics,for example, is often manipulated using “matrix algebra.” If these manipula-tions are understood in terms of linear maps, then one can better understandthe implications of the statistics. And there are other applications. We willprimarily focus on calculus applications and applications to systems of linearequations.

38

We have already covered some important preliminaries for linear algebra.Namely, the properties of bases:

1. Definition: A basis is a collection of vectors v1, v2, . . . that allow uniquerepresentation of each vector X as a linear combination

∑cjvj.

2. There are always n vectors in a basis for Rn.

3. There are standard basis vectors in Rn.

4. Bases can be “detected” using determinants of matrices.

From these properties, we obtained the key formula for linear maps L :R

n → Rm

L(X) =

n∑

j=1

xjL(ej). (10)

We now begin a more systematic study of linear algebra beginning with theconsequences of this formula. The key points are as follows:

1. Linear maps are related to matrix multiplication.2. Images of subspaces are subspaces.3. Kernels are subspaces.4. Dimensions must “add up.”5. Change of basis (or how to change complicated linear transformations

into simple ones).

Don’t worry if you don’t know what all the words in these key pointsmean. The reason for listing them is that there are only five of them; fivesimple things to learn. And the first one (which is perhaps the most impor-tant) is really easy, and we can do it now.

Look at (10). If L(ej) is an element in Rm, it looks something like

(a1j , a2j, . . . , amj). Notice that I need to index these coefficients with a “j”because there are n different vectors L(ej) each of which is in R

m. So wehave

L(e1) = (a11, a21, . . . , am1)

L(e2) = (a12, a22, . . . , am2)

...

L(en) = (a1n, a2n, . . . , amn).

39

So aij is the ith component of the image of the jth standard basis vector.To improve the way we visualize (10), let’s think of it as a formula for

column vectors. Thus, we’ll think of L(X) as a column vector and each ofthe L(ej) as a column vector. In this way (10) becomes

L(X) =

∑xja1j∑xja2j

...∑

xjamj

.

Note that each of the components of the vector on the right is a dot productof X with some vector (ai1, ai2, . . . , ain).∗ This is exactly the kind of thingwe described as a matrix multiplication. That is, (10) can be written as

L(X) = AX (11)

where A is the matrix with columns L(ej). The rows are the vectors (ai1, . . . , ain)for i = 1, . . . , m:

A =

a11 a12 · · · a1n

a21 a22 · · · a2n

...am1 am2 · · · amn

. (12)

The matrix A has m rows and n columns; it is an “m×n” matrix. Thus, theaction of any linear map L : R

n → Rm is determined by mn numbers; there

is a one-to-one correspondence between linear transformations and matrices.In the case, L : R

n → R1, the matrix is just a vector

L(X) = (m1 · · ·mn)

x1

...xn

=

(∑

xjxj

)

.

The matrix in (12) is called the matrix corresponding to L. Conversely, thelinear transformation defined by L(X) = AX is called the linear transforma-tion corresponding to A.

∗Remember the case L : Rn → R

1 in which we only had one component here; in thatcase L was determined by a dot product of X with some vector (m1, . . . , mn).

40

Exercise 39 Find the matrices corresponding to the following transforma-tions.

a. L(x, y, z) = (2x + 3y − z, 3x + z).b. r(x) = (x, 3x,−2x).c. L(x, y) = 3x − 2y.d. L(x, y) = (3x − 2y, x).e. L(x, y) = (3x, 2y).f. L(x, y) = (3y, 2x).g. L(x, y) = (x2, xy)

Lecture 29 Lines and Planes (12.6–7)

12.6 Lines

We already know how to parameterize lines. We need a point and a vector:

`(t) = p0 + tv.

This information can come to us in various forms.

Example 22 (12.6.7) Through two points: (1, 0, 3) and (2,−1, 4).

v = (2,−1, 4) − (1, 0, 3) = (1,−1, 1)

`(t) = (1, 0, 3) + t(1,−1, 1)

Example 23 (12.6.3) A point and a direction. Through (3, 1, 0) in the e3

direction`(t) = (3, 1, 0) + t(0, 0, 1).

Example 24 A system of equations ax + by = c, dx + ey = f . That is{(x, y, z) : ax + by = c and dx + ey = f}. Let’s rewrite this:

a11x + a12y = b1

a21x + a22y = b2

which we can write as a matrix-vector equation:

AX = v

41

where A = (aij), X =

(xy

)

, v =

(b1

b2

)

. This will have a unique solution if

det A 6= 0. Why?

Answer: These equations represent two lines in R2. They will intersect

unless they have the same slope. The slopes are a11/a12 and a21/a22. So theequations can be solved as long as a11/a12 6= a21/a22, i.e., a11a22−a12a21 6= 0.

Exercise 40 The lines may still intersect if the slopes are the same. How?

Assume det A 6= 0, then there is a unique (x0, y0) in the intersection.Nothing is said about z, so it can be anything. This is a line

`(t) = (x0, y0, 0) + t(0, 0, 1).

Exercise 41 When does {(x, y, z) : ax + by = c, dx + ez = f} define a line?

Example 25 (12.6.11) 2x − 4y = −14, 4y − z = 12.

Solution 1:

2 −4 00 4 −10 0 0

xyz

=

−14120

. Set z = t, then we can

try to solve for x and y in the z = t plane:(

2 −40 4

)(xy

)

=

(−14

12 + t

)

x =−56 + 4(12 + t)

8, y =

2(12 + t)

8

=4t − 8

8=

2t + 24

8

=1

2t − 1 =

1

4t + 3.

`1(t) =

(1

2t − 1,

1

4t + 3, t

)

= (−1, 3, 0) + t

(1

2,1

4, 1

)

.

Solution 2: Set y = t. Then x = 4t−14

2= 2t − 7 and z = 4t − 12.

`2(t) = (2t − 7, t, 4t − 12) = (−7, 0,−12) + t(2, 1, 4).

ARE THESE THE SAME LINES?

42

Scale the direction vector in `1 : ˜1(t) = (−1, 3, 0) + t(2, 1, 4). Pick a dif-

ferent point: ˜1(−3) = (−7, 0,−12). `1 and `2 are different parameterizations

of the same line. Speed of `1 =√

1

4+ 1

16+ 1. Speed of `2 =

√4 + 1 + 16.

Example 26 Distance from a point to a line [12.7.29]

(1, 2, 3), `(t) = (1, 0, 2) + t(1,−2, 3)

1−23

×

021

=

−8−12

has length√

5 ·√

14 sin α︸︷︷︸√

69

√5 sin α =

√69√14

In general, the distance from q to `(t) = p + tv is

‖q − p‖ sin α =‖(p − q) × v‖

‖v‖ .

12.7 Planes

The parametric representation of a 2-plane in Rn is

p(s, t) = p0 + sv1 + tv2

where v1 and v2 are linearly independent i.e., neither vector is a scaling ofthe other. A hyperplane in R

n is parametrically represented by

p(µ1, . . . , µn−1) = p0 +n−1∑

j=1

µjvj

where v1, . . . , vn−1 are n − 1 linearly independent vectors, i.e., the only wayn−1∑

j=1

cjvj can be the zero vector is if c1 = c2 = · · · = cn−1 = 0.

Exercise 42 Show that this definition of linearly independent agrees withthe one we used for two vectors.

43

The notions of a 2-plane in Rn and a hyperplane in R

n coincide in thespecial case n = 3. Let’s concentrate on that case for a moment. An im-portant geometric entity related to a (hyper)plane Π in R

3 is the normaldirection (or normal vector). This can be obtained by taking the cross prod-uct w = v1 × v2 = (a, b, c). Given a point p0 = (x0, y0, z0) in the plane Π,every other point p = (x, y, z) ∈ Π satisfies

(p − p0) · w = 0. (13)

In this way, Π can be described by a single equation:

0p

p

w

(x − x0)a + (y − y0)b + (z − z0)c = 0 (14)

orax + by + cz = d (15)

where d = ax0 + by0 + cz0. Conversely, any equation of the form (13), (14),or (15) defines a plane in R

3.

Exercise 43 Find a parametric representation for Π = {(x, y, z) : 3x+2y−z = 7}.

Example 27 (12.7.23) Distance from (2,−1, 3) to 2x + 4y − z + 1 = 0.

Solution: A normal vector is w = (2, 4,−1) and p0 = (0, 0, 1) is on theplane.

44

0p

q=(2,−1,3)

|| q−p ||0

distance = |projw(q − p0)| = |‖q − p0‖ cos α|

=

∣∣∣∣

1

‖w‖ (q − p0) · w∣∣∣∣

=|aξ + bη + cζ − d|√

a2 + b2 + c2

=|(2,−1, 2) · (2, 4,−1)|√

21=

2√21

. 2

Example 28 (12.7.15) Angle between the planes

5(x − 1) − 3(y + 2) + 2z = 0

x + 3(y − 1) + 2(z − 4) = 0

w = (5,−3, 2)/√

38 cos α = 0

w = (1, 3, 2)/√

14 α = π2

2

It’s also true that a hyperplane in Rn can be described by a single equa-

tion. In fact, it’s the same equation as (13); the only difference is thatp, p0 and w are vectors in R

n. The vector w can be any vector normal tov1, . . . , vn−1, though such a vector is rather more difficult to find when n > 3.∗

Nevertheless, the set

Π = {p ∈ Rn : (p − p0) · w = 0}

is a hyperplane in Rn. It’s also true that given a non-zero vector w ∈ R

n,there are n − 1 linearly independent vectors v1, . . . , vn−1 orthogonal to w

∗Unless you know some linear algebra.

45

such that Π is parameterized on Rn−1 by p(µ) = p0 +

∑µjvj. The vectors

v1, . . . , vn−1 are also not easy to find in general.∗

Remark 1 We have discussed two hypersurfaces, the hypersphere and thehyperplane. In general, a hypersurface is a set in R

n with the property thatassociated to every small enough part S of it, there is a function f : R

n → R1

such that ∇f 6= 0 on S and {X ∈ Rn : f(X) = 0} ∩ S = S.

Exercise 44 Verify that hyperspheres and hyperplanes are hypersurfaces.

Lectures 30–31 More Linear algebra

Abstractly, a vector space is a set that is closed under addition and scalarmultiplication. (“Closed” just means you can’t get out of the set by applyingthese operations.) For example, the plane z = 1 in R

3 is not closed underaddition because (0, 0, 1) and (0, 1, 1) are in there, but (0, 0, 1) + (0, 1, 1) =(0, 1, 2) is not.

The prototypical examples of vector spaces are the spaces Rn, n =

1, 2, . . . .Think about this: The set of all scalings of (1, 1) is a subset of R

2 thatis closed under addition and scalar multiplication.

a1(1, 1) + a2(1, 1) = (a1 + a2)(1, 1)

ca(1, 1) = (ca)(1, 1).

Let’s call this set V = {a(1, 1) : a ∈ R} ⊆ R2. This set V is a vector space.

It is a also subset of another vector space R2. We call such sets subspaces.

Exercise 45 Show that if v1, . . . , vk are vectors in Rn, then V =

{∑k

j=1ajvj : aj ∈ R

}

(the set of all linear combinations of v1, . . . , vk) is a subspace of Rn.

Exercise 46 Is the line {p0 + tv : t ∈ R1} a subspace of R

n?

Exercise 47 If L : V → W is a linear map of a vector space V into anothervector space W , and V0 is a subspace of V , then L(V0) = {L(v) : v ∈ V0} isa subspace of W .

46

Exercise 45 asserts that the image of a subspace under a linear map is asubspace. This fact can help us understand the action of linear maps. Beforepursuing this line of thought further, however, let’s firm up what we knowabout bases.

Theorem 1 Any subspace V of Rn has a basis of finitely many vectors

v1, . . . , vk. Thus, V is exactly the collection of linear combinations {∑

ajvj :aj ∈ R}.Proof. Let v1 be any non-zero vector in V . Notice that

V1 = {av1 : a ∈ R}is a subspace of V . If V1 = V , great! We’re done.

If not, then there is a non-zero vector v2 ∈ V \V1 (i.e., v2 ∈ V but isnot a scaling of v1). We claim that {v1, v2} is a linearly independent set.To see this assume a1v1 + a2v2 = 0. If a2 6= 0, then v2 = (a1/a2)v1 whichcontradicts the fact that v2 6∈ V1. Therefore, a2 = 0. This means a1v1 = 0and, consequently, a1 = 0 too.

By Exercise 45,

V2 =

{2∑

j=1

ajvj : aj ∈ R

}

is a subspace of V . If V2 = V , then again, we’re done. If not, we add anothervector to our linearly independent subset of V .

Exercise 48 Show that if V` ={∑`

j=1ajvj : aj ∈ R

}

6= V and v1, . . . , v` are

linearly independent, then there is a linearly independent set {v1, . . . , v`+1} ⊆V .

Obviously this procedure must stop. When it does, we’ll have a linearlyindependent set {v1, . . . , vk} that spans V . 2

Remark 2 Our original definition of a basis v1, . . . , vk required that thevectors span the space and provide unique representation, i.e., if

∑cjvj =

∑djvj then cj = dj for j = 1, . . . , k. Now we have the concept of linear

independence (from Lecture 29). These notions are related as follows.

Exercise 49 Show that any basis is a set of linearly independent vectors.Conversely, show that any spanning set of linearly independent vectors is abasis.

47

Now that we’ve gotten that out of the way, it’s time for a

Confession: To be strictly rigorous, we need to know why the procedurein the proof of Theorem 1 can’t go on forever and produce an infinite set{v1, v2, . . . } of linearly independent vectors in V .

This is related to Fact 1 in Lecture 19 that every basis for Rn has n

vectors. The following technical lemma will help patch things up.

Lemma 1 If V is a vector space that is spanned by m vectors w1, . . . , wm,then every set of more than m vectors in V is linearly dependent.

Proof. It is enough to show that any m + 1 vectors must be linearly depen-dent. We will use induction on m.

Say m = 1. Assume we have two (linearly independent) vectors v1 andv2. Since m = 1,

v1 = a11w1

v2 = a21w1.

It is easily checked that a11v2 − a21v1 = 0. This is a contradiction since thecoefficients are non-zero. This settles the case m = 1.

Now say Lemma 1 holds for m = 1, . . . , k.What if k + 1 vectors span V and we have k + 2 (linearly independent)

vectors v1, . . . , vk+2 in V ? Let’s say V is the span of w1, . . . , wk+1. A nicenotation for this is V = 〈w1, . . . , wk+1〉. Since these wj span, we can write

v1 =k+1∑

j=1

a1jwj

v2 =k+1∑

j=1

a2jwj

...

vk+2 =k+1∑

j=1

ak+2,jwj.

At least one of the aij is non-zero. We can assume (by renumbering the vi

if necessary) that a11 6= 0. This means that v1/a11 = w1 +∑k+1

j=2(a1j/a11)wj.

48

Using this, we can effectively eliminate the w1 term from the expressions forv2, . . . , vk+2. To be precise.

v2 − (a21/a11)v1 =k+1∑

j=2

b2jwj

v3 − (a31/a11)v1 =k+1∑

j=2

b3jwj

...

vk+2 − (ak+2,1/a11)v1 =

k+1∑

j=2

bk+2,jwj

for some coefficients bij. Therefore, these vectors are in 〈w2, . . . , wk+1〉 and,by the lemma in the case m = k, we know they are linearly dependent. Thatis there are numbers µ2, . . . , µk+2, not all zero for which

µ2v2 + µ3v3 + · · · + µk+2vk+2 −k+2∑

j=2

µj(aj1/a11)v1 = 0.

Since this is a linear combination of v1, . . . , vk+2 with some non-zero coeffi-cient, we have that v1, . . . , vk+2 are linearly dependent.

This proves the lemma. 2

Looking back at the proof of Theorem 1 now gives,

Corollary 1 The basis of any subspace of Rn must have k ≤ n elements.

Exercise 50 Show that any two bases of a subspace of Rn must have the

same number of elements.

From Exercise 50 we get, in particular, that Fact 1 is true.

Corollary 2 Any basis of Rn must have n elements.

Exercise 50 also allows us to make the following important definition.

Definition 5 (dimension) The dimension of a subspace of Rn is the num-

ber of elements in a basis for that subspace.

49

We now ask: If V is a subspace in Rn of dimension k and L : R

n → Rm,

what dimension can L(V ) have?

Exercise 51 Show that dim L(V ) ≤ min{k, m}. Give examples (of L) toshow that any dimension ` ≤ min{k, m} is possible.

Lecture 32 Kernels

We now pursue further the question “if L : V → W , what does the imageof L look like?” Here we have in mind that V is a subspace of R

n and Wis a subspace of R

m. If this seems too confusing, just think about the caseV = R

n and W = Rm—there’s really not much difference.

We already know an important fact about the image L(V ). It’s a sub-space. We also know that dim L(V ) ≤ min{dim W, dim V }. In this lecturewe expand our understanding of dim L(V ).

Hopefully, it has not escaped your notice that some domain vectors canbe mapped by a linear transformation to 0 ∈ R

m. These vectors form asubspace of the domain V .

Definition/Proposition 1 The kernel or null space of a linear transforma-tion L : V → W is the subspace defined by

K = K(L) = {v ∈ V : L(v) = 0}.

Exercise 52 Show that K(L) is a subspace.

Exercise 53 Find K = K(L) if L(x, y) = 3x. (Here we are just takingL : R

2 → R1, though one could consider L on some subspace of R

2 likeV = {c(1, 1) : c ∈ R} or the y-axis {c(0, 1) : c ∈ R}.)The most important thing about the kernel is that its dimension tellsus exactly what is the dimension of the image L(V ). This works as follows.If you look at all the vectors in V , there are some (in the kernel) that getcollapsed to 0 by L and others that play some role in spanning the imageL(V ). Dimension-wise this is a complete description of what L does to thevectors in V , that is

dim V = dim K + dim L(V ).

This is called the rank nullity theorem.

50

Definition 6 The rank and nullity of a transformation L are numbers:

rank(L) = dim L(V )

nullity(L) = dim K(L).

Exercise 54 Verify the rank-nullity theorem for each of the cases describedin Exercise 53.

Exercise 55 If {v1, v2} is a basis for V and L : V → V is a linear transfor-mation with L(v1) = v1 + v2 and L(v2) = −v1 − v2, find the rank and nullityof L.

Exercise 56 Let V be an n-dimensional vector space and let L : V → R1

be a nonzero linear transformation. Let a be a real number. Show that{v ∈ V : L(v) = a} is a hyperplane in V (i.e., an affine space of dimensionn − 1).

Exercise 57 Let L : Rn → R

n. Show that the rank(L) is the number oflinearly independent columns in the matrix corresponding to L.

Lecture 33 Consequences of the Rank-Nullity

Theorem; Systems of Linear Equations

We begin with an example. In high school one learns how to solve a linearsystem of equations like

{

a11x1 + a12x2 = b1

a21x1 + a22x2 = b2.(16)

(One solves for x1 and x2 in terms of the a’s and b’s.) One learns in highschool (or 8th grade maybe) that there are three cases:

non-parallel lines(exactly one solution)

two parallel lines(no solution)

the same parallel line—twice(lots of solutions)

51

We want to view this system from a new perspective and use what weknow about kernels and images. First of all, we recognize the system as asingle matrix-vector equation for the vector (x1, x2):

(a11 a12

a21 a22

)(x1

x2

)

=

(b1

b2

)

. (17)

We can make this look simpler if we write X = (x1, x2), b = (b1, b2), and letA be the matrix of the a’s:

Ax = b. (18)

Finally, we remember that L(X) = AX is the formula for a linear transfor-mation L : R

2 → R2 so the equation becomes

L(X) = b, (19)

and we are asking the simple questions, (1) is the vector b in the image of L,and (2) if so which vectors X map to b?

From our new point of view, we see a different collection of cases—somewe didn’t even think about before. We know that L(R2) is a subspace of twoor fewer dimensions.

CASE 1 dim L(R2) = 0. This means that K(L) = R2, i.e., everything maps

to 0.

Exercise 58 Show that this happens only if all the a’s are 0, i.e., A is thezero matrix.

In this case, our system becomes

0 = b1,

0 = b2,

and it doesn’t involve x’s at all. Either these two equations are true or they’renot true. If the vector b = 0, then all vectors X ∈ R

2 are solutions. If b 6= 0,there is no solution.

CASE 2 dim L(R2) = 1. In this case, the image is a one-dimensional sub-space (a line through the origin), and the kernel has one dimension as well;K = {cv : c ∈ R} where v is some non-zero vector in the domain withL(v) = 0.

52

The vector b may either fall on the image line or not. Thus, both caseswith parallel lines are grouped together here.

.b L(R )2

L(R )2

. b

two parallel lines the same parallel line—twice

In order to see that there are infinitely many solutions in the second case,we make the following key observation.

Proposition 1 If X0 is a solution of AX = b and Y ∈ K, then X0 + Y isalso a solution. Conversely, every solution X is of the form X0 +Y for someY ∈ K.

Proof. To see the first assertion, simply note that

L(X0 + Y ) = L(X0) + 0 = b.

For the second assertion, let Y = X − X0. Then

L(X − X0) = L(X) − L(X0) = b − b = 0.

Thus, Y = X − X0 ∈ K. 2

We see now exactly what the solution set is:

{X = X0 + cv : c ∈ R}.

This is a line (an affine hyperplane) in R2.

L(R )2

. b

L

K

X0

53

Remark 3 The old high school picture has the advantage that it nicelydistinguishes between having a solution and not having a solution. Theseare both crammed into case 2 in our new point of view. Notice, however,that a tiny change in the a’s or b’s in case 2 can change having a solutioninto not having one. In this sense, the property of having a solution in case 2is unstable and is not much different from having no solution. As we willsee, the property of having a solution is stable when the mapping has fullrank, and the perspective of linear mappings does a nice job of pointingthis out. Furthermore, this new point of view is more versatile in higherdimensions—that’s why the emphasis is on 2 × 2 systems in high school.

CASE 3 L(R2) = R2; K(L) = {0}. In this case, not only is there a solution,

but every b in (the target) R2 is covered exactly once by the image of L. To

see this, note that if AX1 = AX2 = b, then L(X1 − X2) = 0, so X1 − X2 ∈K = {0}. So X1 = X2. This nice situation is duplicated in general and canbe nicely characterized as we shall now see.

Theorem 2 If L : V → V , then the following are equivalent:

(1) L has an inverse (i.e., L is invertible).

(2) L is onto (i.e., L(V ) = V ).

(3) L is one-to-one (i.e., L(X) = L(Y ) ⇒ X = Y ).

Proof. (1) ⇒ (2). Notice that when we say L has “an inverse,” we meana map L−1 : V → V that “undoes” L. Thus, (2) and (3) really followimmediately from (1). Nevertheless, we give a formal proof in this case. Letw ∈ V . Set v = L−1(w). Then L(v) = L(L−1(w)) = w, so L is onto.

(2) ⇒ (3). If L(V ) = V , then by the rank nullity theorem, dim K(L) = 0,i.e., there is exactly one element, namely, 0, that maps to 0. Now assumeL(X) = L(Y ). Then L(X − Y ) = 0. So X − Y = 0. So X = Y . So L isone-to-one.

(3) ⇒ (1). If L is one-to-one, then L is invertible on its image. The onlything to show is that L(V ) = V . Note that dim K(L) = 0 (since the onlything that maps to 0 is 0). Therefore, by rank-nullity, dim L(V ) = dim V −dim K = dim V . So L(V ) is a subspace of V with the same dimension. Itfollows that L(V ) = V . 2

Exercise 59 Show that if V is a subspace of W and dim V = dim W , thenV = W .

54

Corollary 3 If L : Rn → R

n is one-to-one, then L(X) = b has a uniquesolution for every b ∈ R

n.

Looking back at equations (17), (18), (19), the student should begin tosuspect that the invertibility of L should have something to do with the ma-trix inverse of A and the linear independence of the columns L(e1), . . . , L(en)of A. If not, here are some exercises to make you suspect such connections.In each case L : R

n → Rn is linear.

Exercise 60 Show that L(e1), . . . , L(en) are the columns of A, i.e., L(ej) =(a1j , . . . , anj) where A = (aij) is the matrix corresponding to L.

Exercise 61 Show that dim L(Rn) (the rank of L) is the number of linearlyindependent columns in the matrix of L.

Exercise 62 Show that if the columns of A are linearly independent, thenL(X) = b has a unique solution for every b ∈ R

n. (Hint: Use Theorem 2.)

Before giving an expanded discussion of determinants in the next lecture(and a derivation of Fact 2 from Lecture 19) we describe what happensin general if L is not onto. Let us assume that K(L) = 〈v1, . . . , vk〉 hasdimension k > 0. We see from Proposition 1 that the solution set of L(X) = bis either empty or has the form

{X0 +∑

cjvj : cj ∈ R},

where X0 is any particular solution of the equation. Such a set is called ak-dimensional affine subspace. This terminology is misleading since the setitself is not a subspace unless X0 = 0. But it is certainly an affine translationof the kernel (which is a subspace).

In all of the cases just described, one can change the a’s or the b’s byan arbitrarily small amount and get a system of linear equations with nosolution. This is the property that really makes the case K = {0} far superiorto the others.

Furthermore, it can be argued that the case K = {0} is the most likelyor most common.

55

“n equations in n unknowns”

Any student of science in a university will hear many times “this is a systemof n equations in n unknowns, so we know it defines a unique solution.”We have seen that this statement is not, strictly speaking, correct even forsystems of linear equations—there are sometimes infinitely many solutions,or no solutions. Nevertheless, we are now in a position to understand whythis is a common refrain.

If the n equations happen to be linear equations (and if they’re not,presumably they can be approximated by linear equations), then having aunique solution is equivalent to having the columns of a matrix A be linearlyindependent. Now if you just happen to pick n column vectors in R

n, arethey more likely to be linearly independent or dependent? It turns out theyare more likely to be linearly independent.

Think about picking two vectors in R2. The first choice gives you a single

linearly independent set {v1}, unless you have been so unfortunate that youpicked v1 = 0. Now is it more likely to pick a vector v2 in 〈v1〉 or R

2\〈v1〉?Certainly in terms of area it’s more likely to get v2 ∈ R

2\〈v1〉 since 〈v1〉 is aline with no area and R

2\〈v1〉 has infinite area. The same kind of reasoningapplies to picking n vectors in R

n.

Lecture 33a Proof of the Rank-Nullity Theo-

rem

Theorem 3 If L : V → W , then

dim V = dim L(V ) + dim K(L).

Proof. Let {v1, . . . , vk} be a basis for K(L).

Lemma 2 There is a basis for V with v1, . . . , vk in it. More generally, anybasis of a subspace can be “completed” to a basis for the entire ambient space.

Exercise 63 Prove Lemma 2.

Let {v1, . . . , vk, . . . , vn} be a basis for V . Then note that dim V = n. It isenough to show that {L(vk+1), . . . , L(vn)} is a basis for L(V ).

56

The vectors L(vk+1), . . . , L(vn) clearly span L(V ) since 〈L(v1), . . . , L(vn)〉 =L(V ) and L(v1), . . . , L(vk) are all zero. To be explicit, any w ∈ L(V ) hasw = L(v) for some v, and v =

∑n

j=1ajvj. Therefore,

w = L(v) =n∑

j=1

ajL(vj) =n∑

j=k+1

ajL(vj).

Thus, 〈L(vk+1), . . . , L(vn)〉 = V .We need to show linear independence. Assume

∑n

j=k+1ajL(vj) = 0.

Since∑

ajL(vj) = L(∑

ajvj), it follows that∑n

j=k+1ajvj ∈ K(L). There-

fore,n∑

j=k+1

ajvj =

k∑

j=1

ajvj

for some a1, . . . , ak. Therefore,∑k

j=1(−aj)vj +

∑n

j=k+1ajvj = 0. Since

v1, . . . , vn is a basis, we must have a1, . . . , an = 0, and ak+1, . . . , an = 0in particular.

This proves that vk+1, . . . , vn is a basis for L(V ), and dim L(V )+dim K(L) =(n − k) + k = n = dim V . 2

Lectures 34–35 Determinants and Invertibility

We have discussed how to compute determinants and the fact (Fact 2, Lec-ture 19) that a determinant is non-zero exactly when the columns of thematrix are linearly independent. This fact is complicated to prove directlyfrom the definition of the determinant. In fact, the real problem is that wehaven’t given a solid definition of, but only a procedure for computing, de-terminants. We said that you take any row (or column) and use the entries(with appropriate signs) as coefficients for a linear combination of certaincofactor determinants of smaller size. Each cofactor matrix is obtained bydeleting the row and column of the coefficient element. So

∣∣∣∣∣∣

2 1 −10 3 1−1 5 1

∣∣∣∣∣∣

= 2

∣∣∣∣

3 15 1

∣∣∣∣− 0

∣∣∣∣

1 −15 1

∣∣∣∣− 1

∣∣∣∣

1 −13 1

∣∣∣∣

57

where we have expanded along the first column. We could equally well haveexpanded along the second row to get

∣∣∣∣∣∣

2 1 −10 3 1−1 5 1

∣∣∣∣∣∣

= −0

∣∣∣∣

1 −15 1

∣∣∣∣+ 3

∣∣∣∣

2 −1−1 1

∣∣∣∣− 1

∣∣∣∣

2 1−1 5

∣∣∣∣.

You can check that these are the same number. In fact, this “recipe” alwaysworks, but it’s complicated to prove. We will leave the justification to amore in-depth course on linear algebra.† For now, we will be content toassume the definition makes sense and that we understand it. For intuition,it will also be worthwhile to memorize a couple other facts. Throughoutthis lecture, let A be an n × n matrix (aij) with columns v1, . . . , vn andcorresponding linear transformation L. Denote the determinant of A bydet A. After the recipe/definition, the most important fact everyone shouldknow about determinants is the following.

Fact 3 The n-dimensional volume of the parallelepiped in Rn spanned by

v1, . . . , vn is given by | detA|.

Exercise 64 Prove Fact 3 when n = 3 or when vj = ej, j = 1, . . . , n.

Fact 4 The sign of det A gives the orientation of v1, . . . , vn. If det A > 0,v1, . . . , vn is called a positively oriented basis.

Exercise 65 Show that a positively oriented basis is the same as a righthanded basis when n = 3.

We now proceed to accumulate other important facts about determi-nants.

Theorem 4 (i) If two column vectors in A are equal, then det A = 0.

(ii) If A is obtained from A by replacing the ith column vi by vi + vj forany j 6= i, then

det A = det A.

†The interested student is encouraged to consult the book Linear Algebra, An Intro-

ductory Approach by Charles Curtis.

58

(iii) If A is obtained from A by replacing the ith column vi by αvi, then

det A = α det A.

(iv) If A is obtained from A by interchanging two columns, then

det A = − det A.

Proof. (i) Proof by induction. If n = 2.

A =

(a11 a11

a21 a21

)

.

det A = a11a21 − a11a21 = 0.

Assume inductively that the result holds for all k × k matrices. Look at a(k + 1) × (k + 1) matrix with two equal columns:

a1j a1j

· · · ... · · · ... · · ·ak+1 j ak+1 j

.

Expanding along any other column, it is easy to see that all the k×k cofactormatrices will have two columns that match. Thus, all the k×k cofactors havezero determinant, by the inductive hypothesis. Therefore, the big matrix haszero determinant too.

(ii) This follows from (i) since det A = det A+detA0 where A0 is a matrixwith the column vj appearing twice.

(iii) This also follows easily from the definition by expanding along theαvi column. 2

Exercise 66 Prove statement (iv) by induction.

Perhaps more important than the proof are the heuristic arguments thatcorrespond to Fact 3:

(i) The parallelepiped spanned by n − 1 vectors cannot enclose any n-volume in R

n.

59

(ii) Adding two vectors that are already there, only skews the parallelepiped;volume remains the same.

(iii) Scaling one dimension scales the volume by the same factor.

(iv) Switching the order of two columns changes the orientation. 2

We now generalize certain aspects of Theorem 4.

Theorem 5 (v) The mapping v 7→ det A, where A is obtained by replacingthe ith column of A by v, is linear.

(vi) If A is obtained by replacing the ith column vi by vi +∑

i 6=j αjvj, then

det A = A.

Exercise 67 Prove Theorem 5. (Hint: Prove (v) directly from the recipe/definition, then use (v), (ii), and (iii) to prove (vi).)

Theorem 5 part (vi) makes it clear why one direction of Fact 2 holds.

Derivation of Fact 2. If the columns of A are linearly dependent, then oneof the columns, say vi can be expressed in terms of the others:

vi =∑

j 6=i

αjvj.

60

On the other hand, it follows from (vi) that

det A = det A

where A is obtained from A by replacing vi with vi −∑

j 6=i αjvj = 0. Clearly,

det A = 0.We have shown that if det A 6= 0, then v1, . . . , vn are linearly independent.In order to obtain the reverse direction, we will need what is probably

the single most useful fact about determinants. (This is also complicated toprove, so we will not prove it.)

Fact 5 (The Product Formula for Determinants) If A and B are n×nmatrices, then

det(AB) = det A det B.

Now we can see the other direction. If the columns of A are linearlyindependent, then we can express the standard basis vectors e1, . . . , en aslinear combinations of v1, . . . , vn. We write

ei =∑

j

βijvj.

We recognize the sum on the right as the ith column of AB where B is thetransposed matrix with βji in the ijth slot. Thus,

1 = det AB = det A detB.

There is therefore no way that det A can be zero. 2

Exercise 68 Verify Fact 5 when n = 2 or 3.

We end this lecture with one more fact that should be memorized andfollows easily from our definition of determinants.

Fact 6 If AT is the transposed matrix obtained from A by exchanging therows and columns, then

det AT = det A.

Exercise 69 Prove Fact 6 from the recipe/definition of determinants.

For a rigorous treatment of the definition of determinants, from whichour other unproved facts (namely Fact 5 and Fact 6) follow fairly easily, theinterested student is referred to Chapter 5 of Linear Algebra, An IntroductoryApproach by Charles Curtis.

61

Two last comments on determinants and ma-

trices

1. Testing the linear independence of n vectors v1, . . . , vn is equivalent toshowing that a system of linear equations (for the coefficients) has aunique solution, namely a1 = · · · = an = 0.

∑

ajv1j = 0∑

ajv2j = 0

...∑

ajvnj = 0.

Our discussion shows that v1, . . . , vn will be linearly independent ex-actly when the matrix with v1, . . . , vn as columns has nonzero determi-nant.

2. When det A 6= 0, the corresponding linear transformation L has aninverse. Associated to L−1 is a matrix we call A−1. This inverse matrixhas the properties AA−1 = I = A−1A (matrix multiplication) and isgiven by the formula

A−1 =1

det A(Acof)T .

(Acof is the matrix obtained from A by replacing the ij-th entry withthe appropriate sign times the ij-th cofactor determinant. The “ap-propriate sign” is the one from the sign diagram; see the Mini-Lectureon Determinants.)

Lecture 36 Changing Bases

We have one final topic to cover in linear algebra. This is perhaps the firsttopic to which we have come that should be considered truly abstract. Thestudent will be happy to know, however, that we return at this point toeigenvalues and eigenvectors.

62

Remember that some linear transformations are especially simple to un-derstand. They are those corresponding to diagonal matrices.

λ1

λ2 0...

0 λn

.

The transformation corresponding to this matrix scales each unit vector ej

by the factor λj (with perhaps a flip if λj < 0).

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-2 -1 1 2

-0.4-0.2

0.20.4

L(X) =

(2 00 1/2

)

X.

Furthermore, we’ve noted the possibility that a transformation scaleslike this—but in some other directions rather than along the standard basisvectors:

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-3 -2 -1 1 2 3

-1

-0.5

0.5

1

This is more complicated, but the technique of changing bases allows us tothink of them as the same. In effect, we will “straighten out” the skewnessof the mapping L. We return to an example of Lecture 17:

L(x, y) = (3x − 2y, x).

63

We first solve for the eigenvalues and eigenvectors:

{

3x − 2y = λx

x = λy

{

(3 − λ)x − 2y = λx

x − λy = 0.

On the face of it, we have two nonlinear equations for three unknowns x, yand λ. Using our theory, however, we know that the linear system for x andy will have nonzero solutions∗ exactly when the determinant

∣∣∣∣

3 − λ −21 −λ

∣∣∣∣= 0.

This gives us a single (polynomial) equation for the single variable λ! Thatequation is

λ2 − 3λ + 2 = 0.

The solutions (eigenvalues) are λ1 = 1 and λ2 = 2. For each possibility, wecan return to the original system and solve for x and y:

2x − 2y = 0 x − 2y = 0

x − y = 0 or x − 2y= 0.

In both cases, the two equations are essentially the same. This is what weshould expect—why? These systems of equations are describing the onedimensional kernel of some linear map.

The first map is L(x, y) = (2x−2y, x−y) and its kernel is the line x = y.This is not any line, it is a subspace—the subspace 〈(1, 1)〉.

The second map is L(x, y) = (x − 2y, x − 2y) and its kernel is 〈(2, 1)〉.Notice that (2, 1) is any nonzero solution, or equivalently any nonzero elementof the kernel, or equivalently any nonzero point on the line.

Exercise 70 Find the eigenvalues and eigenvectors associated to the follow-ing maps

(1) L(x, y) = (4x + 5y,−x − 2y)

∗Remember that eigenvectors are nonzero by definition.

64

(2) L(x, y) = (−3x − 2y, 2x − 2y)

(3) L(x, y) = (x, x − 2y)

(4) L(x, y, z) = (y, z,−6x − 11y − 6z)

(5) L(x, y) = (x − 2y, x − y)

(6) L(x, y) = (y,−x).

OK, now comes the tricky part. We’re going to make up a new vectorspace, and it’s going to look like R

2. But its basis vectors are going to be theeigenvectors v1 = (1, 1) and v2 = (2, 1). Notice that this really does look likeR

2. The vectors in this new space are of the form a1v1 + a2v2, so they aredetermined by two coefficient numbers (a1, a2). But this ordered pair is a bitsuspicious because it does not denote a1e1 + a2e2 as we formerly agreed; itdenotes a1v1 + a2v2. Nevertheless, in the abstract vector space 〈v1, v2〉, I amallowed to think of v1 and v2 as my designated standard basis vectors.

Now let’s repeat this and get it straight. I have a linear transformationL : R

2 → R2 where R

2 is the standard R2 and the matrix for L is

(3 −21 0

)

.

I’m also going to talk about an abstract vector space of dimension 2, call inA, which looks just like R

2 as a set but in which points (a1, a2) correspondto vectors a1v1 + a2v2.

Notice in particular that (1, 0) corresponds to v1 and (0, 1) correspondsto v2. That’s why we call v1 and v2 designated standard basis vectors. Nowthis is going to get really confusing—since we are writing (1, 0) to mean twodifferent things. It means e1 ∈ R

2 and the vector corresponding to v1 in A.We need some way to keep track and tell us how to relabel vectors to getbetween R

2 and A:R

2 L−→ R2

↑ ↓A A

If we’re given (a1, a2) ∈ A, then this corresponds to a1v1 +a2v2 in R2. That’s

easy. If we’re given (x, y) ∈ R2, however, it’s not so obvious how to get the

corresponding vector in A. Fortunately, linearity saves the day, for the mapM : A → R

2 given by M(a1, a2) = a1v1 + a2v2 is a linear map. (Check it!)

65

Furthermore, if we have a basis of eigenvectors (as we do in this case), thenM is invertible.

Exercise 71 What is the matrix B that corresponds to M?

Exercise 72 Find B−1.

Exercise 73 Find M−1 : R2 → A.

An invertible transformation M : R2 → R

2 is called a change of basis.Its matrix B is called a change of basis matrix. These change of bases allowus to define a new linear map L : A → A by L = M−1 · L · M .

R2 L−→ R

2

M ↑ ↓ M−1

A L- - - > A .

Since A is really just an abstract R2, it makes perfect sense to write down

a matrix A associated in L. To do this, we need to express the image of(1, 0) ∈ A and (0, 1) ∈ A as ordered pairs in A.

M(1, 0) = v1 ∈ R2

L ◦ M(1, 0) = L(v1) = λ1v1 ∈ R2

M−1 ◦ L ◦ M(1, 0) = M−1(λ1v1) = λ1M−1(v1) ∈ A

= λ1(1, 0) ∈ A.

This gives us the first column of A. Similarly, the second column of A isL(0, 1) = (0, λ2). So the matrix of L is

(λ1 00 λ2

)

=

(1 00 2

)

.

This is much simpler than the original matrix A but encodes the crucialinformation that L stretches in two invariant directions.

66

-1 -0.5 0.5 1

-1

-0.5

0.5

1

-1 -0.5 0.5 1

-2

-1

1

2

The diagonal matrix A is called the matrix of L with respect to the basis{v1, v2}. One also says that A can be diagonalized. Of course, A = B−1AB.

We have essentially shown

Theorem 6 If L admits a basis of eigenvectors, then after a change of basisL has the form

L

x1

...xn

=

λ1x1

...λnxn

=

λ1 0. . .

0 λn

x1

...xn

.

Furthermore, if there is a basis of eigenvectors, you should be able to findthem, their corresponding eigenvalues, and the appropriate change of basis.

Example 29 Diagonalize the matrix

A =

1 1 1 11 1 1 11 1 1 11 1 1 1

.

Solution: We look for vectors x 6= 0 with Ax = λx. That is, (A−λI)X = 0.

67

Such vectors will exist if det(A − λI) = 0.

∣∣∣∣∣∣∣∣

1 − λ 1 1 11 1 − λ 1 11 1 1 − λ 11 1 1 1 − λ

∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣

−λ 0 0 1λ −λ 0 10 λ −λ 10 0 λ 1 − λ

∣∣∣∣∣∣∣∣

= −λ

∣∣∣∣∣∣

−λ 0 1λ −λ 10 λ 1 − λ

∣∣∣∣∣∣

− λ

∣∣∣∣∣∣

0 0 1λ −λ 10 λ 1 − λ

∣∣∣∣∣∣

= −λ

{

−λ

∣∣∣∣

−λ 1λ 1 − λ

∣∣∣∣− λ

∣∣∣∣

0 1λ 1 − λ

∣∣∣∣+

∣∣∣∣

λ −λ0 λ

∣∣∣∣

}

= −λ{−λ(λ2 − λ − λ) − λ(−λ) + λ2

}

= λ2{λ2 − 2λ − λ − λ

}= λ3{λ − 4}

λ1 = 0 = λ2 = λ3, λ4 = 4.

x + y + z + w = 0 (hyperplane in R4)

v1 = (1, 0, 0,−1)

v2 = (1, 0,−1, 0)

v3 = (1,−1, 0, 0)

av1 + bv2 + cv3 = (a + b + c,−c,−b,−a)⇒ clearly linearly independent.

x + y + z + w = 4x = 4y = 4z = 4w⇒ x = y = z = w.

v4 = (1, 1, 1, 1).

Exercise 74 Show that if v1, . . . , vk are linearly independent eigenvectorscorresponding to an eigenvalue λ and vk+1 is an eigenvector correspondingto λk+1 6= λ then {v1, . . . , vk+1} is linearly independent.

68

Change of basis matrix B =

1 1 1 10 0 −1 10 −1 0 1−1 0 0 1

.

Diagonal Form

A =

0 0 0 00 0 0 00 0 0 00 0 0 4

Exercise 75 Compute B−1 and verify that B−1AB = A.

Exercise 76 Diagonalize the matrix

A =

1 0 01 0 −2−1 1 −3

.

Unfortunately, not all matrices are diagonalizable. Equivalently, not alllinear transformations admit a basis of eigenvectors. A more complete courseon linear algebra takes up the alternatives that are available for simplifyingand analyzing the action of a linear map L when this happens. We endour discussion with two general conditions under which there is a basis ofeigenvectors. The second one is important.

Theorem 7 If L : Rn → R

n has n real and distinct eigenvalues, then thereis a basis of eigenvectors.

Theorem 8 If L : Rn → R

n corresponds to a symmetric matrix (AT = A)then not only is there a basis of eigenvectors, but there is an orthonormalbasis of eigenvectors. This means that the change of basis is essentially arigid motion.

Exercise 77 Find an orthonormal basis of eigenvectors for the linear mapin Example 29.

69

Lecture 37 Applications

1. Ordinary Differential Equations

One of the most fundamental equations in mathematics is

x′ = x. (20)

This is an equation for a function x = x(t), and the solution is x(t) = ket

where k is an arbitrary constant. (Check that it works.) In applications, oneoften finds systems of ordinary differential equations that look like

(x′

y′

)

=

(3 −21 0

)(xy

)

.

(Of course the matrix could be bigger and have different entries; X ′ = AX.)Writing this out, we have

{

x′ = 3x − 2y

y′ = x.

Notice that the equations are coupled, so it may be very complicated to finda solution. If the matrix A admits eigenvectors, however, we can use whatwe know about (20) to find solutions.

Theorem 9 If λ and v are an eigenvalue and corresponding eigenvector forA, then

X(t) = keλtv

is a solution.

Exercise 78 Prove Theorem 9.

In fact, if you can find a basis of eigenvectors, then you can find all thesolutions.

Theorem 10 If v1, . . . , vn are a basis of eigenvectors for the matrix A, thenevery solution X = X(t) of the system of ODEs

X ′ = AX

is given by X(t) =∑

kjeλjtvj for some choice of constants k1, . . . , kn.

70

Exercise 79 Solve the initial value problem X ′ = AX, X(0) = e1 if

(i) A =

(2 00 3

)

,

(ii) A =

(2 30 −4

)

,

(iii)

(1/√

2 −1√

2

1/√

2 1/√

2

)

.

Of course, things are more complicated if there fails to be a basis ofeigenvectors. You can learn about these cases in a course on differentialequations.

Before leaving this application, we point out that solving a system X ′ =AX is, in effect, integrating a vector field. The mapping L(X) = AX appear-ing on the right of the ODE X ′ = AX assigns to each point in R

n a vector.One is asked to find a curve X : R

1 → Rn whose tangent vector matches the

prescribed vector field.

-6 -4 -2 2 4 6

-6

-4

-2

2

4

6

II. Differentiating Transformations; Approximation

The derivative of a function f : Rn → R

m is the m × n matrix of partialderivatives of f .

Df =

∂f1

∂x1

· · · ∂f1

∂xn

...∂fm

∂x1

· · · ∂f1

∂xn

.

71

When evaluated at each point, Df(X0) corresponds to a linear transforma-tion L : R

n → Rm given by

L(X) = Df(X0)X.

Example 30 f(x, y) = (x2, xy, x3)

Df =

2x 0y x

3x2 0

Df

(11

)

=

2 01 13 0

L

(ξη

)

=

2 01 13 0

(ξη

)

=

2ξξ + η3ξ

.

For X near X0, one has the approximation formula

f(X) ∼ f(X0) + L(X − X0).

This means

limX→X0

‖f(X) − (f(X0) + L(X − X0))‖‖X − X0‖

= 0.

x2

xyx3

∼

111

+

2(x − 1)(x − 1) + (y − 1)

3(x − 1)

(1.1)2

(1.1)(.9)(1.1)3

∼

111

+

2(.1).1 − .13(.1)

=

1.21

1.3

.

The actual value is

(1.1)2

(1.1)(.9)(1.1)3

=

1.21.99

1.331

Example 31

72

Lecture 38 From Approximating Data to Ap-

proximating Solutions

Scientists and engineers often collect data. For example, measurements ofthe output of an experiment u1, u2, . . . , u` may be made at times t1, t2, . . . , t`.The graphs of such data often resemble one of the pictures in Figure 1.

1 2 3 4 5

10

20

30

40

50

Figure 1: Data from an experiment

Some general observations may be made about such graphs. One mightsay that the first collection of data looks superlinear and the second sublinear.Usually, however, it is desirable to make more precise statements concerning,at least, the supposed growth of a curve passing through (or near) the datapoints and representing some formula underlying the experiment.

Let us suppose that the underlying formula is a power law

f(t) = µtα, (21)

which is an often used hypothesis. Then if we graph the logarithms of thedata vj = log10 uj against the logarithms of the times, sj = logω uj, we have

vj = log10(µtαj ) = log10 µ + αsj. (22)

Thus, we expect the new data points to lie on a line.

73

-0.2 0.2 0.4 0.6

0.25

0.5

0.75

1

1.25

1.5

1.75

Figure 2:

Of course, owing to the facts that three points determine a line and thatthere is always some error in measurements, we do not expect all the pointsto lie precisely on any one line. But which is the best line that fits the data?

One answer to this question involves what is called the least squares fit.The least squares fit is the line determined by b and α with the property thatit minimizes

∑

j=1

|αsj + b − vj|2 (23)

among all possible choices of α and b. Once the best α and b are obtained,it is easy to work out the power law (α is the power and µ = 10b).

This problem can be approached directly, as a calculus problem. Sinceα should minimize

φ(α) =∑

j=1

|αsj + b − vj|2, (24)

no matter what b is, we should have φ′(α) = 0, or

∑

j=1

sj(αsj + b − vj) = 0. (25)

Similarly, differentiating with respect to b, we find that we need

∑

j=1

(αsj + b − vj) = 0. (26)

74

Upon rewriting, we have

(∑

j=1

s2j

)

α +

(∑

j=1

sj

)

b =∑

j=1

sjvj

(∑

j=1

sj

)

α + `b =∑

j=1

vj, (27)

which is a system of two linear equations in two unknowns. In particular,the whole thing reduces to a fairly simple linear algebra problem.

To better see what’s happening, however, and to develop a viewpointwhich applies to many other complicated problems, we will now describe adifferent approach.

We will start with something that looks a little tricky. Define a linearmap L : R

2 → R` by

L

(α

b

)

=

s1α + bs2α + b

...s`α + b

. (28)

The quantity we want to minimize then becomes

‖L(

α

b

)

− v‖2. (29)

Notice that this is a non-negative quantity, and finding a vector (α, b) forwhich this quantity is zero is equivalent to solving the equation L(α, b) = v.We know it is unlikely that this equation can be solved; we are looking at` equations in two unknowns, and solving the equation means finding a linepassing through all the points (s1, v1), (s2, v2), . . . , (s`, v`) after all. Theseconsiderations motivate the following definition.

Definition 7 If L : Rn → R

m and x0 ∈ Rn minimizes ‖L(x) − v‖2, then x0

is called a (least squares) approximate solution of L(x) = v.

Geometrically, the image of L is some subspace and we are trying to findthe point in the image closest to some given point, see Figure 3

75

v

L(x0) = v

L(Rn)

Figure 3: Image of L

Practically speaking, our approximation/minimization problem breaksdown into two pieces.

Geometric Problem: Find a point v ∈ Im(L) closest to v.Algebraic Problem: Solve L(x0) = v.

Intuition suggests that the first problem has a unique solution, and this turnsout to be the case. For the second problem, remember that L : R

2 → R` so

there’s most likely no solution—but remember, we have adjusted things sothat we know there is a solution of L(x0) = v. In such a case, it would benice to have a formula for it. We will tackle each problem separately in thenext lecture.

76

Data Plotted in Figure 1

.5 1.6 3 2.9

.8 3.5 15 4.82.3 17.6 18 5.23.5 32.6 21 5.45.1 57.6

Log Data Plotted in Figure 2

−.3 .2 .48 .46−.1 .54 1.18 .68.36 1.25 1.26 .72.54 1.51 1.32 .72.71 1.76

Lecture 39 The Closest Point Problem Part 1:

Gramm-Schmidt Orthonormalization

The Problem: Given L : Rn → R

m (linear) and v ∈ Rm, find the vector

v ∈ L(Rn) closest to v.Solution: Project v onto Im(L) = L(Rn).

77

v

L(x0) = v

L(Rn)

Figure 4:

i.e., find a vector v ∈ Im(L) and a vector w normal to Im(L) such thatv = v + w.

This suggests some questions:

1. What does normal to Im(L) mean?

2. Do vectors such as v and w exist?

3. If such vectors exist, are they unique?

4. Do they give us what we want, i.e., will v be the vector in Im(L) closestto v?

Here are some answers:

1. Normal to Im(L) means w · p = 0 for every p ∈ Im(L). But note that,unlike hyperplanes, subspaces do not usually have a unique normal.

78

v

Figure 5: A line in R3 has infinitely many different normal directions at each

point

2. Yes, such vectors do exist.

Let {v1, . . . , vk} be an orthonormal basis for Im(L). The vector v isthen given by

v =

k∑

j=1

(v · vj)vj.

This choice forces w = v − v, so we have v = v + w. Since it is clearthat v ∈ Im(L), because {v1, . . . , vk} is a basis for Im(L), we only needto check that w is orthogonal to Im(L).

Let p ∈ Im(L) and find some additional vectors vk+1, . . . , vm so that{v1, . . . , vk, vk+1, . . . , vm} is an orthonormal basis for R

m. Then

v =m∑

j=1

(v · vj)vj

and

p =

k∑

j=1

(p · vj)vj.

79

Using the expression above for v, we find

w · p = (v − v) · p

=

(m∑

j=1

(v · vj)vj −k∑

j=1

(v · vj)vj

)

·k∑

j=1

(p · vj)vj

=

(m∑

j=k+1

(v · vj)vj

)

·(

k∑

j=1

(p · vj)vj

)

= 0.

The discussion above provides us with the desired vectors v and w, butit raises two more important questions which we will answer later.

2a. Can we find an orthonormal basis for Im(A) and, if so, how?

2b. Can we complete an orthonormal basis for Im(A) to an orthonormalbasis for all of R

m and, if so, how?

Now, let us answer question 3.

3. Yes, they are unique. Here’s why. Say

v = v + w = v′ + w′

withv, v′ ∈ Im(L) and w, w′ ⊥ Im(L).

Then v − v′ + w − w′ = 0. This implies

0 = |v − v′ + w − w′|2

= |v − v′|2 + 2(v − v′) · (w − w′) + |w − w′|2= |v − v′|2 + |w − w′|2

since both v and v′ are orthogonal to both w and w′. Since the terms|v − v′|2 and |w − w′|2 are nonnegative, we must have |v − v′| = 0 =|w − w′|, i.e., v = v′ and w = w′.

80

4. Yes, v and w do the job. If p ∈ Im(L), then

|v − p|2 = |v − v + v − p|2= |w + v − p|2= |w|2 + |v − p|2 (since w ⊥ (v − p) ∈ Im(L))

= |v − v|2 + |v − p|2≥ |v − v|2,

and equality holds only if p = v.

Now let us return to questions 2a and 2b.

How to find an orthonormal basis

Let w1, . . . , wk be any basis for L(Rn) = Im(L). Set

v1 = w1/|w1|.

(Then v1 is a unit vector.) Set

v2 =w2 − (w2 · v1)v1

|w2 − (w2 · v1)v1|.

Note that v2 is also clearly a unit vector. Furthermore, we have

v1 · v2 =v1 · w2 − (w2 · v1)(v1 · v1)

|w2 − (w2 · v1)v1|= 0.

Therefore, {v1, v2} is an orthonormal set. It is also easy to check that〈v1, v2〉 = 〈w1, w2〉, i.e., the spans are the same. We can continue this proce-dure to exchange {w1, . . . , wk} for an orthonormal basis {v1, . . . , vk} for thesame vector space 〈w1, . . . , wk〉. The next vector is

v3 =w3 − (w3 · v1)v1 − (w3 · v2)v2

|w3 − (w3 · v1)v1 − (w3 · v2)v2|.

You should check that this choice makes {v1, v2, v3} an orthonormal set withthe same span as {w1, . . . , w3}. For the jth step, we choose

vj =wj −

∑j−1

i=1(wj · vi)vi

∣∣∣wj −

∑j−1

i=1(wj · vi)vi

∣∣∣

.

81

This procedure is called Gramm-Schmidt Orthonormalization. This answersquestion 2a. Let us now consider question 2b.

Let {v1, . . . , vk} be an orthonormal basis (ONB) for Im(L). We ask thequestion: Is there any vector in R

m that is not in 〈v1, . . . , vk〉? If not, thenwe are done. If there is such a vector w, then {v1, . . . , vk, w} spans a newspace. And you can check the assertion of the following.

Exercise 80 {v1, . . . , vk, w} is a basis.

Thus, using Gramm-Schmidt, we obtain an ONB {v1, . . . , vk+1}, and wecan repeat what we have done to complete the basis.

Now, in theory, we are in pretty good shape to solve the closest pointproblem. There is, however, one practical question remaining.

How do we find a basis for Im(L) to start with?Beyond that:Could we program a computer to solve the whole thing?We will take up these questions in the next lectures.

82

Glossary of Notation

Rn “n dimensional Euclidean space” (see Lecture 17)

e1, . . . , en “The standard basis vectors of Rn (see Lecture 19)

ı, , k “The standard basis vectors in R2 or R

3”

f : X → Y “f is a function from X to Y ”

(In this course X and Y will usually be subsets of some n-dimensionalEuclidean space.)

x 7→ y “x maps to y” (There must be a function somewhere for this to hap-pen.)

d(X, Y ) or dist(X, Y ) “The distance from X to Y ” (Lecture 18)

Notes on notation for vectors (1) There is no special notation forvectors. If we want to emphasize the properties of vectors as points we mightuse capital letters like X, Y , and P or lower case letters like p, r, or ` (aswe often do for parameterized curves) or we might use lower case letters likev, u, w, ej . . . if we want to stress vector properties. We will avoid the use ofsome small letters like x, y and t to denote vectors because we (very often)use them to denote scalars. Many books use the boldface versions of theseto denote vectors: x = (x1, . . . , xn), but we will avoid that for the most part.There is also the possibility of putting an arrow over the top to emphasizethat “this quantity is a vector:” ~x = (x1, . . . , xn). We won’t rely much onthis, because it makes typesetting more difficult, but it is a very nice anduseful practice for you to provide these little arrows in your hand writtennotes, at least while you’re getting used to vectors.

(2) [column vectors -vs- row vectors] There is another typesetting con-sideration. For most purposes (matrix multiplication for example), it is bestto use column vectors:

x =

x1

...xn

.

This is very inconvenient for typesetting. Consider the sentences in the lastparagraph where vectors are written as rows (x1, . . . , xn). There is a math-ematically rigorous solution for this notational dilemma. One observes that

83

the operation of transposing (“flipping along the diagonal” or exchangingrow and column indices) changes column vectors into row vectors. Thus, wecan write a column vector (x1, . . . , xn)T where the superscript T representsthe transpose. We will take a more casual approach and use column vectors(m × 1 matrices) and row vectors (1 × n matrices) interchangeably (unlesswe indicate otherwise—by using a T superscript, for example). Please bewarned that the same casual approach will not apply to other matrices.

End Notes

Affine. The word affine means “like” (n.b. an “afinity” is a “liking”). Theconnotation seems to be that one likes what is like one’s self. (“Birdsof a feather stick together.”) Thus, when you translate an object in R

n,you get a copy that is “like” the original.

Hyper-. The prefix hyper comes from the greek word �� which means overor past. It indicates (probably) that we are dealing with dimensionsthat are “beyond” our natural perception. Or perhaps, more vaguely,that the ambient space is one dimension more than (“over”) the object.

84

Appendix

Notes on Functions Math 1502

These notes give a more mature (i.e., abstract) approach to some things that

should already be basically known by the students.

Let X and Y be any sets.

Definition 8 (function) A function from X to Y is a rule or correspon-dence which assigns to each x ∈ X a unique y ∈ Y . (X is the domain of thefunction, Y is the target, and we write f : X → Y .)

Example 32 Take X = {A, B, C, D} and Y = {1, 2, 3, 4}. Here are threedifferent ways to specify functions f : X → Y .function notation (y = f(x))

f1(A) = 1, f1(B) = 1, f1(C) = 1, f1(D) = 1.

The function f1 is an example of a constant function, and we could also writesimply f1 ≡ 1. Here is a second function.

f2(A) = 1, f2(B) = 2, f2(C) = 3, f2(D) = 4.

specify by mapping symbol (7→)

f3 : A 7→ 2, B 7→ 2, C 7→ 3, D 7→ 1.

f4 : A 7→ 1, B 7→ 1, C 7→ 4, D 7→ 4.

specify by pairings

f5 : (A, 4), (B, 4), (C, 2), (D, 3).

Things to Notice:1. The rule or correspondence must apply to each element of the domain.

A 7→ 1, C 7→ 3, D 7→ 1

does not specify a function on X = {A, B, C, D}, because it does not specifywhat happens to B.

85

2. Not all target values must get hit.3. Target values may get hit more than once, n.b., the constant function.4. Domain values cannot get sent to two different places.

(A, 2), (B, 1), (C, 3), (A, 3), (D, 2).

does not specify a function; where does A get sent?

Activity With X = {A, B, C, D} and Y = {1, 2, 3, 4} as above, specifythree different functions g1, g2, g3 : Y → X using function notation, mappingsymbols, and ordered pairs. For each function, indicate the range.

1 The Graph of a Function

Definition 9 Given a function f : X → Y , the graph of f is the set

G = {(x, f(x)) ∈ X × Y : x ∈ X}.

Note 1 X × Y (read “X cross Y ”) is the collection of ordered pairs withfirst component from X and second component from Y .

Specifying a function by pairings is just giving the graph.Exercise Draw the graph of the function f5 specified above.

86

Calculus II Derivatives in Approximationpeople.math.gatech.edu/~rohrs/1502.pdf · I. Approximation by Polynomials (Power Series) II. Integration and the Fundamental Theorem of Calculus

Documents