ENG1091 - Lecture Notes 2011

MONASH UNIVERSITY —SCHOOL OF MATHEMATICAL SCIENCES

ENG1091 Vectors

Lecture 1 · vector arithmetic revision · dot product · cross productText Reference: §4.1 - 4.2

Vectors and Lines, a quick review.

Many quantities in nature are completely specified by one number (called the magnitude of the

quantity) and are usually referred to as scalar quantities. Some examples are temperature, time,

length, and mass.

However, certain quantities require both a magnitude and a direction to specify them. To

say that a boat sailed 10 kilometers (km) does not specify where it went. It is necessary to

give the direction too; perhaps it sailed 10 km northwest. We then describe the position of the

boat by giving its displacement relative to some point, a quantity that involves distance as

well as direction. Quantities that require both a magnitude and a direction to describe them are

called vectors. Other examples include velocity and force. Vector quantities will be denoted by

boldface type: u,v,w, and so on. In handwritten work vectors are denoted by v˜ or by −→v . Thevector that joins the two points A and B is denoted

−−→AB or by AB.

A vector v can be represented geometrically as a directed line segment or arrow. The magnitude

of a vector v will be denoted by ‖v‖ and is sometimes referred to as the length of v because itis represented by the length of the arrow.

Two vectors v and w are equal (written v = w)

if they have the same length and the same di-

rection. Thus, for example, the two vectors in

the diagram are equal even though the initial

and terminal points are different!

vw

There is one vector that has no direction whatsoever-the zero vector 0.

Given a vector v the vector that has the same

length as v but opposite in direction is the neg-

ative of vector v, denoted −v.v

− v

When we multiply a vector by a scalar we

multiply the length of the vector by the rel-

evant amount, without changing its direction

(unless the scalar is negative and then the di-

rection is opposite). Two vectors are parallel if

one is a scalar multiple of the other.

That is, if a = λb then a is parallel to b.

−½vv2v

ENG1091 Mathematics for Engineering page 1

If u and v are two vectors we define their sum u+ v by adding the vectors ‘head to tail’which

is to say we attach the tail of the second vector,v, to the head of the first u, the sum u + v is

then the vector drawn from the tail of first vector to the head of the last.

This method allows also us to add several vec-

tors at once. a + b + cc

b

a

Should it happen that vectors add together forming a loop, so that the end point is the same as

the initial point, then the vector sum is 0. Thus for example if A,B,C are any three points in

space−−→AB +

−−→BC +

−→CA = 0.

We can also add two vectors u and v geometrically by drawing them from the same point and

completing a parallelogram with the two vectors as adjacent sides. The diagonal vector drawn

from the common tail to the common head point is then the vector u+ v.

From the parallelogram method of vector addition we see that u+ v = v + u.

The opposing diagonal, drawn towards v, is the vector v − u.

The unit coordinate vectors.

Vectors of length one unit are called unit vec-

tors. The unit vectors parallel to the positive x

and y axis in the plane are labelled i and j.

In three dimensional space we add a further

unit vector, k , parallel to the z axis.

Any vector r in space can be written as a com-

bination of multiples of i, j and k. The coeffi -

cients of i, j and k are called its rectangular

components.

r = (x, y, z) = xi+ yj+ zk

i

k

j

x

z

y

The magnitudes of vectors given in component form.

Using Pythagoras’theorem it is an easy matter to find the lengths of vectors:

In three dimensions where we have v = ai+ bj+ ck, then ‖v‖ = ‖ai+ bj+ ck‖ =√a2 + b2 + c2.

Example: ‖i− j+ k‖ =√

(1)2 + (−1)2 + (1)2 =√

3

In two dimensions the length of v = ai+ bj is given by |v| = ‖ai+ bj‖ =√a2 + b2.


The Scalar or “Dot”Product

In the previous section we saw how vectors can be added/subtracted together, and we saw how

to multiply them by scalars. The question naturally arises: is it possible to multiply two vectors

together?

There are two types of vector multiplication that are generally useful-the scalar or dot product

and the vector or cross product. Now for a word of warning. Many of the rules we take for

granted in ordinary arithmetic don’t hold when it comes to vector multiplication. When we look

at the vector cross product later this lecture we will see that a×b 6= b×a.We will also see thatthere is no such thing as vector division-vectors don’t have reciprocals! Of course we don’t just

multiply vectors for fun-we do it because it has useful applications.

First, consider the scalar product. One modern use of the scalar product is the projection of

a 3D image on a 2D screen and to do it in such a way as to convince the viewer that he/she is

looking at a 3D image.

Given two vectors a and b then we define their scalar or ‘dot’product as

a · b = (‖a‖ ‖b‖ cos θ)

where θ is the angle between the two vectors.

Note that a · b is a scalar quantity-it is not a vector.

Historically the reason that the scalar product was studied is that in physics the work done by

a force F in moving an object a displacement d is the dot product of force with displacement,

i.e. W = F · d.

From the definition we immediately get the following:

(i) a · a = ‖a‖2 (because the angle between a vector a and itself is 0.)

(ii) If a ⊥ b then a · b = 0

The dot products of the unit vectors i, j and k.

Given the definition above we see thati · j = j · k = k · i = 0

and i · i = j · j = k · k = 1

Properties of the Dot Product

(i) a · b = b · a the dot product is commutative

(ii) λa · b = a · λb =λ (a · b) , for any scalar λ

(iii) a · (b+ c) = a · b+ a · c the dot product is distributive

Notice that the expression a · (b · c) has absolutely no meaning because it is attempting toform a dot product of vector a with the scalar b · c.

The expression a (b · c) has a meaning though it is better written as (b · c)a. The expression(b · c)a means to multiply vector a by the scalar b · c, resulting in a vector having the same oropposite direction as a and of length: = |b · c| ‖a‖ .


Notice how we can use the distributive law to simplify the dot product of two vectors given in

component form: Let a = a1i+ a2j+ a3k, and b = b1i+ b2j+ b3k then

a · b = (a1i+ a2j+ a3k) · (b1i+ b2j+ b3k)

= a1i · (b1i+ b2j+ b3k) + a2j · (b1i+ b2j+ b3k) + a3k · (b1i+ b2j+ b3k)

= a1i · b1i+ a2j · b2j+ a3k · b3k (since i · j = j · k = i · k = 0)

= a1b1 + a2b2 + a3b3 (since i · i = j · j = k · k = 1)

This gives a computational formula for evaluating a · b = a1b1 + a2b2 + a3b3

Example: This next example should convince you that there is no such thing as being able to

‘cancel’out common vectors from a dot product.

Let a = 2i− j+ 4k, b = −i+ 2k, and c = 3i. Show a · b = a · c. Comment.

a · b = (2) (−1) + (−1) (0) + (4) (2) = 6 and a · c = (2) (3) + (−1) (0) + (4) (0) = 6.

Observe that b 6= c.

We conclude it is not possible to cancel out vectors (even non-zero vectors) from a dot product

like we can in ordinary arithmetic.

As a geometrical application we use the dot product to find the angle between two vectors:

cos θ =a · b‖a‖ ‖b‖ .

Example: Find the angle between the main diagonal of a cube and the diagonal of a face which

it meets:

This angle will be the same regardless of the size of the cube so lets assume the cube has a side

length equal to 1.

Then the face diagonal a is i+ k and the main

diagonal b is i+ j+ k.

Now a · b = (1) (1) + (0) (1) + (1) (1) = 2 and

‖a‖ =√

(1)2 + (1)2 =√

2 and

‖b‖ =√

(1)2 + (1)2 + (1)2 =√

3 giving

cos θ =2√2√

3from which θ = 35.26

θ

The dot product provides a very easy way of telling when two vectors are perpendicular.

If a · b = 0 then θ = 90o and we write a ⊥ b.

Example: Show that the points P (2, 1,−3) , Q (4, 2,−5) and R (3, 3,−1) are the vertices of a

right angled triangle.−−→PQ =

−−→OQ−−−→OP = (4i+ 2j− 5k)− (2i+ j− 3k) = 2i+ j− 2k

−→PR =

−−→OR−−−→OP = (3i+ 3j− k)− (2i+ j− 3k) = i+ 2j+ 2k

−−→QR =

−−→OR−−−→OQ = (3i+ 3j− k)− (4i+ 2j− 5k) = −i+ j+ 4k,

and from these it is clear that−−→PQ · −→PR = 0 so we conclude the triangle is right angled at P.


The Vector or “Cross”Product

This is a way of ‘multiplying’ two vectors to-

gether which results in a vector. Given two

vectors a = a1i+ a2j+ a3k, and

b = b1i+ b2j+ b3k then we define their ‘vector’

or ‘cross’product as

a× b = (‖a‖ ‖b‖ sin θ)n

where θ is the angle between the two vectors,

and n is the unit vector perpendicular to both

a and b, in a right-hand rule direction:

Note: (i) a× b = −b× a(ii) If θ = 0o then a× b = 0

(iii) If θ = 90o then ‖a× b‖ = ‖a‖ ‖b‖

The cross products of the unit coordinate vectors i, j and k.

Given the definition above we see that

i× j = k

j× k = i

k× i = j

and i× i = j× j = k× k = 0

Properties of the Cross Product

(i) a× b = − (b× a) cross product is anti-commutative

(ii) λa× b = a× λb =λ (a× b) , for any scalar λ

(iii) a× (b+ c) = a× b+ a× c cross product is distributive

(iv) a× (b× c) 6= (a× b)× c (in general) non-associativity of the cross product

So if a = a1i+ a2j+ a3k, and b = b1i+ b2j+ b3k then

a× b = (a1i+ a2j+ a3k)× (b1i+ b2j+ b3k)

= (a1b1) i× i+ (a1b2) i× j+ (a1b3) i× k

+ (a2b1) j× i+ (a2b2) j× j+ (a2b3) j× k

+ (a3b1)k× i+ (a3b2)k× j+ (a3b3)k× k


continuing:

a× b = (a2b3 − a3b2) i− (a1b3 − a3b1) j+ (a1b2 − a2b1)k

=

∣∣∣∣∣ a2 a3

b2 b3

∣∣∣∣∣ i−∣∣∣∣∣ a1 a3

b1 b3

∣∣∣∣∣ j+∣∣∣∣∣ a1 a2

b1 b2

∣∣∣∣∣knote the ‘− ’in the j term

=

∣∣∣∣∣∣∣∣i j k

a1 a2 a3

b1 b2 b3

∣∣∣∣∣∣∣∣A geometrical application of the cross-product:

Two vectors a and b, if drawn from the same point, define a parallelogram:

Now we can determine the area of the parallelogram by breaking it up into two identical triangles.

Area = 2× 12base× perpendicular height

A = ‖a‖ × ‖b‖ sin θ

= ‖a× b‖

Examples

(a) Let P,Q,R be the points P (2, 1,−3) , Q (3, 4, 7) and R (1,−2, 3). Find the area of the

parallelogram which has PQ and PR as adjacent sides.−−→PQ =

−−→OQ−−−→OP = (3i+ 4j+ 7k)− (2i+ j− 3k) = i+ 3j+ 10k

−→PR =

−−→OR−−−→OP = (i− 2j+ 3k)− (2i+ j− 3k) = −i− 3j+ 6k

So−−→PQ×−→PR =

∣∣∣∣∣∣∣∣i j k

1 3 10

−1 −3 6

∣∣∣∣∣∣∣∣ = i

∣∣∣∣∣ 3 10

−3 6

∣∣∣∣∣− j∣∣∣∣∣ 1 10

−1 6

∣∣∣∣∣+ k

∣∣∣∣∣ 1 3

−1 −3

∣∣∣∣∣= ((3) (6)− (−3) (10)) i− ((1) (6)− (−1) (10)) j+ ((1) (−3)− (−1) (3))k

= 48i− 16j

Hence Area =∥∥∥−−→PQ×−→PR∥∥∥ =

√(48)2 + (−16)2 = 16

√32 + 1 = 16

√10.

(b) Find area 4QPR = 12

∥∥∥−−→PQ×−→PR∥∥∥ = 8√

10.



ENG1091 Vectors

Lecture 2 lines in 3DText Reference: §4.3.1

1. Revision of straight lines in two dimensional space

We are all quite familiar with the two-dimensional representation of a line as y = mx+ b, (called

its Cartesian equation) where m is the slope and b is the y-intercept. Students should also be

familiar with the point-slope equation of a straight line:

(y − y0) = m(x− x0) (1)

Given any two points (x1, y1) and (x2, y2) in the x-y plane, we can readily get the equation of

the line passing through these two points by finding the slope m = (y1−y2)(x1−x2) , and using this value

in the equation (1) above. The basic equation of a straight line is unique up to a scalar factor,

regardless of which point is chosen as (x0, y0) .

It would be natural to try to extend the equation of line in 2D space to 3D space. Perhaps one

might consider z = m1x + m2y + b. Unfortunately, this does not work, indeed, we will see in a

future lecture that this is actually the Cartesian equation of a plane in three-dimensional space.

2. Equations of straight lines in three dimensional space

In three-dimensional space, the concept of a slope is not so easily defined. Instead of slope, a

straight line will have an orientation associated with it that can be represented as a vector. The

line is then fully defined by a point on the line, say A, and an orientation vector, say v. Note

that the magnitude of the orientation vector doesn’t actually matter, as long as we travel in the

right direction, we should stay on the line. Working in Cartesian coordinates, we can define the

point A = (a, b, c), by its position vector and v as a vector with components (p, q, r). Then the

position vector−−→OP of any point P on the line is given by

−−→OP =

−→OA +

−→AP where

−→AP = tv for

some scalar t. We can define the equation of a line r(t) as:

r(t) =−−→OP =

−→OA+ tv,

i.e. (x, y, z) = (a, b, c) + t(p, q, r)

This is the vector equation of a line. The variable t, which can take on any real value, is known

as the parametric variable. Breaking this equation up into the three components we obtain the

parametric equations of a straight line:

x(t) = a+ pt,

y(t) = b+ qt,

z(t) = c+ rt.


(Note: students may have actually been introduced to parametric variables when learning

trigonometry. A circle of radius a centred at the origin can be represented by the paramet-

ric equations x = a cos(t) and y = a sin(t), where t can represent the angle from the x-axis.)

If we are given two-points, say A (x1, y1, z1) and B (x2, y2, z2), then the line between these two

points can be readily found by defining the orientation (direction) vector as the vector from A

to B.

Example1: Define the (vector and parametric) equation of the line between the points A (2, 3, 4)

and B (1, 1, 1) .

−−→AB = (i+ j+ k)− (2i+ 3j+ 4k) = −i− 2j− 3k = v

Equation of line:−−→OP =

−→OA+ tv = (2i+ 3j+ 4k) + t (−i− 2j− 3k)

= (2− t) i+ (3− 2t) j+ (4− 3t)k

Parametrically:

x(t) = 2− t,

y(t) = 3− 2t,

z(t) = 4− 3t.

Notice how the parametric variable works. If t = 0, we are at one point, A (2, 3, 4) , and if t = 1

we are at the other point, B (1, 1, 1) .

Example 2: From the previous example, find the value of t that defines the point (0,−1,−2) .

Again, any value of t defines some point on the line. The value t = 1/2 defines the mid-point of

AB.

Equate x values: solve 2− t = 0 from which t = 2. If this value of t gives matching y and z values

we know the point (0,−1,−2) is on the line. Otherwise the point lies off the line.

With t = 2, y(2) = 3 − 4 = −1, and z(2) = 4 − 6 = −2. Therefore we conclude the point

(0,−1,−2) is on the line.

With t = 1/2, x(12) = 2 − 1

2 = 32 , y(1

2) = 3 − 1 = 2, and z(12) = 4 − 3

2 = 52 ; so

(32 , 2,

52

)is the

midpoint of AB.

Also note, however, that the equation of a line is not unique. The line between the points (2, 3, 4)

and (0,−1,−2) is equivalent to the equation found in the first example, but the equation looks

different: v = ((−j− 2k)− (2i+ 3j+ 4k)) = −2i− 4j− 6k

−−→OP =

−→OA+tv = (2i+ 3j+ 4k)+t ((−j− 2k)− (2i+ 3j+ 4k)) = (2− 2t) i+(3− 4t) j+(4− 6t)k

So

x(t) = 2− 2t,

y(t) = 3− 4t,

z(t) = 4− 6t.

The equation looks different but is it really?


Finally note that the equation of a line can be manipulated to eliminate the parametric variable,

t. In this form the equation of the line is:

x− ap

=y − bq

=z − cr

This is sometimes called the algebraic equation of a straight line. Students should note that

given this form of the equation of a line, we can immediately read off a point on the line and its

orientation vector.

Example 3: Given the relationx+ 2

1=

y

−2=z − 3

2

find any two points on the line.

Solution: By examining the general from in the previous equation we see x = −2, y = 0, z = 3

is one such point (equate each numerator to zero).

Now of course the choice of zero is completely arbitrary; we can of course equate each fraction

to 1 (or any real number)

we do this: x+ 2

1= 1 giving x = −1

y

−2= 1 giving y = −2

z − 3

2= 1 giving z = 5

Thus the point (−1,−2, 5) is also on the line.

Importantly, a direction vector for the line can also be read off namely: v = i− 2j+ 2k. This

choice of v is unique up to scalar multiplication, (i.e. the only other direction vectors for this

line are non-zero scalar multiples of i− 2j+ 2k).

We have a problem if the orientation vector is parallel to any of the axes. In such a case p, q

or r would be equal to zero. For that reason it is best to initially work with the parametric

representation and then find the algebraic form.

After understanding the basic principles of lines, more sophisticated problems can be attempted.

Example 4: Find the minimum distance between the point B (1, 2, 3) and the line defined by

x+ 2

1=

y

−2=z − 3

2

Which point on the line is closest to the point B (1, 2, 3)?

Solution: A point on the line is A (−2, 0, 3) and a direction vector for the line is v = i−2j+2k.

The point B (1, 2, 3) is not on the line. (Check this.)

The shortest distance between the point B and the line is

d =∥∥∥−−→AB∥∥∥ sin θ

=

∥∥∥−−→AB × v∥∥∥‖v‖ ; (draw a diagram)


Now−−→AB = (i+ 2j+ 3k)− (−2i+ 3k) = 3i+ 2j and

∥∥∥−−→AB × v∥∥∥ =

∣∣∣∣∣∣∣∣i j k

3 2 0

1 −2 2

∣∣∣∣∣∣∣∣ = 4i− 6j− 8k.

The shortest distance is thusd =

‖4i− 6j− 8k‖‖i− 2j+ 2k‖

=

√16 + 36 + 64

3

=2√

29

3.

The closest point to B

Solution

Converting the equation of the line into parametric form we have:

x = −2 + t y = −2t z = 3 + 2t

So a general point on the line is P (−2 + t,−2t, 3 + 2t)

Hence−−→BP =

−−→OP −−−→OB = (−3 + t) i+ (−2− 2t) j+ 2tk

[Key step!!] The closest point P on the line must satisfy−−→BP · v = 0.

Now

−−→BP · v = (−3 + t) (1) + (−2− 2t) (−2) + 2t (2)

= 1 + 9t

= 0 when t = −19 .

Hence the closest point is P (−2 + t,−2t, 3 + 2t) when t = −19

The closest point is(−21

9 ,29 , 3−

29

)=(−19

9 ,29 ,

259

)



ENG1091 Vectors

Lecture 3 planes in 3DText Reference: §4.3.2

1. Planes in three-dimensional space

When defining a straight line in three-dimensional space, we needed a point on the line and an

orientation vector.

To define a plane in three-dimensional space, we need a point in the plane and a normal vector,

to the plane. Here n is the normal to the plane. For our immediate purpose, the magnitude of

n is not important, only its direction.

So let’s assume that we have some point on the plane which we label A (a, b, c) and we have a

normal vector n = pi+ qj+ rk. We take a general point on the plane P (x, y, z) . Now the vector−→AP lies in the plane and hence is normal to n.

Thus−→AP · n = 0.

This equation is the Cartesian equation of the plane

Explicitly this becomes (x− a) p+(y − b) q+(z − c) r = 0, which can be simplified to the general

form:

Ax+By + Cz = D.

Example 1: Find the equation of the plane that contains the point (2, 2, 3) and is normal to

the vector 〈−1, 1, 2〉 .

Solution:−→AP = 〈x, y, z〉 − 〈2, 2, 3〉 = 〈x− 2, y − 2, z − 3〉

−→AP · n = 〈x− 2, y − 2, z − 3〉 · 〈−1, 1, 2〉 = −x− 6 + y + 2z

Hence the equation of the plane is −x− 6 + y + 2z = 0 or −x+ y + 2z = 6

Example 2: Find the equation of the plane going through the points (−1, 0, 4) , (2, 5, 0) ,

(2, 2,−1) .

Solution: Label the points A (−1, 0, 4) , B (2, 5, 0) ,and C (2, 2,−1) . A normal vector n is given

by n =−−→AB ×−→AC.

Now−−→AB = 〈2, 5, 0〉 − 〈−1, 0, 4〉 = 〈3, 5,−4〉 and −→AC = 〈2, 2,−1〉 − 〈−1, 0, 4〉 = 〈3, 2,−5〉 .

Thus n =

∣∣∣∣∣∣∣∣i j k

3 5 −4

3 2 −5

∣∣∣∣∣∣∣∣ = i

∣∣∣∣∣ 5 −4

2 −5

∣∣∣∣∣− j∣∣∣∣∣ 3 −4

3 −5

∣∣∣∣∣+ k

∣∣∣∣∣ 3 5

3 2

∣∣∣∣∣ = −17i+ 3j− 9k.

Now−→AP = 〈x, y, z〉 − 〈−1, 0, 4〉 = 〈x+ 1, y, z − 4〉 and

−→AP · n = 〈x+ 1, y, z − 4〉 · 〈−17, 3,−9〉 = 0

that is −17x− 17 + 3y − 9z + 36 = 0

giving the equation of the plane as 17x− 3y + 9z = 19.


We should check that all three points satisfy the plane’s equation:

A (−1, 0, 4) : 17x− 3y + 9z = −17 + 36 = 19 X

B (2, 5, 0) : 17x− 3y + 9z = 34− 15 = 19 X

C (2, 2,−1) : 17x− 3y + 9z = 34− 6− 9 = 19. X

There are two observations that can be made. Firstly, the equation of a plane in three-dimensional

space is unique (up to multiplication by a scalar constant). Secondly, parallel planes have the

same normal vector and hence will only differ by the constant D.

Example 3: Find the minimum distance between the parallel planes 2x + 3y − z = 6 and

2x+ 3y − z = 0.

Let P (x1, y1, z1) be any point in the plane 2x+ 3y − z = 6 and

Q (x2, y2, z2) be any point in the plane 2x+ 3y − z = 0. [Notice that the equations of the planes

are arranged so that they have identical coeffi cients. Rearrange the equations if necessary-this

is important for what comes next.]

The distance between two parallel planes with normal n is then (diagram)

d =∥∥∥−−→PQ∥∥∥ cos θ =

−−→PQ · n‖n‖

=

(−−→OQ−−−→OP

)· n

‖n‖

=

−−→OQ · n−−−→OP · n

‖n‖

however−−→OQ · n = 2x2 + 3y2 − z2 = 0 and similarly

−−→OP · n = 2x1 + 3y1 − z1 = 6.

Thus (and taking absolute value since we seek a distance):

d =

∣∣∣∣∣∣ 0− 6√22 + 32 + (−1)2

∣∣∣∣∣∣ =6√14

2. Lines and Planes

Combining the knowledge of lines, planes and basic vector operations allows for a wide range of

problems to be addressed in three-dimensional space. For example, we can find:

· the minimum distance from a point to a plane,

· the minimum distance from a point to line,

· the angle between two intersecting planes,

· the minimum distance between two non-intersecting lines.

Example 4: Find the line defined by the intersection of the planes −x+y+z = 2 and x+2y = 4

and the angle of intersection.

Solution: A direction vector of the line of intersection is easily found: it is normal to both

−i+ j+ k and i+ 2j and hence could be obtained using the cross product. To find the equation


of the line of intersection is best done using Gauss elimination (next lecture).

A direction vector is

∣∣∣∣∣∣∣∣i j k

−1 1 1

1 2 0

∣∣∣∣∣∣∣∣ = −2i + j − 3k. (Of course any non-zero scalar multiple of

this is also a direction vector.)

The angle between two planes is defined as being the angle between its normals (diagram).

(−i+ j+ k) · (i+ 2j) = −1 + 2 = 1

‖(−i+ j+ k)‖ =√

(−1)2 + 12 + 12 =√

3 and ‖(i+ 2j)‖ =√

5

The angle θ between the planes is then given by cos θ = 1√3√

5, hence θ = 75.04 .

3. Parametric representation of a plane

Recall that straight lines have parametric equations giving x, y, z as function of one parametric

variable (usually t). Planes have parametric equations where x, y, z are given as functions of two

parametric variables (usually u and v).

Suppose we know a point P0 (a, b, c) in the plane and two non-parallel direction vectors

w1 = pi+ qj+ rk, and w2 = li+mj+ nk also in the plane: (diagram):

w1

r(x,y,z)vw2

w2

O

uw1P0

Let r = xi + yj + zk denote the position vector of an general point P (x, y, z) in the plane, so

that r =−−→OP0 + uw1 + vw2 where u, v are any scalars (parameters).

This gives r (x, y, z) = 〈a, b, c〉+ u 〈p, q, r〉+ v 〈l,m, n〉 and hence

x (u, v) = a+ pu+ lv,

y (u, v) = b+ qu+mv,

z (u, v) = c+ ru+ nv.

Theses 3 equations are the parametric equations of a plane. The fact that two parameters (u

and v) are needed to describe it indicates that a plane is a 2 dimensional surface.

In more advanced mathematics (i.e. 2nd level maths), it will be imperative to represent surfaces

parametrically.


Example 5: Find a parametric representation of the plane going through the points (−1, 0, 4) , (2, 5, 0)

and (2, 2,−1) .

Solution: label the points P (−1, 0, 4) , Q (2, 5, 0) and R (2, 2,−1) .

Now a choice for w1 is−−→PQ = 〈2, 5, 0〉 − 〈−1, 0, 4〉 = 〈3, 5,−4〉

and a choice for w2 is−→PR = 〈2, 2,−1〉 − 〈−1, 0, 4〉 = 〈3, 2,−5〉 .

Check that these are non-parallel X. (Otherwise the three points are collinear and the ques-tion cannot be answered properly-there will be an infinite number of planes.)

In vector form the parametric equations are

r = (−1, 0, 4) + u (3, 5,−4) + v (3, 2,−5)

= (−1 + 3u+ 3v, 5u+ 2v, 4− 4u− 5v)

Hence

x (u, v) = −1 + 3u+ 3v,

y (u, v) = 5u+ 2v,

z (u, v) = 4− 4u− 5v.



ENG1091 Systems of Linear Equations

Lecture 4 · echelon form · Gauss eliminationText Reference: §5.5

Our object in this lecture is to solve a system of equations like

2x+ y + z + w = 4

4x+ y + 3z + 2w = 7

−2x+ z − w = 9.

Such a system is called linear because each of the equations on the left hand side is a linear

function of the unknown variables x, y, z and w. Simple linear systems of 2 or 3 variables

are commonly encountered in secondary school and is instructive to view an example before

discussing a more general procedure.

Suppose we wish to solve a system like

x+ 2y = 3 (1)

2x− 3y = −8 (2)

One way to proceed is to multiply equation 1 by 2 and subtract this from equation 2:

x+ 2y = 3 (1)

−7y = −14 (2(a))

The reason why this is effective is that one of the variables is eliminated. Equation (2a) is now

easily solved giving y = 2, and substituting this into equation 1 we find x = −1. Geometrically,

the equations x+ 2y = 3 and 2x− 3y = −8 represent two straight lines in the x− y plane whichintersect at the point (−1, 2).

The important point is that both of the systemsx+ 2y = 3

2x− 3y = −8and

x+ 2y = 3

−7y = −14have identical

solutions. Think about the operations we could perform on the two original equations.

We could

• interchange the two equations

• multiply either equation by any number we choose except zero, and

• add a multiple of one equation to the other.

Now performing any of these operations without thinking is not guaranteed to be effective but

at least we are assured that the resulting system of equations has an identical set of solutions.

Notice that the names of the variables is irrelevant: solvingx+ 2y = 3

2x− 3y = −8is exactly the same

as solving the systemu+ 2v = 3

2u− 3v = −8, only the coeffi cients are important.

1. The first step in solving a linear system is to write the system in augmented matrix form.

This is a way of writing the system using only the coeffi cients.


For example we write the system

2x+ y + z + w − 4 = 0

4x+ y + 3z + 2w = 7

−2x+ z − w = 9

as

2 1 1 1

4 1 3 2

−2 0 1 −1

∣∣∣∣∣∣∣∣4

7

9

.Notice each equation is written as a single row and that coeffi cients belonging to the same variable

are written directly underneath each other. (Equation 3, which appears to have no y, has in fact

a y−coeffi cient of zero.) Each constant term must be placed on the right hand side of the ‘equals’sign (the ‘−4’becomes +4 on the right hand side of equation 1) and the vertical partition is

used to separate the left hand side from the right hand side. (Think of it as replacing all of the

equals’signs.)

Example: Write the system

r + s + 2t = 0

2r − 3t = 1

6s − 5t = 0

in augmented matrix form.

Solution:

1 1 2

2 0 −3

0 6 −5

∣∣∣∣∣∣∣∣0

1

0

.2. Gaussian elimination

Gaussian elimination is a systematic method of solving linear equations by first reducing the

corresponding system into an equivalent system, called row echelon form, where the unknowns

can be calculated by back substitution.

Example: Given the system

r + s + 2t = 0

s − 3t = 1

− 5t = 5

find solutions to each of the variables

using back substitution.

Solution: t = 5−5 = −1 s = 1 + 3t

= −2

r = −2t− s= 2 + 2

= 4The system of equations in the last example has the augmented matrix

1 1 2

0 1 −3

0 0 −5

∣∣∣∣∣∣∣∣0

1

5

andwhich is one that is already in row echelon form. We saw how easy it is to find solutions of

systems in this form.

Definition: A matrix is in row echelon form when

• the leading (non-zero) coeffi cient of each row (called the pivot entry) has zeros below

it, and

• the pivot entries of following rows are located in columns further to the right.

• any rows which have no pivot (and therefore consist entirely of zeros) must come last.


Example: Given the following partitioned matrices, choose those which are in row echelon form:

A.

1 1 2

1 1 13

0 0 1

∣∣∣∣∣∣∣∣0

1

5

B.

1 0 2 0

0 1 −3 0

0 0 0 1

∣∣∣∣∣∣∣∣2

1

10

C.

1 0 0

0 1 0

0 0 1

∣∣∣∣∣∣∣∣0

1

5

no yes yes

D.

2 1 2

0 3 −3

0 0 2

∣∣∣∣∣∣∣∣0

6

5

E.

1 1 2

0 3 13

0 0 0

∣∣∣∣∣∣∣∣0

1

5

F.

1 0 0 0

0 1 1 0

1 0 0 1

∣∣∣∣∣∣∣∣0

0

0

yes yes no

G.

1 2 0 1 −3 1

0 0 0 1 2 −3

0 0 0 0 0 1

∣∣∣∣∣∣∣∣0

1

5

yes

To obtain the equivalent row echelon form of a system we apply a sequence of the three elementary

row operations on the augmented matrix. As discussed above these row operations do not change

the solution set of the corresponding system of linear equations.

The three elementary row operations are:

• Interchanging two rows

• Multiplying a row by a non-zero scalar

• Adding to one row a multiple of another

2. Row echelon forms

To reduce a matrix row echelon form systematically we follow these steps:

1. Locate the left-most column that doesn’t consist entirely of zeros.

2. Ensure that the top entry of this column is a non-zero entry. If necessary, interchange top

row with another row to achieve this.

3. Multiply this top row by the appropriate constant so that the first non-zero entry of this

row is 1. This entry is the pivot for that column. (It is not absolutely necessary that the

value of each pivot be 1 but this is certainly the most convenient value to have. As an

alternative to multiplying each row by a constant we can add/subtract multiples of other

rows to obtain a 1.)

4. Add a suitable multiple of this first row to each row below, so that all entries below this

pivot are 0.

5. Consider the submatrix obtained by removing the top row, and apply to this matrix steps

1 to 4.


Repeat steps 1-5 until the next submatrix under consideration has no rows left.

Example: Reduce the following matrix to row echelon form:

0 0 −2 0 12

3 6 −15 9 42

2 4 −5 6 −1

Solution:

0 0 −2 0 12

3 6 −15 9 42

2 4 −5 6 −1

(

13

)R2 → R2

R1 ↔ R2

1 2 −5 3 14

0 0 −2 0 12

2 4 −5 6 −1

R3 − 2R1 → R3

1 2 −5 3 14

0 0 −2 0 12

0 0 5 0 −29

(−1

2

)R2 → R2

1 2 −5 3 14

0 0 1 0 −6

0 0 5 0 −29

R3 − 5R2 → R3

1 2 −5 3 14

0 0 1 0 −6

0 0 0 0 1

← row echelon form

Exercise: Find a row echelon form of the matrix

1 0 −1 0

2 1 0 8

0 1 −2 0

1 −1 −2 −6

Solution:

1 0 −1 0

2 1 0 8

0 1 −2 0

1 −1 −2 −6

R2 − 2R1 → R2

R4− R1 → R4

1 0 −1 0

0 1 2 8

0 1 −2 0

0 −1 −1 −6

R3− R2 → R3

R4+ R3 → R4

1 0 −1 0

0 1 2 8

0 0 −4 −8

0 0 1 2

−14R3 →R3

1 0 −1 0

0 1 2 8

0 0 1 2

0 0 1 2

R4−R3 →R4

1 0 −1 0

0 1 2 8

0 0 1 2

0 0 0 0

← row echelon form


3. Solving a system using Gaussian elimination: To solve the system

x + 3y + 2z = 1

2x + 7y + 3z = 2

−3x − 10y − 6z = −5

1. we write the augmented matrix:

1 3 2

2 7 3

−3 −10 −6

∣∣∣∣∣∣∣∣1

2

−5

2. by performing appropriate row operations we find an equivalent row echelon form:

R2 − 2R1 → R2

R3 + 3R1 → R3

1 3 2

0 1 −1

0 −1 0

∣∣∣∣∣∣∣∣1

0

−2

R3+R2 →R3

1 3 2

0 1 −1

0 0 −1

∣∣∣∣∣∣∣∣1

0

−2

(−1)R3 →R3

1 3 2

0 1 −1

0 0 1

∣∣∣∣∣∣∣∣1

0

2

3. Use back substitution to find the values of the unknowns, in this case:

z = 2, y = z = 2 and x = 1− 2z − 3y = 1− 4− 6 = −9

So the three planes intersect in a single point: x = −9, y = 2, z = 2.

Note: The pivot in a column does not need to be equal to 1 − any non-zero number would do.

Exercise:

Solve the systems:

(a) −2a − 2b + 3c = 1

2a − 2b + c = 1

a + b − c = −3

ANS: Solution is: a = −52 , b = −11

2 , c = −5

(b) r + s + 2t = 0

2r + 4s − 3t = 1

3r + 6s − 5t = 0

ANS: Solution is: r = −17, s = 11, t = 3

Example: Find a vector equation for the line which forms the solution set of x+ y − z = 3

2x+ y + 2z = 1

(You will recall an example similar to this at the end of lecture 3.)

Writing the augmented matrix of this system and taking the system to row echelon form:[1 1 −1

2 1 2

∣∣∣∣∣ 3

1

]R2 − 2R1 →R2

[1 1 −1

0 −1 4

∣∣∣∣∣ 3

−5

](−1)R2 →R2

[1 1 −1

0 1 −4

∣∣∣∣∣ 3

5

]Here is a system of equations with an infinite solution set.


Notice that the pivot entries correspond to variables x and y.[1 1 −1

0 1 −4

∣∣∣∣∣ 3

5

]

The non-pivot variable, z, is said to be free and is set equal to a parameter t.

Let z = t

y − 4z = 5 hence y = 5 + 4z = 5 + 4t

x+ y − z = 3 hence x = 3 + z − y = −2− 3t

....parametric form

The solution can be written in vector form as

(x, y, z) = (−2− 3t, 5 + 4t, t) = (−2, 5, 0) + t (−3, 4, 1)

or in algebraic form:x+ 2

−3=y − 5

4=z − 0

1.

This shows that the solution is a straight line passing through the point (−2, 5, 0) and with

direction vector−3i+ 4j+ k.

Example: (from the previous lecture) Find an equation of the line of intersection of the

planes −x+ y + z = 2 and x+ 2y = 4.

Augmented matrix:

[−1 1 1

1 2 0

∣∣∣∣∣ 2

4

]

R2 + R1 → R2

[−1 1 1

0 3 1

∣∣∣∣∣ 2

6

](now in echelon form)

z is free, y = 2− 13z, x = −2 + z + y = 2

3z

set z = 3t, y = 2 − t, x = 2t and hence (x, y, z) = (0, 2, 0) + t (2,−1, 3) . (Compare with the

direction vector found in that example.)



ENG1091 Consistency of Linear Equations

Lecture 5 · no solution case · infinite solution caseText Reference: §5.6

The equation systems given in the last lecture were rather special in the sense that they all had

solutions.

An example of this is the equation systemx+ 2y = 3

2x− 3y = −8, which consists of two straight lines

intersecting in the point (−1, 2).

But of course straight lines do not always intersect. The equation systemx+ 2y = 3

2x+ 4y = 1represents

two parallel straight lines and has no solution.

The question is how can we do this systematically:

How do we use Gauss elimination to recognise when a system of

equations has no solution.

Notice what happens when we employ Gauss elimination to solve the system of equations likex+ 2y = 3

2x+ 4y = 1

Augmented matrix:

[1 2

2 4

∣∣∣∣∣ 3

1

]

Converting to row echelon form: (one step only),

[1 2

0 0

∣∣∣∣∣ 3

−5

].

Notice that in the last row all entries left of the partition are zero, and that there is a non zero

number to the right of the partition. Since it is impossible for 0x + 0y = −5 we know that the

system has no solution.

Definition: A linear system of equations without solution is called inconsistent.

Now of course the previous example didn’t need Gauss elimination to demonstrate its inconsis-

tency. However, a system of 3 equations in 3 unknowns (represented by three planes in space) is

rather more complex. A 3× 3 system of equations will be inconsistent if either

• the three planes are parallel

• two planes are parallel and are intersected by the third,

• neither of the planes is parallel but each pair of planes intersects in a line parallel to theothers.

Geometrically the situation for higher dimensions (>3 unknowns) is even more complex still but

algebraically very easy to sort out provided we apply Gauss elimination.


The great advantage of Gauss elimination is that it takes the guess work out of equation manip-

ulation by being systematic. We can tell whether equations are inconsistent or not by using the

following very simple test:

When the augmented matrix corresponding to a system of inconsistent equations

is converted into a row echelon form, there will be at least one row where

all entries left of the partition are zero and there is a non-zero entry to the right

of the partition.

To put it another way, the row echelon form of an inconsistent linear system will have a row of

type[

0 0 0 · · · 0∣∣∣ ∗ ] where ∗ is some non-zero number.

Moreover, the test is completely diagnostic: if no such row exists then the equation systemmust

have solutions.

Example: The following partitioned matrices are row echelon forms corresponding to various

systems of linear equations. Which linear systems are inconsistent?

A.

1 1 2

0 1 13

0 0 0

∣∣∣∣∣∣∣∣0

1

5

B.

1 0 2 0

0 1 −3 0

0 0 0 0

∣∣∣∣∣∣∣∣2

1

0

C.

1 1 −3

0 2 1

0 0 0

0 0 0

∣∣∣∣∣∣∣∣∣∣∣

0

4

1

0

D.

2 1 2

0 0 0

0 0 0

∣∣∣∣∣∣∣∣1

0

0

E.

1 0 2

0 2 0

0 0 0

∣∣∣∣∣∣∣∣0

1

0

F.

1 0 0 0

0 1 1 0

0 0 0 1

∣∣∣∣∣∣∣∣0

0

0

G.

1 2 0 1 −3 1

0 0 0 1 2 −3

0 0 0 0 0 0

∣∣∣∣∣∣∣∣0

1

5

Example: Show that the following system of equations is inconsistent by forming its augmented

matrix and then using row operations convert it to a matrix in row echelon form:

x + 2z = 1

y − z = 0

x + y + z = 2

Solution

Augmented matrix:

1 0 2

0 1 −1

1 1 1

∣∣∣∣∣∣∣∣1

0

2

(not in echelon from)

R3 − R1 → R3

1 0 2

0 1 −1

0 1 −1

∣∣∣∣∣∣∣∣1

0

1

R3 − R2 → R3

1 0 2

0 1 −1

0 0 0

∣∣∣∣∣∣∣∣1

0

1

The shaded row indicates inconsistency.


What is the geometric interpretation of this inconsistent system?

Answer: Since none of the three planes are parallel (why?) we conclude that each pair of planes

intersects in a line parallel to the others.

[Examine the normal vectors (1, 0, 2) , (0, 1,−1) , (1, 1, 1) . Since no two of these is parallel neither

is there a pair of parallel planes.]

A system of linear equations that does not have solutions is said to be inconsistent, so obviously

a consistent system is one that does have solutions.

Now we encounter a remarkable fact: either a consistent linear system has a unique solution

(exactly one solution for each of the unknowns) or else it possesses infinitely many! To put it

another way, if a linear system of equations is known to have two different solutions (say) then

that system must have infinitely many solutions.

2. Systems with infinitely many solutions:

The augmented matrix of the system

x − 3y + z = 1

2x − 6y + 3z = 4

−x + 3y = 1

reduces to the following equivalent row-echelon form:

Working: Augmented matrix:

1 −3 1

2 −6 3

−1 3 0

∣∣∣∣∣∣∣∣1

4

1

R2 − 2R1 → R2

R3 + R1 → R3

1 −3 1

0 0 1

0 0 1

∣∣∣∣∣∣∣∣1

2

2

R3 − R2 → R3

1 −3 1

0 0 1

0 0 0

∣∣∣∣∣∣∣∣1

2

0

row echelon form:

1 −3 1

0 0 1

0 0 0

∣∣∣∣∣∣∣∣1

2

0

The echelon form matrix gives us all the information concerning the original system. First of all

we notice there is no row of the type[

0 0 0 · · · 0∣∣∣ ∗ ] where ∗ is non-zero, so we know

that the system has solutions.

The third row is entirely zero and in effect is totally redundant. We ignore rows that consist

entirely of zeros.

From 2nd row we have z = 2.

Solving the first row for x we have x = 1− z + 3y = −1 + 3y (since z = 2).

So z = 2 and x = −1 + 3y where the choice for y is completely arbitrary. There are infinitely

many solutions, one for each value of y.

It is customary to assign a parameter to the free variable y. We can then write the solution set


as y = t, x = −1 + 3t, z = 2, where t is arbitrary.

What is the graphical interpretation of this consistent system?

Answer: The three planes x − 3y + z = 1, 2x − 6y + 3z = 4, and −x + 3y = 1 intersect

in a straight line in 3D space. This line has a vector equation (x, y, z) = (−1 + 3t, t, 2) =

(−1, 0, 2)+t (3, 1, 0) , and therefore passes through the point (−1, 0, 2) and points in the direction

of the vector 3i+ j+ 0k.

Example: Solve the 3× 4 system of linear equations:

2x+ y + z + w = 4

4x+ y + 3z + 2w = 7

−2x+ 2y + z − w = 9

Solution:

We write the system in augmented matrix form and use elementary row operations to convert

the system to an equivalent one in echelon form. (Gauss elimination.)

Augmented matrix:

[A | b] =

2 1 1 1

4 1 3 2

−2 2 1 −1

∣∣∣∣∣∣∣∣4

7

9

2 1 1 1

4 1 3 2

−2 2 1 −1

∣∣∣∣∣∣∣∣4

7

9

R2 − 2R1 →R2

2 1 1 1

0 −1 1 0

−2 2 1 −1

∣∣∣∣∣∣∣∣4

−1

9

R3+ R1 →R3

2 1 1 1

0 −1 1 0

0 3 2 0

∣∣∣∣∣∣∣∣4

−1

13

R3+ 3R2 →R3

2 1 1 1

0 −1 1 0

0 0 5 0

∣∣∣∣∣∣∣∣4

−1

10

This time the pivot variables are x, y and z (since the pivot entries occur in columns 1,2, and 3,

corresponding to the variables x, y, z).

The free variable is w.

w = free = t (say)

from row 3: 5z = 10 ∴ z = 2

from row 2: −y + z = −1 ∴ y = z + 1 = 3

from row 1: 2x+ y + z + w = 4 ∴ x = 2− 12z −

12y −

12w = −1

2 −12 t

Writing the solutions in vector form:

〈x, y, z, w〉 =⟨−1

2 −12 t, 3, 2, t

⟩=⟨−1

2 , 3, 2, 0⟩

+ t⟨−1

2 , 0, 0, 1⟩.


Exercise:

The row echelon form of a system with unknowns r, s, t, and u, is

1 1 0 1

0 0 1 1

0 0 0 0

0 0 0 0

∣∣∣∣∣∣∣∣∣∣∣

1

1

0

0

Describe the solutions of the system.

ANS: infinite number of solutions with s and u free t = 1− u, r = 1− t− s∴ (r, s, t, u) = (1− t− s, s, 1− u, u) where s, u are arbitrary.

Exercises: Solve the following systems of linear equations:

(a)

x − y − 2z = 3

x + 2y − z = 0

2x − y + z = 5

x − y − z = 3

ANS: unique solution x = 2, y = −1, z = 0

(b)

x + y + z = 2

x − y + z = 1

2x + 2z = 4

ANS: no solution

(c)

−a + b + c + 2d + e = 0

a − c + d − e = 1

2b + c − d − 2e = −1

ANS: infinite solution set where d and e are free.

Solving for a, b, c we get a = −2 + 6d+ 3e, b = 1− 3d, c = −3 + 7d+ 2e

i.e. (a, b, c, d, e) = (−2 + 6d+ 3e, 1− 3d,−3 + 7d+ 2e, d, e)

= (−2, 1,−3, 0, 0) + (6d,−3d, 7d, d, 0) + (3e, 0, 2e, 0, e)

= (−2, 1,−3, 0, 0) + d (6,−3, 7, 1, 0) + e (3, 0, 2, 0, 1)

showing that the solution set is a plane in 5D space



ENG1091 Matrices

Lecture 6 · matrices · matrix arithmeticText Reference: §5.1-5.2

A matrix is a rectangular arrangement of numbers or variables, which can be either real or

complex, enclosed in square brackets. It is usual to denote matrices using capital letters. For

example:

A =

−2 3

4 5 0

7 π√

2 −1

−0.18 7 20 −78

, B =

[1 2 3

4 5

]is not a matrix

A matrix has rows, running left to right, and columns running form top to bottom.

The matrix A has three rows and four columns and consists of 12 entries.

• A matrix with m rows and n columns is called a m × n matrix; the matrix A in the

example is a 3× 4 matrix.

• The position of each entry is determined by the column and row numbers. We use

subindices to indicate this, for example,

a24 is the entry in row 2 , column 4. In matrix A, a24 = −1.

a13 is the entry in row 1 , column 3. In matrix A, a13 = 5.

We use the notation A = [aij ] to indicate that A is a matrix (hence the square brackets) whose

entries are generically indicated as aij . The notation A = [aij ]m×n means that A is an m × nmatrix.

Some special matrices

1. A 1× n matrix is a row matrix or row vector, e.g.[

1 2 4 3]is a 1× 4 row vector.

2. An m× 1 matrix is a column matrix or column vector; e.g.

x =

1

2

3

is a 3 × 1 column vector. Column matrices are usually identified with ordinary vectors andfor this reason it is common to use lower case boldface letters to denote them.

3. A matrix with the same number of rows and columns is called a square matrix; e.g.[1 3

2 4

]is a 2× 2 matrix

4. A zero or null matrix contains all zero entries. That is 0 = [0ij ] where 0ij = 0 for all i and

j.

Operations with matrices


1. Addition and subtraction

Addition and subtraction are possible only between matrices of the same order. These

are performed by adding or subtracting the corresponding entries respectively.

Example: 1 −1

3 5

−4 8

+

7 12

6 1

−3 5

=

8 11

9 6

−7 13

The addition of matrices is commutative i.e. A+B = B +A.

2. Multiplication by scalars

Given a matrix A and a number k, the multiplication of A by the scalar k and is obtained

by multiplying each entry of A by k.

For example let k = 3 and A =

1 −1

3 5

−4 8

, then 3A = 3

1 −1

3 5

−4 8

=

3 −3

9 15

−12 24

Note that subtraction can be expressed in terms of a scalar product (k = −1) and an

addition: A−B = A+ (−B)

For any matrix A, A−A = 0.

3. Multiplication

Two matrices A and B can be multiplied together only when the number of columns in

A equals the number of rows in B. To find the ij entry in the product AB we multiply

the entries along the ith row of A pairwise with entries on the jth column of B and then

add:

A =

1 −1

3 5

−4 8

, B =

[1 −1 3

2 4 −2

], C =

1

2

3

(a)

AB =

1 −1

3 5

−4 8

[

1 −1 3

2 4 −2

]=

1× 1 +−1× 2 1×−1 +−1× 4 1× 3 +−1×−2

3× 1 + 5× 2 3×−1 + 5× 4 3× 3 + 5×−2

−4× 1 + 8× 2 −4×−1 + 8× 4 −4× 3 + 8×−2

=

−1 −5 5

13 17 −1

12 36 −28

(b) AC is not defined

(c)

BA =

[1 −1 3

2 4 −2

]1 −1

3 5

−4 8

=

[1× 1 +−1× 3 + 3×−4 1×−1 +−1× 5 + 3× 8

2× 1 + 4× 3 +−2×−4 2×−1 + 4× 5 +−2× 8

]

=

[−14 18

22 2

]


This example demonstrates something very important: matrix multiplication is not usually

commutative, i.e. AB 6= BA in general.

In fact AB and BA need not be of the same order, or even if one product AB is defined,

the other product, BA, need not be.

In general if A = [aij ]m×p and B = [bij ]p×n then AB is defined and AB = C = [cij ]m×n

where cij = ai1b1j + ai2b2j + · · ·+ aipbpj =p∑

k=1

aikbkj .

To illustrate: . . . . . . . . . . . .

. . . . . . . . . . . .

. . . cij . . . . . .

. . . . . . . . . . . .

m×n

=

. . . . . . . . . . . .

. . . . . . . . . . . .

ai1 ai2 . . . aip

. . . . . . . . . . . .

m×p

. . . b1j . . . . . .

. . . b2j . . . . . .

. . . . . . . . . . . .

. . . bpj . . . . . .

p×n

=

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . ai1b1j + ai2b2j + · · ·+ aipbpj . . . . . .

. . . . . . . . . . . .

So cij = ai1b1j + ai2b2j + · · ·+ aipbpj =

p∑k=1

aikbkj

4. Examples:[2 3

−1 5

]2× 2

[−1 2

−2 3

]2× 2 = 2× 2

[1 −2 3

−4 5 6

] −1 1 3

0 2 −1

3 5 4

2× 3 3× 3 = 2× 3

=

[2 (−1) + 3 (−2) 2 (2) + 3 (3)

−1 (−1) + 5 (−2) −1 (2) + 5 (3)

]=

[−1 + 9 1− 4 + 15 3 + 2 + 12

4 + 18 −4 + 10 + 30 −12− 5 + 24

]

=

[−8 13

−9 13

]=

[8 12 17

22 36 7

]


5. Important:

• We stress again that to be able to perform the matrix product AB there is a size

restriction:

the number of columns in A (the matrix on the left) must equal the number of rows

in B (the second matrix in the product). We then say that AB is defined.

• If A is a m× p matrix, and B is a p× n matrix, then AB is a m× n matrix.

6. Properties of matrix multiplication

If A,B, and C are matrices of appropriate sizes, and k is a scalar then:

• A(B + C) = AB +AC

• (B + C)A = BA+ CA

• (AB)C = A(BC)

• k(AB) = (kA)B = A (kB)

• AB 6= BA in general.

Exercises

1. Find the following product of matrices[2 1

−3 5

]×[

3 −1

−2 4

]=

[4 2

−19 23

]

2. The product in the reverse order, although possible, leads to a different matrix:[3 −1

−2 4

]×[

2 1

−3 5

]=

[9 −2

−16 18

]

3. Given

A =

[1 3

−1 2

], B =

0

7

8

, C =

[2 4 6

8 10 12

], D =

9 8 7 6

5 4 3 2

1 0 9 8

determine which of the following are defined and give their sizes (orders).

(a) AB not defined

(b) AC 2× 3

(c) CD 2× 4

(d) AD not defined

(e) DC not defined

(f) CB 2× 1

(g) BC not defined

(h) (AC)D 2× 4

(i) A(CD) 2× 4



ENG1091 Matrices

Lecture 7 · transpose · matrix inversesText Reference: §5.4

The transpose of a matrix

The transpose of a matrix is obtained by interchanging its rows and columns. That is, the entries

of the ith row become the entries of the ith column.

So, if A is a m× n matrix, then its transpose, denoted AT , is a n×m matrix.

Example:

Let A be the 3× 2 matrix

1 3

2 4

5 8

.

Then AT is the 2× 3 matrix: AT =

1 3

2 4

5 8

T

=

[1 2 5

3 4 8

].

Transpose of a product

If A and B are such that AB is defined then (AB)T = BTAT .

In words: the transpose of a matrix product is equal to the product of individual transposes

taken in the reverse order.

First we illustrate this with an example.

Let

A =

[1 −2

3 0

], B =

[2 −1 4

0 1 3

]we have

AB =

[1 −2

3 0

][2 −1 4

0 1 3

]=

[2 −3 −2

6 −3 12

]so that

(AB)T =

[2 −3 −2

6 −3 12

]T=

2 6

−3 −3

−2 12

On the other hand

BTAT =

2 0

−1 1

4 3

[

1 3

−2 0

]=

2 6

−3 −3

−2 12

Now we give a proof to show why (AB)T = BTAT is always true.

Let A = [aij ]m×p and B = [bij ]p×n.

Then AB is an m× n matrix and for any i = 1, ...,m and j = 1, ..., n we have

(AB)ij = ai1b1j + ai2b2j + · · ·+ aipbpj =p∑

k=1

aikbkj .


Now the (i, j) entry of (AB)T is the the (j, i) entry of (AB); this is found by swapping

i’s and j’s in the formula for (AB)i,j :

(AB)Ti,j =

p∑k=1

ajkbki = aj1b1i + ai2b2i + · · ·+ ajpbpi

= b1iaj1 + b2iaj2 + · · ·+ bpiajp =

p∑k=1

bkiajk

= the sum of products found by multiplying, term by term, the

ith row of BT with the jth column of AT and this is the (i, j)

entry of BTAT

Special matrices

Some types of matrices that are particularly important are given below. This list is not exhaus-

tive.

• A symmetric matrix is one which is equal to its transpose: e.g.

1 2 3

2 4 5

3 5 6

• Diagonal matrices are square matrices where any non-zero entries occur on the main diag-

onal: e.g.

1 0 0

0 2 0

0 0 0

• Identity matrices are square matrices where the main diagonal entries are all 1’s. For

example I2 =

[1 0

0 1

], I3 =

1 0 0

0 1 0

0 0 1

.If the size of the identity matrix can be understood from the context, or is irrelevant, the

symbol I is used.

If A is a square matrix and I is the identity matrix the same size as A, then AI = IA = A.

Identity matrices play a role analogous to the number 1 in ordinary arithmetic.

The inverse of a matrix

Definition: The inverse of a square n× n matrix A is an n× n matrix B, (if one exists),such that AB = BA = I where I is the n× n identity matrix.Note: If such a B exists it is unique and we write it as A−1.

Warning: A−1 does not mean 1A .

If A has an inverse, then we say that matrix A is invertible or non-singular.

We can calculate A−1 by forming the augmented matrix consisting of A and I, the identity

matrix. We then apply a systematic sequence of row operations until, if possible, A becomes I.

In the process, (in other words apply exactly the same sequence of row operations to I), I will

have become A−1.


Schematically: [A | I] row operations−→

[I | A−1

]Recall that elementary row operations are:

1. Interchanging two rows.

2. Multiplying a row by a non-zero scalar.

3. Adding to one row a multiple of another.

STAGE 1: Forward elimination process

1. Let C = [A | I] .

2.If there is a stage where C has a column consisting entirely of zeros, we stop immediately:

A has no inverse.

3. Ensure that the top left entry of C is a non-zero entry, which we will label as a. (If necessary,

interchange the top row with another row to achieve this.)

4. Multiply this row by 1a so that the first non-zero entry of this row is 1. This entry is the

pivot for that column. (Alternatively this can sometimes be affected by row interchange.)

5. Add a suitable multiple of this first row to the rows below row so that all entries in the

column below the pivot become 0.

If there is a stage where there the sub-matrix of C left of the partition has a row consisting

entirely of zeros, we stop immediately: the matrix A has no inverse.

6. Consider the submatrix of C found by removing its 1st row and 1st column, regard this

as a new matrix C. Repeat steps 2-6 until the next submatrix under consideration has no

rows left.

7. Provided the algorithm has not been exited at steps 2 or 5 the full matrix is now in echelon

form. The pivots are all 1 and located on the main diagonal of the matrix left of the

partition.

STAGE 2: Backward elimination process

1. Notice that all pivots are 1 and are located on the main diagonal of the matrix left of the

partition. Locate the row containing the right-most pivot, (which must be in the bottom

row).

2. Add suitable multiples of this row to the rows above so that all entries in the column above

become 0.

3. Locate the next pivot by moving up the diagonal and repeat steps 2 and 3.

4. This procedure is repeated until the top left pivot is reached, at which point the full matrix

is[I | A−1

].


Examples:

Find inverses of the following (if they exist).

1. A =

0 1 1

109

0√

2 −4

0 3 π

, A has a column of zeros and hence no inverse.

2. A =

1 1 −1

1 1 0

1 1 1

We form [A | I] = C =

1 1 −1

1 1 0

1 1 1

∣∣∣∣∣∣∣∣1 0 0

0 1 0

0 0 1

Step 1We note that C has a pivot in the top left entry and that this pivot is 1. Steps 3-4

Subtract row 1 from row 2:

1 1 −1

0 0 1

1 1 1

∣∣∣∣∣∣∣∣1 0 0

−1 1 0

0 0 1

Subtract row 1 from row 3:

1 1 −1

0 0 1

0 0 2

∣∣∣∣∣∣∣∣1 0 0

−1 1 0

−1 0 1

. Step 5 is now complete.Step 7. We apply the algorithm again to the submatrix of C found by deleting its 1st row and

column (shaded)

1 1 −1

0 0 1

0 0 2

∣∣∣∣∣∣∣∣1 0 0

−1 1 0

−1 0 1

but since this new matrix has a column of zeros, we conclude the matrix

1 1 −1

1 1 0

1 1 1

has noinverse. (Exiting the algorithm at step 2.)


3. Find the inverse of A =

2 7 1

1 4 −1

1 3 0

Solution: [A | I] =

2 7 1

1 4 −1

1 3 0

∣∣∣∣∣∣∣∣1 0 0

0 1 0

0 0 1

R1 ↔ R2

1 4 −1

2 7 1

1 3 0

∣∣∣∣∣∣∣∣0 1 0

1 0 0

0 0 1

R2 − 2R1 → R2

1 4 −1

0 −1 3

1 3 0

∣∣∣∣∣∣∣∣0 1 0

1 −2 0

0 0 1

R3 − R1 → R3

1 4 −1

0 −1 3

0 −1 1

∣∣∣∣∣∣∣∣0 1 0

1 −2 0

0 −1 1

(−1)R2 → R2

1 4 −1

0 1 −3

0 −1 1

∣∣∣∣∣∣∣∣0 1 0

−1 2 0

0 −1 1

R2 + R3 → R3

1 4 −1

0 1 −3

0 0 −2

∣∣∣∣∣∣∣∣0 1 0

−1 2 0

−1 1 1

(−1

2

)R3 → R3

1 4 −1

0 1 −3

0 0 1

∣∣∣∣∣∣∣∣0 1 0

−1 2 012 −1

2 −12

R2 + 3R3 → R2

1 4 −1

0 1 0

0 0 1

∣∣∣∣∣∣∣∣0 1 012

12 −3

212 −1

2 −12

R1 + R3 → R1

1 4 0

0 1 0

0 0 1

∣∣∣∣∣∣∣∣12

12 −1

212

12 −3

212 −1

2 −12

R1 − 4R2 → R1

1 0 0

0 1 0

0 0 1

∣∣∣∣∣∣∣∣−3

2 −32

112

12

12 −3

212 −1

2 −12

=[I | A−1

].

Hence A−1 =

−3

2 −32

112

12

12 −3

212 −1

2 −12

= 12

−3 −3 11

1 1 −3

1 −1 −1

Check: 12

−3 −3 11

1 1 −3

1 −1 −1

2 7 1

1 4 −1

1 3 0

= 12

−6− 3 + 11 −21− 12 + 33 −3 + 3 + 0

2 + 1− 3 7 + 4− 9 3− 3 + 0

2− 1− 1 7− 4− 3 1 + 1 + 0

= 12

2 0 0

0 2 0

0 0 2

=

1 0 0

0 1 0

0 0 1

.

Strictly speaking we should also check that

2 7 1

1 4 −1

1 3 0

−3

2 −32

112

12

12 −3

212 −1

2 −12

=

1 0 0

0 1 0

0 0 1

,however it is a known fact for matrices that a left inverse is also a right inverse (and vice versa),

so a one sided check is suffi cient.


Inverses of 2× 2 matrices

Example: find the inverse of the matrix

[2 4

1 3

].

Solution: [A | I] =

[2 4

1 3

∣∣∣∣∣ 1 0

0 1

]12R1 → R1

[1 2

1 3

∣∣∣∣∣ 12 0

0 1

]R2 − R1 → R2

[1 2

0 1

∣∣∣∣∣ 12 0

−12 1

]

R1 −2R2 → R1

[1 0

0 1

∣∣∣∣∣ 32 −2

−12 1

]. Hence

[2 4

1 3

]−1

=

[32 −2

−12 1

].

However there is a simple formula for the inverse of 2× 2 matrices.

The inverse of a 2× 2 matrix:

Let A =

[a b

c d

], and suppose ad−bc 6= 0. Then A is invertible and A−1 =

1

ad− bc

[d −b−c a

].

The number ad− bc is called the determinant of A and is denoted by∣∣∣∣∣ a b

c d

∣∣∣∣∣ or det (A) .

The determinant of any square matrix A is also defined (see next lecture) and this number

determines whether or not A is invertible:

A square matrix A is invertible if and only if its determinant is non-zero..

Using matrix methods to solve linear systems of equations

Consider the 3× 3 linear system:

2x1 + 7x2 + x3

x1 + 4x2 − x3

x1 + 3x2

=

=

=

1

4

5

which can also be written in matrix form

2 7 1

1 4 −1

1 3 0

x1

x2

x3

=

1

4

5

.Any n× n linear system can be written in the form Ax = b, where x and b are column vectors

(matrices).

If A is invertible we can multiply on the left by A−1 and so obtain the unknown

matrix x :

Ax = b

A−1Ax = A−1b

Ix = A−1b

giving x = A−1b

This method is somewhat more restrictive than Gaussian elimination. It only works

for n × n systems and either produces a unique solution (when det (A) 6= 0) but is


incapable of distinguishing between the no solution or infinite solution cases which

occur when det (A) = 0.

The main advantage to using matrix inverse method occurs when working with mul-

tiple equations with the same set of coeffi cients.

Example:

Solve: (a)

2x1 + 7x2 + x3 = 1

x1 + 4x2 − x3 = 4

x1 + 3x2 = 5

and (b)

2x1 + 7x2 + x3 = −2

x1 + 4x2 − x3 = 4

x1 + 3x2 = 6

In (a) we have

x1

x2

x3

=

2 7 1

1 4 −1

1 3 0

−1

1

4

5

, and in (b)

x1

x2

x3

=

2 7 1

1 4 −1

1 3 0

−1

−2

4

6

.

Now

2 7 1

1 4 −1

1 3 0

−1

= 12

−3 −3 11

1 1 −3

1 −1 −1

(shown above),

giving the solution to (a):

x1

x2

x3

= 12

−3 −3 11

1 1 −3

1 −1 −1

1

4

5

=

20

−5

−4

and to (b):

x1

x2

x3

= 12

−3 −3 11

1 1 −3

1 −1 −1

−2

4

6

=

30

−8

−6

.Exercise: Solve the following system of equations using matrix inversion followed by matrix

multiplication:2x+ 3y = 7

4x+ y = 3

In matrix form:

[2 3

4 1

][x

y

]=

[7

3

]

If exists

[2 3

4 1

]−1

we may write

[2 3

4 1

]−1 [2 3

4 1

][x

y

]=

[2 3

4 1

]−1 [7

3

]

Now

∣∣∣∣∣ 2 3

4 1

∣∣∣∣∣ = 2− 12 6= 0 so

[2 3

4 1

]−1

exists.

The formula for the inverse of a 2×2matrix gives

[2 3

4 1

]−1

= 12−12

[1 −3

−4 2

]=

[− 1

10310

25 −1

5

]

So

[1 0

0 1

][x

y

]=

[− 1

10310

25 −1

5

][7

3

]=

[15115

]; giving x = 1/5 and y = 11/5



ENG1091 Determinants

Lecture 8 · Determinants · Cramer’s RuleText Reference: §5.3

Determinants

The determinant of a 2× 2 matrix A =

[a b

c d

]is defined by detA =

∣∣∣∣∣ a b

c d

∣∣∣∣∣ = ad− bc.

As we noted in the previous lecture determinants are used to determine whether a square matrix

is invertible or not.

Determinants can also be used to solve n× n systems of linear equations using a rule known asCramer’s rule.

For example the system :2x + 3y = 5

7x + 11y = 13has the solution:

x =

∣∣∣∣∣ 5 3

13 11

∣∣∣∣∣∣∣∣∣∣ 2 3

7 11

∣∣∣∣∣, y =

∣∣∣∣∣ 2 5

7 13

∣∣∣∣∣∣∣∣∣∣ 2 3

7 11

∣∣∣∣∣The denominator is always the determinant of coeffi cients and the determinant on the top line

replacing the column containing the coeffi cients of the variable in question with the numbers on

the right hand side.

Evaluating the determinants gives

x =55− 39

22− 21, y =

26− 35

22− 21

i.e.

x = 16, y = −9.

Cramer’s rule works provided determinant of coeffi cients (the denominator) is non-zero,

when it is zero Cramer’s rule fails, and the system of equations has either no solutions or

infinitely many.

Determinants of larger matrices

The determinant is a number we assign to any square matrix. It plays an important role in

finding the inverse of a matrix, solving systems of equations, multiplication of vectors, finding

areas of triangles, etc.

To find the determinant of larger matrices we need to know about cofactors. A cofactor of a

particular entry in a matrix is the (smaller) determinant consisting of those elements which

remain if we removed the row and column belonging to that entry, together with a sign, + or −,depending on where the entry is located.

Example In the matrix A =

−1 3 7

4 2 −2

5 6 9

the cofactor of the (2, 3) entry, namely −2, is the


signed determinant −∣∣∣∣∣ −1 3

5 6

∣∣∣∣∣ .

The minus sign comes from the sign matrix:

+ − + . . .

− + − . . .

+ − + . . ....

......

and the minor determinant∣∣∣∣∣ −1 3

5 6

∣∣∣∣∣ is obtained by removing row 2 and column 3We refer to the cofactor of the (i, j) entry as Cij .

In the example above, C23 = −∣∣∣∣∣ −1 3

5 6

∣∣∣∣∣ = − (−6− 15) = 21.

Example Find, but do not evaluate, C41 in the matrix

−1 4 3 −7

2 −3 9 1

1 8 −6 −1

−1 2 1 10

.

This is clearly −

∣∣∣∣∣∣∣∣4 3 −7

−3 9 1

8 −6 −1

∣∣∣∣∣∣∣∣ .Note that the ‘−’comes from the position not the sign of the entry.

How to evaluate determinants.

1. Choose any row or column.

2. For each position in the selected row or column, calculate the corresponding cofactor.

3. Form the product of each cofactor with the corresponding entry. The determinant is the

sum of these products.

Example Find detA =

∣∣∣∣∣∣∣∣−1 3 7

4 2 −2

5 6 9

∣∣∣∣∣∣∣∣ .We choose to expand along the second row.

detA = 4×−∣∣∣∣∣ 3 7

6 9

∣∣∣∣∣+ 2×∣∣∣∣∣ −1 7

5 9

∣∣∣∣∣− 2×−∣∣∣∣∣ −1 3

5 6

∣∣∣∣∣= 4×− (27− 42) + 2× (−9− 35) + 2 (−6− 15) = −70.

If we chose instead to expand along the 1st column the answer is the same.

detA = −1×∣∣∣∣∣ 2 −2

6 9

∣∣∣∣∣+ 4×−∣∣∣∣∣ 3 7

6 9

∣∣∣∣∣+ 5×∣∣∣∣∣ 3 7

2 −2

∣∣∣∣∣ = −30 + 4× 15 + 5×−20 = −70


Which row or column? It is a remarkable fact that the answer is independent of the row and

column we choose.

As a practical consideration we would do well to choose that row/column that has the greatest

number of zeros.

Example: Find the 4× 4 determinant

∣∣∣∣∣∣∣∣∣∣∣

−1 4 3 −7

0 −3 9 1

0 0 −6 −1

0 0 0 10

∣∣∣∣∣∣∣∣∣∣∣.

An obvious choice is to expand along the 1st column:∣∣∣∣∣∣∣∣∣∣∣

−1 4 3 −7

0 −3 9 1

0 0 −6 −1

0 0 0 10

∣∣∣∣∣∣∣∣∣∣∣= −1

∣∣∣∣∣∣∣∣−3 9 1

0 −6 −1

0 0 10

∣∣∣∣∣∣∣∣+ other terms all zero.

And again: −1

∣∣∣∣∣∣∣∣−3 9 1

0 −6 −1

0 0 10

∣∣∣∣∣∣∣∣ = −1×−3

∣∣∣∣∣ −6 −1

0 10

∣∣∣∣∣+ other terms all zero

= −1×−3× (−6× 10− 0) = −1×−3×−6× 10 = −180

This is a general rule:the determinant of a triangular matrix (either upper or lower) is the product of the entries

along the main diagonal.

Exercise: Find detB where B =

−2 1 1

1 3 3

10 5 2

.

detB =

∣∣∣∣∣∣∣∣−2 1 1

1 3 3

10 5 2

∣∣∣∣∣∣∣∣(notice the different bracketing which distinguishes

a matrix from its determinant)

= − (1)

∣∣∣∣∣ 1 1

5 2

∣∣∣∣∣+ (3)

∣∣∣∣∣ −2 1

10 2

∣∣∣∣∣− (3)

∣∣∣∣∣ −2 1

10 5

∣∣∣∣∣(expanding along row 2)= − (−3) + 3 · −14− 3 · −20

= 3− 42 + 60

= 21.


Properties of Determinants

To illustrate the following properties of determinants we will work with an arbitrary 3×3 matrix

A =

a1 a2 a3

b1 b2 b3

c1 c2 c3

. We stress that the all the following properties are true regardless of size.1. Transpose property: det (A) = det

(AT)

∣∣∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣a1 b1 c1

a2 b2 c2

a3 b3 c3

∣∣∣∣∣∣∣∣2. Scaling property: to multiply a determinant by a number we multiply any row or column

by that number.

k

∣∣∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣a1 a2 ka3

b1 b2 kb3

c1 c2 kc3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣ka1 ka2 ka3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣∣∣ (we may pick any row or column)

Hence det (kA) = det

ka1 a2 a3

b1 b2 b3

c1 c2 c3

=

∣∣∣∣∣∣∣∣ka1 ka2 ka3

kb1 kb2 kb3

kc1 kc2 kc3

∣∣∣∣∣∣∣∣ = k3

∣∣∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣∣∣If A is n× n then det (kA) = kn det (A)

3. Interchange property: Swapping any two rows, or two columns, changes the sign of the

determinant

e.g.

∣∣∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣∣∣ = −

∣∣∣∣∣∣∣∣a1 a2 a3

c1 c2 c3

b1 b2 b3

∣∣∣∣∣∣∣∣Hence a matrix with two identical rows or columns has determinant = zero∣∣∣∣∣∣∣∣

a1 a2 a3

a1 a2 a3

c1 c2 c3

∣∣∣∣∣∣∣∣ = 0.

4. Elimination property: Adding a multiple of a row to another row does not alter the

value of a determinant. Similarly for columns.

e.g.

∣∣∣∣∣∣∣∣a1 a2 a3

b1 + kc1 b2 + kc2 b3 + kc3

c1 c2 c3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣∣∣ .5. Matrix multiplication property: Let A and B be square matrices of the same size (both

n× n), then

det (AB) = detAdetB


Special case: if A is invertible then AA−1 = I so that det(AA−1

)= detAdetA−1 = det I = 1.

In particular detA 6= 0, and

det(A−1

)=

1

detA.

It can be shown that the condition detA 6= 0 is also suffi cient for invertibility, i.e.

A is invertible if and only if detA 6= 0.

Application: Simplify the 3× 3 Vandemonde determinant:∣∣∣∣∣∣∣∣1 a a2

1 b b2

1 c c2

∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣1 a a2

0 b− a b2 − a2

1 c c2

∣∣∣∣∣∣∣∣ R2− R1 → R2

=

∣∣∣∣∣∣∣∣1 a a2

0 b− a b2 − a2

0 c− a c2 − a2

∣∣∣∣∣∣∣∣ R3− R1 → R3

= (b− a) (c− a)

∣∣∣∣∣∣∣∣1 a a2

0 1 b+ a

0 1 c+ a

∣∣∣∣∣∣∣∣taking out the common factors of (b− a) from row 2

and (c− a) from row 3

= (b− a) (c− a)

∣∣∣∣∣∣∣∣1 a a2

0 1 b+ a

0 0 c− b

∣∣∣∣∣∣∣∣ R3− R1 → R2

= (b− a) (c− a) (c− b) multiplying down the main diagonal to evaluate the determinant


Cramer’s Rule

Example: Solve the system of equations below using Cramer’s rule:

x+ 2y + z = 1

2x− 3y + 7z = −4

−x+ y − 3z = −1

Solution:

x =

∣∣∣∣∣∣∣∣1 2 1

−4 −3 7

−1 1 −3

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣1 2 1

2 −3 7

−1 1 −3

∣∣∣∣∣∣∣∣, y =

∣∣∣∣∣∣∣∣1 1 1

2 −4 7

−1 −1 −3

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣1 2 1

2 −3 7

−1 1 −3

∣∣∣∣∣∣∣∣, z =

∣∣∣∣∣∣∣∣1 2 1

2 −3 −4

−1 1 −1

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣1 2 1

2 −3 7

−1 1 −3

∣∣∣∣∣∣∣∣.

∣∣∣∣∣∣∣∣1 2 1

−4 −3 7

−1 1 −3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 2 1

0 5 11

0 3 −2

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣ 5 11

3 −2

∣∣∣∣∣ = −43

∣∣∣∣∣∣∣∣1 1 1

2 −4 7

−1 −1 −3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 1 1

0 −6 5

0 0 −2

∣∣∣∣∣∣∣∣ = 12

∣∣∣∣∣∣∣∣1 2 1

2 −3 −4

−1 1 −1

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 2 1

0 −7 −6

0 3 0

∣∣∣∣∣∣∣∣ = 18

∣∣∣∣∣∣∣∣1 2 1

2 −3 7

−1 1 −3

∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣1 2 1

0 −7 5

0 3 −2

∣∣∣∣∣∣∣∣ = 14− 15 = −1

giving x =−43

−1= 43 y =

12

−1= −12 z =

18

−1= −18



ENG1091 Eigenvalues & Eigenvectors

Lecture 9&10 EigenvaluesText Reference: §5.7

1. Homogeneous systems of equations

Generally speaking, when the determinant of an n× n system of equations is zero, we can only

deduce that the system has no solutions or infinitely many.

A homogeneous system of equations, introduced earlier (i.e. Ax = 0), always has the trivial

solution x = 0. If the determinant of coeffi cients of a homogeneous system is zero the system

must have infinitely many solutions.

Example 1: Let

A =

a 1 3

2 2 −1

−2 a 1

,find the values of a, such that Ax = 0 has non-trivial solutions.

2. Eigenvalues and eigenvectors

Definitions: Let A be an n×n matrix and x be an n×1 vector. Any scalar λ satisfying Ax = λx

for some non-zero x is called an eigenvalue of A. The corresponding non-zero vectors x for

which Ax = λx are called the eigenvectors of A corresponding to λ.

To quote from the textbook, “such problems arise naturally in many branches of engineering.

For example, in vibrations the eigenvalues and eigenvectors describe the frequency and mode of

vibration respectively, while in mechanics they represent principal stresses and the principal axes

of stress in bodies subject to external forces.”

Example 2: Show that x =

[1

1

]is an eigenvector of A =

[2 1

1 2

]corresponding to the

eigenvalue λ = 3.

This is straightforward matrix arithmetic :[2 1

1 2

][1

1

]=

[3

3

]= 3

[1

1

]

Note also that if we multiply of the sides of this equation by the scalar t we get[2 1

1 2

][t

t

]=

[3t

3t

]= 3

[t

t

]

This demonstrates that any non-zero multiple of an eigenvector corresponding to λ is also an

eigenvector.


3. Finding eigenvalues

It is tempting to rewrite the equation Ax = λx as (A−λ)x = 0, but this cannot possibly be

correct why?

We write instead: (A−λI)x = 0, this is a homogeneous system of equations. Now we know the

trivial solution x = 0 is always available, but we are interested only in the non-zero solutions

(called eigenvectors). This is the requirement that a homogeneous system has infinite number of

solutions which happens precisely when

det (A− λI) = 0

Now det (A− λI) is a polynomial of degree n. This will have n (possibly complex) linear factors,

and so det (A− λI) = (−1)n (λ− λ1) (λ− λ2) · · · (λ− λn) ; the roots of this polynomial are the

eigenvalues λ1, λ2, . . . , λn.

Notes

1. It is possible for the roots of the polynomial to be repeated so we should not assume the

λ1, λ2, . . . , λn are necessarily distinct.

2. In this course we only deal with the case where the matrix A has real entries. When this is

so, the polynomial det (A− λI) has real coeffi cients and if any of its roots are imaginary these

will occur as conjugate pairs.

Example 3: Find the eigenvalues of

A =

[0 −1

1 0

]

Solution: The characteristic polynomial: det (A− λI) =

∣∣∣∣∣ −λ −1

1 −λ

∣∣∣∣∣ = λ2 + 1 = 0 for λ = ±i.

Example 4: Find the eigenvalues of

A =

1 1 −2

−1 2 1

0 1 −1

Solution: A− λI =

1 1 −2

−1 2 1

0 1 −1

− λ

1 0 0

0 1 0

0 0 1

=

1− λ 1 −2

−1 2− λ 1

0 1 −1− λ

The characteristic polynomial: det (A− λI) =

∣∣∣∣∣∣∣∣1− λ 1 −2

−1 2− λ 1

0 1 −1− λ

∣∣∣∣∣∣∣∣= −

∣∣∣∣∣∣∣∣−1 2− λ 1

1− λ 1 −2

0 1 −1− λ

∣∣∣∣∣∣∣∣ R1 ↔R2


= −

∣∣∣∣∣∣∣∣−1 2− λ 1

0 (1− λ) (2− λ) + 1 −1− λ0 1 −1− λ

∣∣∣∣∣∣∣∣ R2 + (1− λ)R1 →R2

= − (−1)

∣∣∣∣∣ (1− λ) (2− λ) + 1 −1− λ1 −1− λ

∣∣∣∣∣ expanding along col1= (−1− λ)

∣∣∣∣∣ (1− λ) (2− λ) + 1 1

1 1

∣∣∣∣∣ factoring col2= (−1− λ) [(1− λ) (2− λ) + 1− 1] = (−1− λ) (1− λ) (2− λ) . Hence eigenvalues: λ = 1, 2,−1

4. Finding the eigenvectors of unique eigenvalues

Having solved the nth order polynomial (characteristic equation) for the n roots (eigenvalues),

it still remains to find the corresponding eigenvectors. For the moment let’s assume that the

eigenvalues are distinct (non-repeated.)

Let ej be an eigenvector that corresponds to the eigenvalue λj so that Aej = λjej

Example 3 (again): Find the eigenvectors of

A =

[0 −1

1 0

]

Solution: Consider the eigenvalue λ = i :[0 −1

1 0

][x1

x2

]= i

[x1

x2

]is equivalent to

[−i −1

1 −i

][x1

x2

]=

[0

0

]The 2× 2 case always leads to two identical equations, in this example x1 = ix2 and −ix1 = x2.

Thus the eigenvectors are t (i, 1) where t 6= 0.

Now λ = −i :[0 −1

1 0

][x1

x2

]= i

[x1

x2

]is equivalent to

[i −1

1 i

][x1

x2

]=

[0

0

]hence x1 = −ix2

giving eigenvectors t (−i, 1) where t 6= 0.

Example 4 (again): Find the eigenvector corresponding to the dominant eigenvalue of

A =

1 1 −2

−1 2 1

0 1 −1

Solution: The dominant eigenvalue is λ = 2.

Solving Ax = 2x :1 1 −2

−1 2 1

0 1 −1

x1

x2

x3

= 2

x1

x2

x3

and this is equivalent to the homogeneous system−x1 + x2 − 2x3 = 0

−x1 + x3 = 0

x2 − 3x3 = 0

The ‘augmented’matrix is (we don’t need to include the column of zeros):


−1 1 −2

−1 0 1

0 1 −3

(−1)R1 →R1

1 −1 2

−1 0 1

0 1 −3

R1+R2 →R2

1 −1 2

0 −1 3

0 1 −3

R3+R2 →R3

1 −1 2

0 −1 3

0 0 0

The non-pivot variable x3 is chosen free, so x3 = t

x2 = 3x3 = 3t from row 2, and x1 = x2 − 2x3 = t from row 1, giving (x1, x2, x3) = t (1, 3, 1)

hence an eigenvector corresponding to λ = 2 is 〈1, 3, 1〉 .

Check

1 1 −2

−1 2 1

0 1 −1

1

3

1

=

2

6

2

= 2

1

3

1

.Note that the eigenvectors are only unique up to multiplication by scalars.

Also, while the 0 vector is always a solution to the system (A−λI)x = 0 the eigenvectors are

the non-zero solutions.

The vector 0 is never an eigenvector.

5. Finding the eigenvectors of repeated eigenvalues

If the eigenvalues of the matrix A are distinct, then it can be shown that the corresponding

eigenvectors are linearly independent. If, however, the eigenvalues are repeated, it may not be

possible to find n linearly independent eigenvectors. (By repeated roots of the characteristic

equation, we mean that two or more of the eigenvalues are the same.)

Example 5: Find the eigenvalues of the matrix

A =

0 0 1

0 1 2

0 0 1

.

Solution: Characteristic polynomial is det (A− λI) =

∣∣∣∣∣∣∣∣−λ 0 1

0 1− λ 2

0 0 1− λ

∣∣∣∣∣∣∣∣ = −λ (1− λ)2

which has roots λ = 0 and λ = 1 (multiplicity = 2)

It is not clear how many independent eigenvectors exist when λ = 1. The eigenvalue has a

multiplicity of 2, but that doesn’t assure us that there will be two independent eigenvectors.

Example 6: From the matrix in the previous example, find the eigenvector(s) corresponding to

each eigenvalue.

Solution: Eigenvectors for λ = 1 :

A− λI = A− I =

−1 0 1

0 0 2

0 0 0

which is in echelon form.ENG1091 Mathematics for Engineering page 46

Solving

−1 0 1

0 0 2

0 0 0

x1

x2

x3

=

0

0

0

we have x3 = 0, x2 = t and −x1 +x3 = 0 so x1 = 0 also.

Thus the eigenvectors for λ = 1 are

x1

x2

x3

=

0

t

0

= t

0

1

0

, that is, the non-zero multiplesof the vector x = (0, 1, 0) .

Eigenvectors for λ = 0 :

A− λI = A =

0 0 1

0 1 2

0 0 1

R1 ↔ R2

0 1 2

0 0 1

0 0 1

R3− R2 → R3

0 1 2

0 0 1

0 0 0

(now in ech-elon form)

Solving

0 1 2

0 0 1

0 0 0

x1

x2

x3

=

0

0

0

we have x3 = 0, x2 + 2x3 = 0 (and hence x2 = 0) while

x1 is free and we set x1 = t.


x1

x2

x3

=

t

0

0

= t

1

0

0

, that is, the non-zero multiplesof the vector x = (1, 0, 0) .

Consider the next example.

Example 7: Find the eigenvalues and corresponding eigenvectors for the matrix

A =

0 0 0

0 1 0

1 0 1

Solution: Characteristic polynomial is det (A− λI) =

∣∣∣∣∣∣∣∣−λ 0 0

0 1− λ 0

1 0 1− λ

∣∣∣∣∣∣∣∣ = −λ (1− λ)2

which has roots λ = 0 and λ = 1 (multiplicity = 2)

The eigenvectors:

λ = 1 :

A− λI = A− I =

−1 0 0

0 0 0

1 0 0

R3+ R1 → R3

−1 0 0

0 0 0

0 0 0

which is in echelon form.

Solving

−1 0 0

0 0 0

0 0 0

x1

x2

x3

=

0

0

0

we have −x1 = 0, x2 and x3 are free so we set x2 = s

and x3 = t.



x1

x2

x3

=

0

s

t

= s

0

1

0

+ t

0

0

1

, that is, the sums ofnon-zero multiples of (0, 1, 0) and (0, 0, 1) .

In this example we do have 2 independent eigenvectors for λ = 1.

λ = 0 :

A− λI = A =

0 0 0

0 1 0

1 0 1

R3 ↔ R1

1 0 1

0 1 0

0 0 0

which is in echelon form.

Solving

1 0 1

0 1 0

0 0 0

x1

x2

x3

=

0

0

0

we have x2 = 0, and x1 + x3 = 0 with x3 as free.

We set x3 = t, giving

x1

x2

x3

=

−t0

t

= t

−1

0

1

that is, the non-zero multiples of thevector x = (−1, 0, 1) .


6. Properties of eigenvalues

Let us assume that we have an n × n matrix A with the eigenvalues λ1, λ2, λ3, . . . λn. (Not

necessarily distinct.)

Property 1: The sum of the eigenvalues of A is equal to the sum of the elements of the diagonal

of A.n∑1

λi = λ1 + λ2 + ...+ λn =n∑1

aii

(The right-hand summation is known as the trace of A.)

Property 2: The product of the eigenvalues of A is equal to the determinant of A.

n∐1

λi = λ1 · λ2 · ... · λn = det(A)

Property 3: If A−1 exists, each of the eigenvalues of A must be non-zero, (use Property 2).

The eigenvalues of A−1 are:1

λ1,

1

λ2, . . . ,

1

λn

Property 4: The eigenvalues of the transpose matrix AT are the same as those of A.

Property 5: If k is a scalar then the eigenvalues of kA are kλ1, kλ2, kλ3, . . . , kλn.

Property 6: If k is a scalar and I is the n × n identity matrix then eigenvalues of A ± kI arerespectively

λ1 ± k, λ2 ± k, λ3 ± k, . . . λn ± k.

Property 7: If k is a positive integer then the eigenvalues of Ak are

λk1, λk2, λ

k3, . . . , λ

kn



ENG1091 Further Calculus

Lecture 11 · Implicit differentiation · Logarithmic differentiationText Reference: §8.3.14&8.4

Functions such as f(x) = x sinx express f(x) explicitly in terms of x. Expressions of the form

x2 + y2 = 4, or 2y + x = 11, define an implicit relationship between x and y. In these cases, we

can still determine the rate of change of one variable with respect to the other by the technique

known as implicit differentiation. As an illustration, consider the equation

x2 + y2 = 4.

The graph of this equation is the circle radius 2 centre (0, 0). This graph obviously is not the

graph of a function since, for instance, there are two points on the graph whose x-coordinate is

1. (Functions must satisfy the vertical line test.)

We denote the slope (gradient) of the curve at the point(1,√

3)by

dy

dx|(1,√

3)

In general, the slope (gradient) at the point (a, b) is denoted by

dy

dx|(a,b)

In a small vicinity of the point (a, b), the curve looks like the graph of a function. [That is, on

this part of the curve, y = g(x) for some function g(x).] We say that this function is defined

implicitly by the equation.

We obtain a formula fordy

dxby differentiating both sides of the equation with respect to x

while treating y as a function of x.

1. Finddy

dxin the relation x2 + y2 = 4. (This simple example is used mainly to illustrate the

general technique of implicit differentiation.)

Now we differentiate both sides of the equation with respect to x.

The first term x2 has the derivative 2x as usual. We need to think of the second term y2 as

having the form [g(x)]2 . To differentiate we apply the chain rule:

d

dx[g(x)]2 = 2 [g(x)]1 g′(x)

or, equivalentlyd

dx

(y2)

= 2ydy

dx.

The right hand side of the original equation, the derivative of the constant function 4 is zero.

Thus implicit differentiation of x2 + y2 = 4 yields

2x+ 2ydy

dx= 0


solving for dydx , we have

2ydy

dx= −2x

dy

dx= −2x

2y= −x

y.

notice that this slope (gradient) formula involves y as well as x. This reflects the fact that the

slope (gradient) of the circle at a point depends on the y-coordinate as well as the x-coordinate.

At the point(1,√

3)the slope (gradient) is

dy

dx|(1,√

3) = −xy|(1,√

3)

= − 1√3

At the point(1,−√

3)the slope (gradient) is

dy

dx|(1,−

√3) = −x

y|(1,−

√3)

=1√3

When y = 0 the slope (gradient) is undefined indicating that the tangent is vertical at these

points.

2. The equation 5x2− 6xy+ 5y2 = 16 defines an ellipse graphed below, find a formula fordy

dxin

this relation.

3 2 1 1 2 3

3

2

1

1

2

3

x

y

Differentiate both sides with respect to x, taking care with the product 6xy and the chain rule

on 5y2 :10x−

(6 · y + 6x · dydx

)+ d

dx

(5y2)

= 0 using the product rule

10x− 6y − 6x dydx + ddx

(5y2) dydx = 0 using the chain rule

10x− 6y − 6x dydx + 10y dydx = 0

Grouping: 10x− 6y + dydx (10y − 6x) = 0

Now solve for dydx :

ANS:dy

dx=

6y − 10x

10y − 6x=

3y − 5x

5y − 3x. (Observe also the faint lines 3y − 5x = 0 where the curve is

horizontal and 5y − 3x = 0 where the curve is vertical.)


3. Find a formula fordy

dxin the relation x3 + y3 = 6xy

5 4 3 2 1 1 2 3 4 5

5

4

3

2

1

1

2

3

4

5

x

y

Differentiate both sides with respect to x, taking care with the chain rules on x3 and y3 :

x3 + y3 = 6xy

3x2 + ddx

(y3)

= 6 · y + 6x · dydx3x2 + d

dx

(y3) dydx = 6y + 6x dydx

3x2 + 3y2 dydx = 6y + 6x dydx

dydx

(3y2 − 6x

)= 6y − 3x2

ANS:dy

dx=

6y − 3x2

3y2 − 6x

=2y − x2

y2 − 2x

4. Problem type: Find the point(s) on the curve (x− 1)2 + (y + 2)2 = 4

where the gradient of the tangent is 1.

Differentiate both sides with respect to x :

2(x− 1) + 2(y + 2) dydx = 0

so dydx = −

(x−1y+2

)This is equal 1 where − (x− 1) = y + 2,

i.e. along the line y = −x− 1 (Sketch)

Intersection with the curve (x− 1)2 + (y + 2)2 = 4

(x− 1)2 + (−x+ 1)2 = 4

(x− 1)2 + (−1)2 (x− 1)2 = 4

2(x− 1)2 = 4 or (x− 1)2 = 2

so (x− 1) = ±√

2

giving x = 1 +√

2 or x = 1−√

2

The points of intersection are:(1 +√

2,−2−√

2)and

(1−√

2,−2 +√

2)

2 1 1 2 3 4

5

4

3

2

1

1

x

y


Logarithmic Differentiation

The calculation of derivatives of complicated functions involving products, quotients, or powers

can often be simplified by taking logarithms. The method is called logarithmic differentiation.

Review of logarithm laws: If x, y, a > 0, and n ∈ R, then

1. loga(xy) = loga(x) + loga(y)

2. loga(xy ) = loga(x)− loga(y)

3. loga(xn) = n loga(x)

4. loga(1) = 0

5. loga(a) = 1

Steps in Logarithmic Differentiation

1. Take logarithms of both sides of an equation y = f(x).

2. Differentiate implicitly with respect to x.

3. Solve the resulting equation for y′.

Example 1 Differentiate y =x4√x2 + 1

(2x+ 3)6

Take loge of both sides:

ln y = ln(x4√x2 + 1

)− ln

((2x+ 3)6

)= ln

(x4)

+ ln√x2 + 1− 6 ln(2x+ 3)

= 4 ln (x) + 12 ln

(x2 + 1

)− 6 ln(2x+ 3)

Now differentiate both sides implicitly with respect to x :

1

y

dy

dx=

4

x+

1

2· 2x

x2 + 1− 6 · 1

2x+ 3· 2

=4

x+

x

x2 + 1− 12

2x+ 3

Now solve fordy

dx:

dy

dx= y

(4

x+

x

x2 + 1− 12

2x+ 3

)=x4√x2 + 1

(2x+ 3)6

(4

x+

x

x2 + 1− 12

2x+ 3

)


Example 2 Find the derivative of y = 3x


ln y = ln (3x)

= x ln 3


1

y

dy

dx= ln 3

Now solve fordy

dx:

dy

dx= y ln 3

= 3x ln 3

Hence the derivative of an exponential function is a constant multiple of the exponential function.

Example 3 Find the derivative of y =(x− 1)3(2x+ 5)5

(3− 4x)2(1 + x2)3


ln y = ln[(x− 1)3(2x+ 5)5

]− ln

[(3− 4x)2(1 + x2)3

]= 3 ln (x− 1) + 5 ln (2x+ 5)− 2 ln(3− 4x)− 3 ln(1 + x2)


1

y

dy

dx=

3

x− 1+ 5 · 2

2x+ 5− 2 · −4

3− 4x− 3 · 2x

1 + x2

=3

x− 1+

10

2x+ 5+

8

3− 4x− 6x

1 + x2

Now solve fordy

dx:

dy

dx= y

(3

x− 1+

10

2x+ 5+

8

3− 4x− 6x

1 + x2

)=

(x− 1)3(2x+ 5)5

(3− 4x)2(1 + x2)3

(3

x− 1+

10

2x+ 5+

8

3− 4x− 6x

1 + x2

)Example 4 (Problem type): Determine the location and nature of any stationary points of

y = xx and hence sketch the graph of y = xx.

Find both dydx and

d2ydx2

implicitly:

Take loge x of both sides of the equation y = xx:

loge y = x loge x,

now differentiate implicitly, with a product rule on the right:

(1/y)dy

dx= loge x+ 1

sody

dx= y (loge x+ 1) = xx (loge x+ 1) .

Notice thatdy

dx= 0 at the point where loge x + 1 = 0 and only there, giving the coordinates of


the stationary point as(e−1,

(e−1)e−1)

=(e−1, e−e

−1).

For the second derivative we differentiate the equation dydx = y (loge x+ 1) implicitly:

d2y

dx2=dy

dx(loge x+ 1) + y · d

dx(loge x+ 1) (product rule)

=dy

dx(loge x+ 1) + y

(1

x

)= xx (loge x+ 1)2 + xx

(1

x

)= xx (loge x+ 1)2 + xx−1.

> 0 when x = e−1 (in fact d2ydx2

> 0 for all x > 0). Therefore the point(e−1, e−e

−1)

is a local minimum point.

Sketch:

x

y




Lecture 12 ·: Hyperbolic functions · IdentitiesText Reference: §2.7, 8.3.12

1. Definitions: Trig functions are often called ‘circular’functions because (cos t, sin t) lies on

the curve x2 + y2 = 1, (i.e. the unit circle).

Hyperbolic functions have a very similar relationship with the hyperbola x2 − y2 = 1, with the

point (cosh t, sinh t) lying on the right branch of this curve.

x x

y y

0 Q

x

x

2

2

2

2

y

y

+

0

=1

P cos t, sin t( ) P cosh t, sinh t( )

=1

The Hyperbolic functions arise from certain combinations of exponential functions, and occur

frequently in applications of mathematics. For example, the shape of a hanging wire (a catenary

curve) is described by a ‘cosh’expression.

Definitions: coshx =ex + e−x

2pron. ‘cosh’

sinhx =ex − e−x

2pron. ‘shine’

tanhx =sinhx

coshx=ex − e−xex + e−x

pron. ‘tanch’

The reciprocal functions can also be defined:

sechx =1

coshx=

2

ex + e−xpron. ‘sech’as in fetch

cschx =1

sinhx=

2

ex − e−x pron. ‘cosech’as in go-fetch

cothx =1

tanhx=

coshx

sinhx=ex + e−x

ex − e−x pron. ‘coth’as in goth


2. Graphs:

5 4 3 2 1 1 2 3 4 5

1

1

2

3

4

5

x

y

y = coshx, y = 12ex, y = 1

2e−x

5 4 3 2 1 1 2 3 4 5

5

4

3

2

1

1

2

3

4

5

x

y

y = sinhx, y = 12ex, y = −1

2e−x


3 2 1 1 2 3

2

1

1

2

x

y

y = tanhx, y = 1, y = −1

5 4 3 2 1 1 2 3 4 5

5

4

3

2

1

1

2

3

4

5

x

y

y = cschx, y = sinhx


3. Hyperbolic identities

Hyperbolic identities hold in similar ways to the trig identities; some of these include

a) cosh2 x− sinh2 x = 1

b) 1− tanh2 x = sech2 x

c) sinh(x+ y) = sinhx cosh y + coshx sinh y

Osborn’s rule: In general, to obtain the formula for hyperbolic functions from the analogous

identity for circular functions, replace each circular function by the corresponding hyperbolic

function and change the sign of every product (or implied product) of two sines.

Examples

cos2 x+ sin2 x = 1 becomes cosh2 x− sinh2 x = 1

cos 2x =

cos2 x− sin2 x

2 cos2 x− 1

1− 2 sin2 x

cosh 2x =

cosh2 x+ sinh2 x

2 cosh2 x− 1

1 + 2 sinh2 x

sin 2x = 2 sinx cosx sinh 2x = 2 sinhx coshx

1 + tan2 x = sec2 x 1− tanh2 x = sech2 xbecause tan2 x = sin2 x

cos2 xis an

implied product of two sines

4. Relationship between circular and hyperbolic functions

Eulers’formula provides the link:

Since (i) eiθ = cos θ + i sin θ and (ii) e−iθ = cos (−θ) + i sin (−θ) = cos θ − i sin θ

adding: 2 cos θ = eiθ + e−iθ from which cos θ = cosh (iθ)

and subtracting (ii) from (i) gives sin θ =eiθ − e−iθ

2i=

sinh (iθ)

i

Then cosh iθ = cos θ

sinh iθ = i sinθ

cos iθ = cosh θ

sin iθ = i sinh θ

These relationships provide the justification for Osborn’s rule.

5. Derivatives of Hyperbolic Functions

These are easily found using differentiation of exponentials ( again note the similarities with trig

function derivatives).

d

dx(sinhx) = coshx (apply definition and use derivatives of ex and e−x)

d

dx(coshx) = sinhx (apply definition and use derivatives of ex and e−x)


d

dx(tanhx) =

cosh2 x− sinh2 x

cosh2 x= sech2 x (apply definition and quotient rule, use identity

cosh2 x− sinh2 x = 1)

d

dx(cschx) = − coshx

cosh2 x− 1= − cschx cothx (apply definition and quotient rule, use identity


d

dx(sechx) = − sinhx

cosh2 x= − sechx tanhx (apply definition and quotient rule, use identity


d

dx(cothx) = −sinh2 x− cosh2 x

sinh2 x= − csch2 x (apply definition and quotient rule, use identity


Examples

1. Find the derivative of f(x) =√

coshx.

f ′(x) = 12 (coshx)−1/2 d

dx (coshx) (applying the chain rule)

= 12 (coshx)−1/2 (sinhx) (applying the rule obtained 4 (ii))

=

(sinhx

2√

coshx

)2. Find the derivative of f(x) = cosh

√x.

f ′(x) = sinh (√x) d

dx (√x) (applying the chain rule)

= (sinh (√x)) 1

2 (x)−1/2

=

(sinh (

√x)

2√x

)3. Find f ′(x) if f(x) = sinhx+ coshx.

f ′(x) = coshx+ sinhx = f(x)

Notice also that f(0) = 1

There is only one function f which is equal its derivative and which satisfies f (0) = 1. Namely

f (x) = ex. It must be therefore that coshx+ sinhx = ex.




Lecture 13 · Inverse Hyperbolic functions · Log form · DerivativesText Reference: §2.7.5

Since the hyperbolic sine, sinh, and the hyperbolic tangent, tanh, are one-to-one, their inverses

are fully defined without needing to consider domain restrictions. For the hyperbolic cosine,

cosh, which is not one-to-one, we use a restricted domain of [0, ∞) to define its inverse. Thisfunction will be called the principal branch of coshx.

Since the hyperbolic functions are defined in terms of the exp function, their inverses can be

defined in terms of natural logarithms. This definition is often called the logarithmic form of the

inverse.

Examples

1. Find the logarithmic form, domain and range of the inverse of the function f(x) = coshx, x ≥ 0

Sketch the graph of y = f(x) and its inverse on the same axes.

The domain of f is given and its range is easily determined. From this we can write down the

domain and range of the inverse.

f f−1

domain: [0,∞) [1,∞)

range: [1,∞) [0,∞)

deriving the log form of cosh−1 x: y = cosh−1 x so cosh y = x

x = 12 (ey + e−y)

2x = ey + e−y

2xey = e2y + 1

0 = e2y − 2xey + 1

ey =2x±

√4x2 − 4

2(quadratic formula)

ey =2x+ 2

√x2 − 1

2

(the negative root is ignored

since the range of cosh−1 is

[0,∞) and ∴ ey ≥ 1)

ey = x+√x2 − 1

y = cosh−1 x = ln(x+√x2 − 1

)


(restricted) coshx : domain: [0,∞) cosh−1 x : domain: [1,∞)

range: [1,∞) range: [0,∞)

log form: ln(x+√x2 − 1)

1 1 2 3 4 5

1

1

2

3

4

5

x

y

coshx and cosh−1 x

The derivatives of the inverse functions can be found by differentiating the logarithmic form, or

by implicit differentiation.

For f(x) = cosh−1 x, the derivative is found as follows:

For f(x) = sinh−1 x, the derivative is found as follows:


Function Properties Log Form Derivative

sinh−1 x

Domain: Range:

R R

4 2 2 4

4

2

2

4

x

y

ln(x+√x2 + 1)

1√1 + x2

cosh−1 x

Domain: Range:

[1,∞) [0,∞)

1 1 2 3 4 51

1

2

3

4

5

x

y

ln(x+√x2 − 1)

1√x2 − 1

tanh−1 x

Domain: Range:

(−1, 1) R

2 1 1 2

2

1

1

2

x

y

1

2ln(

1 + x

1− x)1

1− x2

csch−1 x

Domain: Range:

R\ 0 R\ 0

4 2 2 4

4

2

2

4

x

y

ln

(1

x+

√1

x2+ 1

)−1

|x|√x2 + 1


sech−1 x

Domain: Range:

(0, 1) [0,∞)

1 1 2 3 4 51

1

2

3

4

5

x

y

ln

(1

x+

√1

x2− 1

)−1

x√

1− x2

coth−1 x

Domain: Range:

R\ [−1, 1] R\ 0

4 2 2 4

4

2

2

4

x

y

1

2ln

(1 + x

x− 1

)1

1− x2

Examples

1. Show that f(x) = tanh−1 x is always increasing.

d

dx

(tanh−1 x

)=

1

1− x2

and 1− x2 is positive on the domain of tanh−1, namely (−1, 1) . Thus

d

dx

(tanh−1 x

)=

1

1− x2> 0

and hence tanh−1 x is always increasing.

2. Find the derivative of tanh−1 (sinx).

d

dxtanh−1 (sinx) =

1

1− sin2 xcosx

=1

cos2 xcosx using 1− sin2 x = cos2 x

= secx

3. Evaluate∫ 1

0

dx√1 + x2

.

∫ 1

0

dx√1 + x2

=[sinh−1 x

]10

= sinh−1 (1)− sinh−1 0

= sinh−1 (1) = loge

(1 +√

2)

using the log form of sinh−1


4. Find∫ 3

0

dx√9 + 4x2

.

∫ 3

0

dx√9 + 4x2

=

∫ 3

0

dx

3√

1 + 4x2

9

=

∫ 3

0

dx

3

√1 +

(2x3

)2=

1

3

[3

2sinh−1

(2x

3

)]3

0

=1

2sinh−1 (2)− 0

=1

2loge

(2 +√

5)

using the log form of sinh−1

5. Find f ′(x) when f(x) = sinh−1(x2).

f ′(x) =1√

1 + (x2)2· 2x

=2x√

1 + x4

6. Find the derivative of sinh−1 (tanx). Comment in light of Q.2

d

dxsinh−1 (tanx) =

1√1 + tan2 x

sec2 x

=1

secxsec2 x using 1 + tan2 x = sec2 x

= secx

=d

dxtanh−1 (sinx) from example 2

Notice also sinh−1 (tan (0)) = 0 and tanh−1 (sin 0) = 0.

Provided sinh−1 (tan (x)) and tanh−1 (sinx) are defined on the same interval, and one which

includes x = 0 (for example(−π

2 ,π2

)), we must conclude that they are the same function

on this interval.




Lecture 14 · Integration by parts · use of complex exponentialText Reference: §8.8.3

Differentiation techniques are usually fairly routine, following set rules and patterns. This is

not the case for antidifferentiation, where it can be far more challenging to find the appropriate

technique; some careful thinking must often be done to find the antiderivative. Sometimes an

antiderivative can’t be found in terms of elementary functions! Remember that all antiderivatives

can be checked by differentiation, be prepared to have a go (even guess the answer?) and check

it back through by differentiation.

e.g. Guess the answer for∫x cosxdx

The formula for Integration by Parts arises from the Product Rule for Differentiation:

d

dx(u(x)v(x)) =

du

dxv(x) + u(x)

dv

dxso∫ [

du

dxv(x) + u(x)

dv

dx

]dx = u(x)v(x)∫

du

dxvdx+

∫udv

dxdx = uv∫

du

dxvdx = uv −

∫udv

dxdx

This effectively means that we are replacing the problem of finding∫u′(x)v(x)dx with the (eas-

ier?) problem of finding∫u(x)v′(x)dx. To use this rule effectively, we have to be careful in

choosing u and v. There are no general rules for choosing u and v, but the purpose is to obtain

a simpler integral.

Integration by parts is often used when integrating a product (but not always) and is usually

the second technique we would think to employ. (The method of substitution, covered in the

ENG1090 and specialist maths syllabus, being the first.)

Examples:

•∫x cosxdx

∫x cosxdx =

∫xd

dx(sinx) dx

= x sinx−∫

1 · sinxdx

= x sinx+ cosx+ c

•∫xexdx

∫xexdx =

∫xd

dx(ex) dx

= xex −∫

1 · exdx

= xex − ex + c


•∫x lnxdx ∫

x lnxdx =

∫lnx

d

dx

(1

2x2

)dx

=1

2x2 lnx−

∫1

x·(

1

2x2

)dx

=1

2x2 lnx− 1

2

∫xdx

=1

2x2 lnx− 1

4x2 + c

•∫

lnxdx∫

lnxdx =

∫lnx

d

dx(x) dx

= x lnx−∫

1

x· (x) dx

=1

2x lnx−

∫dx

= x lnx− x+ c

•∫x2exdx ∫

x2exdx =

∫x2 d

dx(ex) dx

= x2ex −∫

2x · exdx

= x2ex − 2

∫xexdx

now integrate by parts again or use the

result from example 2 above

= x2ex − 2 (xex + ex) + c

•∫

tan−1 xdx∫

tan−1 xdx =

∫tan−1 x

d

dx(x) dx

= x tan−1 x−∫xd

dx

(tan−1 x

)dx

= x tan−1 x−∫

x

1 + x2dx

= x tan−1 x− 1

2

∫2x

1 + x2dx

= x tan−1 x− 1

2loge

(1 + x2

)+ c


•∫ex cosxdx

∫ex cosxdx =

∫ex

d

dx(sinx) dx

= ex sinx−∫

d

dx(ex) sinxdx

= ex sinx−∫ex sinxdx

= ex sinx+

∫ex

d

dx(cosx) dx

= ex sinx+ ex cosx−∫

(cosx)d

dx(ex) dx

= ex sinx+ ex cosx−∫

(cosx) exdx

2

∫ex cosxdx = ex sinx+ ex cosx

Hence:

∫ex cosxdx =

1

2ex (cosx+ sinx) + c.

Complex numbers are extremely useful in obtaining integrals of the type∫eax cos bxdx or∫

eax sin bxdx, and are usually much quicker than integration by parts.

To better understand these examples we are going to use two important facts about complex

numbers:

1

a+ ib=

1

a+ ib× a− iba− ib and eibx = cos bx+ i sin bx

=a− iba2 + b2

so that e(a+ib)x = e(a+ib)x

= eax · eibx

= eax (cos bx+ i sin bx)

Examples

•∫ex cosxdx

∫ex cosxdx = Re

∫ex (cosx+ i sinx) dx

= Re

∫ex · eixdx

= Re

∫ex+ixdx

= Re

∫ex(1+i)dx

Now

∫ex(1+i)dx =

1

1 + iex+ix

=1

1 + iex (cosx+ i sinx)

=1− i

2ex (cosx+ i sinx) .

Taking the real part:

∫ex cosxdx =

1

2ex (cosx+ sinx)

Hence:

∫ex cosxdx =

1

2ex (cosx+ sinx) + c.


•∫e−x sin 2xdx

∫e−x sin 2xdx = Im

∫e−x (cos (2x) + i sin (2x)) dx

= Im

∫e−x · ei2xdx

= Im

∫e−x+2ixdx

= Im

∫ex(−1+2i)dx

Now

∫ex(−1+2i)dx =

1

−1 + 2iex(−1+2i)

=1

−1 + 2ie−x (cos 2x+ i sin 2x)

=−1− 2i

5e−x (cos 2x+ i sin 2x)

Taking the imaginary part

∫e−x sin 2xdx = −2

5e−x cos 2x− 1

5e−x sin 2x+ c

•∫e3x cosxdx

∫e3x cosxdx = Re

∫e3x (cosx+ i sinx) dx

= Re

∫e3x · eixdx

= Re

∫ex(3+i)dx

Now

∫ex(3+i)dx =

1

3 + ie3x (cosx+ i sinx)

=3− i10

e3x (cosx+ i sinx)

Taking the real part:

∫e3x cosxdx =

1

10e3x (3 cosx+ sinx) + c



ENG1091 Limits of Functions

Lecture 15 · Limit properties · ‘Squeeze’Principle · l’Hopitals Rule

Definition: We write

limx→a

f (x) = L

and say “the limit of f (x) , as x approaches a equals L” if we can make the values of f (x)

arbitrarily close to L (as close to L as we like) by taking x to be suffi ciently close to a (on either

side of a) but not equal to a.

Notice the phrase “but x 6= a”in the definition. This means in finding the the limit of f (x) as x

approaches a, we are not interested in the value of the function at x = a. In fact f (x) may not

even be defined when x = a. The only thing that maters is how f (x) is defined near a.

Illustrative example: Use a calculator and guess the value of

limx→0

sinx

x.

The limit laws.

The limit laws are listed below. Essentially they allow ‘common sense’manipulation of limit

expressions, following normal algebraic operations, e.g. the limit of a sum is the same as the sum

of its limits. It is important to note that these laws can only be applied when the combining

functions have an existing limit.

Suppose that c is a constant and the limits limx→a f(x) and limx→a g(x) exist. Then

1. limx→a

[f(x) + g(x)] = limx→a

f(x) + limx→a

g(x)

2. limx→a

[f(x)− g(x)] = limx→a

f(x)− limx→a

g(x)

3. limx→a

[cf(x)] = c limx→a

f(x)

4. limx→a

[f(x)× g(x)] = limx→a

f(x)× limx→a

g(x)

5. limx→a

f(x)

g(x)=

limx→a f(x)

limx→a g(x)if lim

x→ag(x) 6= 0.

6. To evaluate limits we will make frequent use of the continuous function rule: :

Suppose limx→a

g (x) = b and f is continuous at b.

Then limx→a

f(g (x)) = f(

limx→a

g (x))

= f (b)

To make effective use of rule 6 we will take it as known that the elementary functions (poly-

nomial, exponential, logarithmic, trigonometric and hyperbolic functions) are continuous

on their respective domains.

Examples: (examples 1-4 are evaluated using the limit laws above)


1. Evaluate limx→1

x2(6x+ 3)(2x− 7)

(x3 + 4)(x+ 17).

limx→1

x2(6x+ 3)(2x− 7)

(x3 + 4)(x+ 17)=

limx→1 x2(6x+ 3)(2x− 7)

limx→1(x3 + 4)(x+ 17)by rule 5 since lim

x→1(x3 + 4)(x+ 17) 6= 0

=limx→1 1(9)(−5)

limx→1(5)(18)

= −1

2.

In example 1 we could have found the limit by merely substituting in the value x = 1. If

we could always evaluate limits by doing this the concept of a limit would be superfluous.

However the notion of a limit of a function f (x) as x → a is most useful when f (x) is

undefined at x = a.

2. Find limx→1

1x − x1x − 1

limx→1

1x − x1x − 1

= limx→1

1− x2

1− x

= limx→1

(1− x) (1 + x)

1− x= lim

x→1(1− x)

= 0

3. Find limx→0

1x+4 −

14

x

limx→0

1x+4 −

14

x

= limx→0

4− (x+ 4)

x (x+ 4) 4

= limx→0

−xx (x+ 4) 4

= limx→0

−xx (x+ 4) 4

= limx→0

−1

(x+ 4) 4

= −1÷ limx→0

(x+ 4) 4

= − 1

16


4. Find limt→0

√2− t−

√2

t.

limt→0

√2− t−

√2

t= lim

t→0

√2− t−

√2

t×√

2− t+√

2√2− t+

√2

= limt→0

2− t− 2

t(√

2− t+√

2)

= limt→0

−tt(√

2− t+√

2)

= limt→0

−1(√2− t+

√2)

= −1÷ limt→0

(√2− t+

√2)

= − 1

2√

2

The last four examples demonstrate the use of algebra in evaluating limits. However in

evaluating most limits the use of algebra alone will not be suffi cient. The next technique

we introduce is much more powerful than algebraic methods.


Indeterminate forms and L’Hopital’s rule

Applying the limit techniques (particularly direct substitution) discussed earlier can often lead

to ‘meaningless’expressions of the type 00 or

∞∞ . These are called indeterminate forms, since

they have not correctly determined the true limit value.

However, if we ‘zoom in’around x = a for 2 functions f and g, such that f(a) = g(a) = 0 we

can see that the value of f(x)g(x) ≈

f ′(x)g′(x) .

f

g

x

y

0 a

This forms the basis of a rule known as L’Hopital’s Rule: Suppose f and g are differentiable,

with f(a) = g(a) = 0. If f ′ and g′ are continuous (but g′(x) 6= 0), then

limx→a

f(x)

g(x)= lim

x→af ′(x)

g′(x).

This rule can be applied for two-sided and one-sided limits, approaching a fixed value a or ±∞,which give the indeterminate form

0

0or∞∞ . To reduce expressions to a meaningful term, it may

be necessary to apply L’Hopital’s Rule two or more times.

Examples

limx→0

sin 2x

x limx→0

sin 2x

xis of the form ‘

0

0’so that L’Hopital’s rule may be applied:

= limx→0

2 cos (2x)

1

= limx→0

2

1= 2

limx→∞

lnx

x limx→∞

lnx

xis of the form ‘

∞∞’so that L’Hopital’s rule may be applied:

= limx→∞

(1x

)1

= limx→∞

1

x= 0


limx→0

x− sinx

x3

limx→0

x− sinx

x3is of the form ‘

0

0’so that L’Hopital’s rule may be applied:

= limx→0

1− cosx

3x2is of the form ‘

0

0’so that L’Hopital’s rule may be applied again

= limx→0

sinx

6xis of the form ‘

0

0’

= limx→0

cosx

6

=1

6

There are other types of indeterminate forms, involving combinations of 0 and ∞, dealt with asfollows:

Indeterminate Product 0. ∞

If limx→a

f(x)g(x) = 0 · ∞ re-arrange f(x)g(x) to f(x)1/g(x) , then apply L’H Rule.

limx→0+

x lnx limx→0+

x lnx is of the form ‘0 · ∞’so that some rearrangement is necessary

= limx→0

lnx

(1/x)is now of the form ‘

∞∞’so that L’Hopital’s rule may be applied

= limx→0

x−1

−x−2

= limx→0

x2

x

= limx→0

x

1= 0

Indeterminate Difference ∞−∞

If limx→a[f(x) − g(x)] = ∞ − ∞, convert the expression to a single fraction, using commondenominators, factorisation, or rationalisation, to produce a 0

0 or∞∞ form. Then apply L’H Rule.

Examples

limx→0

[1

x− 1

sinx]

limx→0

[ 1x −

1sinx ] is of the form ‘∞−∞’so that some rearrangement is necessary

= limx→0

sinx−xx sinx is now of the form ‘

0

0’so that L’Hopital’s rule may be applied

= limx→0

cosx−1sinx+x cosx a ‘

0

0’form’

= limx→0

sinxcosx−x sinx+cosx applying L’Hopital’s rule

= limx→0 sinxlimx→0(cosx−x sinx+cosx) applying rule (5)

= limx→0

02

= 0


Indeterminate Powers 00, 1∞, ∞0.

For these indeterminate forms, begin with y = f(x)g(x).

Take the natural logarithm of both sides, to give ln y = g(x) ln f(x).

The indeterminate product 0.∞ which results is then re-arranged as above.

Lastly, if limx→a ln y has been found to be L, limx→a y = eL.

Examples

limx→0+

xx = a ‘00’form

suppose for the moment that limx→0+

xx exists, so let limx→0+

xx = L

then ln

(limx→0+

xx)

= lnL

since ln is continuous ln

(limx→0+

xx)

=

(limx→0+

ln (xx)

)so lim

x→0+ln (xx) = lnL

now limx→0+

x ln (x) = 0 from the example above hence

lnL = 0 giving L = e0 = 1

Hence limx→0+

xx = 1

limx→0

(1 +

2

x

)x= a ‘∞0’form

suppose for the moment that limx→0

(1 +2

x)x exists, so let lim

x→0(1 +

2

x)x = L

then ln

(limx→0

(1 +2

x)x)

= lnL

since ln is continuous ln

(limx→0

(1 +2

x)x)

=

(limx→0

ln(1 +2

x)x)

so(

limx→0

ln(1 +2

x)x)

= lnL

now limx→0

ln

(1 +

2

x

)x= lim

x→0x ln(1 +

2

x) = ....a ‘0 · ∞’form

= limx→0

ln(1 + 2x)

1/x= ....a

∞∞ form

= limx→0

1/(1 + 2x) ·

(−2/x2

)−1/x2

applying L’Hopital’s rule

= limx→0−2

1

(1 + 2x)

= −2

lnL = −2 giving L = e−2.

Hence limx→0

(1 +

2

x

)x= e−2


The squeeze theorem

If a function g(x) is ‘trapped’between 2 other functions f and h such that f(x) ≤ g(x) ≤ h(x),

and limx→a

f(x) = limx→a

h(x) = L, then limx→0

g(x) = L.

L

a

fg

h

x

We can sometimes use this to evaluate limits of expressions where Limit Laws cannot successfully

be applied:

Example:

Show that

limx→0

x sin1

x= 0.

This limit cannot be found by finding limx→0

x and limx→0

sin1

xand multiplying them together since

limx→0

sin1

xdoes not exist.

graph of y = sin(

1x

)We know that −1 ≤ sin

1

x≤ 1, so we can introduce a ‘squeeze’situation by using

− |x| ≤ x sin1

x≤ |x|


graph of y = x sin(

1x

)Now lim

x→0|x| = 0 and limx→0− |x| = 0, so we have lim

x→0x sin

1

x= 0.

Example: A very common limit encountered by engineering students is

limx→∞

e−x sinx

Solution: We know that −1 ≤ sinx ≤ 1, so we can form the ‘squeeze’using

−e−x ≤ e−x sinx ≤ e−x

Now limx→∞

e−x = 0 and similarly limx→∞

−e−x = 0 hence limx→∞

e−x sinx = 0.




Lecture 16 · Improper integralsText Reference: §9.2

Example: What is wrong with the following calculation:∫ 1

−1

1

x2dx =

[−x−1

]1−1

= −1− 1 = −2?

ANS: The function f (x) = 1x2is always positive. The definite integral of a positive function can

never be negative. (Definite integrals give the ‘signed’area between a curve and the x−axis.For a curve which is always positive this signed area must also be positive.)

We have applied the Fundamental Theorem of Calculus in circumstances where we were

not entitled to do so.

The Fundamental Theorem of Calculus which enables us to evaluate a definite integral by

taking an antiderivative of the integrand requires that the integrand be continuous over a finite

domain of integration [a, b].

The function 1x2is not continuous on the domain [−1, 1]. (In fact of course it is not even defined

on [−1, 1] .)

If we break the integral up we obtain

∫ 1

−1

1

x2dx =

∫ 1

0

1

x2dx+

∫ 0

−1

1

x2dx

However this introduces a new problem. The integrands in both these integrals are not Riemann

integrable because they are not bounded. (The function 1x2is unbounded near x = 0.)

We can extend the theory of Riemann integration by introducing ‘improper integrals’.

There are two types of improper integrals:

• an expression like∫ ∞

1exdx is improper because the domain of integration, in this

case [1,∞) , is not bounded;

• expressions like∫ 1

0

1

x2dx where the range of the integrand is unbounded on the

interval of integration. (In this case the function1

x2is unbounded on [0, 1] .)

When the domain of integration is not finite we have a Type 1 improper integral.

When the integrand is unbounded at a particular point, but continuous elsewhere, we have

a Type 2 improper integral.

Type 1: Infinite intervals

For these integrals, we are attempting to find the area of an ‘infinite space’. To do this, we

evaluate the definite integral over a finite interval, and investigate the limit of the integral as the

interval is extended.


Example∫ ∞1

1

x2dx [geometrically this is the area under the curve y =

1

x2to the right of x = 1].

∫ ∞1

1

x2dx = lim

t→∞

∫ t

1

1

x2dx (diagram)

= limt→∞

[−x−1

]t1

= limt→∞

(−1

t

)+ 1

= 1.

We say that∫ ∞

1

1

x2dx is convergent.

We use the following definitions to evaluate these integrals:

To define∫ ∞a

f(x)dx we require two things, (i) that∫ t

af(x)dx exists for every number t ≥ a

(ii) that the limt→∞

∫ t

af(x)dx exists and is finite.

We then say that∫ t

af(x)dx converges.

Provided these two conditions are satisfied we define∫ ∞a

f(x)dx = limt→∞

∫ t

af(x)dx.

A similar statement can be made regarding the definition of∫ a

−∞f(x)dx.

The integral∫ ∞−∞

f(x)dx is also considered a type I integral, we define

∫ ∞−∞

f(x)dx =

∫ 0

−∞f(x)dx+

∫ ∞0

f(x)dx

provided the two improper integrals on the right are convergent independently.

Note: In each of these cases, if the integral exists, we say that the improper integral is convergent

and that the limit becomes the value of the improper integral. If the limit fails to exist, the

improper integral is divergent.

ExampleDetermine if

∫ ∞0

e−2xdx is convergent or divergent.

∫ ∞0

e−2xdx = limt→∞

∫ t

0e−2xdx

= limt→∞

[−1

2e−2x

]t0

= limt→∞

(−1

2e−2t

)+

1

2

=1

2. (The integral is convergent.)


ExampleDetermine if

∫ ∞1

1

xdx is convergent or divergent.

∫ ∞1

1

xdx = lim

t→∞

∫ t

1

1

xdx

= limt→∞

[loge x]t1

= limt→∞

(loge t)− 0

=∞ since loge x is an unbounded function as x→∞.

This means the integral∫ ∞

1

1

xdx diverges.

ExampleFor what values of p is

∫ ∞1

1

xpdx convergent?

The case where p = 1 was considered in the previous example, so we know that∫ ∞

1

1

xpdx

diverges when p = 1.

Now consider what happens if p 6= 1 :∫ ∞1

1

xpdx = lim

t→∞

∫ t

1

1

xpdx

= limt→∞

[1

1− px−p+1

]t1

provided p 6= 1

= limt→∞

(1

1− pt−p+1

)− 1

1− p

Now1

1− pt−p+1 →∞ as t→∞ if p < 1, while

1

1− pt−p+1 → 0 as t→∞ if p > 1.

We conclude∫ ∞

1

1

xpdx converges if p > 1 and diverges if p ≤ 1.

Example

Evaluate∫ ∞−∞

1

1 + x2dx or else explain why the integral diverges.

If∫ ∞−∞

1

1 + x2dx is to converge we require the (independent) convergence of both

∫ ∞0

1

1 + x2dx

and∫ 0

−∞

1

1 + x2dx.

Now∫ ∞

0

1

1 + x2dx = lim

t→∞

∫ t

0

1

1 + x2dx While

∫ 0

−∞

1

1 + x2dx = lim

t→−∞

∫ 0

t

1

1 + x2dx

= limt→∞

[tan−1 x

]t0

= limt→−∞

[tan−1 x

]0t

= limt→∞

(tan−1 t

)− tan−1 (0) = 0− lim

t→−∞

(tan−1 t

)=π

2− 0 = 0−− π

2

=π

2. =

π

2.


So both∫ ∞

0

1

1 + x2dx and

∫ 0

−∞

1

1 + x2dx converge,

and we have∫ ∞−∞

1

1 + x2dx =

∫ ∞0

1

1 + x2dx+

∫ 0

−∞

1

1 + x2dx = π.

Type 2 - integrand unbounded at a single point

Suppose f is a function continuous on [a, b) but is not bounded at x = b, that is, limx→b− f (x) =

∞ or −∞.

Provided limt→b−

∫ t

af(x)dx exists, we define

∫ b

af(x)dx = lim

t→b−

∫ t

af(x)dx

The analogous definition can be made when f is not bounded at a :

Suppose f is a function continuous on (a, b] but is not bounded at x = a, that is, limx→a+ f (x) =

∞ or −∞.

Provided limt→a+

∫ b

tf(x)dx exists, we define

∫ b

af(x)dx = lim

t→a+

∫ b

tf(x)dx

Now we see why we have the apparent contradiction in the example:∫ 1

−1

1

x2dx.

The integral∫ 1

−1

1

x2dx is undefined because neither

∫ 1

0

1

x2dx nor

∫ 0

−1

1

x2dx exists.

(The failure of just one of these limits to exist results in the integral being undefined.)

∫ 1

0

1

x2dx = lim

t→0+

∫ 1

t

1

x2dx

= limt→0+

[−x−1

]1t

= limt→0+

(−1 +

1

t

)=∞. (The integral is divergent.)

Similarly∫ 0

−1

1

x2dx diverges.


Example

Is the area under the curve y = 1√xfrom x = 0 to x = 1 finite? If so, what is it?

Solution: The area, if it exists, is given by∫ 1

0

1√xdx. This integral is improper since the

integrand is unbounded at x = 0.

Now∫ 1

0

1√xdx = lim

t→0+

∫ 1

t

1√xdx

= limt→0+

[2x1/2

]1

t

= limt→0+

(2− 2

√t)

= 2.

The area under the curve is finite and is equal 2 sq. units.

Examples: Evaluate each of the following when they exist and explain the situation otherwise:

Find∫ 1

0

1√1− x2

dx

∫ 1

0

1√1− x2

dx = limt→1−

∫ t

0

1√1− x2

dx

= limt→1−

[sin−1 x

]t0

= limt→1−

(sin−1 t− 0

)= sin−1 (1)

= π/2

Find∫ e

0lnxdx

∫ e

0lnxdx = lim

t→0+

∫ e

tlnxdx diagram:

= limt→0+

[x lnx− x]et (see lecture 14:∫

lnxdx = x lnx− x),

= e ln e− e− limt→0

(t ln t− t)

= e− e− 0 since limt→0

(t ln t) = 0,

= 0


The Comparison Test for Improper Integrals allows us to discuss the convergence of an

improper integral without evaluating it directly, by comparing it to a known or easier integral.

If f and g are continuous functions, where f(x) ≥ g(x) ≥ 0, then

1.∫ ∞a

g(x)dx is convergent if∫ ∞a

f(x)dx is convergent.

2.∫ ∞a

f(x)dx is divergent if∫ ∞a

g(x)dx is divergent.

Example

Show that∫ ∞

1e−x

2dx is convergent. (This integral cannot be evaluated by elementary means

since the antiderivative of e−x2is not an elementary function).

Solution: We compare the integrand e−x2with e−x.

Since x2 ≥ x for all x ≥ 1 we have1

ex2≤ 1

exi.e. e−x

2 ≤ e−x for all x > 1 (in fact e−x2approaches

0 at a very much faster rate than does e−x).

So, using the comparison test,∫ ∞

1e−x

2dx converges if we can show

∫ ∞1

e−xdx converges.

∫ ∞1

e−xdx = limt→∞

∫ t

1e−xdx

= limt→∞

[−e−x

]t1

= limt→∞

(−e−t

)+ e−1.

Now limt→∞

(−e−t

)exists, in fact it is zero, and hence

∫ ∞1

e−xdx converges to the value e−1.

Thus∫ ∞

1e−x

2dx also converges. Its value (whatever it might be) is a number less than e−1.




Lecture 17 · slab and washer methods · shell methodText Reference: §8.9.1

Volumes of solids

Most regular solids have a ‘formula’to use to calculate their volume

e.g. volume (sphere) =43πr

3,

volume (cone) =13πr

2h etc.

Where do these formulae come from, and how do we find volumes of other solids?

A shape is positioned along co-ordinate axes

and a representative slice is used for the cross-

sectional area.

The width of this slice is taken as ∆x and thus

the volume of a typical slice is

∆V = A(x)∆x.

V =

∫ b

aA(x)dx

Example 1 Find the volume of a sphere of radius r with centre at the origin.

∆V = A(x)∆x

= π [y (x)]2 ∆x

So that

V =

∫ r

−rπ(√

r2 − x2)2dx

= 2π

∫ r

0

(r2 − x2

)dx since

√r2 − x2 is an even function

= 2π

[r2x− 1

3x3

]r0

= 2π

(r3 − 1

3r3

)=

4

3πr3

Slab method:

The sphere is an example of a solid of revolution. These are formed when a region (in this

case the region bounded by the x -axis and the upper half of the circle centred at the origin and


of radius r) of the Cartesian plane is rotated about the x -axis. The cross-sectional area of a

typical slice is then in the shape of a disk, and being circular has area

A = πr2 = πf(x)2, where f(x) = height of each slice above the x -axis and therefore the radius

of each slab.

Thus, for a volume of a solid of revolution bounded by the x-axis, y = f(x), x = a and x = b, we

have

V =

b∫a

π [f(x)]2 dx

Washer method

The volume formed by rotation around the x -axis of an area between 2 curves can often be

determined by using the washer method. For this we use

V =

∫ b

aπ[f (x)2 − g(x)2

]dx

The shape created will be a washer, sitting perpendicular to the x -axis.

Example 2 Find the volume of the solid formed when the region bounded by y = x and y = x2

is rotated through 2π radians about the x -axis.

∆V = A(x)∆x

= π(

[f (x)]2 − [g(x)]2)

∆x

V =

∫ b

aπ[f (x)2 − g(x)2

]dx

= π

∫ 1

0

(x2 −

(x2)2)

dx

= π

[1

3x3 − 1

5x5

]1

0

= π

(1

3− 1

5

)=

2π

15

Example 3 Find the volume of the solid formed when the region bounded by y = x and y = x2

is rotated through 2π radians about the y-axis.

∆V = A(y)∆y

= π(

[x2 (y)]2 − [x1(y)]2)

∆y

V = π

∫ 1

0

[(√y)2 − y2

]dy the y terminals are y = 0 and y = 1

the outer radius x2 is y = x2 or x2 (y) =√y

and the inner radius is x1 (y) = y

= π

∫ 1

0

(y − y2

)dy

=

[1

2y2 − 1

3y3

]1

0

= π

(1

2− 1

3

)=π

6


Shell method:

In finding the volume of a solid of revolution which has been rotated about the y-axis, it may

sometimes be more useful to find the volume using cylindrical (hollow) shells, where the shells

will be thin with axis the y-axis.

We use the fact that the shell opens to give a

flat rectangular solid, where

∆V = length× height× thickness

= 2πx · f(x) · dx

to arrive at the expression for the total volume

of the solid of revolution

V =

∫ b

a2π × (shell radius)× (shell height) dx

=

∫ b

a2πxf(x)dx

To use the Shell Method:

1. Draw the diagram, including a line to represent the radius perpendicular to the axis of

revolution.

2. Find the limits of integration, along the required axis of revolution.

3. Integrate the product 2 π (shell radius) (shell height) to give the total volume.

Example 4 Find the volume of the solid obtained by rotating about the y-axis the region

bounded by y = x(x− 1)2 and y = 0. (To attempt this example using the washer method would

be almost impossible.)

∆V = 2πx · x(x− 1)2∆x


V = 2π

∫ 1

0x2 (x− 1)2 dx

= 2π

∫ 1

0x4 − 2x3 + x2dx

= 2π

[1

5x5 − 1

2x4 +

1

3x3

]1

0

= 2π

(1

5− 1

2+

1

3

)

=π

15

1.0 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0

0.5

0.4

0.3

0.2

0.1

0.1

0.2

0.3

0.4

0.5

x

y

Example 5 (Example 3 again but this time via shell method.): Find the volume of the solid

formed when the region bounded by y = x, and y = x2 is rotated through 2π radians about the

y-axis.

∆V = 2πx(x− x2

)∆x

V = 2π

∫ 1

0x(x− x2

)dx

= 2π

[1

3x3 − 1

4x4

]1

0

= 2π · 1

12

=π

6same as that obtained previously

The answers obtained by either method are identical, but the shell method avoids the use of

squaring.

Example 6 Find the volume of the solid generated when the region bounded by y = 1x , y = 0,

x =1 and x = 10 is rotated about the y-axis, using cylindrical shells.

∆V = 2πx

(1

x− 0

)∆x

= 2π∆x

V = 2π

∫ 10

11dx

= 2π · 9

= 18π

The next example shows that the shell method can also be used to find volumes of revolution

about the x axis.


Example 7 The region bounded by y =√x, the x -axis, and the line x = 4 is revolved about

the x -axis to generate a solid. Find its volume using shells.

∆V = 2πy (4− x) ∆y

= 2πy(4− y2

)∆y

V = 2π

∫ 2

0y(4− y2

)dy note use of y values as terminals

= 2π

[2y2 − 1

4y4

]2

0

= 8π

Here the shell method is more complicated than the washer method: V = π∫ 4

0 (√x)

2dx = 8π.



ENG1091 Sequences and Series

Lecture 18 · sequences · limits of sequencesText Reference: §7.1,7.2,7.5

1. Definition: An infinite sequence is a special kind of function whose domain is a set of

integers extending from some starting integer (usually 1) and then continuing indefinitely.

The sequence a1, a2, a3, a4, ... is the ordered list of function values of a function a wherea (n) = an at each positive integer n. We usually specify a sequence by giving its general

term, the formula for an.

2. Examples:

(a) an =

(−1

2

)n=

−1

2,1

4,−1

8,

1

16, ...

(b) an =

n− 1

n=

0,

1

2,2

3,3

4, ...

(c) an = (−1)n−1 = 1,−1, 1,−1, ...

(d) an =

(n2

2n

)=

1

2, 1,

9

8, 1,

25

32,

9

16, ...

(e) an =

(cos nπ2n

)=

0,−1

2, 0,

1

4, 0,−1

6, ...

(f) an =

(1 +

1

n

)n=

2,

(3

2

)2

,

(4

3

)3

,

(5

4

)4

, ...

.

3. Definition: An infinite sequence has a limit L if the terms of the sequence tend to that

limit. This is all very well but it doesn’t say very much. A real (or complex) number L

is the limit of a sequence an if for any number ε > 0 there is a number N such that all

terms of the sequence beyond N are within ε of L. Consult the picture on page 439 of your

text for a visual illustration of this definition. When an infinite sequence an has a limitL we write

limn→∞

an = L.

We are not going to use this definition in any formal sense because we are going to establish

convergence or divergence of sequences using the limit theorems which follow. However it

is important to bear in mind that the proofs of these theorems depend ultimately on this

definition.

Not all sequences have limits and those that do are said to be convergent to their limit.

If a sequence has no limit we say it diverges.

Many people have a false idea of a limit as a number which the terms of the sequence ‘get

closer to’somehow. Notice example (e) above which has the limit 0. Notice also that it is

not true to say that successive terms are getting closer to zero, in fact each non-zero term

is farther away from zero than its predecessor, which of course is exactly zero.


4. Examples:

(a) an =

(−1

2

)n=

−1

2,1

4,−1

8,

1

16, ...

converges to 0.

(b) an =n− 1

n=

0,

1

2,2

3,3

4, ...

converges to 1.

(c) an = (−1)n−1 = 1,−1, 1,−1, ... diverges since it oscillates indefinitely between −1

and 1.

(d) an =

(n2

2n

)=

1

2, 1,

9

8, 1,

25

32,

9

16, ...

converges to 0.

(e) an =

(cos nπ2n

)=

0,−1

2, 0,

1

4, 0,−1

6, ...

converges to 0.

(f) an =

(1 +

1

n

)n=

2,

(3

2

)2

,

(4

3

)3

,

(5

4

)4

, ...

, converges to e.

(g) an = n = 1, 2, 3, 4, ... , diverges since an →∞, we also say that an is unbounded.

5. Demonstrating divergence. Showing that a particular sequence diverges can in many

ways be more problematic.

(a) If we can show that the sequence is unbounded the sequence diverges. A sequence

an is unbounded if for all numbers M > 0 we may find an n such that |an| > M.

However, please remember that many bounded sequences are also divergent.

(b) If a sequence appears to have two or more different ‘limits’the sequence diverges. It

may happen, for example, that the sequence of odd terms of a converges to a limit

which is different to the limit of the sequence of even terms. This behaviour is apparent

in the example (c) above.

(c) Many divergent sequences behave like the divergent sequence an = sin (n) . The range

of this sequence is dense in the set [−1, 1] which means we can pick any number in

[−1, 1] and specify any positive distance we like, then there exists an n such that

sin (n) is as close as we please to our chosen number.


6. Sequence theorems

Suppose that c and p are constants and (unless stated otherwise) the limits limn→∞ an

and limn→∞ bn exist. Then

(a) limn→∞

[an + bn] = limn→∞

an + limn→∞

bn

(b) limn→∞

[an − bn] = limn→∞

an − limn→∞

bn

(c) limn→∞

[can] = c limn→∞

an

(d) limn→∞

[an × bn] = limn→∞

an × limn→∞

bn

(i) if limn→∞ bn 6= 0 then limn→∞anbn

= limn→∞ anlimn→∞ bn

,

(ii) if an is a bounded sequence and bn is unbounded then limn→∞

anbn

= 0. (It is

not necessary that limn→∞ an exists.)

(e) limn→∞

[anp] = [ lim

n→∞an]p

Part (f) is really a special case of the Continuous function theorem which says

that

if f is a continuous function then limn→∞

[f (an)] = f(

limn→∞

an

).

(f) limn→∞

c = c

(g) limn→∞

cn = 0 if |c| < 1 and divergent otherwise.

7. The following examples illustrate how the various properties listed above can be used to

establish convergence of sequences and find their limits.

(a) an = n diverges since an is unbounded.

(b) an = 1n converges to 0. Rather obvious but a special case of rule (e)ii.

(c) an =n2 − 3n+ 1

2n2 + 1

Write an =n2 − 3n+ 1

2n2 + 1

=n2(1− 3

n + 1n2

)n2(2 + 1

n2

)=

(1− 3

n + 1n2

)(2 + 1

n2

)

So limn→∞

an =limn→∞

(1− 3

n + 1n2

)limn→∞

(2 + 1

n2

) (apply rule (e))

=(1− 0 + 0)

(2 + 0)(apply rules (a),e(ii))

=1

2


(d) an =2n2 + 3n+ 1

n3 + 1

Write an =2n2 + 3n+ 1

n3 + 1

=n2(2 + 3

n + 1n2

)n3(1 + 1

n3

)=

1(2 + 3

n + 1n2

)n(1 + 1

n2

)

So limn→∞

an = limn→∞

(1

n

)×

limn→∞(2 + 3

n + 1n2

)limn→∞

(1 + 1

n2

) (apply rules (d,e))

= 0× 2

= 0

(e) an =√n+ 1−

√n

an =

√n+ 1−

√n

1×√n+ 1 +

√n√

n+ 1 +√n(a trick that often works with difference of sq. roots)

=n+ 1− n√n+ 1 +

√n

=1√

n+ 1 +√n

So limn→∞

an = limn→∞

1√n+ 1 +

√n

= 0 (since the sequences√n+ 1,

√n are unbounded)

Exercises Find the limits of the following sequences if they exist, or if they are divergent explain

why.

1. an =√n2 + 2n− n ANS: convergent: limn→∞ an = 1.

2. an =n2 − 4

n+ 5ANS: divergent: an = n2−4

n+5 is not bounded.

3. an = ln (n+ 1)− ln (2n− 1) ANS: convergent: limn→∞ an = ln 12 = − ln 2.


An important sequence

Show limn→∞

(1 +

x

n

)n= ex.

Use L’Hopitals rule but first we need to change it from a sequence limit to a function of a

continuous variable.

Consider instead limx→∞(1 + a

x

)x= L (a) .

Then lnL (a) = ln(limx→∞

(1 + a

x

)x)= limx→∞ ln

(1 + a

x

)x= limx→∞ x ln

(1 + a

x

)=∞ · 0

= limx→∞ln(1+ a

x)1/x = ·0

0

= limx→∞

(−ax2

)(1+ a

x)÷(−1x2

)applying L’Hopitals rule

= limx→∞a

(1+ ax)

= a

so L (a) = ea hence limx→∞(1 + a

x

)x= ea.

We conclude that the sequence limit also exists and limn→∞(1 + x

n

)n= ex.

Note that the existence of the function limit implies the existence of the corresponding sequence

limit but not vice versa.

For example limn→∞

sin (2πn) = 0 but limx→∞

sin (2πx) does not exist.




Lecture 19&20 · series · geometric series · convergenceText Reference: §7.6

Many students find this lecture very diffi cult and the material covered here is quite sparse. An

excellent account of this material can be found in the first chapter of Mathematical Methods in

the Physical Sciences by Mary L Boas. (Available in the Hargrave library.)

1. An infinite series is a formal sum of infinitely many terms; for example a1 + a2 + a3 + a4 + ...

is a series formed by adding the terms of the sequence an . This series is also denoted∞∑n=1

an.

∞∑n=1

an = a1 + a2 + a3 + a4 + ...

Examples:

1.∞∑n=1

(−1

2

)n= −1

2+

1

4− 1

8+

1

16− ....

2.∞∑n=1

n− 1

n= 0 +

1

2+

2

3+

3

4+ ...

3.∞∑n=1

(−1)n−1 = 1− 1 + 1− 1 + ...

4.∞∑n=1

(n2

2n

)=

1

2+ 1 +

9

8+ 1 +

25

32+

9

16+ ...

5.∞∑n=1

n = 1 + 2 + 3 + 4 + ...

6.∞∑n=2

(1

lnn

)=

1

ln 2+

1

ln 3+

1

ln 4+ ...

To every series∞∑n=1

an there is an associated sequence called the sequence of partial sums sn

whose nth term is the sum of the first n terms of the series:

s1 = a1

s2 = a1 + a2

s4 = a1 + a2 + a3

s4 = a1 + a2 + a3 + a4

...

sn =

n∑k=1

ak

...

Definition: We say that the series∞∑n=1

an converges to the sum s if the sequence of partial

sums sn , where sn =

n∑k=1

ak, converges to s. If this is the case we write∞∑n=1

an = s.

If the sequence of partial sums is a divergent sequence then the series∞∑n=1

an is said to diverge.

Recall what it means for a sequence sn to converge. Given any ε > 0 there exists N such that

|sn − L| < ε for all n > N. In particular the distance between any two terms sn and sn+1 must


be less than 2ε whenever n > N. To see this:

|sn+1 − sn| = |sn+1 − L+ L− sn|

≤ |sn+1 − L|+ |L− sn| by triangle inequality

< ε+ ε whenever n > N

But |sn+1 − sn| = |an+1| so the sequence an converges to zero. Thus we have the followingnecessary condition for convergence.

Theorem: The infinite series∑∞

n=1 an converges only if the parent sequence an converges tozero.

Example: Discuss the convergence or divergence of the series∞∑n=1

n− 1

n+ 1.

We have limn→∞

an = limn→∞

n− 1

n+ 1

= 1. Since this is not zero the series∞∑n=1

n− 1

n+ 1diverges.

Important note: The test limn→∞

an = 0 is a condition necessary for convergence; it is not

suffi cient.

Later on we show that the series∞∑n=1

1

nis a divergent series despite the fact that lim

n→∞1

n= 0.

2. Geometric series

A series of the form

a+ ar + ar2 + ....

where a 6= 0 is called a geometric series. The number a is its first term and the number r is

called the common ratio since it is the value of the ratio of any term to its predecessor.

Repeating decimals are infinite geometric series, e.g.

0.12 = 0.12121212... =12

100+

12

10, 000+

12

1, 000, 000+ ...; r =

1

100

Finding an explicit formula for sn for a geometric series is easy:

sn = a+ ar + ar2 + ....+ arn−1, (1)

and

rsn = ar + ar2 + ar3 + ....+ arn (2)

e.g. (1)− e.g. (2):

hence

sn =a (1− rn)

1− r

• For |r| < 1 we have limn→∞ rn = 0 and so the geometric series converges to

∞∑n=1

arn−1 =a

1− r .


• For r > 1 the sequencearn−1

is unbounded and so the geometric series diverges.

• For r = 1, and a 6= 0 we have the divergent constant series a+ a+ a+ .... and for r = −1

we have the series a − a + a − a + .... which alternates between a and 0, and hence also

diverges.

Exercise Use the formula∞∑n=1

arn−1 =a

1− r to find the fraction equivalent of the repeating

decimal 0.12.

0.12 = 0.121212...

Exercises: Discuss the convergence or divergence of each of the following series:

1. Use partial fractions to show1

n (n+ 1)=

1

n− 1

n+ 1. Use this to find a formula for its nth

partial sum sn. Hence show∞∑n=1

1

n (n+ 1)converges by finding its limit.

The nth partial sum is sn =n∑k=1

1

n (n+ 1)=

(1

1− 1

2

)+

(1

2− 1

3

)+ ...+

(1

n− 1

n+ 1

)= 1− 1

n+ 1

Hence∞∑n=1

1

n (n+ 1)= lim

n→∞

(1− 1

n+ 1

)= 1.

2.∞∑n=1

n− 1√n2 + 1

.


Tests for Series Convergence

The convergence or divergence of the geometric series was determined by finding a formula for

the sequence of partial sums sn . This is not always possible for more general series and hencethe need to establish some tests which are suffi cient to determine convergence or divergence.

For now we deal exclusively with positive series, that is series of the type∞∑n=1

an where an ≥ 0

for all n.

1. Integral Test.

Example: Determine the convergence or divergence of the series∞∑n=1

1

n2.

Notice that all of the terms of the series are positive.

The essential idea of the integral test is that the series∞∑n=1

1

n2and the improper integral

∫ ∞1

1

x2dx

either both converge, or both diverge (to ∞).

Now a quick calculation shows∫ ∞

1

1

x2dx converges:

Notice that∞∑n=2

1

n2<

∫ ∞1

1

x2dx (diagram) so that

∞∑n=1

1

n2< 1 +

∫ ∞1

1

x2dx

Since an =1

n2is always positive, the sequence of partial sums is increasing (since sn+1 − sn =

an+1 > 0).

Therefore the series is bounded above by 1 +

∫ ∞1

1

x2dx.

An increasing sequence sn that is bounded above converges, hence the series∞∑n=1

1

n2converges.

Example: Determine the convergence or divergence of the series∞∑n=1

1

n.

Notice once again that all of the terms of the series are positive. This time the corresponding

improper integral is∫ ∞

1

1

xdx which diverges (to ∞).

Calculation:

Notice that∞∑n=1

1

n>

∫ ∞1

1

xdx (diagram).

Hence∞∑n=1

1

n>

∫ ∞1

1

xdx is unbounded, and therefore

∞∑n=1

1

nis also unbounded and therefore

diverges.

Note: the divergent series∞∑n=1

1

nis called the harmonic series. It is rather special because it is

an example of a series that diverges and yet whose parent sequence, an = 1n , converges to zero.


Example (p-series): The series class∞∑n=1

1

np. are known collectively as p−series . By comparing

with the corresponding integral∫ ∞

1

1

xpdx a quick calculation shows:

∞∑n=1

1

npdiverges for p ≤ 1 and

∞∑n=1

1

npconverges for p > 1.

2. The comparison test

The integral test works by comparing an infinite series with the corresponding improper integral.

Why not compare two series? This then is the comparison test.

Example The series∑∞

n=11

n2+1≤∑∞

n=11n2because 1

n2+1≤ 1

n2for all n. We know

∑∞n=1

1n2

converges and since it dominates∑∞

n=11

n2+1this series must also converge. (Once again the fact

that∑∞

n=11

n2+1and

∑∞n=1

1n2are both series of positive terms is crucial here.)

The precise statement of the comparison test is as follows:

Let∑∞

n=1 an and∑∞

n=1 bn both be series of positive terms and that the

convergence or divergence of∑∞

n=1 bn is known.

Showing convergence: If∑∞

n=1 bn converges and an ≤ bn for all n, then∑∞

n=1 an

converges.

Showing divergence: If∑∞

n=1 bn diverges and an ≥ bn for all n, then∑∞

n=1 an diverges.

Warning: When using the comparison test it is important to get the inequalities the

correct way about and avoid using too coarse a comparison.

For example, it is true that 1n2+1

≤ 1n for all n and that

∑∞n=1

1n diverges. What can we say

about the behaviour of∑∞

n=11

n2+1on the basis of this comparison? Absolutely nothing!

Exercises: Discuss the convergence or divergence of each of the following series:

1.∞∑n=2

1

n lnn. [Compare with the integral

∫ ∞2

1

x lnxdx.]

2.∞∑n=1

en cos2 n

πn. [Compare with the geometric series

∞∑n=1

en

πn]

3.∞∑n=1

√n− 1

n2 + 1. [Compare with the p-series

∞∑n=1

1

n3/2]

4.∞∑n=1

n− 1

2n (n+ 1). [Compare with the geometric series

∞∑n=1

1

2n]


The Ratio Test

Recall that the infinite geometric series∑∞

n=1 arn−1 = a + ar + ar2 + ... converges for r < 1

and diverges for r > 1, where the common ratio r is the ratio of two consecutive terms of the

geometric sequence, i.e. r = an+1an

.

The ratio test for convergence of a series is a generalisation of this to other types of series.

Ratio Test: Suppose we have a series∞∑n=1

an where an > 0 for all n, and for which limn→∞

an+1

aneither exists or is infinite.

Let ρ = limn→∞

an+1

an.

• If ρ < 1 then∞∑n=1

an converges. (As a consequence we get limn→∞

an = 0.)

• If ρ > 1 then limn→∞

an =∞ and∞∑n=1

an diverges.

• If ρ = 1, then the ratio test fails as the series may converge, or diverge to ∞.

Notice that this test could also be used to test for convergence of a geometric series since in this

case limn→∞an+1an

= an+1an

= r, a constant.

Examples

1.∞∑n=1

1

n2

(ρ = 1 and therefore ratio test fails, but we know this series converges by earlier tests)

2.∞∑n=1

2n

n!

(ρ = 0, series converges by ratio test)


3.∞∑n=1

n100

2n

(ρ = 12 , series converges by ratio test)

4.∞∑n=1

n!

nn

(ρ = 1e , series converges by ratio test)

5. Use the ratio test to show the series∞∑n=1

ne−nconverges.


Absolute and Conditional convergence

All of the series in the previous section were series of positive terms. We can now remove

this restriction and allow arbitrary terms an. We can obtain a series of positive terms from an

arbitrary series by replacing all the terms with their absolute values.

Definition: The series∞∑n=1

an is said to be absolutely convergent if the series∞∑n=1

|an| con-

verges.

Absolute convergence Theorem: If a series converges absolutely then the series converges.

Thus the tests for series of positive terms can be used to determine the convergence of any series

converges by it showing converges absolutely.

Example: Show the series∞∑n=1

(−1)n

n2converges absolutely.

However the absolute convergence test (if we call it that) is a suffi cient condition for convergence,

but it is not a necessary condition. Many series may fail to be absolutely convergent and yet are

convergent just the same. We call such series conditionally convergent.

Example: The series∞∑n=1

(−1)n

ndoes not converge absolutely because if we replace all the terms

by their absolute values we get the divergent harmonic series.∞∑n=1

1

n.

However the alternating harmonic series∞∑n=1

(−1)n

nconverges (conditionally) as we will show.

We cannot use any of the tests previously discussed to show that the series∑∞

n=1(−1)n

n converges

as these tests apply only to series of positive terms. Generally speaking, to demonstrate conver-

gence where the convergence is not absolute is usually quite diffi cult. We will discuss but one of


many tests that do the job; this test is very easily applied but is quite restrictive as it can only

be used on special types of series.

The Alternating series test. Suppose we have a series of the form∞∑n=1

(−1)n an where the

sequence an satisfies:

(i) an ≥ 0, for all n

(ii) limn→∞ an = 0 and

(iii) an+1 ≤ an for all n.

Then the series∞∑n=1

(−1)n an converges.


(−1)n

n.

(i) The series is of the required form with an = 1n . Clearly an > 0 for all n.

(ii) limn→∞1n = 0,

(iii) an − an+1 = 1n −

1n+1 = 1

n(n+1) > 0 for all n and hence an+1 ≤ an.

The three parts of the alternating series test are satisfied and we deduce that∞∑n=1

(−1)n

nconverges.


cosnπ

loge n.

(i) Since cosnπ = (−1)n the series is of the required form with an = 1loge n

. Since loge n > 0

for all n ≥ 2, we have an > 0.

(ii) Also, limn→∞1

loge n= 0,

(iii) To show an − an+1 = 1loge n

− 1loge(n+1) > 0 for all n, is a little more awkward than that

for the previous example and one way of doing this is to show the function 1/ loge (x) is

decreasing for all x ≥ 2. This is easy using calculus:

The function 1/ loge (x) has derivative:

d

dx(loge (x))−1 = −1 (loge (x))−2 × d

dxloge (x)

= − 1

x (loge (x))2

This is clearly negative, and hence 1/ loge (x) is a decreasing function. Thus1

loge n− 1

loge(n+1) > 0 for all n ≥ 2.

All three parts of the alternating series test are satisfied and we deduce that∞∑n=2

cosnπ

loge nconverges.


The alternating series test is quite restrictive as it cannot be used to show the conditional

convergence of series whose terms do not strictly alternate in sign.

For example, the series∑∞

n=1sinnn is also convergent conditionally, but its terms do not strictly al-

ternate in sign. A more general test for conditional convergence (and which works for∑∞

n=1sinnn )

is Dirichlet’s test but will not be examined in this course.




Lecture 21 Taylor’s theoremText Reference: §9.4.1-9.9.4.2

The pragmatic reason for spending all this time on sequences and series was to get to Taylor

series. The idea of the Taylor series is to approximate a function with a power series. This series

can then be used to find values of the original function in an effi cient manor. Calculators and

computers regularly use Taylor series expansions for more sophisticated functions.

To begin with, let’s construct an approximation to the function f(x) at the point a, given the

value of the function at the point and it’s slope. If we don’t really know anything about the

shape of the function, then we will stick with the basic approximation of a straight line.

f (x) ≈ f (a) + (x− a) f ′ (a)

If we are also given the second derivative evaluated at the point x = a, then we have an extra

constraint. Instead of a straight line, we can approximate f(x) with a parabola.

f (x) ≈ f (a) + (x− a) f ′ (a) +(x− a)2

2!f ′′ (a)

Example: Given f (2) = −1, f ′ (2) = 0, f ′′ (2) = −1, find the 2nd order polynomial approxima-

tion to f(x) about x = 2.

The Taylor polynomial of degree 2 is

f (x) ≈ f (a) + (x− a) f ′ (a) +(x− a)2

2!f ′′ (a)

and with a = 2 becomes

f (x) ≈ f (2) + (x− 2) f ′ (2) +(x− 2)2

2!f ′′ (2)

= −1− 1

2(x− 2)2 .

This can readily be extended to higher order polynomials.

Example: Given, f (0) = 2, f ′ (0) = −1, f ′′ (0) = 3 and f ′′′ (0) = 1 find the 3rd order polynomial

approximation to f(x) about x = 0.

Now

f (x) ≈ f (a) + (x− a) f ′ (a) +(x− a)2

2!f ′′ (a) +

(x− a)3

3!f ′′′ (a)

with a = 0 this becomes:

f (x) ≈ f (0) + xf ′ (0) +(x)2

2!f ′′ (0) +

(x)3

3!f ′′′ (0)

= 2− x+3

2x2 +

1

6x3


Taylor polynomials are approximations of the function f.

If we replace the notion of polynomial with infinite series the ‘approximate equals’sign can be

replaced by equality [under appropriate conditions]:

f (x) = f (a) + (x− a) f ′ (a) +(x− a)2

2!f ′′ (a) + · · ·+ (x− a)n

n!f (n) (a) + ... (21.1)

This expression is known as the Taylor series of f.

[One of the conditions for equality in eq. (2.1) is that the Taylor series converges. The Taylor

series of some functions converge for all x, while others (typically) converge only on some interval

of the real line. Equality can only apply for those x for which the series converges.]

Example: Find the Taylor series for f(x) = ex about x = 1.

f (x) = ex f (1) = e

f ′ (x) = ex f ′ (1) = e

f ′′ (x) = ex f ′′ (1) = e

f ′′′ (x) = ex f ′′′ (1) = e

f (4) (x) = ex f (4) (1) = e

· · · · · ·Therefore the Taylor series of f (x) = ex is

f (a) + (x− a) f ′ (a) +(x− a)2

2!f ′′ (a) + · · ·+ (x− a)n

n!f (n) (a) + · · ·

= e+ e (x− 1) +(e

2

)(x− 1)2 +

( e3!

)(x− 1)3 + ...

In the instance when the expansion is about the point x = 0, the Taylor series is then

called a Maclaurin series.

f (x) = f (0) + xf ′ (0) +x2

2!f ′′ (0) + · · ·+ xn

n!f (n) (0) + · · ·

Example: Find the Maclaurin series for f(x) = ln (1 + x) about x = 0.

f (x) = ln (1 + x) f (0) = 0

f ′ (x) =1

1 + xf ′ (0) = 1

f ′′ (x) =−1

(1 + x)2 f ′′ (0) = −1

f ′′′ (x) =2

(1 + x)3 f ′′′ (0) = 2

f ′′′ (x) = − 2 · 3(1 + x)4 = − 3!

(1 + x)4 f (4) (0) = −3!

· · · · · ·Therefore the Maclaurin series of f (x) = ln (1 + x) is

f (0) + (x) f ′ (0) +(x)2

2!f ′′ (0) + · · ·+ (x)n

n!f (n) (0) + · · ·

= 0 + 1 (x)− x2

2+

2! · x3

3!− 3! · x4

4!+ ...

= x− x2

2+x3

3− x4

4+ ...


Example: Find the Maclaurin series for f(x) = cos (x) about x = 0.

f (x) = cosx f (0) = 1

f ′ (x) = − sinx f ′ (0) = 0

f ′′ (x) = − cosx f ′′ (0) = −1

f ′′′ (x) = sinx f ′′′ (0) = 0

f ′′′ (x) = cosx f (4) (0) = 1

· · · · · ·Therefore the Maclaurin series of f (x) = cosx is

f (0) + (x) f ′ (0) +(x)2

2!f ′′ (0) + · · ·+ (x)n

n!f (n) (0) + · · ·

= 1− 0 (x)− x2

2+ 0

x3

3!+x4

4!+ ...

= 1− x2

2!+x4

4!− x6

6!+ ...

It can be rather tedious finding Taylor or Maclaurin series from scratch each time. Using a known

Taylor series it is possible to find the Taylor series of other related functions by substitution, or

by integrating or differentiating term by term.

Example: Find the Maclaurin series expansion to f(x) =1

1 + x, given the expansion of f(x) =

ln (1 + x) from the earlier example.

We differentiate the Maclaurin series for f(x) = ln (1 + x)

The Maclaurin series for f(x) =1

1 + xis then

(x− x2

2+x3

3− x4

4+ ...

)′= 1− x+ x2 − x3 + ...

(This is an infinite geometric series with common ratio r = −x.)


Example: Find the Maclaurin series expansion to f(x) =

∫coshxdx.

First we find the Maclaurin series expansion to coshx :

f (x) = coshx f (0) = 1

f ′ (x) = sinhx f ′ (0) = 0

f ′′ (x) = coshx f ′′ (0) = 1

f ′′′ (x) = sinhx f ′′′ (0) = 0

f ′′′ (x) = coshx f (4) (0) = 1

· · · · · ·Giving

coshx = 1 +x2

2!+x4

4!+x6

6!+ ...

Note that the Maclaurin series for cosx can also be obtained by the identity cosx = cosh (ix) .

Now integrating term by term we obtain∫coshxdx =

∫1 +

x2

2!+x4

4!+x6

6!+ ...dx

= x+x3

3!+x5

5!+x7

7!+ ...+ C

with C = 0 we obtain the Maclaurin series expansion to sinhx = x+x3

3!+x5

5!+x7

7!+ ....

(This may be obtained directly of course from the Taylor series formula.)



ENG1091 Multivariable Calculus

Lecture 22&23 · partial derivatives · directional derivatives · chain ruleText Reference: §9.6.1-9.6.5

1. Functions of several variables

Throughout our discussions on differentiation and integration we have examined functions with

only one independent variable. Yet we can think of any number of examples in engineering in

which a quantity is defined by two or more independent variables. The volume of a cylinder is a

function of the height of the cylinder and the radius of its base:

V = πr2h

The density of ocean water is a function of its temperature and salinity: density:

ρ = ρ (T, σ)

For the moment let us focus on functions with two independent variables, x and y. For further

convenience, we can assume that x and y are our familiar Cartesian coordinates. Given an

arbitrary function of our two independent variable, z = f(x, y), it is possible to view the variable

z as the height above the x-y plane. This function of two variables is thus a three-dimensional

surface above the x-y plane, which, unfortunately, is very diffi cult to graph on a piece of paper. In

graphing f(x, y), it is common to draw lines of constant height z (i.e. contours). Such diagrams

are completely analogous to contour maps used in bushwalking and mountaineering.

It is worth the time to graph a few simple functions to help with future lectures.

Consider the contour maps/surface plots for the functions below:

z1 =√

16− x2 − y2 z2 = 16− x2 − y2


z3 = 2x− y3

and

z4 = cos (x) cos (y) (not examinable)

It is worth noting that the function f (x, y) is often called a scalar field in vector calculus. Also,

we can readily extend this material to three dimensions and beyond; only it isn’t simple to draw

such functions on paper.


2. Partial differentiation: The aim of this section is to extend some of the principles of

basic calculus to functions with multiple independent variables. We begin with differentiation.

Thinking back to one independent variable, if f is a function of a single variable, x say, then we

define the derivative of f with respect to x as

df

dx= lim

∆x→0

f (x+ ∆x)− f (x)

∆x

Now if f is a function of two independent variables, x and y, then we can define the derivative

of f with respect to each of these variables as follows

∂f

∂x= lim

∆x→0

(∆f

∆x

)y=const

= lim∆x→0

(f(x+ ∆x, y)− f(x, y)

∆x

)y=const

(1)

In this operation we treat y as a constant. It is basically ignored. Note the special notation used

for the partial derivative. Note that∂f

∂xand

df

dx

have different meanings in multivariable calculus, so we need to be careful. The partial derivative

with respect to y is similarly defined as

∂f

∂y= lim

∆y→0

(∆f

∆y

)x=const

= lim∆y→0

(f(x, y + ∆y)− f(x, y)

∆y

)x=const

(2)

where x is held constant throughout.

The basic concepts of differentiation (e.g. the product rule,quotient rule, associative and distrib-

utive properties) extend across to higher dimensions as expected.


Returning to our visualisation of z = f (x, y) as representing a height or a 3-D surface, then the

partial derivative∂z

∂x

represents the change in height in the x direction or the slope of the surface in the x direction.

Example: Find both partial derivatives of

f (x, y) = sin (xy) + x2 + x/y

∂f

∂x= cos (xy) · y + 2x+ 1/y

∂f

∂y= cos (xy) · x− xy−2

= y cos (xy) + 2x+ 1/y = x cos (xy)− xy−2

Example: Given

f(x, y) = sin(xy) + x2 + x/y,

find both ∂f∂x and

∂f∂y at the point (π, 1) .

∂f

∂x|(π,1)= cos (π) + 2π + 1

∂f

∂y|(π,1)= π cos (π)− π

= 2π = −2π

As the text notes, partial differentiation can readily be extending to instances of more than two

independent variables.

Example (from text): Given

f (x, y, z) = xyz2 + 3xy − z

find∂f

∂x,∂f

∂yand

∂f

∂z.

∂f

∂x= yz2 + 3y

∂f

∂y= xz2 + 3x

∂f

∂z= 2xyz − 1

Suppose we want to evaluate the partial derivative at a specified point. That is, we want to

quantify the slope given a choice of x and y. Just as in one dimension, we must take the derivative

first before plugging in the variable. Note that since y is held constant in calculating∂f∂x ,, it doesn’t

really matter when we substitute in the given value of y.


3. The gradient and directional derivatives

Staying in Cartesian coordinates, it is natural to extend the partial derivatives to include a

direction. That is, we can turn them into a vector. Assuming that ∂f∂x points in the direction of

x and ∂f∂y points in the direction of y, then we call define the gradient of the field f(x, y) as

∇f(x, y) =∂f

∂xi+

∂f

∂yj (3)

where i and j are the unit vectors in the direction of x and y, respectively. The gradient of the

field f is often abbreviated as ‘gradf’and given the notation ∇f .

Example: Given the scalar field

f(x, y) =√

16− x2 − y2,

calculate ∇f . Sketch these vectors on the contour map of f(x, y).

Solution:

∇f(x, y) =∂f

∂xi+

∂f

∂yj =

1

2

(16− x2 − y2

)−1/2 · −2xi+1

2

(16− x2 − y2

)−1/2 · −2yj

=−1√

16− x2 − y2(xi+ yj)

Note that the gradient vector is always perpendicular to a level curve at a given point and

points towards the direction of increasing function value.

The previous example revealed a noteworthy point about the gradient. At all points the vectors

of the gradient are at right angles to the contour lines. In this two-dimensional, Cartesian

coordinate picture, the gradient points us in the direction of greatest change of our scalar field

f (x, y). Going back to our analogy of f (x, y) representing the contours of height on a map, the

gradient of f (x, y) gives us a vector that tells us the direction of the maximum slope and its

magnitude.

Example: Given the scalar field f(x, y) = xy, draw the contour field, calculate ∇f and sketchthe gradient vectors over the contour lines.

∇f = ∂f∂x i+ ∂f

∂y j = yi+ xj

5 4 3 2 1 1 2 3 4 5

5

4

3

2

1

1

2

3

4

5

x

y


Please note that the gradient can readily be extended to higher dimensions.

Example: Given the scalar field

f(x, y, z) = z + (x2 + y2)

calculate ∇f . Sketch a level surface f(x, y, z) = k for some suitable value of k and plot ∇f ata point on this surface. (The graphic illustrates the case k = 1, i.e. the surface z + (x2 + y2).)

42

4

4 2

3

2

z0

y

00

1

2

3

x2

1

42

Example: Given the scalar field f(x, y, z) = xyz2 + 3xy − z calculate∇f.

∇f =∂f

∂xi+

∂f

∂yj+

∂f

∂zk =

(yz2 + 3y

)i+(xz2 + 3x

)j+ (2xyz − 1)k

Directional derivative

We’ve seen that ∇f is a vector that tells us the direction and magnitude of the rate of changeof the scalar field f (x, y). We can also use ∇f to find the rate of change of the scalar field f (x,y) in some arbitrary direction. This is known as the directional derivative. Specifically, if we are

given a scalar field f (x , y) and a specified orientation to follow, say

v = vxi+ vyj

the unit vector having same direction as v is v =v

‖v‖ where ‖v‖ =√v2x + v2

y ;

then the directional derivative Dvf is defined as

Dvf = ∇f ·(v

‖v‖

)(4)

Example: Given the scalar field f(x, y) = xy, find the directional derivative in the direction of

v = 3i+ 4j

at the points ((1, 1), (1,−1) , and (−4, 3).

v = 3i+ 4j so that ‖v‖ =√

(3)2 + (4)2 = 5 and hence v = 35 i+ 4

5 j.


∇f = ∂f∂x i+ ∂f

∂y j = yi+ xj

Hence Dvf (x, y) = ∇f · v = 35y + 4

5x.

Dvf (1, 1) = 75 , Dvf (1,−1) = 1

5 , Dvf (−4, 3) = −75 .

The definition of the directional derivative presented here is different, in notation, than that

presented in the text. One would find that the definitions are identical in practice since:

v

‖v‖ =vxi+ vyj√v2x + v2

y

=

vx√v2x + v2

y

i+

vy√v2x + v2

y

j = cos(α)i+ sin(α)j (5)

where α is the angle that the vector v makes with the x axis. Using the dot product, eq.(4)

becomes:

∇f ·(v

‖v‖

)=

(∂f

∂xi+

∂f

∂yj

)· (cos(α)i+ sin(α)j)

=∂f

∂xcosα+

∂f

∂ysinα (6)

Equation (6) is the definition of directional derivative (of functions of two variables) given in the

text.

The vector definition presented in these notes is, in general, far more widely used in mathematics

and engineering as it can readily be extended to other coordinate systems and higher dimensions.

4. The chain rule

In one dimension the chain rule was employed when f (x ) and x (t). In such a case,

df

dt=df

dx× dx

dt.

When moving to multiple dimensions, the basic concept is extended.

Suppose that we have z = f (x, y) and that x (s, t) and y(s, t). Here we have f as a function of

two variables, and each of these variables, in turn is a function of two variables. In this case we

may find an expression for the change in f with respect to s and t.

∂z

∂s=∂f

∂x

∂x

∂s+∂f

∂y

∂y

∂s

and∂z

∂t=∂f

∂x

∂x

∂t+∂f

∂y

∂y

∂t

As the text notes, a good example of this is when undertaking a coordinate transformation. If

a function is defined in Cartesian coordinates, and we wish to change over to polar coordinates

(r, θ) then we need to recall the relations

x = r cos θ, and y = r sin θ.

In calculating the partial derivatives, one can either completely change coordinate systems first,

and then compute the partial derivatives, or apply the chain rule.

Example: Given the function z = sin(xy) is defined in for a Cartesian coordinate system, find

the partial derivatives∂z

∂r, and

∂z

∂θ.


∂z

∂x= y cos (xy)

∂z

∂y= x cos (xy)

From x = r cos θ, and y = r sin θ we have:

∂x

∂r= cos θ

∂x

∂θ= −r sin θ

∂y

∂r= sin θ

∂y

∂θ= r cos θ

Now

∂z

∂r=

∂z

∂x

∂x

∂r+∂z

∂y

∂y

∂r

= y cos (xy) cos θ + x cos (xy) sin θ

= 2r cos θ sin θ cos (xy)

= r cos (xy) sin (2θ)

∂z

∂θ=

∂z

∂x

∂x

∂θ+∂z

∂y

∂y

∂θ

= y cos (xy) · −r sin θ + x cos (xy) r cos θ

= cos (xy)(r2 cos2 θ − r2 sin2 θ

)= r2 cos (xy) cos (2θ)

Suppose now we have z = f(x, y) and that x and y are functions of a single variable t. Here we

might think of x and y being our Cartesian coordinates again, but these values are functions of

the time t. (Thus x (t) and y(t) define a path of some particle as it moves in the x -y plane.)

We can then define a derivative of z with regards to t as follows:

dz

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt


Example: Given z (x, y) = x2y − y lnx− 2x with the further relations x (t) = t2 and

y (t) = cos (t).

Finddz

dtand evaluate it at the time t = π.

dz

dt=∂z

∂x

dx

dt+∂z

∂y

dy

dt

∂

∂x

(x2y − y lnx− 2x

)= 2xy − y

x− 2

∂

∂y

(x2y − y lnx− 2x

)= x2 − lnx

dx

dt= 2t

= 2π when t = π

dy

dt= − sin t

= 0 when t = π

dz

dt=

∂z

∂x

dx

dt+∂z

∂y

dy

dt

=(

2xy − y

x− 2)

2π + 0

=

(−2π2 +

1

π2− 2

)2π substituting x = π2 and y = −1 when t = π

= −4π3 +2

π− 4π




Lecture 24 higher derivatives · total differential · exact differentialText Reference: §9.6.7

1. Higher order derivatives

We can extend the partial differentials to higher order derivatives. Given the function f (x, y),

we could create four second order derivatives.

∂

∂x

(∂f

∂x

)=∂2f

∂x2= fxx (1)

∂

∂y

(∂f

∂y

)=∂2f

∂y2= fyy (2)

∂

∂y

(∂f

∂x

)=

∂2f

∂y∂x=

∂

∂y(fx) = fxy (3)

∂

∂x

(∂f

∂y

)=

∂2f

∂x∂y=

∂

∂x(fy) = fyx (4)

Please note the order of the notation in these equations. The partial derivative within the

brackets is the first operation, so in equation (3) the partial derivative with respect to x is first

undertaken, and then with respect to y. Also note that there are functions when equations (3)

and (4) are NOT equal. However, for the purposes of this course, we will neglect these special

cases and assume that order of differentiation can readily be swapped. I.e., we will assume that

∂

∂y

(∂f

∂x

)=

∂

∂x

(∂f

∂y

).

Example: Find given

f (x, y) = x3y3 + sin (y)

find fxx, fyy, fxy and fyx.

fx = ∂∂x

(x3y3 + sin (y)

)fy = ∂

∂y

(x3y3 + sin (y)

)= 3x2y3 = 3x3y2 + cos y

fxx = ∂∂x

(3x2y3

)= 6xy3

fxy = ∂∂y

(3x2y3

)= 9x2y2

fyx = ∂∂x

(3x3y2 + cos y

)= 9x2y2

fyy = ∂∂y

(3x3y2 + cos y

)= 6x3y − sin y


Extending this work to higher order derivatives, and/or functions of more than two independent

variables is straightforward.

2. The total differential and small errors

Suppose we are given a function z = f(x, y), and we wish to appreciate the change in z given a

small change in x and y.

∆z = f(x+ ∆x, y + ∆y)− f(x, y)

This can readily be manipulated to

∆z = f(x+ ∆x, y + ∆y)− f(x+ ∆x, y) + f(x+ ∆x, y)− f(x, y) ≈ ∂f

∂x∆x+

∂f

∂y∆y (5)

If we turn the change of independent variables into a vector

dr = (∆xi+ ∆yj)

then the total differential can be written succinctly as ∇f · dr.

In the limiting case of ∆x→ 0 and ∆y → 0 we can define total differential as

dz =∂f

∂xdx+

∂f

dydy (6)

with dz ≈ ∆z.

Example (from text): Find the total differential for the function z(x, y) = x2y3.

dz =∂f

∂xdx+

∂f

dydy

=

The text notes that the concept of the total differential is commonly used in setting error esti-

mates given some uncertainty in the independent variables. The relative error is defined as

∣∣∣∣duu∣∣∣∣ ≈ ∣∣∣∣∆uu

∣∣∣∣


Example (from text): Find the relative error of the volume of a circular cylinder given the

radius r = 3± 0.01 and the height h = 5± 0.005.

3 Exact differentials

In the previous topic, we started with a well-defined function z = f(x, y) and developed the

total differential in equation (24.6). The idea now is to start with something in the form of the

right-hand side of equation (24.6) and see if it is, indeed, an exact differential. Assume we have

P (x, y)dx+Q(x, y)dy. (7)

Does there exist a function z = f(x, y) such that

dz = P (x, y)dx+Q(x, y)dy?

For this to hold we need

P (x, y) =∂f

∂xand Q(x, y) =

∂f

∂y(8)

Assuming that f(x, y) has second order derivatives we have

∂P

∂y=

∂

∂y

(∂f

∂x

)=

∂2f

∂y∂xand

∂Q

∂x=

∂

∂x

(∂f

∂y

)=

∂2f

∂x∂y

so if the mixed partial derivatives are equal we have

∂P

∂y=∂Q

∂x.

Provided this equation is satisfied our original expression (24.7) may be considered an exact

differential. Note that this test does not tell us how to recover the original function f(x, y). This

must be done through integrating both parts of (24.8) to find a common function.

Example: Verify the expression

(2x+ 2y)dx+ (2x+ 1/y)dy

is an exact differential and recover the function defined by it.




Lecture 25 Taylor’s theorem in two dimensions · OptimisationText Reference: §9.7

4. Taylor’s theorem in two dimensions

Taylor series can readily be extended to functions of two (or more variables). For a function of

two independent variables, f (x, y), we can make an extension around the point (a, b) as follows.

f(a+ h, b+ k) = f(a, b) +1

1!

(h∂

∂x+ k

∂

∂y

)f(x, y)(a,b) +

1

2!

(h∂

∂x+ k

∂

∂y

)2f(x, y)|(a,b) +

. . .+1

n!

(h∂

∂x+ k

∂

∂y

)nf(x, y)|(a,b) + ... (25.1)

Some new notation has been introduced here(h∂

∂x+ k

∂

∂y

)rf(x, y)|(a,b) ≡

hr

∂r

∂xr+

(r

1

)hr−1k

∂r

∂xr−1∂y+ . . .+

(r

s

)hr−sks

∂r

∂xr−s∂ys+ . . .

+

(r

r − 1

)hkr−1

∂r

∂x∂yr−1+ kr

∂r

∂yr

f(x, y)|(a,b)

For example:(h∂

∂x+ k

∂

∂y

)3f(x, y)|(a,b) = h3

∂3f

∂x3|(a,b) +3h2k

∂3f

∂y∂x3|(a,b) +3hk2

∂3f

∂y2∂x|(a,b) +k3

∂3f

∂y3|(a,b)

= (x− a)3 fxxx (a, b) + 3 (x− a)2 (y − b) fyxx (a, b) + 3 (x− a) (y − b)2 fyyx (a, b) + (y − b)3 fyyy (a, b)

Here we have assumed that all of the nth order partial derivatives exist and are continuous in

some domain close to the point (a, b).

Example: Up to second order, find the Taylor series expansion to the function ln(xy) about the

point (1, 1) .


Please note that the first order Taylor approximation to f (x, y) is

T (x, y) = f(a, b) +1

1!

((x− a)

∂

∂x+ (y − b) ∂

∂y

)f(x, y)|(a,b)

= f(a, b) +1

1!((x− a) fx (a, b) + (y − b) fy (a, b))

The equation

z = f(a, b) + ((x− a) fx (a, b) + (y − b) fy (a, b))

is the equation of the tangent plane in 3-D to the surface z = f (x, y) at the point (a, b, f((a, b)) .

This is analogous to earlier work with functions of one independent variable, f(x), in which the

first order Taylor series approximation returned the tangent line.

5. Optimisation of unconstrained functions

We’ve learned that the local extrema of a continuous function of one independent variable f(x)

occur at critical points where the derivative f ′(x) is equal to zero. If the derivative is equal to

zero, then we can have a local minimum, maximum or point of inflection. We then used the

second derivative to, hopefully, tell help us classify the extrema. We wish to extend this work to

a function of two independent variables, f(x, y).

Using the Taylor series expansion just presented, we see that in the neighbourhood of the point

(a, b) the change in f (x, y) is

∆f = f(a+h, b+k)−f(a, b) =

(h∂

∂x+ k

∂

∂y

)f(x, y)|(a,b)+

1

2!

(h∂

∂x+ k

∂

∂y

)2f(x, y)|(a,b)+ . . .

∆f must be either strictly negative or positive for an extrema. Notice that the first term on the

right-hand side depends linearly on h and k. Since these values can be either positive or negative,

the first partial derivatives∂f

∂xand

∂f

∂y

must be zero for ∆f to be strictly positive or negative. This is a necessary condition, which

then leaves our difference depending on the second order partial derivatives. Since we are only

interested in very small values of h and k, we can ignore the higher order partial derivatives, as

these will involve terms like h3, which is much less than h2. Ultimately we require

∆f ≈ 1

2

(h2fxx (a, b) + 2hkfxy (a, b) + k2fyy (a, b)

)(25.2)

to be either positive or negative. This expression can be manipulated as follows

fxx (a, b) ∆f ≈ 1

2

(h2 (fxx (a, b))2 + 2hkfxx (a, b) fxy (a, b) + k2fxx (a, b) fyy (a, b)

)ENG1091 Mathematics for Engineering page 121

Complete the square on the first two terms:

=1

2

[((hfxx (a, b)) + kfxy (a, b))2 − k2 (fxy (a, b))2 + k2fxx (a, b) fyy (a, b)

]=

1

2

[((hfxx (a, b)) + kfxy (a, b))2 + k2

(fxx (a, b) fyy (a, b)− (fxy (a, b))2

)]First, in order for ∆f to be strictly positive in the neighbourhood of a stationary point we require

both∂2f

∂x2and

∂2f

∂x2∂2f

∂y2−(∂2f

∂x∂y

)2(25.3)

be positive. This is thus a requirement for a local minimum.

The results are summarised in the folowing theorem:

Let (a, b) be an interior point of the domain for the function f and suppose that the first and

second partial derivatives of f exist and are continuous on some circular disk with (a, b) as its

centre and contained in the domain of f. Assume that (a, b) is a critical point of f, so that

fx(a, b) = fy(a, b) = 0. Define

∆ =

∣∣∣∣∣ fxx(a, b) fxy(a, b)

fyx(a, b) fyy(a, b)

∣∣∣∣∣ = fxx(a, b)fyy(a, b)− (fxy(a, b))

Then:

1. If ∆ > 0 and fxx(a, b) < 0 or fyy(a, b) < 0, then (a, b) is a local maximum.

2. If ∆ > 0 and fxx(a, b) > 0 or fyy(a, b) > 0, then (a, b) is a local minimum.

3. If ∆ < 0, then (a, b) is a saddle point.

4. If ∆ = 0, then this test is inconclusive.

A saddle point, as the name suggests, is a point on the domain of f (x, y) where a minimum is

approached in one direction, but a maximum is approached from a different direction.

Example 1: Verify that the point (2,−1) is a local maximum for the function f(x, y) = 1 −(x− 2)2 − (y + 1)2

Solution: fx = −2 (x− 2) · 1 and fy = − (y + 1) · 1 and these are zero when x = 2 and when

y = −1. Hence there is a single stationary point of (2,−1) .

To determine the nature of the stationary point we evaluate

D (x, y) =

∣∣∣∣∣ fxx fxy

fyx fyy

∣∣∣∣∣ =

∣∣∣∣∣ −2 0

0 −1

∣∣∣∣∣ = 2 > 0

so (2,−1) is either a local minimum or a local maximum.

Since fxx = −2 < 0 we have that (2,−1) is a local maximum point and that the local maximum

value of f is f(2,−1) = 1.


Example 2: Find the critical points of the function f(x, y) = x2 − 5xy + 3y2 + 13y. Determine

the nature of each stationary point.

Solution:

fx = 2x− 5y and fy = −5x+ 6y + 13 and these are zero when

2x− 5y = 0

−5x+ 6y = −13.

Using Cramer’s rule gives

x =

∣∣∣∣∣ 0 −5

−13 6

∣∣∣∣∣∣∣∣∣∣ 2 −5

−5 6

∣∣∣∣∣= 5 and y =

∣∣∣∣∣ 2 0

−5 −13

∣∣∣∣∣∣∣∣∣∣ 2 −5

−5 6

∣∣∣∣∣= 2

So there is one stationary point: (5, 2) .

Its nature:

D (x, y) =


fyx fyy

∣∣∣∣∣ =

∣∣∣∣∣ 2 −5

−5 6

∣∣∣∣∣ = 12− 25 < 0

so (5, 2) is a saddle point.

Example 3: Show that the function f(x, y) = x3−3xy+y3 has two stationary (critical) points.

Find the second order partial derivatives of f and evaluate the determinant

D (x, y) =


fyx fyy

∣∣∣∣∣at each stationary point. Hence determine the nature of each stationary point.

If the function f has local maximum or minimum values find these.

Solution:

fx = 3x2 − 3y and fx = 0 when x2 = y and fy = 3y2 − 3x and fy = 0 when y2 = x.

Thus the only critical points occur when x4 = x, i.e. when

x4 − x = x (x− 1)(x2 + x+ 1

)= 0, namely at x = 0 and x = 1.

Hence f has two critical points, (0, 0) and (1, 1) .

The nature of the two critical points:

fxx = 6x, fxy = fyx = −3 and fyy = 6y.

Hence D (x, y) =

∣∣∣∣∣ 6x −3

−3 6y

∣∣∣∣∣ = 36xy − 9.

Now D (0, 0) < 0 indicating (0, 0) is a saddle point of f.

On the other hand D (1, 1) > 0 and fxx (1, 1) > 0 indicating that f has a local minimum at

(1, 1) , and its minimum value is f (1, 1) = −1.




Lecture 26 Taylor’s theorem in two dimensions · OptimisationText Reference: §9.7

6. Optimisation of constrained functions (the technique of Lagrange multipliers)

In this section we wish to explore the optimisation of a function of several independent variables,

given a constraint. In two dimensions this is often straightforward. For example, suppose we

wanted to find the maximum of the function

f(x, y) = 1− (x− 2)2 − (y + 1)2

subject to the constraint g(x, y) = x/y = 3. Visually, this could be done by drawing the contour

map of f (x, y) and then drawing the hyperbola x = 3y over the top of the contours, on the

say sheet of paper. The maximum contour value along the hyperbola is the solution we want

to find. Mathematically, we could attack this problem by simple substitution. The constraint is

equivalent to saying that x = 3y so the original function becomes

f(y) = 1− ((3y)− 2)2 − (y + 1)2 = −10y2 + 10y − 4

The extreme for this can readily be found by solving

f ′(y) = −20y + 10 = 0

solution: y = 1/2

substituting into x = 3y we get x = 3/2

Thus the point (3/2, 1/2) should be the maximum (or minimum) point to the function

f(x, y) = 1− (x− 2)2 − (y + 1)2

subject to the constraint g(x, y) = x/y = 3. We get the value f(3/2, 1/2) = −1.5 as a solution

to the original problem.

Example: Find the extrema for the function

f(x, y) = 12xy − 4x2y − 3xy2

given the constraint x+ 2y = 2.

12xy − 4x2y − 3xy2 = 12 (2− 2y) y − 4 (2− 2y)2 y − 3 (2− 2y) y2

= 8y + 2y2 − 10y3

d

dy

(8y + 2y2 − 10y3

)= 8 + 4y − 30y2

= 0 when y =1

15+

1

15

√61, or y =

1

15− 1

15

√61


substituting into x = 2− 2y

= 2− 2

(1

15+

1

15

√61

)=

28

15− 2

15

√61

or x = 2− 2

(1

15− 1

15

√61

)=

28

15+

2

15

√61

Suppose, now that we are working with functions of three independent variables. Namely suppose

we wish to find the extrema of the function f (x, y, z ) subject to the constraint

g(x, y, z) = 0. (26.1)

Sometimes we can manipulate the constraint and substitute it into the original function and

lower the number of independent variables.

For example, consider the function

f(x, y, z) = x2 + xy + xz + y2z2,

and the constraint

g(x, y, z) = 2x2 + 3y − z = 2,

then we could define z = 2x2+ 3y− 2 and substitute this into f to leave it with two independent

variables,

f(x, y) = x3 + xy + x(2x2 + 3y − 2) + y2(2x2 + 3y − 2)2

We are then back to optimising a function of two independent variables and we could approach

the problem as was done in the previous section.

Please note however that this can be very tedious. We can actually manipulate this problem

to present it in a manner that is usually easier to solve. Consider the constraint (26.1). This,

in general, represents a surface in 3-D space. We will define small motions along this surface

as ds = (dx, dy, dz). Without any loss of generality, we can consider this to be a vector in the

3-D Cartesian space. Since g(x, y, z) is constrained to be zero, we know that motion along this

surface won’t change the value of g(x, y, z) :

dg = ∇g · ds =∂g

∂xdx+

∂g

∂ydy +

∂g

∂zdz = 0

Now assume that we are at the point that conditional stationary point that actually both satisfies

the constraint and optimises f(x, y, z) under this constraint. Then small motions along the

surface will also require

df = ∇f · ds =∂f

∂xdx+

∂f

∂ydy +

∂f

∂zdz = 0

Using our basic understanding of the vector dot product we know that both ∇f and ∇g isperpendicular to ds. Thus they may be expressed as a linear combination of one another.

∇f − λ∇g =

(∂f

∂x,∂f

∂y,∂f

∂z

)− λ

(∂g

∂x,∂g

∂y,∂g

∂z

)= (0, 0, 0) (26.2)


Here λ is basically another unknown variable. At this point in time, some students might be

asking what the advantage in all of this is. We have moved from our initial optimisation problem

with three unknowns (x, y and z ) to a system with four equations [(26.1) and the three of (26.2)]

and four unknowns (x, y, z and λ). Experience tells us that this new approach is often easier to

solve than the original problem. Please note that the variable λ is called the Lagrange multiplier

and the function

φ(x, y, z) = f(x, y, z)− λg(x, y, z)

is called the auxiliary function.

Example: Find the extrema of the function

f(x, y, z) = x2 + y2 + z2

subject to the constraint

g(z, y, z) = x2 + 2y2 − z2 − 1 = 0

Solution: The three Lagrange multiplier equations can be written:

∇f = (2x, 2y, 2z) = λ∇g = λ (2x, 4y,−2z)

The first equation 2x = λ2x gives λ = 1 or x = 0

If λ = 1 (x is arbitrary) then the second component gives 2y = 4y hence y = 0; and the third

component 2z = −2z gives z = 0.

Solving the constraint equation x2 + 2y2 − z2 − 1 = 0 with y = z = 0 gives x = ±1.

Using the equation 2y = λ4y we have λ = 1/2.

If λ = 1/2, then y can be arbitrary and equations 1 and 3 give x = z = 0. The constraint equation

x2 + 2y2 − z2 − 1 = 0 with x = z = 0 gives y = ±1/√

2.

Using the equation 2z = λ (−2z) we have λ = −1.

If λ = −1, then z can be arbitrary and equations 1 and 2 give x = y = 0. The constraint equation

x2 + 2y2 − z2 − 1 = 0 becomes −z2 = 1 which has no solution.

There are thus the 4 constrained extreme points (±1, 0, 0) with f (x, y, z) = 1 and(0,±1/

√2, 0)

with f (x, y, z) = 1/2.


Example: Find the extrema of the function f(x, y, z) = xyz subject to the constraint

g(x, y, z) = x2 + y2 + z2 = 1.

Solution: The three Lagrange multiplier equations can be written:

∇f = (yz, xz, xy) = λ∇g = λ (2x, 2y, 2z)

λ =yz

2x=xz

2y=xy

2z

y2 = x2; z2 = y2; x2 = z2

x2 + y2 + z2 = 1 so 3x2 = 1⇒ x = ± 1√3

we have y = ± 1√3

; z = ± 1√3

so eight points:

(± 1√

3,± 1√

3,± 1√

3

).

Example: Use the method of Lagrange multipliers to find the maximum possible volume of a

cone inscribed in a sphere of radius a.

Solution: Let the cone have height h and radius r.

The function to be maximised is V = 13πr

2h.

The fact that the cone is inscribed in the sphere leads to the constraint:

a2 = r2 + (h− a)2 = g (r, h) .

This time there are two Lagrange multiplier equations:

∇V =

(2

3πrh,

1

3πr2)

= λ∇g = λ (2r, 2 (h− a))

so λ =23πrh

2r=

13πr

2

2 (h− a)

hence2h

r=

r

h− a and hence 2h2 − 2ah+ h2 − 2ah+ a2 = a2

3h2 − 4ah = 0 and hence h (3h− 4a) = 0⇒ h =4a

3(or h = 0)

From r2 + (h− a)2 = a2 we get

r2 = a2 − (h− a)2 = a2 −(a

3

)2=

8

9a2

r =2√

2

3a

Vmax =1

3πr2h =

32π

81a3



ENG1091 Ordinary Differential Equations

Lecture 27 · introduction · classificationText Reference: §10.1-10.5

1. Introduction and definition

The derivatives of y(x ) have been further expressed by the notation

dy

dx,d2y

dx2,d3y

dx3

(or more concisely as y′, y′′, y′′′) for the first, second and third order derivatives.

Equations (or physical relationships) involving derivatives are known as differential equations.

Examples:dy

dx= 5y + 2 (27.1)

y′′′ + y cosx = 0 (27.2)

d2s

dt2+ t

ds

dt+ t2s = −t (27.3)

xx2 + t2x = ln t (27.4)

Although these examples have no particular physical relevance, there are many simple examples

of relevant differential equations. In basic calculus, the exponential function was commonly

defined through the differential equation

dN

dt= λN

and was used to model ideal population growth.

2. Examples of Engineering Applications:

Ordinary Differential Equations (or ODEs) also have a number of basic engineering applications.

For example, Newtonian physics requires that the forces applied to it define the rate of change

of momentum of a body. For simple gravity

mdv

dt= −mg or

d2s

dt2= −g

ord2s

dt2= −g

where g defines gravity, s is height, v is velocity and t is time. If a drag is considered, then the

equation becomes

mdv

dt= −mg + bv2

ord2z

dt2= −g +

b

m

(dz

dt

)2where b is a constant. The dynamics of a spring can readily be modelled with an ODE. Here

the resistance force is not gravity or drag, but rather it is proportional to the displacement.


Consider a basic problem in thermodynamics with the heating (or cooling) of a body to room

temperature. The rate of change of the temperature of the body T b is proportional to the

temperature difference between the body and room temperature (Tb − Tr). Specifically the

governing equation isdTbdt

= −α(Tb − Tr)

Another classical example models an electrical circuit involving a resistor, an inductor and a

capacitor. If we define the inductance as L, the capacitance as C and the resistance as R, then

the current i(t) of the LCR circuit can be modeled as

Ld2i

dt2+R

di

dt+

1

Ci = 0

3. Classification of ODEs

The notation y(x ) has commonly been used to define y as a dependent function of the inde-

pendent variable x. It is common to use x or t as the dependent variable to signify position or

time.

Given a differential equation, if the dependent variable is a function of only one independent

variable, then the differential equation will be classified as an ordinary differential equation

or sometimes ODEs. All of the examples discussed so far have been of ordinary differential

equations.

In multivariable calculus the function y (also called the dependent variable) might be a function

of two or more independent variables. (For example y might be a function of the displacement

x and the time t, we write y(x, t)). The derivatives are partial derivatives:(∂y

∂xand

∂y

∂t

)Equations involving partial derivatives are logically referred to as partial differential equa-

tions (or PDEs) and will be covered in 2nd level engineering maths. PDEs are commonly used

to study fluid dynamics, heat flow and other engineering applications.

Differential equations will be further classified by their order, which is the degree of the highest

derivative that appears in the differential equation. Example 27.1 is a first order, ordinary

differential equation. Example 27.2 is a 3rd order ODE. Examples 27.3 and 27.4 are both 2nd

order ODEs.

Another important differential equation type are linear ODEs. We define linear differential

equations as those in which the dependent variable terms and their derivatives do not occur as

products, raised to powers (other than one) or in non-linear functions. Otherwise the differential

equation is said to be non-linear.

Examples 27.1, 27.2 and 27.3 are linear while example 27.4 is non-linear because of the xx2 term.

A very important classification of a differential equation is whether it is homogenous or not. A

homogeneous differential equation is one in which every term involves either the dependent

variable or derivatives of the dependent variable. Otherwise it is said to be non-homogeneous.


A non-homogeneous equation will have one or more terms that are either constant or that contain

the independent variable only. Equation 27.1 is non-homogeneous because of the ‘2’on the right

hand side. Such a term will be referred to in these notes as the ‘inhomogeneous term’.

It is common to write differential equations with all terms involving the dependent variable

(including derivatives) on the left-hand side of the equation and any remaining terms on the right.

Thus if the right-hand side of the differential equation is zero, it is classified to be homogeneous.

Example 27.2 is homogeneous while the remaining examples are non-homogeneous. Note that

homogenous equations will always have the trivial solution y(x ) = 0, while non-homogenous

equations will not.

In summary, example 27.1 is a first order, linear, non-homogeneous ordinary differential equation.

Example 27.2 is a third order, linear, homogeneous ordinary differential equation. 27.3 is second

order, linear, non-homogeneous ordinary differential equation, and example 27.4 is a second

order, non-linear, non-homogeneous ordinary differential equation.

4. Solving differential equations

Ideally a solution of an ODE would be an explicit representation of the independent variable

y(x ). Sometimes an analytic solution of an ODE may be found, but only in an implicit form,

e.g. H (x, y) = 0, and sometimes no analytic solution to an ODE is possible.

For example, the exponential function

N(t) = αeλt

is an explicit solution to the simple ODE

dN

dt= λN

where α is an arbitrary constant.

As a second example, consider the 2nd order linear ODE

x+ x = t

By inspection we can see that x(t) = t is a solution to the ODE since the second derivative of

x(t) would be zero. A more general solution, however, would be x(t) = A sin(t) + B cos(t) + t,

where A and B are arbitrary constants.

x(t) = A sin(t) +B cos(t) + t

x(t) = A cos(t)−B sin(t)

x(t) = −A sin(t)−B cos(t)

so substituting in these values,

x+ x = [−A sin(t)−B cos(t)] + [A sin(t) +B cos(t) + t] = t

We define the general solution of an ODE as one that contains the arbitrary constants and

retains the maximum degrees of freedom possible. As demonstrated in the first example, the


solution to our first order ODE has one degree of freedom in its solution. The solution of the

second order ODE has two degrees of freedom in the solution. Both of these examples are linear

ODEs, and an nth order linear ODE will have n degrees of freedom in its general solution.

If the solution of an ODE contains no free constants, then we say that the solution is a particular

solution. Typically a particular solution is found by placing additional constraints on the ODE

that define the arbitrary constants. For example, the ODE

dN

dt= λN

could be further constrained by the condition when t = 0, N (t) = 5. So the solution would have

to then be N(t) = 5eλt.

In the second example, the 2nd order linear ODE requires two constraints to fully define the

arbitrary constants. These two constraints could be at different points in the domain (e.g. x (0)

= 4 and x (10) = -2) or all the constraints could be given at the same point in the domain (e.g.

x (0) = 4 and x(0) = 3.) The first set of constraints is called boundary conditions and the

later is called initial conditions. The definition typically reflects the physical nature of the

physical problem. As there is only one constraint for first order linear ODEs, it doesn’t really

matter what you call it (but it is common to refer to the single constraint as the initial condition.)

The statement of an ODE with the boundary (initial) conditions is commonly called a boundary

(initial) value problem.

The ODE x− 4x = 4t will allow a general solution of

x(t) = Ae2t +Be−2t − t

(Use substitution to verify this.) While the initial value problem

x− 4x = 4t, x(0) = 0, x(0) = 4

requires the particular solution of

x(t) = e2t − e−2t − t

(Again, this can be verified through substitution.) In the coming lectures we will learn a number

of techniques for finding analytic solutions to a select set of ODEs. When analytic solutions

are not possible, one may be interested in employing a graphical approach (for 1st order ODEs)

and/or numerical techniques for higher order problems.

5. Graphical interpretation of first order ODEs

Let us initially assume that we have a simple 1st order ODE that we can write in the form

dy

dx= F (x, y)

with no initial condition specified. These slopes can then be drawn and produce what is known

as a direction field.


4 2 2 4

4

2

2

4

Slope field for dydx = y

4 2 2 4

4

2

2

4

Slope field for dydx = −x

y

4 2 2 4

4

2

2

4

Slope field for dydx = x

y

4 2 2 4

4

2

2

4

Slope field for dydx = y

x

Given an initial condition, the solution can be mapped out graphically. This is known as a

solution curve. Different initial conditions will normally lead to different solutions. Simply

plotting a few arbitrary solution curves will produce a family of solution curves. In a preview

to a later lecture, this graphical technique is the basis of many common numerical techniques for

solving ODEs.




Lecture 28 · separable first order ODEsText Reference: §10.5

1. Separable equations

A number of techniques may be used to find analytic solutions of various ODEs. Perhaps the

most simple approach would for ODEs that are separable. By this we mean that the basic ODE

can be re-written with all components of the dependent variable on one side of the equation, and

all components of the independent variable on the other.

Example 1:dy

dx= xy can be rewritten to

dy

y= xdx

Both the left and right hand side of the equation can be readily integrated:∫dy

y=

∫xdx

which leads to

ln y =x2

2+ c

This can be further manipulated to

y(x) = c1ex2/2

One can readily verify by substitution that this is the general solution to the original 1st order

linear ODE.

Example 2: Find the solution to the ODE

dy

dx=−xy

and verify that the solution does solve the ODE.

In general, the technique for separation of variables requires that the ODE be of the form

dx

dt=h(t)

f(x)(28.1)

which can be rewritten to f(x)dx = h(t)dt and that both integrals may be solved with F (x) =∫f(x)dx and H(t) =

∫h(t)dt.

Then the general solution of the separable ODE will be

F (x)−H(t) = c. (28.2)

Note that not all ODEs are separable.

Moreover, even if a 1st order ODE is separable, that does not mean that the components can be

integrated to get a neat analytic solution.


Example 3:

y′ = ex+y.

Example 4: Find the solution to the ODE

exdy

dx− 2y = 1

and verify that the solution does solve the ODE.


2. Substitution

Just as when we learned basic integration, simple substitutions may sometimes be able to trans-

form the given ODE into a separable 1st order ODE. The standard example of this pertains to

ODEs of the form:dx

dt= f

(xt

)Here we can make the substitution w = x/t or x = tw.

Example (from text): Solve the ODE

t2dx

dt= x2 + xt

Write the DE as a function ofx

t:dx

dt=(xt

)2+(xt

)now use the substitution x = tw so that

dx

dt= w + t

dw

dtdx

dt=(xt

)2+(xt

)becomes w + t

dw

dt= w2 + w

so that tdw

dt= w2

now separate:dw

w2= tdt

integrate:∫dw

w2=

∫dt

t

−w−1 = ln t+ C

hencet

x= − ln t+ C

giving x (t) =t

C − ln t.




Lecture 29 · first order linear ODEsText Reference: §10.5.9

1. First order linear ODEs

As an initial point we will consider a homogeneous 1st order linear ODE,

dx

dt+ p (t)x = q (t) (29.1)

When we have a linear d.e. in this form we multiply both sides of the equation by an integrating

factor g (t) that will make the LHS of the ODE the derivative of a product.

g (t)dx

dt+ g (t) p (t)x = g (t) q (t)

The integrating factor is

g (t) = e∫p(t)dt (29.2)

Example: Find the integrating factor and solve the ODE

dy

dx+ xy = 0

[This equation could also be solved by separating the variables.]

Integrating factor: g (x) = e∫xdx = e

12x2

Multiply both sides: e12x2 dy

dx+ xe

12x2y = 0

Combine the LHS into a single derivative:d

dx

(e12x2y)

= 0

Integrate both sides: e12x2y = c

y = ce−12x2

2. Non-homogenous first order linear ODEs

Example: Solve the following ODE by finding the integrating factor.

dy

dx− y

x= 2

Integrating factor: g (x) = e∫− 1xdx = e− lnx =

1

x

Multiply both sides:1

x

dy

dx− 1

x2y =

2

x

Combine the LHS into a single derivative:

d

dx

(1

xy

)=

2

x

Integrate both sides:y

x=

∫2

xdx = 2 lnx+ c

so y = 2x lnx+ cx


Example: Find the integrating factor and solve the initial value problem

tdx

dt+ x = t2 with x(2) = 1/3.

Rewrite in standard form:dx

dt+

1

tx = t

Integrating factor: g (t) = e∫1tdt = eln t = t

Multiply both sides: tdx

dt+ x = t2


dt(tx) = t2

Integrate both sides

tx =1

3t3 + C

so x (t) =1

3t2 +

C

t

Now use the initial condition:

x (2) =4

3+C

2=

1

3, which gives C = −2

Hence x (t) =1

3t2 − 2

t.

Example: Find the integrating factor and solve the initial value problem

dx

dt+ 5x− t = e−2t, x(−1) = 0.

Rewrite in standard form:dx

dt+ 5x = e−2t + t

Integrating factor: g (t) = e∫5dt = e5t

Multiply both sides: e5tdx

dt+ 5e5tx = e3t + te5t


dt

(e5tx

)= e3t + te5t

Integrate both sides (note the integration by parts):

e5tx =

∫e3t + te5tdt

=1

3e3t +

1

5

∫td

dt

(e5t)dt

=1

3e3t +

1

5

(te5t)− 1

5

∫ (e5t)dt

=1

3e3t +

1

5

(te5t)− 1

25e5t + C

so x (t) =1

3e−2t +

1

5t− 1

25+ Ce−5t

Now use the initial condition:

x (−1) =1

3e2 − 6

25+ Ce5 = 0, which gives C = −0.015...

Hence x (t) =1

3e−2t +

1

5t− 1

25− 0.015e−5t



ENG1091 Systems of Differential Equations

Lecture 30&31 · Homogeneous linear systems

Systems of differential equations (time continuous dynamical systems)

Consider the linear differential equation system

x = 2x− y

y = x+ y

To solve this system we need to find both x and y as explicit functions of t.

Now the first step in solving such a system is to write it in matrix form:[dxdtdydt

]=

[2 −1

1 1

][x

y

]

Which in vector form can be written:

dx

dt= Ax where x =

[x

y

], and A =

[2 −1

1 1

]. (30.1)

This is a first order homogeneous system.

Such systems arise frequently in engineering applica-

tions. As an example let us consider the mechanical

system consisting of two masses on two springs as

shown in the diagram.

The displacements y1 (t) and y2 (t) are the displace-

ments of the two masses from their equilibrium posi-

tions when the whole system is at rest.

The upper mass is connected to two springs and

Hooke’s law gives an upward spring force of −3y1 and

a downward spring force of 2 (y2 − y1) since the dis-placement of the lower spring is (y2 − y1) from its

equilibrium.

The lower mass experiences an upward spring force of

−2 (y2 − y1) .

So..y1 = −3y1 + 2 (y2 − y1) = −5y1 + 2y2..y2 = −2 (y2 − y1)

.

This is a 2nd order system.


To convert it to first order we let x1 = y1 x2 =·y1 x3 = y2 x4 =

·y2

=·x1 =

·x3

and then

·x1 =

·y1 = x2

·x2 =

··y1 = −5x1 + 2x3

·x3 =

·y2 = x4

·x4 =

··y2 = 2x1 − 2x3

Written in matrix form this is·x1·x2·x3·x4

=

0 1 0 0

−5 0 2 0

0 0 0 1

2 0 −2 0

x1

x2

x3

x4

.

We wont solve the system in this particular example but the general solution takes a remarkably

simple form provided we know the eigenvalues and eigenvectors of the matrix A.

The solution first order homogeneous systems.

Suppose we have a general first order homogeneous linear system of d.e.s:

dx

dt= Ax (30.2)

where x =

x1 (t)...

xn (t)

and A is an n× n constant matrix.If A has n linearly independent eigenvectors v1,v2, . . . ,vn corresponding to the eigenvalues

λ1, λ2, . . . , λn then the general solution to (30.2) is

x = c1eλ1tv1 + c2e

λ2tv2 + . . .+ cneλntvn (30.3)

Proof: (part)

Using x = c1eλ1tv1 + c2e

λ2tv2 + . . .+ cneλntvn we have

dx

dt= c1λ1e

λ1tv1 + c2λ2eλ2tv2 + . . .+ cnλne

λntvn.

Now, remembering that each v1,v2, . . . ,vn is an eigenvector so that Av1 = λ1v1,

Av2 = λ2v2, . . . , and Avn = λnvn we have

dx

dt= c1λ1e

λ1tv1 + c2λ2eλ2tv2 + . . .+ cnλne

λntvn

= c1eλ1tAv1 + c2e

λ2tAv2 + . . .+ cneλntAvn

= A(c1e

λ1tv1 + c2eλ2tv2 + . . .+ cne

λntvn

)= Ax.


ExamplesSolve the system:

dx1dt

= x1 + x2,dx2dt

= 4x1 − 2x2

subject to the initial conditions: x1 (0) = 1, x2 (0) = 6.

Solution: First write system in matrix form:

[dx1dtdx1dt

]=

[1 1

4 −2

][x1

x2

].

The matrix

[1 1

4 −2

]has eigenvalues−3, 2 corresponding to eigenvectors

[1

−4

],

[1

1

]respectively.

Show this:

The characteristic polynomial is det (A− λI) =

∣∣∣∣∣ 1− λ 1

4 −2− λ

∣∣∣∣∣= (1− λ) (−2− λ)− 4

= −2 + λ+ λ2 − 4

= λ2 + λ− 6

= (λ+ 3) (λ− 2)

and hence the eigenvalues are λ = −3 and λ = 2.

Now for the eigenvectors:

For λ = −3 we solve

[1 1

4 −2

][x1

x2

]= −3

[x1

x2

](equivalently,

[4 1

4 1

][x1

x2

]=

[0

0

])

hence 4x1 + x2 = 0 yielding eigenvectors of the form s

[1

−4

]for s 6= 0.

For λ = 2 we solve

[1 1

4 −2

][x1

x2

]= 2

[x1

x2

](equivalently,

[−1 1

4 −4

][x1

x2

]=

[0

0

])

hence x1 = x2 yielding eigenvectors of the form s

[1

1

]for s 6= 0.

Hence the general solution of the system is

[x1 (t)

x2 (t)

]= c1e

−3t

[1

−4

]+ c2e

2t

[1

1

]

Now we need to find c1 and c2, to do this solve c1

[1

−4

]+ c2

[1

1

]=

[1

6

]giving c1 = −1

and c2 = 2.

Show this (Cramer’s rule):

We have c1 + c2 = 1

−4c1 + c2 = 6

hence

c1 =

∣∣∣∣∣ 1 1

6 1

∣∣∣∣∣∣∣∣∣∣ 1 1

−4 1

∣∣∣∣∣and c2 =

∣∣∣∣∣ 1 1

−4 6

∣∣∣∣∣∣∣∣∣∣ 1 1

−4 1

∣∣∣∣∣so c1 =

−5

5= −1 and c2 =

10

5= 2


Therefore the solution is x =

[x1 (t)

x2 (t)

]= −e−3t

[1

−4

]+ 2e2t

[1

1

].

Explicitly this gives x1 (t) = −e−3t + 2e2t, x2 (t) = 4e−3t + 2e2t.

Example (repeated eigenvalue)Find the general solution of the system:

dx1dt

= 6x1 + x2,dx2dt

= −x1 + 8x2

Solution: The system in matrix form:

[dx1dtdx1dt

]=

[6 1

−1 8

][x1

x2

].

This time the matrix:

[6 1

−1 8

]has a single (repeated) eigenvalue of 7 corresponding to the

eigenvector

[1

1

].

Show this:


∣∣∣∣∣ 6− λ 1

−1 8− λ

∣∣∣∣∣= (6− λ) (8− λ) + 1

= λ2 − 14λ+ 49

= (λ− 7) (λ− 7)

and hence there is a single eigenvalue only, namely λ = 7.


For λ = 7 we solve

[6 1

−1 8

][x1

x2

]= 7

[x1

x2

](equivalently,

[−1 1

−1 1

][x1

x2

]=

[0

0

])

hence −x1 + x2 = 0 yielding eigenvectors of the form s

[1

1

]for s 6= 0.

As this matrix only has one independent eigenvector the solution form (30.3) is incomplete.

While x =

[x1 (t)

x2 (t)

]= c1e

7t

[1

1

]is a solution it is only part of the general solution.

The complete general solution cannot be obtained solely through eigenvalue/eigenvector methods.


Example (complex eigenvalues)Find the complete general solution the system:

dx1dt

= x1 + x2,dx2dt

= −4x1 + x2

Solution: Write system in matrix form:

[dx1dtdx2dt

]=

[1 1

−4 1

][x1

x2

].

The matrix

[1 1

−4 1

]has eigenvalues 1 + 2i, with corresponding eigenvector

[1

2i

],

and 1− 2i, with corresponding eigenvector

[1

−2i

].

Show this:


∣∣∣∣∣ 1− λ 1

−4 1− λ

∣∣∣∣∣= (1− λ)2 + 4

= [(1− λ)− 2i] [(1− λ) + 2i]

and hence the eigenvalues are λ = 1± 2i.


For λ = 1+2i we solve

[1 1

−4 1

][x1

x2

]= (1 + 2i)

[x1

x2

](equivalently,

[−2i 1

−4 −2i

][x1

x2

]=

[0

0

])

hence −2ix1 + x2 = 0 yielding eigenvectors of the form s

[1

2i

]for s 6= 0.

For λ = 1−2i we solve

[1 1

−4 1

][x1

x2

]= (1− 2i)

[x1

x2

](equivalently,

[2i 1

−4 2i

][x1

x2

]=

[0

0

])

hence 2ix1 + x2 = 0 yielding eigenvectors of the form s

[1

−2i

]for s 6= 0.

The general solution (30.3) gives x =

[x1 (t)

x2 (t)

]= c1e

(1+2i)t

[1

2i

]+ c2e

(1−2i)t

[1

−2i

].

Now

c1e(1+2i)t

[1

2i

]+ c2e

(1−2i)t

[1

−2i

]= et

[c1 (cos 2t+ i sin 2t)

2ic1 (cos 2t+ i sin 2t)

]+

[c2 (cos 2t− i sin 2t)

−2ic2 (cos 2t− i sin 2t)

]

= et

[(c1 + c2) cos 2t+ i (c1 − c2) sin 2t

2i (c1 − c2) cos 2t− 2 (c1 + c2) sin 2t

]

Setting C1 = c1 + c2, and C2 = i (c1 − c2) (notice that C1 and C2 are real if and only if c1 andc2 are complex conjugates) we obtain

x1 (t) = et (C1 cos 2t+ C2 sin 2t)

x2 (t) = et (2C2 cos 2t− 2C1 sin 2t)



ENG1091 Second Order Differential Equations

Lecture 32&33 · Examples · Homogeneous problemText Reference: §10.8 & 10.9.1

1. Linear higher order ODEs.

Recall in the initial lecture on ODEs that linear differential equations were defined as those

in which the dependent variable or variables and their derivatives do not occur as products,

raised to powers or in non-linear functions.

A general second order linear equation takes the form

p (x)d2y

dx2+ q (x)

d2y

dx2+ r (x) y = f (x) (32.1)

where as the notation implies, p, q, r and f are functions of x only. [Here the independent variable

is x.] If f (x) = 0 the equation is homogeneous:

p (x)d2y

dx2+ q (x)

d2y

dx2+ r (x) y = 0 (32.2)

One feature common to all linear, homogeneous equations is that if y1 (x) and y2 (x) are both

solutions of (32.2) the homogeneous linear differential equation, then so is ay1 (x)+by2 (x), where

a and b are arbitrary constants.

If yp (x) is any solution of the non-homogeneous equation (32.1) and y1 (x) and y2 (x) are both

solutions of (32.2), then for any constants a and b, yp (x)+ay1 (x)+by2 (x) is a solution of (32.1).

We will make use of these facts as we progress.

2. Examples of 2nd order linear ODEs

Gravitation Acceleration

Consider a stone dropped from a tall building. Neglecting air resistance, its acceleration is given

by

a =d2s

dt2= g

where g is gravity. This is a simple 2nd order linear non-homogeneous ODE. The velocity v of

the stone may readily be recovered to

v =ds

dt= v0 + gt

where v0 is the initial velocity.

Its distance from the top of the building s is given by

s = s0 + v0t+ gt2/2

where s0is the initial distance from the top of the building.

Simple Harmonic Motion (Mass on a Spring)

Hooke’s Law: If the spring is stretched (or compressed) s units from its natural length,


F = −kswhere k is the spring constant (k > 0).

Now net force = mass × acceleration, so

d2s

dt2= − k

ms = −ω2s

where ω2 =k

m> 0

Here m defines mass. We will find that ω governs the frequency with which the system oscillates.

Assuming that the spring oscillates about the position s = 0, then the solution to this ODE is

s(t) = A cosωt+B sinωt

or by identity s(t) = C sin(ωt+D)

period (time for one complete oscillation) =2π

ω

frequency =1

period=

ω

2π

Example: Verify that equation (32.2) is a solution to equation (32.1).

This type of motion is called simple harmonic motion.

Example: A spring with a mass of 2 kg has natural length 0.5 m.. A force of 12.8 N is required

to maintain it stretched to a length of 0.6 m. If the spring is stretched to a length of 0.6 m and

then released with initial velocity 0, find the position of the mass at any time t.

Damped Oscillations

For a mass on a spring, the frictional force from air resistance increases with the velocity of the

mass. The frictional force is often proportional to the velocity, so we can introduce a damping

term of the form

D dsdt , where D is a constant, called the damping constant, and ds

dt is the velocity. The governing

ODE remains a 2nd order linear ODE with constant coeffi cients.

md2s

dt2= −ks−Dds

dt

or

md2s

dt2+D

ds

dt+ ks = 0

3. Homogeneous 2nd order linear ODEs with constant coeffi cients

For the moment let us focus on homogeneous 2nd order linear ODE with constant coeffi cients.

ad2s

dt2+ b

ds

dt+ cs = 0 (32.3)

where a, b, and c are constants.

The general solution to a 2nd order linear ODE will be a family of functions with two linearly

independent components meaning two arbitrary constants. In the example of the falling body,


the initial value problem requires a specification of velocity and position at some point in time.

Since we are examining 2nd order ODEs, we could readily create a boundary value problem

instead of an initial value problem by defining either the velocity or the position at two different

points in time.

Let us assume that a solution of (32.3) has the form s(t) = eλt, for some (as yet unkown value)

of λ.

Example: Verify that equation (32.4) is a solution to (32.3). Identify the constraint that is

placed on λ.

Substituting we obtain: (aλ2 + bλ+ c

)eλt = 0 (32.4)

which requires

aλ2 + bλ+ c = 0. (32.5)

Equation (32.5) is called the auxiliary or characteristic equation, and is a quadratic equation

and has either two real solutions, two imaginary solutions or 1 real solution. These form the

basis of three three cases.

Case 1: b2 − 4ac > 0.

There are two distinct real solutions,

λ1,2 =

(−b±

√b2 − 4ac

2a

)

to the characteristic equation and so the general solution has the form

s (t) = C1eλ1t + C2e

λ2t (32.6)

here C1 and C2 are arbitrary constants. Now eλ1t and eλ2t are two independent solutions of

(32.3) and because (32.6) involves 2 arbitrary constants the solution (32.6) is the full general

solution of (32.3).


Example: Solve the ODE

s+ 3s+ 2s = 0

with the constraints of s(0) = −0.5 and s(0) = 3.

Solution:

The d.e. s+ 3s+ 2s = 0 has the characteristic equation

λ2 + 3λ+ 2 = 0

which factorises: (λ+ 2) (λ+ 1) = 0

the equation has two real (unequal) roots λ = −2,−1

hence the general solution consists of linear combinations of the two independent solutions

e−t, and e−2t

i.e. s(t) = C1e−t + C2e

−2t

With the initial conditions s(0) = −0.5 and s(0) = 3 we obtain:

−0.5 = C1 + C2

3 = −C1 − 2C2

giving us C1 = 2 and C2 = −2. 5.

Hence s(t) = 2e−t − 2.5e−2t.

1 2 3 4 5

1

0

1

x

y

Note that the solution passes over the t axis once and approaches it as t approaches infinity.

With different initial conditions, the solution needn’t pass over the axis at all. This case is

sometimes called overdamped.


Case 2: b2 − 4ac < 0.

Here there are no real solutions to the characteristic equation, instead there are two complex

conjugate solutions λ1 = p+ qi, λ2 = p− qi where p = −b/2a and

q =√

4ac− b2/2a

The general solution can be written in the form of (32.6) but is usually simplified to

s(t) = ept(C1 cos(qt) + C2 sin(qt)) (32.7)

with the use of Euler’s equation,

eiqt = cos(qt) + i sin(qt).

Exercise: Starting with equation (32.6) derive equation (32.7).


s+ 0.4s+ 4.04s = 0

with the constraints of s(0) = 1 and s(0) = −0.2.

Solution: The d.e. s+ 0.4s+ 4.04s = 0 has the characteristic equation

λ2 + 0.4λ+ 4.04 = 0

which has the solutions

λ =−0.4±

√(0.4)2 − 4× 1× 4.04

2using the quadratic formula: λ =

−b±√b2 − 4ac

2a

=−0.4±

√−16

2

=−0.4± 4i

2= −0.2± 2i

The solution eq. (32.7) now becomes

s(t) = e−0.2t(C1 cos (2t) + C2 sin (2t))

This of course is the general solution of s+ 0.4s+ 4.04s = 0.

With the initial conditions s(0) = 1 and s(0) = −0.2. we obtain:

1 = e0(C1 cos (0) + C2 sin (0))

= C1

Nows (t) = −0.2e−0.2t(C1 cos (2t) + C2 sin (2t))

+ e−0.2t(−2C1 sin (2t) + 2C2 cos (2t))

so

−0.2 = s (0) = −0.2C1 + 2C2


hence

2C2 = −0.2 + 0.2 = 0

giving the specific solution: s(t) = e−0.2t cos (2t) .

Graph:

1 2 3 4 5 6 7

1.0

0.8

0.6

0.4

0.2

0.0

0.2

0.4

0.6

0.8

1.0

x

y

Note that while the solution is damped and s(t) will approach 0 as t approaches infinity, the

solution oscillates about 0. This is sometimes called and underdamped system.

Case 3: b2 − 4ac = 0.

There is only one distinct real solution (λ = −b/2a), and while s(t) = eλt does satisfy the ODE, it

alone is not the general solution, as we need a second linearly independent component. Another

independent solution has the equation teλt. The general solution of (32.3) is for this case

s (t) = (C1t+ C2) eλt (32.8)


s+ 2s+ s = 0

with the constraints of s(0) = 3 and s(0) = 5.

Solution:

The d.e. s+ 2s+ s = 0 has the characteristic equation

λ2 + 2λ+ 1 = 0

which factorises: (λ+ 1)2 = 0

the equation has two equal roots λ = −1,−1

hence the general solution consists of linear combinations of the two independent solutions

e−t, and (note this) te−t

i.e. s(t) = C1e−t + C2te

−t


With the initial conditions s(0) = 3 and s(0) = 5 we obtain:

3 = C1e0 + 0

3 = C1

Now s (t) = −C1e−t + C2(e−t − te−t

)so 5 = s (0) = −C1 + C2

hence C2 = 8 giving the specific solution:

s (t) = 3e−t + 8te−t

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

x

y

Case 3 is sometimes called critically damped as it provides the quickest approach to s = 0. This

is similar to case 1, but the damping is just suffi cient to suppress vibrations.


Summary: Solutions of

ad2s

dt2+ b

ds

dt+ cs = 0


Roots of aλ2 + bλ+ c = 0 General solution

Two real distinct roots λ1 and λ2 s(t) = C1eλ1t + C2e

λ2t

Repeated (real) root λ s(t) = C1eλt + C2te

λt

Two complex roots p± iq s(t) = ept(C1 cos(qt) + C2 sin(qt))

where C1, C2 are arbitrary constants.

4. Higher order linear ODEs with constant coeffi cients

The text notes that the method of solution developed here is not strictly limited to 2nd order

equations. In particular a homogeneous nth order linear ODE with constant coeffi cients

andns

dtn+ an−1

dn−1s

dtn−1+ · · ·+ a1s+ a0 = 0 (32.9)

where a0 through an are constants, can be solved by assuming a solution in the general form of

s(t) = eλt. As before this will lead to a characteristic equation, which is an nth order polynomial.

anλn + an−1λ

n−1 + · · ·+ a1λ+ a0 = 0 (32.10)

This polynomial will have n roots, which may be some combination of real, repeated and complex

conjugate pairs. As the ODE is linear, we must have n linearly independent components of the

general solution. Please note that it is not trivial to analytically solve a higher order polynomial.

Example: Find the general solution to the ODE

...x − 2x− 5x+ 6x = 0.

Hint: λ = 1 is one solution to the characteristic equation.

Solution: The d.e....x − 2x− 5x+ 6x = 0 = 0 has the characteristic equation

λ3 − 2λ2 − 5λ+ 6 = 0

which factorises to

(λ− 1)(λ2 − λ− 6

)= 0 using the hint

(λ− 1) (λ− 3) (λ+ 2) = 0 completing the factorisation

hence there are three roots

λ = 1, 3,−2

Since these roots are different there are no teλt terms in the solution, and the solution is written:

x (t) = C1et + C2e

3t + C3e−2t


Example: Suppose a 4th order linear homogeneous ODE has the characteristic equation

(λ2 + 1)(λ+ 1)2 = 0.

Find the homogeneous equation and find its general solution.

Solution:

The characteristic equation has roots λ = ±i from (λ2 + 1) = 0 and two equal roots λ = −1,−1

from (λ+ 1)2 = 0.

The roots λ = ±i provide the partial solution form (eq. 32.7)

x (t) = e0(C1 cos(t) + C2 sin(t))

= C1 cos(t) + C2 sin(t)

and the two equal roots λ = −1,−1 provide the remaining part of the solution (eq. 32.8):

x (t) = C3e−t + C4te

−t.

Combining: x (t) = C1 cos(t) + C2 sin(t) + C3e−t + C4te

−t

We now write out the original form of the ODE. We need to expand the characteristic equation:

(λ2 + 1)(λ+ 1)2

= (λ2 + 1)(λ2 + 2λ+ 1

)= λ4 + 2λ3 + 2λ2 + 2λ+ 1 = 0

whence we obtain (using dot notation now would be ridiculous)

d4x

dt4+ 2

d3x

dt3+ 2

d2x

dt2+ 2

dx

dt+ x = 0



ENG1091 Second Order Differential Equations

Lecture 34&35 · nonhomogeneous equations · Engineering applicationText Reference: §10.9.2 & 10.10

1. Non-homogeneous 2nd order linear ODEs with constant coeffi cients

Moving from the homogeneous to non-homogeneous 2nd order linear ODE with constant coeffi -

cients means adding a term f(t) to the right-hand side of the equation.

ad2s

dt2+ b

ds

dt+ cs = f (t) (34.1)


The non-homogeneous term f(t) cannot involve the dependent variable s, but it can be non-linear

in the independent variable t. The function f(t) is commonly called the forcing term and this

course will deal with the situation in which f(t) is either a polynomial, exponential or a circular

function.

The general solution s (t) to (34.1) is the sum of the homogeneous solution sc (t) (called the com-

plementary solution) to the homogeneous equation (also called the complementary equation)

and a particular solution, sp (t) .

s (t) = sc (t) + sp (t)

The particular solution thus accounts for the forcing term f(t). The complementary solution

sc(t) will already contain the two independent variables necessary for the general solution.

Much as we had to make a wise guess in finding the solution to homogeneous problem, we will

have to make a wise guess for the nature of the particular solution. Having set the form of the

particular solution it will remain to find the coeffi cient for this term. The technique for this is

commonly called the method of undetermined coeffi cients.

Consider the ODEd2s

dt2− ds

dt− 2s = sin t (34.2)

The solution to the homogeneous equation (called the complementary solution) can readily

be found to be sc (t) = C1e−t + C2e

2t.

A particular solution sp (t) must take the form sp (t) = A cos (t) + B sin (t) where A and B are

undetermined constants which need to be found.

Substituting (34.3) into the original ODE (34.2) allows us to define the coeffi cients A and B into

the differential equation and try to find appropriate values of A and B.

s′p (t) = −A sin (t) +B cos (t) (34.4)

s′′p (t) = −A cos (t)−B sin (t) (34.5)


Substituting these equations back into (e.g. 34.2) leads to

−A cos (t)−B sin (t)− (−A sin (t) +B cos (t))− 2 (A cos (t) +B sin (t)) = sin (t)

−3A cos (t)− 3B sin (t) +A sin (t)−B cos (t) = sin (t)

(−3A−B) cos (t) + (A− 3B) sin (t) = sin (t)

This last equation states that (−3A−B) cos (t)+(A− 3B) sin (t) = sin (t) must be true for all t.

This is possible only if A− 3B = 1 and −B − 3A = 0.

A− 3B = 1

−3A−B = 0

A =

∣∣∣∣∣ 1 −3

0 −1

∣∣∣∣∣∣∣∣∣∣ 1 −3

−3 −1

∣∣∣∣∣B =

∣∣∣∣∣ 1 1

−3 0

∣∣∣∣∣∣∣∣∣∣ 1 −3

−3 −1

∣∣∣∣∣=

−1

−10B =

3

−10

Thus A = 0.10 and B = −0.30.

The general solution is now

s (t) = sc (t) + sp (t)

= C1e−t + C2e

2t +A cos (t) +B sin (t) where A = 0.10 and B = −0.30

= C1e−t + C2e

2t + 0.1 cos (t)− 0.3 sin (t)

Series RLC circuits

In an RLC electrical circuit externally driven by E(t).

E

R

L

C

By Kirchoff’s voltage law,

Ld2Q

dt2+R

dQ

dt+

1

CQ = E (t)


where Q is the charge on the capacitor at time t.

Differentiate this equation with respect to t (and remember that I =dQ

dt), thus

Ld2I

dt2+R

dI

dt+

1

CI = E′ (t)

Example: Find the charge and current at time t in an RLC circuit if R = 40 Ω, L = 1 H, C =

16× 10−4 F, E(t) = 100 cos (10t) V, and the initial charge and current are both 0.

Solution: Substituting the values for L,R and C we obtain

d2Q

dt2+ 40

dQ

dt+ 625Q = 100 cos (10t)

The homogeneous equation isd2Q

dt2+ 40

dQ

dt+ 625Q = 0

and this has characteristic equation

λ2 + 40λ+ 625 = 0

λ =−40±

√1600− 4× 1× 625

2using the quadratic formula: λ =

−b±√b2 − 4ac

2a

=−40±

√−900

2= −20± 15i

This gives us the complementary function:

Qc (t) = e−20t (C1 cos (15t) + C2 sin (15t))

For the particular solution we try Qp (t) = A cos (10t) +B sin (10t) where A and B are undeter-

mined constants which need to be found.

Q′ (t) = −10A sin (10t) + 10B cos (10t)

Q′′ (t) = −100A cos (10t)− 100B sin (10t)

substituting intod2Q

dt2+ 40

dQ

dt+ 625Q = 100 cos (10t)

we obtain

−100A cos (10t)− 100B sin (10t) + 40 (−10A sin (10t) + 10B cos (10t))

+625 (A cos (10t) +B sin (10t)) = 100 cos (10t)

(525A+ 400B) cos (10t) + (−400A+ 525B) sin (10t) = 100 cos (10t)


So

525A+ 400B = 100 or 21A+ 16B = 4

−400A+ 525B = 0 or − 16A+ 21B = 0

A =

∣∣∣∣∣ 4 16

0 21

∣∣∣∣∣∣∣∣∣∣ 21 16

−16 21

∣∣∣∣∣B =

∣∣∣∣∣ 21 4

−16 0

∣∣∣∣∣∣∣∣∣∣ 21 16

−16 21

∣∣∣∣∣=

84

697B =

64

697

Q (t) = Qc (t) +Qp (t)

= e−20t (C1 cos (15t) + C2 sin (15t)) +84

697cos (10t) +

64

697sin (10t)

We now find the values of C1 and C2 given Q (0) = 0 and Q′ (0) = 0

Substituting: Q (0) = 0

C1 +84

697= 0

Finding Q′ (t)

Q′ (t) = −20e−20t (C1 cos (15t) + C2 sin (15t)) + 15e−20t (−C1 sin (15t) + C2 cos (15t))

− 840

697sin (10t) +

640

697cos (10t)

Substituting: Q′ (0) = 0

0 = −20C1 + 15C2 +640

697

Solving

C1 = − 84

697

−20C1 + 15C2 = −640

697

we obtain C1 = − 84

697and C2 = − 464

2091

giving

Q (t) = e−20t(− 84

697cos (15t)− 464

2091sin (15t)

)+

84

697cos (10t) +

64

697sin (10t)

Graphs of Q (t) and Qp (t)

Note that e−20t → 0 as t→∞ and both cos (15t) and sin (15t) are bounded functions.

So for large values of t, Q (t) ≈ Qp (t) , and for this reason, Qp (t) is called the steady state

solution.


We have so far considered only one type of externally forcing for our non-homogeneous problem,

namely sinusoidal forcing.

Fortunately we can cover a little more ground than this. Experience tells us that the method

of undetermined coeffi cients can readily be employed when the forcing function is a polynomial

or exponential, in addition to sinusoidal. Note that more complicated forcing may, hopefully, be

readily approximated by some series involving this base functions.

Summary of undetermined coeffi cients

f (t) try sp(t)

acos (kt) +b sin (kt) sp(t) = Acos (kt) +Bsin (kt)

antn + · · ·+ a1t+ a0 sp(t) = Ant

n + · · ·+A1t+A0

ekt sp(t) = Aekt

ekt (acos (ωt) +b sin (ωt)) s(t) = ekt(A cos(ωt) +B sin(ωt))

As the table suggests, if the forcing term f(t) is a polynomial, we anticipate that the particular

solution will be of this form (and degree) too.

Example Find the general solution to the ODE:

d2x

dt2+ 6

dx

dt+ 9x = t2

and this has characteristic equation

λ2 + 6λ+ 9 = 0

λ = −3,−3

xc (t) = C1e−3t + C2e

−3tt

A particular solution xp (t) must take the form xp (t) = at2+bt+c where a, b, c are undetermined

constants which need to be found.

x′p (t) = 2at+ b

x′′p (t) = 2a

Substituting:d2x

dt2+ 6

dx

dt+ 9x = 2a+ 6 (2at+ b) + 9

(at2 + bt+ c

)= 9at2 + (9b+ 12a) t+ 2a+ 9c+ 6b ≡ t2

9a = 1

9b+ 12a = 0

2a+ 9c+ 6b = 0

a =1

9, b = − 4

27, c =

2

27

x (t) = xc (t) + xp (t) = C1e−3t + C2te

−3t + +1

9t2 − 4

27t+

2

27

If the forcing term f(t) is an exponential, we anticipate that the particular solution will be one

also.


Example: Find the general solution to the ODE

d2x

dt2+ 5

dx

dt− 6x = e−t

Solution: This has characteristic equation

λ2 + 5λ− 6 = 0

(λ+ 6) (λ− 1) = 0

λ = −6, 1

xc (t) = C1et + C2e

−6t

A particular solution sp (t) must take the form

xp (t) = ae−t

where a is the undetermined constant.

x′p (t) = −ae−t

x′′p (t) = ae−t

d2x

dt2+ 5

dx

dt− 6x = ae−t − 5ae−t − 6ae−t

= −10ae−t ≡ e−t

Clearly

a = − 1

10

x (t) = C1et + C2e

−6t − 1

10e−t

Problems arise if any term of xp (t) is a part of the complementary solution. In such a case

multiply xp (t) by t (or t2 in the case of repeated roots).

Example: Find the general solution to the ODE .

d2x

dt2+ 5

dx

dt− 6x = 7et

Solution: This has characteristic equation

λ2 + 5λ− 6 = 0

λ = −6, 1

xc (t) = C1et + C2e

−6t

A particular solution sp (t) must take the form xp (t) = a(tet)

since et is already part of the complementary solution where a is the undetermined constant.

x′p (t) = a(et + tet

)x′′p (t) = a

(et + et + tet

)d2x

dt2+ 5

dx

dt− 6x = a

(2et + tet

)+ 5a

(et + tet

)− 6a

(tet)

= 0(tet)

+ 7aet ≡ 7et

Clearly a = 1 giving x (t) = C1et + C2e

−6t + tet.


If the inhomogeneous term is composed of several functions whose particular solutions can be

individually found then we combine (add) our particular solutions.

Example: Find the general solution to the ODE .

d2x

dt2+ 5

dx

dt− 6x = t+ e−2t sin t

Solution:

Consider the equationd2x

dt2+ 5

dx

dt− 6x = t

A particular solution xp (t) must take the form xp (t) = at+ b

which leads to the particular solution: xp (t) = − 536 −

16 t

While the equationd2x

dt2+ 5

dx

dt− 6x = e−2t sin t

has a particular solution xp (t) of the form: xp (t) = ae−2t cos t+ be−2t sin t

x′p (t) = a(−2e−2t cos t− e−2t sin t

)+ b

(−2e−2t sin t+ e−2t cos t

)= (−2a+ b) e−2t cos t+ (−a− 2b) e−2t sin t

x′′p (t) = (−2a+ b)(−2e−2t cos t− e−2t sin t

)+ (−a− 2b)

(−2e−2t sin t+ e−2t cos t

)= (3a− 4b) e−2t cos t+ (4a+ 3b) e−2t sin t

now substitute into d2xdt2

+ 5dxdt − 6x :

[(3a− 4b) + 5 (−2a+ b)− 6a] e−2t cos t+ [(4a+ 3b) + 5 (−a− 2b)− 6b] e−2t sin t

= (−13a+ b) e−2t cos t+ (−a− 13b) e−2t sin t

so −13a+ b = 0

−a− 13b = 1

from which we obtain: a = − 1170 , b = − 13

170

which leads us to the particular solution:

xp (t) = ae−2t cos t+ be−2t sin t

xp (t) = e−2t(− 1

170cos (t)− 13

170sin (t)

)Now we add particular solutions:

d2x

dt2+ 5

dx

dt− 6x = t+ e−2t sin t so xp (t) = − 5

36− 1

6t+ e−2t

(− 1

170cos (t)− 13

170sin (t)

)The full general solution is

x (t) = xc (t) + xp (t)

= C1et + C2e

−6t − 5

36− 1

6t+ e−2t

(− 1

170cos (t)− 13

170sin (t)

)Note: inhomogenous terms such as ekt sinωt or ekt cosωt can be much better handled using the

complex exponential which is given as an alternative to the above working.


Alternative (finding the particular solution of ekt sinωt or ekt cosωt forms) using complex expo-

nentials:

For the particular solution of d2xdt2

+ 5dxdt − 6x = e−2t sin t first we rewrite the equation as

d2xdt2

+ 5dxdt − 6x = e−2t (cos t+ i sin t) = e−2t · eit and we wish to take the imaginary part only.

We try xp (t) = αe(−2+i)t where α is an unknown complex constant.

Now

d2x

dt2+ 5

dx

dt− 6x = α (−2 + i)2 e(−2+i)t + 5α (−2 + i) e(−2+i)t − 6αe(−2+i)t

= α[(−2 + i)2 + 5 (−2 + i)− 6

]e(−2+i)t

= α [(4− 4i− 1)− 10 + 5i− 6] e(−2+i)t notice this step

= α (−13 + i) e(−2+i)t

now introduce the RHS: ≡ e−2t · eit = e(−2+i)t

Now equate coeffi cients:

α (−13 + i) = 1

α = 1−13+i

= 1−13+i ×

−13−i−13−i

= 1132+1

(−13− i)

= 1170 (−13− i)

So for the particular solution of d2xdt2

+ 5dxdt − 6x = e−2t sin t we use the imaginary part of

αe(−2+i)t =(−13170 −

1170 i

)e−2t (cos t+ i sin t) .

The ‘i’term is[(− 1170 i

)(cos t) +

(−13170 i

)(sin t)

]e−2t so the imaginary part of αe(−2+i)t is(

− 1170 cos (t)− 13

170 sin (t))e−2t = xp (t) as before.


ENG1091 - Lecture Notes 2011

Documents