16. Systems of Linear Equations 1 Matrices and Systems of … · 2013-04-01 · March 31, 2013 16-1 16. Systems of Linear Equations 1 Matrices and Systems of Linear Equations An m

March 31, 2013 16-1

16. Systems of Linear Equations

1 Matrices and Systems of Linear Equations

An m× n matrix is an array A = (aij) of the forma11 · · · a1n

a21 · · · a2n

. . .

am1 · · · amn

where each aij is a real or complex number.We sometimes call such an array A an m by n matrix.The matrix has m rows and n columns. The numbers m and n are called

the row dimension and column dimension of A.For 1 ≤ i ≤ m, 1 ≤ j ≤ n,, the m× 1 matrix

a1j...amj

is called the j − th column of A and the 1× n matrix

(ai1 . . . ain)

is called the i− th row of A.If A = (aij) is an m × n matrix, then its transpose, denoted AT is the

n×m matrix defined by AT = (atij) = aji for each i, j.If the row and column dimensions of the matrix A are equal, then we

call A a square matrix, and we call the common value of its row and columndimensions, its dimension. We will see that square matrices have specialproperties.

We can add m × n matrices as follows. If A = (aij) and B = (bij), thenC = A+B is the matrix (cij) defined by

cij = aij + bij.

We can only multiply matrices A and B if A is m × n and B is n × p.That is, the number of columns of A is the same as the number of rows of B.

March 31, 2013 16-2

In that case, if A = (aij) and B = bjk, then C = A ·B is the m×p matrixC = (cik) defined by

cik =n∑j=1

aijbjk.

Thus the element cik is the dot product of the ith row of A and the jthcolumn of B.

Both the operations of matrix addition and matrix multiplication areassociative. That is,

(A+B) + C = A+ (B + C), (AB)C = A(BC).

Multiplication of matrices is not always commutative, even for squarematrices. For instance, if

A =

[1 10 1

]and B =

[1 01 1

],

then,

AB =

[2 11 1

]and BA =

[1 11 2

].

Let us consider some matrices A,B and illustrate these concepts.Example 1

A =

[2 31 2

], B =

[3 −1 21 2 3

]

C = A ·B =

[9 4 135 3 8

]

B · A not defined

AT =

[2 13 2

]Example 2

A =

2 −1 31 2 13 −2 2

, B =

1−3−2

March 31, 2013 16-3

C = A ·B =

−1−7

5

AT =

2 1 3−1 2 −2

3 1 2

, (A.B)T =[

1 −3 −2]

Fact. (A ·B)T = BT · AT .Let ei be the n−vector with zeroes everywhere except in the i−th position

and a 1 there. This is called the standard i−th unit n−vector.The n × n matrix whose i−th row consists of a 1 in the i−th position

and zeroes elsewhere is called the n × n identity matrix, and is denoted In(or simply I if the context makes the size clear).

For any m× n matrix A we have

ImA = AIn = A.

2 Multiplication of matrices by row and col-

umn vectors

Let p and n be positive integers.Let u1,u2, . . . ,un be n vectors in Rp, and let a1, a2, . . . , an be n real

numbers.The expression

u = a1u1 + a2u2 + . . .+ anun

is called the linear combination of the vectors

{u1,u2, . . . ,un}

with coefficients {a1, a2, . . . , an}.Any expression of the above form is called a linear combination of vectors

in Rp.It is useful to note the following properties of matrix multiplication.

March 31, 2013 16-4

Let A = (aij) be m× n and let B = (bjk) be n× p. Then, of course, C ism× p.

Let Cri be the i−th row of C and Ari be the i−th row of A, then

Cri = Ari ·B

Similarly, if Ccj is the j−th column of C and Bc

j is the j−th column ofB, then

Ccj = A ·Bc

j

We wish to write these matrix expressions as certain linear combinations.From the definitions, it follows that

Cri = ai1B

r1 + ai2B

r2 + . . .+ ainB

rn =

n∑j=1

aijBrj , (1)

and

Ccj = b1jA

c11r + b2jA

c2 + . . .+ bnjA

cn =

n∑i=1

bijAci (2)

Thus, we see thatThe i−th row of A · B is the linear combination of the rows of B with

coefficients given by the i−th row of AandThe j−th column of A · B is the linear combination of the columns of A

with coefficients given by the j−th column of B

3 Some properties of square matrices

An n× n matrix A is invertible if there is another n× n matrix B such thatAB = BA = I. We also call A non-singular. A singular matrix is one thatis not invertible.

The matrix B is unique and called the inverse of A. It is usually writtenA−1.

It is a fact that, if A and B are two n × n invertible matrix, then theirproduce A ·B is also invertible, and we have the formula

(A ·B)−1 = B−1 · A−1

March 31, 2013 16-5

There is a useful number which we can associate to a square matrix, calledits determinant. This is often written det(A).

For a 1× 1 matrix A = (a11), we set det(A) = a11.

If A =

[a11 a12

a21 a22

], we set

det(A) = a11a22 − a12a21

We define det(A) inductively for matrices of higher dimension.Assuming that we know det(B) for all n × n matrices, let A be an (n +

1)× (n+ 1) matrix.Define

det(A) =n+1∑i=1

(−1)i+1ai1det(A(i | 1)

where A(i | 1) is the n × n matrix obtained by deleting the i − th rowand first column of A.

This is called expansion by minors along the first column of A.One gets the same answer by expanding by minors along any row or

column.As an example of expansion along the second row (assuming that it ex-

ists), take

det(A) =n+1∑j=1

(−1)2+jdet(A(2 | j)).

An easy way to remember the signs in the preceding summation is tomake an | × |n S matrix whose entries are either ”+” or ”-” as follows. LetS11 = ” + ”. Make S12 = ”− ”, S13 = ” + ”, and continue along the first row.For the second row, start with ”-” and alternate along this row. Continue inthis way to obtain the whole matrix S.

Examples of S for 2× 2 and 3× 3 matrices are

[+ −− +

],

+ − +− + −+ − +

.Let 0 denote the n−vector all of whose entries are 0.

March 31, 2013 16-6

A collection u1,u2, . . . ,uk of vectors in Rn is called a linearly independentset of vectors in Rn, if whenever we have a linear combination

α1u1 + . . .+ αkuk = 0,

with the α′is constants (scalars) we must have αi = 0 for every i.

Fact. The following conditions are equivalent for an n × n matrix A tobe invertible.

1. the rows of A form a linearly independent set of vectors

2. the columns of A form a linearly independent set of vectors

3. for every vector b, the system

Ax = b

has a unique solution

Here we are writing b as a column vector (or as an n× 1 matrix.

4. det(A) 6= 0

The function det maps the set of square matrices with real entries to thereal numbers. (If the matrix A has complex entries, then det(A) is a complexnumber. In this case, the determinant function has similar properties. Forsimplicity, we emphasize real matrices).

Additional properties of determinants:

1. For any square matrix A, det(A) = det(AT ) (here AT is the transposeof A).

2. For any two square matrices A and B of the same dimension, det(A ·B) = det(A) · det(B).

3. det(In) = 1 for any n.

4. For an n×n matrix A, and, for each 1 ≤ i ≤ n, let Ai be its i−th row.

We may consider the determinant function as a function of the rows.That is, we may write

March 31, 2013 16-7

det(A) = det(A1, A2, . . . , An).

Now, fix some i with 1 ≤ i ≤ n.

Suppose that A and B are n × n matrices which only differ in theiri − th rows. That is, for j 6= i, Aj = Bj. Let C be the matrix whosei−row is the sum aAi + bBi, where a and b are constants, and whoseother rows also equal the corresponding rows of A and B.

Then,det(C) = a det(A) + b det(B)

This property of determinants is called multi-linearity as a function ofthe rows.

Replacing rows by columns, we also get the det(A) is multi-linear as afunction of the columns.

5. If B is the matrix obtained from A by interchanging two rows thendet(B) = −det(A). One refers to this property by saying that det(A)is skew-symmetric as a function of the rows of A. The function a →det(A) is also skew-symmetric as a function of the columns of A.

Let us compute some determinants.

A =

[2 −13 2

], det(A) = 2(2)− 3(−1) = 7

A =

2 −1 12 −3 41 2 2

, det(A) = 2(−6− 8)− 2(−4) +−4 + 3 = −21

4 Systems of Linear Equations

We will write vectors x = (x1, . . . , xn) in Rn both as row vectors and columnvectors.

Matrices are useful for dealing with systems of linear equations.Suppose we are given a system of m equations in n unknowns

March 31, 2013 16-8

We can write the system

a11x1 + a12x2 + . . .+ a1nxn = b1a21x1 + a22x2 + . . .+ a2nxn = b2

...am1x1 + am2x2 + . . .+ amnxn = bm

(3)

as a single vector equation

Ax = b (4)

where A in the m× n matrix (aij), x is an unknown n−vector, and b isa known m−vector.

Comment: Strictly speaking, the left and right sides of (4) are m × 1matrices. It is customary to ignore this and simply refer to these as m −vectors. Sometimes, we refer to 1× n matrices as n− vectors. The contextwill make it clear what is being done.

We wish to determine whether the system (3) has a solution, and, if so,how many solutions does it have. Also, we seek a convenient way to representthe solutions.

Before we consider this in detail, we note that there is a nice geometricdescription of the expression (4).

4.1 Linear Maps and Matrices

Consider the map T (x) = A · x. This defines a map from Rn to Rm.To solve the equation (4), where A and b are given, we are looking for

a vector x ∈ Rn such that T (x) = b. That is, we want b to be in theset-theoretic image of T .

The map T has very special properties. It is what is called a linear mapor a linear transformation from Rn to Rm.

The definition is the following.

Definition 4.1 The map T : Rn → Rm is called linear if it preserves theoperations of vector addition and scalar multiplication. That is, for any twovectors x,y ∈ Rn and any real number α, we have

T (x + y) = T (x) + T (y),

March 31, 2013 16-9

and

T (αx) = αT (x)

One can easily see that maps from Rn to Rm defined using multiplicationof vectors x by m× n matrices A as above are linear maps.

Conversely, it is relatively easy to prove that every linear map T : Rn →Rm is defined by multiplication by some m× n matrix.

To be actually correct in this case, it is easier to ignore our convention ofwriting vectors as either row or column vectors, and to stick to row vectors.

That is, we write x = (x1, x2, . . . , xn) as a 1× n matrix.Then, we have the following

Proposition 4.2 If T : Rn → Rm is a linear map, then there is an n ×mmatrix A such T (x) = x · A for all x ∈ Rn.

Proof. Let {e1, e2, . . . , en} denote the standard unit vectors in Rn, andlet {f1, f2, . . . , fm} denote the standard unit vectors in Rm.

Observe that writing a vector x ∈ Rn as x = (x1, x2, . . . , xn) amounts tosame as

x =n∑i=1

xiei.

Similarly, if y ∈ Rm is written as y = (y1, . . . , ym), then this is the sameas

y =m∑j=1

yjf j.

Let T : Rn → Rm be a linear map.By linearity, we have

T (x) = T (n∑i=1

xiei) =n∑i=1

xiT (ei) (5)

Now, each T (ei) is a vector in Rm, so there are real numbers aij suchthat

T (ei) =m∑j=1

aijf j

March 31, 2013 16-10

Inserting these expressions into (5) we get

T (x) =n∑i=1

xi(m∑j=1

aijf j) =n∑i=1

m∑j=1

xiaijf j =m∑j=1

n∑i=1

xiaijf j

Let A be the n×m matrix given by A = (aij).Thinking of x = (x1, x2, . . . , xn) as a 1 × n matrix, and y = T (x) =

(y1, y2, . . . , ym) as a 1×m matrix, we have

x · A = y. (6)

QED.Remark To keep things in column vector format, we would only have to

replace the matrix A by its transpose AT and write T (x) = AT · x.

4.2 Matrix methods for systems of linear equations

Let us return to the system of linear equations (3).Our goal is to develop simple and effective ways of obtaining the solution

set of the system.We first observe that the set of solutions is unchanged it we do any com-

bination of the following operations on the system (3) (including both theleft and right sides of the equation).

1. interchange two rows

2. multiply a row by a non-zero constant

3. add a mutltiple of one equation to another.

These are called elementary row operations on the matrix.Because the solutions don’t change under these operations, we can use

them to try to simplify the equations (i.e., put them into a form where wecan determine the solutions).

First, we observe that it is not necessary to keep the variables in manip-ulation of the equations.

From the system (3), we write the following matrix

March 31, 2013 16-11

a11 a12 . . . a1n

a21 a22 . . . a2n...

am1 am2 . . . amn

∣∣∣∣∣∣∣∣∣∣b1b2...bm

(7)

This is called the augmented matrix of the system.We manipulate this matrix with the operations above to put the part

corresponding to A in a better form so that we can read off the solutions.Example 1. Consider the system

2x1 + 3x2 = 4x1 − x2 = 6

Of course, we can solve this by Cramer’s rule. We first do this. Afterward, weuse the method of row operations. This latter method provides a systematicway to handle general linear systems of equations.

Using Cramer’s rule, we first write the system in terms of matrices:[2 31 −1

] [x1

x2

]=

[46

]

Then, we get

x1 =

det

([4 36 −1

])

det

([2 31 −1

]) = 22/5, x2 =

det

([2 41 6

])

det

([2 31 −1

]) = −8/5

Next, let’s do this with row operations. We denote row i by Ri.Write the augmented matrix:

2 3 41 -1 6

Replace R1 with −2R2 +R1

0 5 -81 -1 6

March 31, 2013 16-12

We can now read off that 5x2 = −8, x1 − x2 = 6, so we can read thesolution as x2 = −8/5, x1 = 6 + x2 = 22/5.

Example 2 Use row operations to solve

2x1 + 3x2 + x3 = 2x1 − x2 − x3 = 1

3x1 − 2x2 = 3

The augmented matrix is:

2 3 1 21 -1 -1 13 -2 0 3


0 5 3 01 -1 -1 13 -2 0 3


0 5 3 01 -1 -1 10 1 3 0

Replace R2 with R3 +R2

0 0 -12 01 0 2 10 1 3 0

Now read off the solutions as:

x3 = 0x2 = −3x3 = 0x1 = 1− 2x3 = 1

or, x1 = 1, x2 = 0, x3 = 0.Example 3. Use row operations to solve the system.

March 31, 2013 16-13

2x1 + 3x2 + x3 = 2x1 − x2 − x3 = 1

3x1 + 2x2 = 3

Notice that this system only differs from that in the preceding exampleby the a change of sign in the multiplier of x2 in the third equation. Thesolutions, however, will change considerably. In Example 2, there was aunique solution. In this example, it will be seen that there are infinitelymany solutions.

We begin with the augmented matrix, and proceed to use the row oper-ations. Instead of writing each changed matrix, we combine several steps.

2 3 1 21 -1 -1 13 2 0 3

−2R2 +R1 followed by −3R2 +R3

0 5 3 01 -1 -1 10 5 3 0

Now, this reduces to the two equations

x1 = 1 + x2 + x3

5x2 + 3x3 = 0

x2 = −(3/5)x3

x1 = 1− (3/5)x3 + x3

x1 = 1 + (2/5)x3

So we see that solutions have the form 1 + (2/5)x3

−(3/5)x3

x3

=

100

+ x3

2/5−3/5

1

March 31, 2013 16-14

This shows that the set of solutions is a line in R3 running through thepoint (1, 0, 0) with the direction (2/5,−3/5, 1).

In general, the set of solutions of a system of 3 equations in 3 unknownswill be a subset of R3 of one of the following types:

1. empty (i.e., there are no solutions)

2. a point (there is a unique solution)

3. a line (there are infinitely many solutions)

4. a plane (there are infinitely many solutions)

5. all of R3 (the coefficients aij are all 0).

Example 4.Solve the system

2x1 − 2x2 + x3 = 3x1 − x2 − x3 = 1

3x1 + 2x2 + 2x3 = 2

We write the augmented matrix and do some row operations.

2 -2 1 31 -1 -1 13 2 2 2

−2R2 +R1, −3R2 +R3

0 0 3 11 -1 -1 10 5 5 -1

x3 = 1/3

x2 = −1/5− x3 = −1/5− 1/3−−3/15− 5/15 = −8/15

x1 = 1 + x2 + x3 = 1− 8/15 + 5/15 = 12/15 = 4/5.

March 31, 2013 16-15

5 Finding the inverse of a matrix

The 2× 2 matrix A is invertible if and only if det(A) 6= 0.Letting

A =

[a11 a12

a21 a22

],

we can compute A−1 from the formula

A−1 =1

det(A)

[a22 −a12

−a21 a11

].

Example 5.Find the matrix X such that[

2 1−1 2

]X =

[3 0−2 1

]

Here the matrix X is itself 2× 2. This has the form A ·X = B for 2× 2matrices.

If A is non-singular, then we get the answer from

X = A−1B.

We check non-singularity by computing det(A). We get det(A) = 5, sothe matrix is non-singular and its inverse is

(1/5)

[2 −11 2

]

so,

X = A−1B = (1/5)

[2 −11 2

] [3 0−2 1

]=

[8/5 −1/5−1/5 2/5

]

For higher dimensional matrices, the formula for the inverse is harder, soit is convenient to find another method.

Suppose that the dimension of A is n.Then, we are looking for an n× n matrix B such that A ·B = I where I

is the n× n identity matrix.

March 31, 2013 16-16

Thus, we can form a big augmented matrix of the form

A | I.

This is an n× 2n matrix (with the vertical line in the middle).Applying the row operation method for solving linear systems to this

matrix, we do a sequence of successive row modifications. If A is actuallyinvertible, then, at the end of doing these operations, we will actually get ann× 2n matrix of the form

I | B.

It turns out that the resulting matrix B is actually A−1.The proof of this is not difficult. It amounts to the following observation.Let us use the notation op to denote any of the elementary row operations.

Write A op B to mean that B is obtained from A by applying the elementaryrow operation op.

As above, let I be the n× n identity matrix.Then the following is true.Proposition Let I denote that n × n identity matrix. Let op be any

elementary row operation, and assume that A op B and I op D.Then,

B = DA

This means that we get B from A doing a left multiplication by D whereD is gotten from I via the same elementary row operation used to get B fromA.

Let us describe this in more detail. It will be convenient to have specificnotations for the matrices gotten by applying elementary row operations toI.

There are of three types:Let i, j be any two integers with 1 ≤ i ≤ n and 1 ≤ j ≤ n.

1. Interchange of rowsLet Pij denote the matrix obtained by interchanging the i−th and j−throws of I. (This is usually called a permutation matrix since it permutesthe rows of I).

March 31, 2013 16-17

2. Multiplication of a row by a non-zero constant.Let c be a non-zero constant, and let Ec,i denote the matrix whose j−throw is the unit vector ij if j 6= i, and whose i−th row is cei.

3. Addition of a multiple of row i to row j.Let Eci+j denote the matrix obtained from I by adding c times row ito row j.

Let us consider some examples.Example 6 Consider an interchange of the first two rows.

A =

2 3 11 −1 10 4 3

, B =

1 −1 12 3 10 4 3

The associated permutation matrix is

P12 =

0 1 01 0 00 0 1

Then, one sees that

B = P12A

Example 7. Consider the replacing row 1 by row 1 plus 2 times row 3

A =

2 3 11 −1 10 4 3

, B =

2 11 71 −1 10 4 3

The associated matrix is E23+1, with

E23+1 =

1 0 20 1 00 0 1

,and

B = E23+1A.

March 31, 2013 16-18

Try a few more examples yourself to get comfortable with this.Now, notice that each of the matrices Pij, Ec,i, Eci+j is invertible. Indeed,

E−1ij = Eij, E

−1c,i = E1/c,i, and E−1

ci+j = E−ci+j.It follows that if we do a sequence of k row modifications to a matrix A

this amounts to successively multiplying it on the left by k matrices obtainedfrom one of the three types Pij, Ec,i, Eci+j.

This means that after k such modifications, the matrix A is replaced by

D1D2 . . . DkA

where each Dk is one of the Pij, Ec,i or Eci+j we just discussed.If we had an augmented matrix

A | I

and we do the same row modifications on it, then, and the end, we get

D1D2 . . . DkA | D1D2 . . . DkI

If we end up with I on the left, then D1D2 . . . Dk = A−1. On the rightwe end up with D1D2 . . . DkI = D1D2 . . . Dk. So we have written down A−1.

5.1 A simple method to compute the inverse of a 3× 3matrix

Let A = (aij) be an invertible n × n matrix. We can use determinants andCramer’s rule to get a simple way to compute A−1 using Cramer’s rule.

We first observe that Cramer’s rule holds for every invertible square ma-trix:

The i−th coordinate of the solution to A · x = b has the form

xi =det(Ai)

det(A)

where Ai is the matrix obtained by replacing the i−th column of A bythe i−th unit column vector eTi .

Since A−1 is the solution X of the matrix equation A ·X = I, the i− thcolumn of A−1 is the solution to the matrix equation A · x = eTi .

We apply Cramer’s rule to the case of a 3× 3 invertible matrix

March 31, 2013 16-19

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

Let ui denote the i−th column of A−1.Then,

u1 =1

det(A)

∣∣∣∣∣ a22 a23

a32 a33

∣∣∣∣∣−∣∣∣∣∣ a21 a23

a31 a33

∣∣∣∣∣∣∣∣∣∣ a21 a22

a31 a32

∣∣∣∣∣

, u2 =

1

det(A)

−∣∣∣∣∣ a12 a13

a32 a33

∣∣∣∣∣∣∣∣∣∣ a11 a13

a31 a33

∣∣∣∣∣−∣∣∣∣∣ a11 a12

a31 a32

∣∣∣∣∣

u3 =1

det(A)

∣∣∣∣∣ a12 a13

a22 a23

∣∣∣∣∣−∣∣∣∣∣ a11 a13

a21 a23

∣∣∣∣∣∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣

In practice, this easy to compute. After computing the determinant of

A, it simply involves computing 9 2× 2 determinants.

5.2 The inverse of an n× n matrix A

Let A be an n× n matrix.For a given pair (i, j) of indexes (i.e., 1 ≤ i ≤ n, 1 ≤ j ≤ n), define the

n× n matrix C = Adj(A) by

Cij = (−1)i+jdet(A(j | i))

March 31, 2013 16-20

where A(j | i) is the (n − 1) × (n − 1) matrix obtained by deleting thej−th row and the i− column of A.

The matrix C = Adj(A) is called the classical adjoint of A.The following theorem can be found in most books on Matrix Theory or

Linear Algebra.Theorem For any square real or complex matrix A, we have

A · Adj(A) = det(A) · I

where I is the n× n identity matrix.If follows that if det(A) 6= 0, then

det(A−1) =1

det(A)· Adj(A) (8)

6 Eigenvalues and eigenvectors

Let A be an n× n matrix (real or complex).An eigenvector is a non-zero vector ξ such that there is a scalar r such

that Aξ = rξ. Note that the scalar r may be 0. When one can find such a ξand a scalar r, one calls r an eigenvalue of the matrix A, and one calls ξ aneigenvector of A associated to r or for r.

If ξ is a non-zero eigenvector, then the unit vector in the direction of ξ,namely ξ

| ξ | is called a unit eigenvector.

It turns out that if ξ is an eigenvector associated to the eigenvalue r, thenso is any non-zero scalar mutliple of ξ. It is sometimes useful to have a uniqueway to specify a particular vector in the set of non-zero scalar multiples ofξ. Accordingly we define the scaled version of a non-zero vector v to be thevector

]v =1

viv

where vi is the first non-zero entry in v. We use the sharp symbol frommusic to denote the scaled version of v.

For instance, we have

]

[−23

]=

(13−2

), ]

0−34

=

014−3

(9)

March 31, 2013 16-21

To understand the notion of eigenvector better, observe that the equation

Aξ = rξ

is equivalent to either of the equations

(rI − A)ξ = 0

or

(A− rI)ξ = 0

where I is the n× n identity matrix.If either of these equations had a non-zero vector ξ as a solution, then it

would follow that

det(rI − A) = 0 (10)

The expression det(rI − A) is actually a polynomial of degree n of theform

z(r) = rn + an−1rn−1 + . . .+ a0

and the above equation can be written as z(r) = 0.The polynomial z(r) is called the characteristic polynomial of the matrix

A, and its roots are the eigenvalues of A.The existence of these roots is provided by theTheorem (Fundamental Theorem of Algebra) Let

p(r) = anrn + an−1r

n−1 + . . .+ a0

be a polynomial of degree n (i.e. an 6= 0) with complex coefficientsa0, a1, . . . , an.

Then, there is a complex number α such that z(α) = 0.Remark. The Euclidean algorithm for positive integers states that given

two positive integers p > q there are integers k > 0 and 0 ≤ s < q such that

p = kq + s

There is an analogous result for polynomials. Let us denote by deg(z(r))the degree of the polynomial z(r) (with real of complex coefficients).

March 31, 2013 16-22

Euclidean Algorithm for Polynomials:Let p(r) and q(r) be two polynomials with deg(q(r)) < deg(p(r)). Then

there are polynomials k(r) and s(r) such that

p(r) = k(r)q(r) + s(r)

and deg(s(r)) < deg(q(r)).From this we have, for any complex polynomial z(r), and any com-

plex number α, there exist a complex polynomial q(r) with deg(z(r)) =deg(q(r)) + 1 and a complex complex number c such that

z(r) = (r − α)q(r) + c

Note that if α is a root of z(r) (i.e., z(α) = 0), then c = 0. That is, thepolynomial z − α is a factor of z(r).

Let us make repeated use of the Fundamental Theorem of Algebra onz(r).

There exists a root r1 of z(r) and a polynomial z1(r) such that

z(r) = (r − r1)z1(r)

Next, there exists a root r2 of z1(r) and a polynomial z2(r) such that

z(r) = (r − r1)(r − r2)z2(r)

Continuing this way, we obtain all of the roots of z(r) as can express itas

z(r) = an(r − r1)(r − r2) · · · (r − rn)

Note that the roots need not be distinct. So, we refer to this expression forz(r) as its factorization with multiplicities. We also say that the polynomialz(r) of degree n has n roots with multiplicity.

Remarks.

1. If z(r) is a polynomial of degree n with real coefficients, then its rootsmay be real or complex. If r1 = a + bi is such a complex root (i.e.,b 6= 0), the complex conjugate r̄1 = a− bi is also a root of z(r).

March 31, 2013 16-23

2. The Fundamental Theorem of Algebra is an existence theorem. It givesno information about how to find the roots of a given polynomial z(r).For degree two, one can explicitly find the roots via that quadraticformula (i.e., involving square roots of expressions involving the coeffi-cients).

For degrees three and four, there also are explicit formulas to find theroots in terms of taking expressions involving various roots of the coef-ficients. For degree greater than four, there are no such formulas thatwork in all cases. This surprising result (often called the unsolvabilityof the quintic) was proved by Abel in 1823. For more information lookup the Abel-Ruffini Theorem on Wikipedia.

Note that some texts call the polynomial z1(r) = det(A − rI) the char-acteristic polynomial of A. Since z1(r) = (−1)nz(r), these two polynomialshave the same roots, so to solve problems concerning eigenvalues, it reallydoes not matter which definition is used.

Of course, even if A is real, the characteristic polynomial z(r) may havereal or complex roots. A real eigenvalue will have associated eigenvectorswhich are also real, and a complex eigenvalue will only have associated com-plex eigenvectors (i.e. written as u1 + iu2 with u1 and u2 both real vectorsand u2 non-zero).

6.1 Simple formulas for the characteristic polynomialsof 2× 2 and 3× 3 matrices

For any matrix A = (aij), define the trace of A to be

tr(A) =n∑i=1

aii

Thus, tr(A) is simply the sum of the diagonal entries.If

A =

(a bc d

),

then

March 31, 2013 16-24

z(r) = det(rI − A) = r2 − tr(A)r + det(A).

In the 3× 3 case, formula for z(r) is a bit more complicated:If

A =

a b cd e fg h i

,then

z(r) = r3 − tr(A)r2 +

(det(

e fh i

) + det(a cg i

) + det(a bd e

)

)r − det(A)

Remark. There are also simple formulas for eigenvalues and eigenvectorsfor two dimensional square matrices. These are described in Section 17 ofthe Notes.

In order to find eigenvalues and eigenvectors for matrices A of dimensionhigher than 2, it is necessary to do more work on solving the associated linearsystems.

For instance, if

A =

1 2 22 1 31 1 0

then, the characteristic polynomial will be

z(r) = r3 − 2r2 − 8r − 5

Its roots are

r1 =3 +√

29

2, r2 =

3−√

29

2, r3 = −1

The associated three eigenvectors are found by solving the three linearsystems

(A− rI)ξ = 0

for each of the three values r = r1, r = r2, r = r3.

March 31, 2013 16-25

6.2 Subspaces of Rn, Null Space and Range of a LinearMap

It is convenient to have name for subsets of Rn which behave well undervector addition and scalar multiplication.

Definition. A subset W of Rn is called a linear subspace (or, simply asubspace) if it satisfies the following two properties.

1. For any two vectors v and w in W , we have v + w ∈ W .

2. For any vector v ∈ W and scalar α, we have αv ∈ W .

We often say that W is closed under vector addition and scalar multipli-cation.

Let W be a subspace of Rn which contains at least one non-zero vector.A basis for W is a maximal linear independent set B = {v1, v2, . . . , vk} ofvectors in W . This means that, for any vector w in W the set

B1 = {w}⋃B

is no longer linear independent. That is, we cannot increase B inside Wand keep it a linearly independent set.

It is a fact that any two bases of a subspace W have the same numberof elements. This common number is called the dimension of W . Note thesingle element subset {0} is a subspace of Rn. We define its dimension to be0.

Let B be any subset of Rn. Then, the subspace spanned by B, denotedsp(B) is defined to the set of finite linear combinations a1v1 + a2v2 + ajvjwhere ai is a scalar and vi is a vector in B for all i.

If B = {v1, v2, . . . , vk} is a finite set, then we write sp(v1, v2, . . . , vk) forsp(B).

Note that if dim(sp(B)) = d and k > d, then the set B cannot be linearlyindependent.

Examples.

1. The subspaces of R2 consist of

(a) the set {0|} consisting of the zero vector. (dimension 0)

(b) the lines through the origin (dimension 1)

March 31, 2013 16-26

(c) all of R2 (dimension 2)

2. The subspaces of R3 consist of

(a) the set {0|} consisting of the zero vector. (dimension 0)

(b) the lines through the origin (dimension 1)

(c) the planes through the origin (dimension 2)

(d) all of R3 (dimension 3)

3. Let 0 < k < n and consider the set of vectors (x1, x2, . . . , xk, 0, . . . 0)of vectors in Rn whose coordinates xi are zero for i > k. This is asubspace of dimension k spanned by the vectors e1, e2, . . . , ek where eiis the standard unit vector whose only non-zero entry is a 1 in the i−thposition.

Definition. Let T : Rn → Rp be a linear map with associated matrix Awhose j−th column Acj is T (ej). The null-space or kernel of T is the set ofvectors v in Rn such that T (v) = 0. The Range of T is the set of vectorsw ∈ Rp such that there is a v ∈ Rn such that T (v) = w.

It is easy to show that the kernel of T and the range of T are subspacesof Rn and Rp, respectively.

6.3 Some properties of eigenvalues and eigenvectors

First, we discuss some general facts about eigenvalues and eigenvectors whichare valid in any dimension.

1. Let A be an n× n matrix and let r1 be an eigenvalue of A. Let ξ andη be eigenvectors associated to r1. Then, for arbitrary scalars α, β, wehave that αξ+ βη is also an eigenvector associated to r1 provided thatit is not the zero vector.

Proof.

Let v = αξ + βη and assume this is not 0.

We have

March 31, 2013 16-27

A(v) = A(αξ + βη)

= αAξ + βAη

= αr1ξ + βr1η

= r1(αξ + βη)

= r1v

Therefore v is also an eigenvector as required.

2. Let r1 6= r2 be distinct eigenvalues of A with associated eigenvectorsξ, η, respectively. Then, ξ is not a multiple of η.

Proof.

Assume that ξ = αη for some α. Since both vectors are not 0, we musthave α 6= 0.

Now,

Aξ = r1ξ

= r1αη,

Aξ = Aαη = αAη = αr2η,

So,

r1αη = r2αη.

Since α 6= 0, and η 6= 0, we get r1 = r2 which is a contradiction.

3. A real matrix may not have any real eigenvalues, but always has com-plex eigenvalues.

In the language of subspaces, if we consider all of the eigenvectors for r1and add the zero vector, then we get a subspace of Rn. This is called theeigenspace of r1.

March 31, 2013 16-28

A simple method to find eigenvectors for 2× 2 matricesLet

A =

(a bc d

)

be a 2× 2 matrix with characteristic polynomial

z(r) = r2 − (a+ b)r + ad− bc,

and let r1 be a root of z(r).

We wish to find a vector v =

(v1

v2

)such that

Av = r1v or (A− r1I)v = 0

with I equal to the 2× 2 identity matrix.That is, we wish to solve the system of equations

(a− r1)v1 + bv2 = 0

cv1 + (d− r1)v2 = 0

This is a homogeneous system of linear equations, and, since it has asolution, the two equations must be multiples of each other. Thus, we onlyneed to solve the first equation.

Case 1: b 6= 0Since b 6= 0, we can let v1 = 1 and get v2 = r1−a

bto solve the equation.

Hence, the vector

v =

(1

r1−ab

)

is an eigenvector for r1.Note that this works whether the root is real or complex. In the complex

case, we get an associated complex eigenvector–there is no associated realeigenvector.

If the two roots of z(r) are r1, r2 with both real and distinct, then thesame formula works for each of them.

March 31, 2013 16-29

That is, an eigenvector for r1 is v1 =

(1

r1−ab

)and one for r2 is v2 =(

1r2−ab

).

If the roots are real and equal, then this gives one eigenvector v1

There are two possibilities that can occur:Either all eigenvectors are non-zero multiples of v1 or there is second

eigenvector v2 which is not a multiple of v1. In this latter case all non-zerovectors in R2 in fact are eigenvectors.

Case 2: b = 0 but c 6= 0.In this case, we use the second equation in a similar way and get the eigen-

vector v1 =

(r1−dc

1

). The cases of real and equal or complex eigenvalues

are similar as well.Examples.

1.

A =

(3 8−1 −6

)

Characteristic polynomial: z(r) = r2 + 3r − 10 = (r + 5)(r − 2),

roots: r = −5, 2

Eigenvalues and Eigenvectors:

r = −5, v =

(1r−38

)=

(1−5−3

8

)=

(1−1

)

r = 2, v =

(1r−38

)=

(1

2−38

)=

(1−1/8

)

2.

A =

(4 −11 4

)

Characteristic polynomial: z(r) = r2 − 8r + 17,

March 31, 2013 16-30

roots:

r =8±√

64− 68

2= 4± i


r = 4 + i, v =

(1

4+i−4−1

)=

(1−i

)

r = 4− i, v =

(1

4−i−4−1

)=

(1i

)

3.

A =

(6 01 6

)

Characteristic polynomial: z(r) = r2 − 12r + 36 = (r − 6)2,

roots: 6, 6


r = 6, v =

(01

)

16. Systems of Linear Equations 1 Matrices and Systems of … · 2013-04-01 · March 31, 2013 16-1 16. Systems of Linear Equations 1 Matrices and Systems of Linear Equations An m

Documents