Chapter 1 Matrix Algebra. Definitions and Operations 1.1 Matrices Matrices play very important roles in the computation and analysis of several engineering problems. First, matrices allow for compact notations. As discussed below, matrices are collections of objects arranged in rows and columns. The symbols representing these collec- tions can then induce an algebra, in which different operations such as matrix addition or matrix multiplication can be defined. (This compact representation has become even more significant today with the advent of computer software that allows simple statements such as A+B or A*B to be evaluated directly.) Aside from the convenience in representation, once the matrices have been constructed, other internal properties can be derived and assessed. Properties such as determinant, rank, trace, eigenvalues and eigenvectors (all to be defined later) determine characteristics about the systems from which the matrices were obtained. These properties can then help in the analysis and improvement of the systems under study. It can be argued that some problems may be solved without the use of matrices. How- ever, as the complexity of the problem increases, matrices can help improve the tractability of the solution. Definition 1.1 A matrix is a collection of objects, called the elements of the matrix, arranged in rows and columns. These elements could be numbers, A = 1 0 −0.3 −2 3+ i 1 2 with i = √ −1 3
106
Embed
Chapter 1 Matrix Algebra. Definitions and Operationstbco/cm5100/chapter1_2_3.pdf · Chapter 1 Matrix Algebra. Definitions and Operations 1.1 Matrices ... in which different operations
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Chapter 1
Matrix Algebra. Definitions andOperations
1.1 Matrices
Matrices play very important roles in the computation and analysis of several engineeringproblems. First, matrices allow for compact notations. As discussed below, matrices arecollections of objects arranged in rows and columns. The symbols representing these collec-tions can then induce an algebra, in which different operations such as matrix addition ormatrix multiplication can be defined. (This compact representation has become even moresignificant today with the advent of computer software that allows simple statements suchas A+B or A*B to be evaluated directly.)
Aside from the convenience in representation, once the matrices have been constructed,other internal properties can be derived and assessed. Properties such as determinant, rank,trace, eigenvalues and eigenvectors (all to be defined later) determine characteristics aboutthe systems from which the matrices were obtained. These properties can then help in theanalysis and improvement of the systems under study.
It can be argued that some problems may be solved without the use of matrices. How-ever, as the complexity of the problem increases, matrices can help improve the tractabilityof the solution.
Definition 1.1 A matrix is a collection of objects, called the elements of the matrix,arranged in rows and columns.
We restrict the discussion on matrices that contain elements for which binary oper-ations such as addition, subtraction, multiplication and division among the elements makealgebraic sense.
To distinguish the elements from the collection, we refer to the valid elements of thematrix as scalars. Thus, a scalar is not the same as a matrix having only one row and onecolumn.
We will denote the elements of matrix A positioned at the ith row and jth column asaij. We will use capital letters to denote matrices. For example, let matrix A have m rowsand n columns,
A =
a11 a12 · · · a1n
a21 a22 · · · a2n...
.... . .
...am1 am2 · · · amn
We will also use the symbol “[=]” to denote “has the size”, i.e A [=] m × n means Ahas m rows and n columns.
A row vector is simply a matrix having one row.
v = (v1, v2, · · · , vn)
If v has n elements, then v is said to have length n. Likewise, a column vector is simply amatrix having one column.
v =
v1
v2...
vn
By default, “vector” will imply a column vector, unless it has been specified to be a rowvector.
A square matrix is a matrix with the same number of columns and rows. Special casesinclude:
(b) For the operation AB, we say A pre-multiplies B and B post-multiplies A.
(c) When the number of columns of A is equal to the number of rows of B then wesay that A is conformable with B for the operation AB.
(d) In general, AB is not equal to BA. For those special cases in which AB = BA,then we say that A commutes with B.
4. Haddamard-Schur Product. Let A = (aij), B = (bij), C = (cij), then
A B = C
if and only ifcij = aijbij
Condition: A, B and C all have the same size.
5. Kronecker (direct) Product. Let A = (aij)[=]m × n then
A ⊗ B = C =
a11B a12B · · · a1nBa21B a22B · · · a2nB
......
. . ....
am1B am2B · · · amnB
6. Transpose. Let A = (aij), then the transpose of A, denoted AT , is obtained byinterchanging the position of row and column.1 For example, suppose A is given by
A =
(
a b c de f g h
)
then the transpose is given by
AT =
a eb fc gd h
1In other journals and books, the transpose symbol is an apostrophe (′), i.e. A′ instead of AT .
If A = AT , then A is said to be symmetric. If A = −AT , then A is said to beskew-symmetric.
If the elements of the matrix involves elements in the complex number field, then arelated operation is the conjugate transpose A∗ = (aji), where A = (aij) and a is thecomplex conjugate of a. If A = A∗ then A is said to be Hermitian. If A = −A∗ thenA is said to be skew-Hermitian.
7. Vectorization. Let A = (aij)[=]m × n,
x = vec(A) =
a11
a21...
am1...
a1n
a2n...
amn
8. Determinant. Let A be a square matrix of size n, then the determinant of A isgiven by
det(A) or |A| =∑
k1 6=k2 6=···6=kn
p(k1, . . . , kn)a1,k1a2,k2
. . . an,kn(1.1)
where p(k1, . . . , kn) = (−1)h is called the permutation index and h is equal to thenumber of flips needed to make the sequence k1, k2, k3, . . . , kn equal to the sequence1, 2, 3, . . . , n.
Example 1.1
Let A be a 3×3 matrix, then the determinant of A is obtained as follows:
From the definition given, we expect the summation to consist of n! terms. Thisdefinition is not usually used when doing actual determinant calculations. Instead,it is used more for proving some theorems which involve determinants. It is crucialto remember that (1.1) is the definition of a determinant ( and not the computationmethod using the cofactor that is developed below ).
9. Cofactor of aij. Let Aij↓ denote a new matrix obtained by deleting the ith row andjth column of A, then the cofactor of aij, denoted cof(aij) is given by (−1)i+j|Aij↓|.Using cofactors, the determinant of a matrix can be obtained recursively as follows:
(a) The determinant of a 1×1 matrix is equal to that element, e.g. |(a)| = a.
(b) The determinant of an n × n matrix can be obtained by column expansion
|A| =n
∑
i=1
aik cof(aik) k is any one fixed column
or by row expansion
|A| =n
∑
j=1
akj cof(akj) k is any one fixed row
10. Matrix Adjoint. The matrix adjoint of a square matrix, denoted adj(A), is obtainedby first replacing each element aij by its cofactor and then taking the transpose of theresulting matrix.
12. Inverse of a Square Matrix. The matrix, denoted by A−1, is called the inverse ofA if and only if
A−1 A = A A−1 = I
where I is the identity matrix.
Condition: The inverse of a square matrix exists only if its determinant is not equalto zero. A matrix whose determinant is zero is called a singular matrix.
Lemma 1.1 The inverse of a square matrix A can be obtained using theidentity
A−1 =1
|A| adj(A) (1.2)
(see page 23 for proof)
Note that even though only nonsingular square matrices have inverses, all square ma-trices can still have matrix adjoints.
13. Augmentation of Matrices Let matrices A[=]m× n and B[=]m× s have the samenumber of rows, then row augmentation
C = augmentrow(A,B) (1.3)
is obtained by simply attaching each row of A to corresponding rows from B, i.e.
cij =
aij if j ≤ n
bi,j−n if j > ni = 1, . . . , m; j = 1, . . . , n + s (1.4)
Likewise, for A[=]n × m and B[=]s × m, column augmentation
G = augmentcol(A,B) (1.5)
is obtained by simply attaching each columns of A to corresponding columns from B,i.e.
gij =
aij if i ≤ n
bi−n,j if i > ni = 1, . . . , n + s; j = 1, . . . , m (1.6)
As a shorthand, we will also use vertical bars to represent row augmentation andhorizontal bars to represent column augmentation, i.e.
14. Block Matrix OperationsThe reverse of the augmentation operation is the partitioning of matrices to blocks ofmatrices. Once partitioned, these block matrices obtain a set of operations that couldtake advantage of the special structure of the resulting matrix blocks.
Suppose that the matrices can be partitioned into different blocks of appropriate (i.e.conformable) sizes, then we have the following matrix block operations:
(a) Block matrix multiplication
(
A BC D
) (
E FG H
)
=
(
AE + BG AF + BHCE + DG CF + DH
)
(1.9)
(b) Block matrix determinant
Let A[=]n × n , D[=]m × m
det
(
A 0C D
)
= det(A)det(D) (1.10)
det
(
A BC D
)
= det(A)det(D − CA−1B) if A is nonsingular (1.11)
= det(D)det(A − BD−1C) if D is nonsingular (1.12)
(see page 22 for proof of (1.10) and page 23 for proof of (1.11).
(c) Block matrix inverse
Let
Ω =
(
A BC D
)
where A[=]n × n and Γ = D − CA−1B[=]m × m are both nonsingular, then
15. Derivative of a Matrix of Functions of One Variable. The derivative of amatrix is simply defined as the matrix of derivatives of each element of the matrix, i.e.
16. Partial Derivative of a Vector of Functions with Multiple Variables
Let f(x1, x2, . . . , xn) be a scalar function in which x1, x2 . . . xn are independent vari-ables, then we will denote the operation df/dx as the n × 1 row vector given by
d
dxf(x1, x2, . . . , xn) =
(
∂f
∂x1
, . . . ,∂f
∂xn
)
(1.18)
If f is a vector of m functions, then we denote df/dx as the m × n matrix given by
d
dx
f1(x1, . . . , xn)f2(x1, . . . , xn)
...fm(x1, . . . , xn)
=
∂f1/∂x1 ∂f1/∂x2 · · · ∂f1/∂xn
∂f2/∂x1 ∂f2/∂x2 · · · ∂f2/∂xn...
.... . .
...∂fm/∂x1 ∂fm/∂x2 · · · ∂fm/∂xn
(1.19)
17. Integral of a Matrix The integral of a matrix is defined as the matrix of integrals ofeach element of the matrix, i.e.
Note that the identity matrix can also be represented by In = (e1| . . . |en). Using the unitvectors, we could construct matrices known as elementary matrix operators. Thesematrices, say E, manipulates the rows of matrix A by postmultiplication to yield matrix B,i.e. B = AE. Likewise, the transpose of matrix operator E works on the columns of matrixA by premultiplication, i.e. ET A = B.
By using a sequence of elementary matrix operations, via both premultiplication andpostmultiplications, one can reduce square matrices to either an upper triangular, lowertriangular or diagonal matrices. This process is generally known as Gaussian elimination.2
1.3.1 Elementary Matrix Operators
We will group the elementary matrix operators into only three types: permutation, combi-nation and scaling matrices.
1. Permutation Operator
Epermute(k1, . . . , kn) =(
ek1|ek2
| . . . |ekn
)
k1 6= k2 6= · · · 6= kn (1.21)
When A is postmultiplied by Epermute, the columns of A will be rearranged accordingto the sequence given by k1, . . . kn. Likewise, if A is premultiplied by ET
permute, the rowsare permuted according to the given sequence.
Example 1.2
AEpermute(3, 1, 2) =
a11 a12 a13
a21 a22 a23
a31 a32 a33
0 1 00 0 11 0 0
=
a13 a11 a12
a23 a21 a22
a33 a31 a32
Epermute(3, 1, 2)T A =
0 0 11 0 00 1 0
a11 a12 a13
a21 a22 a23
a31 a32 a33
=
a31 a32 a33
a11 a12 a13
a21 a22 a23
♦♦♦
Properties:
2One can use the same procedures to reduce a non-square matrix to an upper or lower echelon form.
However, during the Gaussian elimination procedure that will be described later, weactually premultiply a matrix by Ecombine – not its transpose. Doing so will have a
different effect. For[
E[i]combine(v)
]
A, the jth row (j 6= i) of A will be changed as
follows:
anewjk = aold
jk − vjaik
Properties:
(a) Determinant
By expanding along the ith column,∣
∣
∣ Ecombine
∣
∣
∣ = 1 (1.25)
(b) Inverse
E[i]combine
(
v)
= E[i]combine
(
2ei − v)
(1.26)
(c) Decomposition
E[i]combine
(
v)
=∏
k 6=i
E[i]combine
(
vkek + ei
)
(1.27)
3. Scale Operator
Escale
(
α1, α2, . . . , αn
)
= diag(
α1, . . . , αn
)
(1.28)
where αi 6= 0
When matrix E[i]scale postmultiplies a matrix A, the ith column of A is scaled by a factor
αi. Similarly, if matrix A is premultiplied by E[i]scale, the ith row of A is scaled by a
For a matrix A[=]m×n, the Gaussian elimination (or reduction) procedure involves a seriesof premultiplication and postmultiplication by Ecombine, Epermute and Escale so that
where Q and W are products of elementary matrix operators, and r ≤ min(m,n).
Gaussian elimination procedures are useful to understand solutions of linear equations.However, these procedures can propagate round-off errors during the determination of therequired elementary operators. Hence for large matrices, with the exception of sparse matri-ces, Gaussian elimination is used with caution. Instead, there are more efficient and stablenumerical methods, e.g. singular value decomposition, that are preferred in most applicationsinvolving linear equations. The Gaussian elimination algorithm included below is just oneamong many possible methods, and hence the matrices Q and W operators are not unique.
Rather than focusing on the actual computations of Q and W , it is more instructivefor our discussion in the next chapters to note the existence of operators Q and W so thatequation (1.31) is true.
1.3.3 Gaussian Elimination Algorithm
Gaussian Elimination Algorithm : Given A[=]n × n, the following procedure obtainsthe required operators Q and W . (The algorithm for a nonsquare A is left as an exercise).
1. Initialize: M = A, β = 1, i = 1
2. Reduce M to upper triangular form:While i ≤ n
(a) Determine pivot term, β = mkj, where
(k, j) = arg
[
maxk,j>i
(|mkj|)]
If β = 0 then exit while-loop; otherwise
(b) Eliminate all terms in column j (except for the kth entry) using operator Zi
Zi = E[k]combine
(
v)
where
vℓ =
1 if ℓ = k−mℓ,j/β if ℓ 6= k
(c) Interchange column i and column j using Gi = Pi,j and interchange row i androw k using Hi = P T
i,k, where
Pi,ℓ =
I if ℓ = iEpermute(1, 2, . . . , i − 1, ℓ, i + 1, . . . , ℓ − 1, i, ℓ + 1, . . . , n) if ℓ > i
1.4 Some Important Properties of Matrix Operations
Property 1. Let A and B be square matrices of the same size, then the determinant ofthe product AB is the product of the determinants of A and B,
|AB| = |A||B| (1.32)
(see page 21 for proof)
Property 2. If any column of A is a linear combination of the other columns of A, thendet(A)=0. Likewise, if any row of A is a linear combination of the other rows of A,then det(A)=0. (see page 21 for proof)
Since matrix B has column k filled with 0’s, expanding along this column, we find|B| = 0. Using property 1, |A||Ecombine| = 0. Since |Ecombine| 6= 0, it must be that|A| = 0.
• Proof of (1.10): (cf. page 10) First, suppose A = (a), a 1 × 1 matrix. Then byexpanding along the first row,
∣
∣
∣
∣
(
a 0C B
)∣
∣
∣
∣
= a|B|
Now assume that (1.10) is true for A[=](n − 1) × (n − 1). Let
Z =
(
G 0C B
)
with G[=]n × n. By expanding along the first row,
|Z| =n+m∑
j=1
z1jcof(z1j)
where
z1j =
g1j j ≤ n0 j > n
cof(z1j) = (−1)1+j
∣
∣
∣
∣
(
G1j↓ 0C B
)∣
∣
∣
∣
, j ≤ n
then
|Z| =n
∑
j=1
g1jcof(g1j)|B| = |G||B|
Equation (1.10) is proved by induction starting from n = 1.
• Proof of (1.11): (Cf. page 10) Using (1.9), with A nonsingular,(
A DC B
)(
I −A−1D0 I
)
=
(
A 0C B − CA−1D
)
Applying (1.32) and (1.10),∣
∣
∣
∣
(
A DC B
)∣
∣
∣
∣
|I| = |A||B − CA−1D|
• Proof of the matrix inverse formula: (1.2). (Cf. page 9) Let B = A adj(A) =(bij), then
bkk =n
∑
j=1
akj cof(akj) = |A|
If i 6= j,
bij =n
∑
ℓ=1
aiℓ cof(ajℓ)
which is the same as the determinant of a matrix M having row i equal to row jobtained via expansion on row j, i.e.
M =
a11 a12 · · · a1n
a21 a22 · · · a2n...
.... . .
...ai1 ai2 · · · ain...
.... . .
...aj1 aj2 · · · ajn...
.... . .
...an1 an2 · · · ann
obtain determinant of M←− by expanding about row j
with ajk = aik
Because row j is equal to row i, property 2 implies bij = |M | = 0 for i 6= j orB=diag(|A|, |A|, . . . , |A|). Thus A adjA/|A| = I or A−1 = adjA/|A|.
• Proof of (1.40): d(Ax)/dx = A. (cf. page 20)
Let n = 1. Then x = (x1) and A = (a11, . . . , am1)T
d
dxAx =
d(a11x1)/dx1...
d(am1x1)/dx1
= A
Now assume that (1.40) is true for some A[=]m × n. Let B = (A|v), i.e. append anm × 1 vector v to the left of A, and let y = (xT , α)T be a new vector of length n + 1,where α is a scalar variable.
• Proof of (1.41): d(xT Ax)/dx = xT (A + AT ) (cf. page 20)
Let n = 1,d
dxxT Ax = 2x1a11 = xT (A + AT )
Now assume that (1.41) is true for A[=]n × n.
Let y = (xT , α)T be a new vector of length n + 1, where α is a scalar variable.Let B[=](n + 1)× (n + 1) be obtained by appending vectors of constants v and w andscalar constant β
B =
(
A vwT β
)
then
yT By = xT Ax + αwT x + xT vα + αβα
d
d
(
xα
)(yT By) =
(
∂(yT By)
∂x,
∂(yT By)
∂α
)
=(
xT (A + AT ) + α(wT + vT ) | xT (w + v) + 2αβ)
= (xT | α)
(
A + AT w + vwT + vT 2β
)
= yT
((
A vwT β
)
+
(
AT wvT β
))
= yT (B + BT )
where we took advantage of the fact that xT v and xT w are 1 × 1 matrices and aretherefore symmetric. Thus (1.41) is proved by induction from n = 1.
Exercises
E1. Show that AT A, AAT and A+AT are symmetric, and A−AT is skew symmetric.
E2. Prove that the determinant of a triangular matrix is the product of the diagonals.(Hint: Use induction.)
E11. If Ar = 0, with Ar−1 6= 0 for some positive integer r, then A is said to benilpotent of index r. Show that a triangular matrix whose diagonal elements areall zero is nilpotent.
E12. Show that equations (1.13)-(1.16) yield the required block matrices which formthe inverse of a partitioned matrix. (Hint: Substitute to show that
(
A BC D
)(
W XY Z
)
= I
is true)
E13. The Vandermonde matrix is a matrix having a special structure given by
V =
λn−11 λn−2
1 · · · λ1 1λn−1
2 λn−22 · · · λ2 1
......
. . ....
...λn−1
n λn−2n · · · λn 1
Prove that the determinant is given by
|V | =∏
i<j
(λi − λj)
(Hint: Use the fact that the product series can be rearranged to become,
∏
i<j
(λi − λj) =n
∑
k=1
(
(−1)k+1λn−1k
∏
i<j,i6=k,j 6=k
(λi − λj)
)
and then prove using induction starting with n = 2)
E14. Let A[=]m × q and B[=]q × n. Show that
d
dt(AB) = (
d
dtA)(B) + (A)(
d
dtB)
E15. Unsteady-state heat conduction in a flat rectangular plate, say of dimensionL × W , can be described by
A finite difference approximation to the differential equation (with ∆x = ∆y,L = N∆x and W = M∆x) is given by
Tn,m(k + 1) =1
µ
([
Tn−1,m(k) + (µ
2− 2)Tn,m(k) + Tn+1,m(k)
]
+[
Tn,m−1(k) + (µ
2− 2)Tn,m(k) + Tn,m+1(k)
])
(1.42)
where
µ =α(∆x)2
∆tis the modulus which needs to be > 4 for the approximation to be stable. T (k) isthe temperature distribution at time t = k∆t of the rectangular grid and T (k) isan N × M matrix of temperature values.
1. Obtain matrices A, B and C such that the finite difference equation in (1.42)can be written as
T (k + 1) = AT (k) + T (k)B + C (1.43)
2. (Computer project) Using available matrix software, obtain and animate thetemperature profiles using (1.43) from k = 0 to k = 1000, using µ = 8,N = 50 and M = 100 (To reduce memory problems store only the images atevery 10th k-increments.)
E16. The general equation of an ellipse is given by
(
x
a1
)2
+
(
y
a2
)2
− 2xy
a1a2
cos δ = sin2 δ (1.44)
where a1, a2, sin(δ) 6= 0.
Let v = (x, y)T , find matrix A such that equation (1.44) can be written as
vT Av = 1
(Note: vT Av = 1 is the general equation of a conic and if A is positive definite, aproperty to be discussed in the next chapter, then the conic is an ellipse.)
E17. Discuss the changes needed for Gaussian elimination algorithm given in sec-tion 1.3.3 to apply to nonsquare matrices of size n × m, n 6= m.
E18. Show that tr(AB)=tr(BA) (assuming conformability conditions are met).
Chapter 2
The Linear Equation: Ax=b
In this chapter, we will treat the equation,
Ax = b (2.1)
in three perspectives. The first view is to treat (2.1) as the matrix description of a setof n linear equations in which x = x1, . . . , xn is the set of unknowns which needs to beevaluated such that all the equations are satisfied. The objective is to determine the set x.If no set exists that achieves equality, the problem is said to have no solution. This view willbe referred to as the solution of linear equations.
The second view is to consider matrix A = (a1| . . . |an) as a set of column vectors, ai.The column vector, x = (x1, . . . , xn)T , is a set of scalar weights xi which linearly combinesthe columns of A. The problem is to find x such that the sum : a1x1 + . . . + anxn matchesvector b. In some cases, an exact match can not be obtained. The problem is then relaxedto that of finding the closest match, e.g. in a least squared error sense. The column vectorsof A forms the base or basis set to which an algebra is performed. This view is thus referredto as the linear algebra treatment of (2.1).
The third view is to consider matrix A as an operator which moves the position ofvector x, to a new position described by vector b. In this treatment, the problem is not somuch the evaluation of x. Instead, the main focus is the analysis of the properties of A, i.e.to determine what it does to various input vectors x as well as what output vectors b canresult. This third view is the linear operator view of (2.1).
All three views are important in engineering applications. In solving of linear equa-tions, the main purpose is the exact determination of unknown values. The second view,i.e. the linear algebra perspective, is useful for mathematical modeling where the best ap-proximation is the focus. The third view, the linear operator perspective, is useful for thedesign/redesign of operator A to achieve desired characteristics.
Unfortunately, the coexistence of these different views can cause confusion. One sourceof confusion is the term – coefficients. When solving linear equations, the rows of A are the
coefficients of each equation. However, when taking the linear algebra view, x is the set ofweights, or coefficients, and not matrix A.
Since all three viewpoints consider exactly the same equation (2.1), it is inevitablethat the tools developed within each perspectives will enter in the treatment of the otherperspectives. Thus it is important to be able to distinguish among the three perspectiveswhile still being able to temporarily switch views when needed.
2.1 Solution of Linear Equation.
For a given set of linear equations,
a11x1 + a12x2 + · · · + a1nxn = b1
a21x1 + a22x2 + · · · + a2nxn = b2...
......
an1x1 + an2x2 + · · · + annxn = bn
(2.2)
where aij and bi are fixed constants, the problem is to find the values of x1, x2, . . ., xn whichsatisfy all the equations simultaneously.
The set of equations in (2.2) can be written in matrix notation,
Ax = b
where
A =
a11 a12 · · · a1n
a21 a22 · · · a2n...
.... . .
...an1 an2 · · · ann
x =
x1
x2...
xn
b =
b1
b2...bn
Three case are possible: (1) there is a unique solution, (2) there are multiple (indeedinfinite) solutions, and (3) there are no solution possible that would satisfy all the givenequations. Matrix theory will be used to determine which case the problem falls under, aswell as the solutions (if they exist).
Case 1: Unique Solution Exists.
If A−1 exists, then the solution is unique and is immediately obtained by pre-multiplying (2.1) by A−1 to yield
x = A−1b (2.3)
In some applications, one might be interested only in a much smaller subset ofx1, . . . , xn. A method known as Cramer’s rule is useful in these instances.
If A is nonsingular then the jth element of the solution vector x for Ax = b isgiven by
xj =|Ω(j)||A| (2.4)
where Ω(j) is matrix A whose jth column is replaced by vector b.
(see page 64 for proof.)
Case 2: Multiple Solutions.
If A is singular then either an infinite number of solutions exist or no solutionexists. Recall from the Gaussian elimination procedure discussed in section 1.3.2that there exists nonsingular matrices Q and W such that
QAW =
(
Ir 00 0
)
(2.5)
(if A is nonsingular then r = n and QAW = I)
Applying these matrices on (2.1),
(QAW )(W−1x) = Qb
If we let y = W−1x and partition y and Q as follows:
y =
(
y(1)
y(2)
)
Q =
(
Q(1)
Q(2)
)
where y(1)[=]r × 1 and Q(1)[=]r × n. This results in two sets of matrix equations
y(1) = Q(1)b (2.6)
0y(2) = Q(2)b (2.7)
Equation (2.7) can now be used to determine if a solution exist. A solution ispossible only if Q(2)b = 0.
If consistency is present, the solution is given by
x = W
(
Q(1)by(2)
)
in which y(2) becomes an (n − r) × 1 vector of arbitrary constants.
and r = 3. Q(1) consists of the first three rows of Q while Q(2) consists of thelast row of Q. Since
Q(2)b = (1,−2/5,−4/5, 0)
810514
= (0)
there are multiple solutions. Setting y(2) = (α),
x =
0 1 0 10 0 0 10 0 1 −11 0 0 0
0 −7/5 1/5 10 −6/5 −2/5 10 3 0 −2
810514
α
=
αα
2 − α1
♦♦♦
If one is interested only in determining whether a solution exists or not, another index,called the rank, can be used. This index has already appeared earlier as r in (2.5)
Definition 2.1 The rank of a matrix A, denoted rank(A) is the size of the largestnonsingular square submatrix obtained from the columns and rows of A.
With this definition, and the property that |AB| = |A||B|, one could see that with Bnonsingular,
rank(A) = rank(AB) = rank(BA) (2.8)
Since Q and W used in the Gaussian elimination procedure are nonsingular, the size of theidentity matrix Ir in (2.5) turns out to be the rank of A.
Next, obtain a partitioned matrix
G = (A | b)
and
QG = (QA | Qb) =
(
V Q(1)b0 Q(2)b
)
(2.9)
where V is a matrix formed by extracting the first r rows of W−1, and rank(V ) = rank(QA) =r. From (2.9) we arrive at the rank test for the existence of solutions.
Theorem 2.2 A solution exist for (2.1) if and only if
rank(A) = rank((A | b)) (2.10)
For the special case of b = 0, the rank condition (2.10) is immediately satisfied. If Ais nonsingular, the only solution is x = 0. However, if A is singular, nonzero solutions exist,even though x = 0 still satisfy Ax = 0. x = 0 is referred to as the trivial solution sincethis always true for Ax = 0, while the nonzero solutions, x 6= 0, are called the nontrivialsolutions.
Case 3: No Solution.
As a corollary to theorem 2.2 we have
Corollary 2.3 A solution does not exist for (2.1) if and only if
rank(A) < rank((A | b))
Suppose there is additional knowledge on which equations may be unreliable, sayequation j. One could set bj as an unknown to determine a new value for bj to satisfy therank test (2.10), i.e. determine bj such that Q(2)b = 0.
Matrix A is the same as in example 2.1, so we can use Q and W determined fromthat example. However, there is no solution for this example because Q(2)b =(−19/5) 6= (0).
Suppose the second equation is unreliable. We want to find b2 so that Q(2)b = 0.
Q(2)b = (1,−2/5,−4/5, 0)
3b2
66
=
(
− 9 + 2 ∗ b2
5
)
i.e. set b2 = −9/2. With this new vector b, the case of multiple solutions can nowbe solved.
♦♦♦
The results for the solution Ax = b for x can be generalized for other linear problems.For instance, the Lyapunov equations for the unknown matrix X are given by the form
BX + XC = E (2.11)
where B[=]n × n, C[=]m × m, X[=]n × m and E[=]n × m. Except for the special casewhen either B or C are identity matrices, one can not solve (2.11) by direct matrix inversionapproach. However, using the properties of Kronecker products (cf. page 6)and vectorizationoperations (cf. page 7), one can easily transform problems like (2.11) into the familiar formof Ax = b. This transformation depends on the two lemmas given below.
Lemma 2.1 For matrices X[=]m × n and Y [=]m × n,
vec (X + Y ) = vec (X) + vec (Y ) (2.12)
Lemma 2.2 For matrices B[=]p × q, X[=]q × r and C[=]r × s,
Returning to (2.11), we can now apply both lemmas and obtain
vec (BX + XC) = vec (E)
(Im ⊗ B) vec(X) +(
CT ⊗ In
)
vec(X) =
Ax = b
where,A = Im ⊗ B + CT ⊗ In
and x = vec (X), b = vec (E). With this transformation, the same criterion can be usedon A to determine whether the linear equation has a unique, non-unique or no solution. Ofcourse after x has been determined, an additional step is required to reverse the vectorizationoperation to reshape x to have the same size of X.
Exercises
E1. An N -multistage evaporator system is shown in Figure 2.1. Brine having a mass
Figure 2.1: Flowsheet of an N-stage evaporator system.
fraction xLN+1salt is fed to stage N at a rate nLN+1
. Condensation of steam fromstage (i − 1) is used to evaporate some water that is in the brine fed to stage i.By fixing the pressure at each stage i and assuming that the enthalpy of the brine
where nV0can be extracted as the last element of n = G−1v.
G is of size (2N + 1)× (2N + 1) but relatively sparse. There exists numericalmethods which could take advantage of the sparseness. Another method isto manipulate the matrix equations. Show that nV0
can be obtained from(2.18),(2.19) and (2.20) to be
nV0=
xLN+1nLN+1
− gT (AC − B)−1(Af − h)
gT (AC − B)−1q(2.25)
Note that equation (2.25) involves the inverse of a smaller matrix (AC − B)that is of size N × N .
(Hint: first solve for nV in equation (2.19), then substitute into (2.18). Finally,substitute into (2.20) and solve for nV0
.)
4. Show that you can also derive equation (2.25) using Cramer’s rule togetherwith equations (1.11) and (1.13)-(1.16).
E2. For the multiple reversible first-order batch reaction shown in Figure 2.2, the
Figure 2.2: A schematic of three-component reversible reaction.
with xi being the concentration of component i (in units of mass/volume) and kij
being the specific rate constant of the formation of j from i. By noting that Ais singular, the equation should yield multiple solution. However, starting withinitial concentration xio for component i, the equilibrium is unique. What is themissing equation? Is it possible, even with the additional equation, that a case ofnon-unique solution or a case with no solution may still occur? Explain.
E3. Let A[=]n × n, X[=]n × m and C[=]n ×m, show that equation AX = C can berewritten as
(Im ⊗ A)x = c
where x = vec(X) and c = vec(C).
E4. Let A[=]n× n, B[=]m×m, G[=]n× n, H[=]m×m X[=]n×m and C[=]n×m.To solve for X on
AX + XB + GXH = C
determine D such thatDx = c
where x = vec(X) and c = vec(C).
2.2 Linear Algebra Interpretation of Ax=b
The linear algebra perspective to the equation, Ax = b is specially useful to mathematicalmodeling where linear regressions are needed to determine unknown model parameters.
To illustrate, suppose a study has gathered some experimental data shown in Table 2.1to relate the effects of temperature and pressure on concentration.
The investigator wishes to obtain a linear model given by
C = αT + βP + γ (2.26)
that would best fit the data. To determine the unknown parameters of the model (2.26, letus first assume that the model will fit each data point.
It is highly unlikely that all the equations in (2.27) can be satisfied simultaneously.Further, matrix A is not a square matrix and does not have an inverse. What we will developin this chapter is a pseudo-inverse, A†, which acts almost like an inverse in the sense thatx = A†b becomes the best approximate solution.
Consider a set of vectors a1, a2, . . ., am, each having n elements,
a1 =
a11
a12...
a1n
; a2 =
a21
a22...
a2n
; ; am =
am1
am2...
amn
(2.28)
Each vector describes a point in an n-dimensional space.
1. Linear Combination and Basis Vectors.
A linear combination of vectors a1, . . . , am (each of dimension n ≥ m) is a weightedsum these vectors to yield another vector, say b, of the same dimension,
b = x1a1 + · · · + xmam (2.29)
where x1, x2, . . . , xm are scalars.
2. Span
In (2.29), if each scalar xi is allowed to take on a range of values, then vector b willreside in a subspace of dimension less than or equal to m. This subspace is called thespan and it contains all the vectors resulting from a linear combination of the spanningvectors including the origin.
Span(a1, . . . , am) = x1a1 + . . . + xmam (2.30)
where scalars, xi, are real or complex depending on the application.
3. Euclidean Norm
The norm of a vector is the distance of the point from the origin. One specific measure,called the Euclidean norm, uses the Pythagorean theorem to obtain the distancebetween two points: one point is the position represented by the vector and the otherpoint is the origin. The Euclidean norm 1 for vector ai, denoted by ‖ai‖, is evaluatedas follows:
‖ai‖ =√
a2i1 + · · · + a2
in (2.31)
Example 2.3
1 This norm is also known as 2-norm since each element is raised to the second power and is also denotedby ‖ai‖2.
Since there are three elements in vector a4, the dimension is n = 3. The norm ofa4 is given by
‖a4‖ =√
(−2)2 + 22 + 12 = 3
Figure 2.3 shows the point described by vector a4 and its norm.
Figure 2.3: The norm of vector a4.
The span of a1, a3 and a4 is given by
q = Span(a1, a3, a4) = x1a1 + x3a3 + x4a4
=
2x1 + 2x3 − 2x4
−2x1 + 2x3 + 2x4
2x1 − 2x3 + x4
which turns out to be the whole three dimensional space as x1, x3 and x4 takeon any real values. This is shown in Figure (2.4). Note that the vectors a1, a3
It turns out that the points a1, a2 and a3 plus the origin, lies in a two dimensionalplane described by
y + z = 0
Thus the dimension of the span space could be less than m, the number ofspanning vectors. It turns out that in this example, a3 = −a1 − 2a2, i.e. a3 islinearly dependent on vectors a1 and a2. The notion of linear independence isdefined more formally below.
♦♦♦
Definition 2.2 A set of vectors a1, a2, . . . , am are linearly independent if theonly possibility for the linear combination
x1a1 + x2a2 + · · · + xmam = 0
to be true is for the scalars x1, . . . , xm to be all zero. Otherwise, the set is linearlydependent.
If the spanning vectors a1, a2, . . . , am are linearly independent, then they are called the basisvectors.
One method for checking linear independence is through the use of the Grammianmatrix.
(In this section, we will restrict the elements of the vectors and matrices to be real numbers.)
The corresponding scalar coefficients of a linear combination can be collected into avector x
x =
x1...
xm
and if x is allowed to be arbitrary, the span is given by the product, Ax.
If vector b is included in the span Ax, equality for Ax = b is possible. On the otherhand, if vector b is not in region inside the span, the best one could do is to find a point inAx that is closest to the point b. This means that a value for x has to be found that wouldminimize the error vector
e = Ax − b
Figure 2.6: Graphical interpretation of the least square problem.
One choice for measuring the error is the Euclidean norm. However, for simpler cal-culations, the problem does not change if we use the square of the norm instead. Theminimization problem is then formulated as
xlsq = arg(minx
‖Ax − b‖2)
This is often referred to as the least squares problem and xlsq is referred to as the leastsquares solution.2
To find the minimum of a function that is dependent on only one variable, the resultfrom calculus states that at the minimum point, the derivative has to be zero, while at thesame time the second derivative has to be positive (i.e. concave upwards). For situationsin which the function to be minimized depends on more than one variable, we will need an
2Strictly speaking, it should be “least squares error” problem and solution, but the term “error” is droppedsince the method applies to more general situations where the objective functions do not involve errors.
extension that is given in theorem 2.5. It states that the partial derivatives with respect toeach independent variable has to be zero. In addition, for the function to attain minimum,it is sufficient that matrix containing the second order partial derivatives, called the Hessian,will have to attain a condition called positive definiteness.
Definition 2.4 A matrix P [=]n × n is positive definite if xT Px > 0 for allx 6= 0.
Example 2.4
Let A be given by
A =
(
3 −1−1 2
)
then
xT A x = (x1, x2)A
(
x1
x2
)
= 3x21 − 2x1x2 + 2x2
2
= 2x21 + (x2
1 − 2x1x2 + x22) + x2
2
= 2x21 + (x1 − x2)
2 + x22
Thus A is positive definite because xT Ax is always positive as long as x1 and x2
are not both zero. The process just shown is called completing the squares. (This method can be too complicated for large dimensions. There are other moreefficient methods for determining positive definiteness of square matrices whichwill be discussed in later sections. ) To show why positive definiteness is anextension to the concept of concavity, a plot of xT Ax is shown in Figure 2.7.
♦♦♦
Theorem 2.5 Let f(x) be a scalar function of vector x. Then xopt minimizesthe value of f if
df
dx(x = xopt) = 0
and the Hessian matrix H is positive definite, where H is given by
Figure 2.7: Plot of xT Ax, where A is positive definite.
Expanding the square of the norm of vector e,
‖e‖2 = (Ax − b)T (Ax − b)
= (xT AT − bT )(Ax − b)
= xT AT Ax − 2bT Ax + bT b
d
dx‖e‖2 = 2xT AT A − 2bT A
After taking the transpose and equating to zero,
AT A xlsq = AT b (2.32)
If AT A is nonsingular,
xlsq = (AT A)−1AT b (2.33)
This equation is referred to as the normal equation.3 To determine if the solution (2.33)indeed yields a minimum, we need to check whether
d
dx
(
d
dx‖e‖2
)
= 2AT A
3 The term is due to the matrix AT A being a normal matrix. As to be discussed later, a matrix N isnormal if NT N = NNT of which symmetric matrices are special cases.
is positive definite. If AT A is nonsingular, it is immediately a positive definite matrix. Toshow this, let s = Ax.
p = xT (AT A)x
= (xT AT )(Ax)
= sts
and p > 0 as long as s 6= 0.
Recall also that AT A is the Grammian for the columns of A. Thus, as long as thecolumns of A are linearly independent, AT A is guaranteed to be nonsingular and positivedefinite. This means that the normal equation yields a least squared error solution to Ax = b.
Let us take another look at the normal equation (2.33). The product [(AT A)−1AT ]acts like an inverse in solving the problem, Ax = b. This group of terms is often referred toas the pseudoinverse of A, often denoted as A†. If A is square and nonsingular, the inverseof A and pseudoinverse of A are exactly the same:
A† = (AT A)−1AT = A−1(AT )−1AT = A−1
For more general cases, e.g. including when AT A is singular, the pseudoinverse (orsometimes called the Moore-Penrose generalized inverse) is defined as the matrix A† thatsatisfies the following conditions:
1. AA† and A†A are Hermitian
2. AA†A = A
3. A†AA† = A†
2.2.3 Linear-in-parameters Model
Often times, the models required to fit the data can be rearranged into a form that is linearin the unknown parameters.
α1, . . . , αm are the unknown parameters. y1, . . . , yn are the variables that are observed in anexperiment. g, f1, . . . , fm are functions which do not contain any unknown parameters.
Once in this form, the matrix notation becomes Ax = b, where
The subscript [i] denotes evaluation of the function using the values observed at the ithsample, and T > m is total number of samples in the experiment. The unknown parameterscan then be easily determined using the normal equation, as long as AT A is nonsingular, i.e.the columns of A need to be linearly independent.
Example 2.5
Consider the raw data from the laboratory for the determination of vapor pressureas a function of temperature:
Table 2.2: Raw Data from Vapor Pressure Experiment.
Suppose one wishes to determine the coefficients of the Antoine Equation givenby:
log10(Pvap) = A − B
T + C(2.36)
where Pvap is the vapor pressure in mm Hg and T is the temperature in oC.The model given in (2.36) is not linear in the parameters A, B and C. We canrearrange the equation to make it amenable to the normal equation:
(T + C) log10(Pvap) = (T + C)A − B
T log10(Pvap) = −C log10(Pvap) + AT + (AC − B) (2.37)
Compared with (2.34), we have g(T, P ) = T log10(Pvap), f1(T, P ) = log10(Pvap),f2(T, P ) = T and f3(T, P ) = 1. These functions can be evaluated at each datapoint as shown in Table 2.3. The parameters are recast as α1 = −C, α2 = A andα3 = AC − B.
Table 2.3: Function Evaluations for Different Samples.
Figure 2.8: Comparison of the Antoine model and raw data.
2.2.4 Least Squares Approximation Under Equality Constraints.
Often times, the physics of the system require that the model strictly pass through specifiedpoints. For instance, for multicomponent systems an fL-fV diagram require that at least themole fraction in the liquid phase fL = 0 when the mole fraction in the vapor phase fV = 0.Likewise, fL = 1 when fV = 1. Constraints can generally be handled by using methodssuch as calculus of variations. However, in cases where the constraints can be set up to besystems of linear algebraic equations and where the model can be formulated to be in a formthat is linear in parameters, a simple modification of the normal equation offer a more directsolution.
Let x be the model parameters, i.e.
u1x1 + u2x2 + · · · + unxn = y
orUx = Y (2.38)
that needs to identified based on available measurements of ui and y. Also, suppose thatthere are r (< n) independent equality constraints, i.e.
where Qa is nonsingular. (Note: reordering or reindexing of the parameters may be necessaryin most cases to have Qa nonsingular). Then
xa = −Q−1a Qbxb + Q−1
a Z (2.39)
Let Ua be the matrix formed from the first r columns u1, . . . , ur. Let Ub be formedusing the remaining n − r columns ur+1, . . . , un. Then equation (2.38) becomes
(Ua | Ub)
(
xa
xb
)
= Uaxa + Ubxb = Y
Using xa from equation (2.39),
Ua(−Q−1a Qbxb + Q−1
a Z) + Ubxb = Y
(Ub − UaQ−1a Qb)xb = (Y − UaQ
−1a Z)
The least squares solution of xb is then given by
xb = (M tM)−1M tW (2.40)
where M = Ub − UaQ−1a Qb and W = Y − UaQ
−1a Z. Finally, xa is evaluated by substitution
of xb in equation (2.39).
Example 2.6
Consider the task of finding a second order polynomial model approximation forfL-fV diagram
fV = αf 2L + βfL + γ (2.41)
to fit the set of vapor-liquid equilibrium data for a binary system that is givenin Table 2.4.
It is required that fL = 0 at fV = 0. and fL = 1 at fV = 1. (For some systems, anazeotrope point needs to be fixed at fL,az = fV,az, 0 < fL,az < 1.) The regressionhas to obtain a model which passes through these fixed points. Based on themodel (2.41), the constraints are given as
Without imposing the constraints, i.e. applying the normal equation on Ux = Z,the fit is given by
fV = −0.6349 f2L + 1.5999 fL + 0.0262 (2.43)
The plots shown in Figure 2.9 compares the model fit that uses constraints andto the one that does not.
0 0.5 10
0.5
1
fL
f v
without constraints
using constraints
Figure 2.9: Comparison of models using constraints and not using constraints.
♦♦♦
Exercises
E1. Determine which of the following matrices are positive definite:
A =
(
3 41 2
)
B =
(
3 41 0
)
C =
(
0 2−2 0
)
E2. The linear regression model given by
y = mx + b
where x and y are the independent and dependent variables, respectively, m is theslope of the line and b is the intercept. Show that the familiar formulas
E3. From the data given in Table 2.5, obtain the parameters α1, α2 and α3 that wouldyield the least squares fit using the following model:
z = (α1w2 + α2w + α3) sin(2πw) (2.44)
E4. Given the vapor liquid equilibrium data shown in Table 2.6, obtain a 5th orderpolynomial fit that satisfies the following constraints:
fL = 0 when fV = 0fL = 1 when fV = 1
fL = 0.65 when fV = 0.65
2.2.5 Gram-Schmidt Orthogonalization
Suppose we are given a set of linearly independent vectors, say x1, x2, . . . , xn. These vectorsare basis vectors that span an n-dimensional space. However, in order to obtain a betterdescription of the span space, it is often better to use a different basis set of n vectors thatare perpendicular (or orthogonal) to each other. For instance, for models that are linear-in-parameters, if the columns of data are closer to orthogonality (a property that is definedbelow), solving the normal equation becomes more computationally stable. The Gram-Schmidt algorithm is one procedure to obtain these mutually perpendicular basis vectors.
Definition 2.5 Let a and b be two vectors of the same length which could containcomplex numbers. Then the inner product of a and b denoted by 〈a, b〉 is givenby
〈a, b〉 = a∗b (2.45)
( Note that 〈b, a〉 = 〈a, b〉 (the complex conjugate of 〈a, b〉).)
Definition 2.6 Let a and b be two vectors of the same length. Then a and b areorthogonal to each other if < a, b >= 0. A set of vectors z1, . . . , zn is called anorthonormal set if
< zi, zj >=
0 if i 6= j
1 if i = j(2.46)
Gram-Schmidt Algorithm:
Given a set of linearly independent vectors: x1, . . . , xn
For large systems, the Gram-Schmidt algorithm described above have been shown tobe sensitive to round-off errors. A modification to the Gram-Schmidt algorithm has beensuggested and yields more stable results. In addition, this modification also yields a veryuseful factoring of a matrix X = (x1| · · · |xn) containing the original vectors:
X = QR (2.47)
where Q is the matrix containing orthogonal vectors and R is an upper triangular matrix.Due to the convention used by the originators of this factorization (or decomposition), it isreferred to as the QR decomposition.
Modified Gram-Schmidt Orthogonalization (and QR Decomposition) Algorithm:
In this section we will focus our attention on matrix A in the equation, Ax = b. This viewis to treat matrix A as an operator that transforms an input vector x and outputs a vectorb as shown in Figure 2.10.
A
x
b
Figure 2.10: Matrix as an operator.
If A [=] m×n, then it transforms vectors of n dimensions into vectors of m dimensions.When A is square, the number of dimensions are preserved. If the elements of x describesthe position of a point, then the effect of A is to move it to another position, see Figure 2.11as an example. More generally, the operation is also called “mapping”.
Figure 2.11: Example of A operating on x to yield b.
In physical systems, the elements of a vector may have units attached to the scalars.For instance, let
A =
(
2g/cc 0g/cc1g/cc 1g/cc
)
; x =
(
1cc2cc
)
;
then
Ax = b =
(
2g3g
)
For this case, one should be cautious about plotting x and b on the same graph. Even thoughboth x and b have the same number of dimensions, they have different units and thereforereside in different spaces. Nonetheless, we will be dealing mostly with systems where both
the input x and output b reside in the same space. Thus, unless otherwise noted, we canvisualize A as simply repositioning x to b. Furthermore, we will mainly focus our discussionon square matrices in the sections below.
2.3.1 Orthogonal and Unitary Matrices
There are special types of operators which will not change the norm of a vector. The firsttype is the orthogonal matrix.
Definition 2.7 A square matrix A is an orthogonal matrix if
AT A = AAT = I (2.48)
Recall from equation (2.31) that for vectors x of real elements, the distance measuredusing the Pythagorean theorem is given by
‖x‖ =
√
√
√
√
n∑
i=1
xi =√
xT x
Thus the norm of b = Ax is given by
‖b‖2 = ‖Ax‖2
= (Ax)T (Ax)
= xT (AT A)x
= xT x due to (2.48)
= ‖x‖2
which means that the operation of A is simply to change the position of x into anotherposition described by b which has the same distance from the origin as x had.
Example 2.9
The matrix Rcw defined below is a clockwise rotation operator which can beshown to be an orthogonal matrix (see Figure 2.12 for an example).
Figure 2.12: Example of Rcw operating on x to yield b.
To show that Rcw is orthogonal, we check if it satisfies the condition stated in(2.48):
RTcwRcw =
(
cos(θ) − sin(θ)sin(θ) cos(θ)
)(
cos(θ) sin(θ)− sin(θ) cos(θ)
)
=
(
cos2(θ) + sin2(θ) 00 cos2(θ) + sin2(θ)
)
= I
Similarly, one can show RcwRTcw = I.
♦♦♦
Note that the condition described by (2.48) states that for an orthogonal matrix,AT = A−1. Computationally, this means that the inverse of an orthogonal matrix can befound by simply taking the transpose. For instance, the transpose of the clockwise rotationoperator Rcw yields the counterclockwise rotation operator:
Rccw = RTcw =
(
cos(θ) − sin(θ)sin(θ) cos(θ)
)
Since we will be dealing soon with vectors having elements which are complex numbers,we need to generalize our definition of norms and orthogonality.
Definition 2.8 Let v be a vector whose elements are complex numbers, then theEuclidean norm of v, denoted ‖v‖ is given by
Definition 2.9 A matrix A consisting of elements which are complex numbersis called a unitary matrix if
A∗ A = A A∗ = I (2.51)
Example 2.10
An important example of a unitary matrix operator is the Householder Trans-formation Operator Uw given by
Uw = I − 2
w∗www∗ (2.52)
where w is a nonzero vector. Note that U∗w = Uw
To show that Uw is unitary, we check for the condition
U∗wUw = (I − 2
w∗www∗)∗(I − 2
w∗www∗)
= (I − 2
w∗www∗)(I − 2
w∗www∗)
= I − 4
w∗www∗ +
4
(w∗w)2ww∗ww∗
= I − 4
w∗www∗ +
4
(w∗w)ww∗
= I
The action of Uw on a vector x is to reflect this vector along a hyperplane per-pendicular to w as shown in Figure 2.13.
One could also use the Householder transformation operator to move a vector xto another vector y having the same norm as x by choosing w = x − y, i.e.
Ux−yx =
(
I − 2
(x − y)∗(x − y)(x − y)(x − y)∗
)
x
= x − 1
x∗x − y∗x(xx∗x − yx∗x − xy∗x + yy∗x)
=xx∗x − xy∗x − xx∗x + yx∗x + xy∗x − yy∗x
x∗x − y∗x= y
♦♦♦We will further analyze the characteristics of matrix operators such as eigenvector, eigen-values and canonical forms in the next chapter. The purpose of including the discussion ofmatrix operators in this section is simply to underline the fact that a simple-looking equationAx = b has at least three different perspectives, the last of which we begin to investigate thebehavior of A itself rather than x or b.
E1. Determine which of the following matrices are orthogonal or unitary:
a) Epermute(5, 1, 4, 2, 3)
b) B =
(
I − 2
w∗www∗
)(
cos(θ) − sin(θ)sin(θ) cos(θ)
)
c) A =
cos(θ) − sin(θ) 0sin(θ) cos(θ) 0
0 0 cos(θ)
E2. Let x1, x2, . . . , xm be a collection of vectors of length n < m. Will the Gram-Schmidt orthogonalization procedure produce a new set of orthogonal vectors ?Why or why not?
E3. Show that products of unitary matrices are unitary.
E4. Find an operator that would rotate a three dimensional vector by an angle equalto θ radians clockwise around: a) z-axis, b) y-axis, and c) x-axis. Also finda single operator that would first rotate a point 30o counterclockwise around thez-axis then 30o clockwise around the x-axis. (Note that three dimensional rotationoperators are not commutative. Verify the operators found by using sample vectorsand plot the vectors before and after being operated on by the matrices found.)
E5. In two dimensional graphics, the transformation is done on vectors given by
v =
xy1
to describe the x coordinate, the y coordinate, and the last entry is a constant=1to allow translation of points. This extension then uses two operators, Gtranslate
and Gcw rotate:
Gtranslate =
1 0 xtrans
0 1 ytrans
0 0 1
Gcw rotate =
cos(θ) sin(θ) 0− sin(θ) cos(θ) 0
0 0 1
The curve shown in Figure 2.14 is generated using the data in Table 2.7.
1. Find an operator that would rotate the curve θ radians counterclockwisearound the point (x, y) = (a, b). Test this operator on the data given inTable 2.7 with (a, b) = (4,−1.5) and θ = π/2.
2. Find an operator that would reflect the curve along a line that contains points(x1, y1) and (x2, y2). (Hint: you might need a corresponding Householder-typereflector operator that is a 3×3 matrix operator.). Test this operator on thedata given in Table 2.7 with (x1, y1) = (0,−1) and (x2, y2) = (10,−3).
In the previous chapter, we have discussed three perspectives or uses of matrices for theequation Ax = b. In this chapter, we will further explore some analysis tools and decom-positions (factoring) of matrices. Some of the results are useful in understanding how theelements affect the operational behavior of the matrix while other results simply improvecomputational efficiency.
We will first introduce the concepts of eigenvalues and eigenvectors, followed by a listof properties pertaining to eigenvalues and eigenvectors. Next, we describe three importanttransformations that are strongly based on eigenvalues. These are Schur triangularization,diagonalization and Jordan canonical formulation. Each of these transformations are usefulin furthering the results of eigenvalue analysis. They also are useful in the evaluation ofa function whose arguments are matrices. After a discussion of a few methods on how tohandle matrix functions, we include a section on a particular numerical method that is widelyused in the determination of eigenvalues, called the QR method. We then end this chapterwith three more matrix decompositions that are useful in both matrix computation andanalysis, namely the LU decomposition, the Singular Value Decomposition and the PolarDecomposition.
3.1 Eigenvalues and Eigenvectors
To study the characteristics of a particular matrix operator A, one can collect several pairsof x and b = Ax. Some of these pairs will behave more distinctively than others and willyield more information about the operator, i.e. somewhat like signatures of the operator.Specifically, we are interested in those vectors, v, which when operated on by matrix A, willresult in a vector that is scaled by a factor λ,
These vectors are known as the eigenvectors of A. For instance, if λ is real, thiscondition simply states that the eigenvectors v are special vectors to A, in which the onlyeffect A has on v is to either move the position radially-out by a factor λ if |λ| > 1, radially-inif |λ| < 1. It could also flip v by 180o if λ < 0.
To determine these eigenvectors, we need to use the condition given in (3.1).
Av = λv = λIv
Av − λIv = 0
(A − λI)v = 0 (3.2)
Obviously, v = 0 will always be a solution to the last equation (3.2). Thus the zerovector is called a trivial solution because it does not give us any more information aboutA other than A 0 = 0. To obtain nontrivial solutions, we need the group (A − λI) to besingular, i.e.
det( A − λI ) = 0 (3.3)
Equation (3.3) is known as the characteristic equation of A. With (A−λI) singular,the solution to this equation is not unique, i.e. there are more than one eigenvalue. The setof values for λ which satisfies the characteristic equation (3.3) are known as the eigenvaluesof A.
Using the definition of determinants, plus the fact that the elements of A−λI is eitheraij or ajj − λ, the characteristic equation can be expanded into a single polynomial of ordern, where n is the size of A. Thus in the matrix case, the term characteristic polynomialequation is used interchangeably with characteristic equation. The polynomial is givenby
p(λ) = λn + βn−1λn−1 + · · · + β1λ + β0
where β0, β1, . . . , βn−1 are coefficients resulting from the expansion of equation (3.3). Thecharacteristic polynomials will yield n roots. The eigenvalues may either be real numbers orcomplex numbers. Some of the eigenvalues may even occur more than once. The collectionof all the eigenvalues of A, including multiplicities, are also known as the spectrum of A,denoted by σ(A).
Once the eigenvalues are found, the eigenvectors can be obtained by substituting eacheigenvalues one at a time to equation (3.2),
Recall that each of these equations yield infinite solutions. However, only the directionsof these vectors are important. As a standard, the desired eigenvectors are those whoselengths equal to 1. These vectors are known as normalized eigenvectors.
Example 3.1
Let A =
(
2 −34 2
)
then the characteristic equation becomes
λ2 − 4λ + 16 = 0
whose roots are λ1 = 2 + 2√
3i and λ2 = 2 − 2√
3i. Upon substitution, theseeigenvalues are used to yield the following eigenvectors:
v1 = α
√3
2i
1
and v2 = β
−√
32
i
1
where α and β are just two arbitrary numbers. To obtain normalized vectors, weneed α = 0.7559 and β = 0.7559.
♦♦♦
3.2 Properties of Eigenvalues and Eigenvectors.
In this section we will list some useful properties and identities that applies to eigenvaluesand eigenvectors. This will aid in both future analysis and computational efficiency. (Unlessotherwise noted, the proofs are left as exercises – mostly using (3.1).)
Property 1. Let B = S−1AS where S is nonsingular. Then the eigenvalues of A and Bare the same. (Note: S−1AS is called a similarity transformation of A.)
Property 2. Let λ be an eigenvalue of A, then λk is an eigenvalue of Ak as long as Ak
exist, k = . . . ,−2,−1, 0, 1, 2, . . .. The eigenvector corresponding to λ for A is thesame eigenvector corresponding to λk for Ak.
Property 3. Let λ be an eigenvalue of A and let α be a scalar, then αλ is an eigenvalueof αA.
Property 4. The eigenvalues of diagonal and triangular matrices are the diagonal en-tries of these matrices. The eigenvectors of a diagonal matrix D corresponding toeigenvalue djj is the unit vector ej.
Property 5. A and AT have the same eigenvalues, but in general, they do not have thesame set of eigenvectors.
Property 6. Eigenvectors corresponding to distinct eigenvalues are linearly independent.
(see page 103 for proof.)
Property 7.∏n
i=1 λi = |A|. ( Thus at least one of the eigenvalues of a singular matrixwill be zero. )
(see page 104 for proof.)
Property 8.∑n
i=1 λi = tr(A).
(see page 104 for proof.)
Property 9. The eigenvalues of Hermitian matrices are all real-valued. ( Thus the eigen-values of real-valued symmetric matrices are also all real. )
(see page 105 for proof.)
Property 10. The eigenvalues of skew-Hermitian matrices are all pure imaginary.
(see page 105 for proof.)
Property 11. The eigenvectors of a Hermitian matrix corresponding to distinct eigen-values are orthogonal.
(see page 105 for proof.)
Property 12. The eigenvalues of block diagonal matrices, upper block triangular or lowerblock triangular matrices, is the collection of all the eigenvalues of each of the blockmatrices in the diagonal.
3.3 Schur triangularization.
For any square matrix A, one can find a unitary matrix operator U such that U∗AU willyield an upper triangular matrix where all the eigenvalues appear in the diagonal. Thisapproach is known as the Schur triangularization method. This result is useful in provingsome properties of eigenvalues. Also, for certain types of matrices called normal matrices(to be defined later), the Schur triangularization method yields diagonal matrices.
For some square matrices A there exists similarity transformations, T−1AT , T nonsingular,which will yield a diagonal matrix in which the diagonal elements are the eigenvalues of A,i.e.
T−1AT =
λ1 0 · · · 00 λ2 · · · 0...
.... . .
...0 0 · · · λn
These matrices which can be diagonalized are classified as diagonalizable matrices orsemisimple matrices.
Two cases (not necessarily disjoint) are guaranteed to be diagonalizable. The first caseinvolves those having all distinct eigenvalues. The second case involves normal matrices (tobe defined below). Other diagonalizable matrices may have repeated eigenvalues but requiresome rank conditions.
3.4.1 Diagonalizable Class 1: All Eigenvalues Are Distinct.
The n eigenvalue equations,
Av1 = λ1v1
...
Avn = λnvn
can be rewritten compactly as follows:
AV = V Λ (3.4)
whereV = (v1| · · · |vn) Λ = diag(λ1, . . . , λn)
If all the eigenvalues are different from each other, then the corresponding eigenvec-tors v1, . . . , vn are all linearly independent (see property 6 on page 72) . This means V isnonsingular, and diagonalization is obtained by premultiplying (3.4) by V −1 to yield
V −1AV = Λ
Example 3.3
A =
(
a b0 c
)
a 6= c
The eigenvalues are λ1 = a, λ2 = c, while the corresponding eigenvectors are
choose α = 1, β = 0 for v1, and α = 0, β = 1 for v2. Next, to solve for v3,
(λ2I − A) v3 =
0 0 01 1 00 0 1
v3 = 0
v3 =
γ−γ0
choose γ = 1. Thus,
V =
0 0 11 0 −10 1 0
and
V −1AV =
−2 0 00 −2 00 0 −1
♦♦♦
3.5 Jordan Canonical Form
For nondiagonalizable matrices, the closest form to a diagonal matrix via a similarity trans-formation is the Jordan canonical form given by the block diagonal matrix
J =
J1 0 · · · 00 J2 · · · 0...
... · · · ...0 0 · · · Jk
(3.8)
where Ji is called the Jordan block of either a scalar or of the form
The similarity matrix T that would transform a square matrix A to another matrix Jin Jordan canonical form, i.e.
T−1AT = J
is a called the modal matrix. The columns of the modal matrix is also collectively called thecanonical basis of A. The canonical basis is composed of vectors derived from eigenvectorchains of different orders.
Definition 3.2 Given matrix A and one of its eigenvalues λ, the eigenvectorchain with respect to λ of order r is
chain(A, λ, r) = (v1, v2, . . . , vr) (3.10)
where(A − λI)rvr = 0 (A − λI)r−1vr 6= 0
vj = (A − λI)vj+1 j = (r − 1), . . . , 1
Note: If the order of the chain is 1, then the chain is composed of only one eigenvector.
Algorithm for Obtaining Chain(A,λ,r).
1. Obtain vector vr to begin the chain.
(a) Construct matrix M ,
M(λ, r) =
(
(A − λI)r−1 −I(A − λI)r 0
)
(b) Use Gaussian elimination to obtain Q, W and qA such that
To obtain the canonical basis, we still need to determine the required eigenvectorchains. To do so, we need to calculate the orders of matrix degeneracy with respect to aneigenvalue λi, to be denoted by Ni,k, which is just the difference in ranks of succeeding orders,i.e.
Ni,k = rank(A − λiI)k−1 − rank(A − λiI)
k (3.11)
Using these orders of degeneracy, one can calculate the required orders for the eigenvec-tor chains. The algorithm below describes in more detail the procedure for obtaining thecanonical basis.
Algorithm for Obtaining Canonical Basis. Given A[=]n × n. For each distinct λi:
1. Determine multiplicity mi.
2. Calculate order of required eigenvector chains.
Let
pi = arg
(
min1≤p≤n
[rank(A − λ1I)p = n − mi]
)
then obtain ordi = (γi,1, . . . , γi,pi), where
γi,k =
Ni,k if k = pi
max(0, [Ni,k −∑pi
j=k+1 γi,j]) if k < pi
where,Ni,k = rank(A − λiI)
k−1 − rank(A − λiI)k
3. Obtain the required eigenvector chains.
For each γi,k > 0, find γi,k sets of chain(A, λi, k) and add to the collection of canonicalbasis.
One can show that the eigenvector chains found will be linearly independent, i.e. Tis nonsingular. Thus the Jordan canonical form can then be determined from the similaritytransformation T−1AT = J .
There are several functions of square matrices such as sin(A), cos(A) and exp(A) whichare comparable to their analogous scalar function but will also have significant differencesspecially because of the loss of commutative operations. We begin with the definition ofwell-defined functions.
Definition 3.3 Let f(x) be a function having a power series expansion
f(x) =∞
∑
i=0
αi xi (3.12)
which is convergent for |x| < R. Then the function of a square matrix A definedby
f(A) =∞
∑
i=0
αi Ai (3.13)
is called a well-defined function if each eigenvalue have absolute value lessthan the radius of convergence R.
Of course it is not often advisable to actually calculate the power series of a squarematrix directly from the definition. Instead, one could use several methods to simplify theevaluation. One method is to implement diagonalization when possible, or use the Jordancanonical forms when matrix A is not diagonalizable. Another method is make use of theCayley-Hamilton theorem to produce a finite series that is equivalent to the power series.
3.6.1 Case 1: A is Diagonalizable
Let us first take the situation when A is diagonalizable. This means that there exist anonsingular matrix T such that T−1AT = Dλ, where Dλ is a diagonal matrix with theeigenvalues in the diagonal, or equivalently, A = TDλT
If A is not diagonalizable, we will need to implement the Jordan canonical form instead, i.e.T−1AT = J where J is a block diagonal matrix with Jordan blocks in the diagonal.
Before we expand the power series, let us take a closer look at different powers of aJordan block. Recall that a Jordan block is of the form,
3.6.3 Case 3: Using Finite Sums to Evaluate Matrix Functions
The method of using finite matrix sequences begins with the use of Cayley-Hamilton theorem.
Theorem 3.3 For any square matrix A[=]n×n, whose characteristic polynomialis given by
charpoly(λ) = a0 + a1λ + · · · + anλn = 0 (3.26)
matrix A will also satisfy the characteristic polynomial with A replacing λ, i.e.
charpoly(A) = a0I + a1A + · · · + anAn = 0 (3.27)
(see page 107 for proof of theorem 3.3)
Using the Cayley-Hamilton theorem, we can see that An can be written as a linearcombination of Ai, i = 0, 1, . . . , (n − 1).
An =−1
an
(a0I + · · · + an−1An−1) (3.28)
The same is true with An+1,
An+1 =−1
an
(a0A + · · · + an−1An)
=−1
an
(
a0A + · · · + an−1
[−1
an
(a0I + · · · + an−1An−1)
])
= β0I + β1A + · · · + βn−1An−1 (3.29)
We can continue this process and conclude that An+j , j > 0, can always be recast as alinear combination of I, A, . . ., An−1. This means that for any well-defined matrix function,
f(A) = c0I + c1A + · · · + cn−1An−1 (3.30)
for some coefficients c0, . . . , cn−1. Therefore the key evaluations are on these coefficientswhich require n linearly independent equations.
Since the suggested derivation of (3.30) was based on the characteristic polynomial,this equation should also hold if A is replaced by λi, an eigenvalue of A. Thus we can get mlinearly independent equations from the m distinct eigenvalues:
As an alternative solution for verification, one can apply the power series defi-nition (3.14) and truncate at a high order, say at the 20th power, and shouldfind a very close match. In fact, since the power series was truncated, the finitesequence method is actually the more accurate answer.
♦♦♦1Note that in cases where the degree of degeneracy of A introduces multiple Jordan blocks corresponding
to the same eigenvalue, this method may not yield n linearly independent equations. In those cases however,there exists a polynomial of lower order than the characteristic polynomial (called minimal polynomials)such that the required number of coefficients will be equal to the number of linear equations obtained fromthe method just described.
For large systems, the determination of eigenvalues and eigenvectors can become susceptibleto numerical errors especially since the roots of polynomials are very sensitive to smallperturbations in the polynomial coefficients. Fortunately, there exist other more robustand accurate methods to determine eigenvalues aside from using the basic definitions. Oneof the more stable and reliable method is the QR decomposition method. Recall fromsection 2.2.5 that the modified Gram-Schmidt method for orthogonalization also produces adecomposition of the original matrix A into a product of two matrices Q and R where Q isunitary while R is upper triangular
A = QR (3.32)
Since Q is unitary, (3.32) a similarity transformation based on Q,
A(1) = Q−1AQ = RQ (3.33)
only involves inverting the order of matrix multiplication of Q and R. Since matrix A(1) andA are similar, they have exactly the same eigenvalues. Due to the upper triangular form ofR, one can show2 that performing the QR decomposition iteratively on Ai , i = 1, 2, . . . willreliably converge to a matrix that can be partitioned as follows:
A(i) =
(
A(1),11 A(1),12
0 A(1),22
)
where A(1),22 is either a scalar or a 2×2 submatrix. When A(1),22 is a scalar, then this scalaris one of the eigenvalues of A. When A(1),22 is a 2 × 2 submatrix, it can be solved to obtaina pair of complex roots of A. The same QR decomposition is then applied to the submatrixA(1),11. This process continues until all the eigenvalues are obtained.
Although the QR method converges to the required eigenvalues, two enhancements tothe method significantly helps in accelerating the convergence. The first is called the shiftedQR method. Here, instead of taking the QR decomposition of A(i), the decomposition isinstead applied on
A(i) = A(i) − σiI = QiRi
where σi is the element of A(i) in the (n, n)th position. Then
A(i+1) = RiQi + σiI
= Q−1i (A(i) − σiI)Qi + σiI
= Q−1i A(i)Qi
2For a detailed proof, refer to G. H. Golub and C. Van Loan, Matrix Computations, third edition, 1996,John Hopkins University Press.
which is again a similarity transformation, thus preserving the set of eigenvalues.
The second enhancement to the algorithm is the use of Householder operators to firstput A into the upper Hessenberg form. The upper Hessenberg form is a matrix in which allelements below the first subdiagonal are zero,
where “×” denotes arbitrary values. Using Householder operators Uw (cf. page 63 ) whichhave already been shown to be unitary (and Hermitian) means that UwAUw is anothersimilarity transformation and thus preserves the set of eigenvalues. Once in Hessenbergform, the shifted-QR method will have significantly fewer steps required to converge to asimilar matrix where the eigenvalues are in the diagonals (or 2×2 diagonal blocks for complexeigenvalues). The main idea is to use a Householder operator Ux−y to transform a vector xto another vector y of the same norm but in which only the first element is non-zero, i.e.
Ux−yx =
(
I − 2
< x − y, x − y > (x − y)(x − y)∗
)
x = y with y =
‖x‖0...0
(cf. example shown in page 63). This idea is then expanded to introduce different unitarytransforms to iteratively eliminate the terms below the first subdiagonal.
Algorithm for Householder Transformations to Upper Hessenberg Form.
In this example, it was very fortuitous that the resulting Hessenberg form isalready in block diagonal form, and the eigenvalues can be obtained from eachsubmatrix block in the diagonal. This is not usually the case and the QR (orshifted-QR) method is needed to obtain the eigenvalues.
♦♦♦
QR Algorithm for Obtaining Eigenvalues.
Given matrix A[=]n × n, and a specified tolerance, ǫ
1. Use Householder’s method to obtain Hessenberg Form.
G = HT AH
2. Use QR to reduce G.
While G[=]m × m, m > 2
Let σ = Gm,m,
• Case 1: (|Gm,m−1| > ǫ) and (|Gm−1,m−2| > ǫ)Iterate on the following steps until either the conditions of Case 2 or Case 3results:
(a) Find Q and R, such thatQR = G − σI
(b) Update GG ←− RQ + σI
• Case 2: (|Gm,m−1| ≤ ǫ).Add Gm,m to list of eigenvalues.Update G by removing the last row and last column.
Since G2,1 = 0, there is no need to apply the shifted-QR method. We can simplyfind the eigenvalues of the 2× 2 matrix in the lower right corner. This yields thecomplex pair −0.5314 ± 1.5023i, which are then included as eigenvalues.
Finally, after extracting the last 2 rows and 2 columns, G = (4.2768). Thus4.2768 is the last eigenvalue in the list. Gathering all the values, the spectrumof A is given by: −1.2856,0.0716,−0.5314 ± 1.5023i, 4.2768.
♦♦♦
3.8 Miscellaneous Matrix Decomposition
There are several other matrix decompositions possible. In this section, we will describethree more popular and useful decompositions:
LU Factorization A = LU L is lower triangular Solution ofU is upper triangular linear equations
Singular Value A = UΣV ∗ U is unitary Generalized inversesDecomposition V is unitary
Σ is diagonalPolar Decomposition A = UP U is unitary Analysis of operators
P is positive definiteor
A = SV S is positive definiteV is unitary
3.8.1 LU Decomposition
The main goal of the LU decomposition is to factor out a square matrix A such the solutionof simultaneous linear equation is broken down in two sequence of operation: the forwardsubstitution and the backward substitution. Observe that if L is a lower triangular matrixand U is an upper triangular matrix, then if A = LU , the general equation for solution oflinear equations become
Ax = b
L(Ux) = b
Ly = b (3.35)
wherey = Ux (3.36)
From (3.35), note that the first equation is given by
ℓ11y1 = b1
y1 =b1
ℓ11
The second equation can then use the value of y1 just calculated.
ℓ21y1 + ℓ22y2 = b2
y2 =b2 − ℓ21y1
ℓ22
This process now continues until the last element of y is obtained.
λi are called the singular values of A, where λi is a positive eigenvalue ofA∗A.
Algorithm for Singular Value Decomposition
1. Calculate the Eigenvalues and Eigenvectors of A∗A.
Since A∗A is Hermitian, the eigenvalues are all real and nonnegative. Moreover A∗Ais also normal, and thus diagonalizable, e.g. via Schur triangularization.
2. Build matrix Σ.
Σ =
(
D 00 0
)
where D = diag(σ1, . . . , σr)
3. Arrange eigenvectors into matrix V . Let V = [V1|V2] where V1 contains the reigenvectors corresponding to the nonnegative singular values in Σ.
4. Obtain U1.
U1 = AV1D−1
5. Complete the Columns of U . If m > r, augment U1 with m− n column vectors ofrandom numbers. Then use Gram-Schmidt (or QR) orthogonalization to obtain U .
in which P is a symmetric positive definite matrix and U is unitary. The main implicationof the polar decomposition is to separate the operation of A to be a sequence of operationon a vector, say x. First, Ux is applying a unitary matrix, which in most cases involves arotation of the vector without changing its magnitude. Then the symmetric positive definitematrix P will have, by its symmetric property, all real positive eigenvalues and orthogonaleigenvectors. Thus, P (Ux) will simply stretch the vectors Ux according to the positive scalesdetermined by the eigenvalues of P along the principal axes determined by the orthogonalset of eigenvectors. This idea is useful when analyzing deformation of systems in which thetwo operators P and U need to be separately analyzed.
Algorithm for Polar Decomposition Given a matrix A[=]n × n,
1. Obtain the Grammian of A.G = A∗A
2. Calculate P to be the positive square root of G.
Since G is normal, the Schur Triangularization should yield the followingdiagonalization
G = QΛQ∗
where Q is unitary and Λ is diagonal matrix of eigenvalues of G.
Let P be the positive square root of G, i.e. P 2 = G.
To see how each matrix affects different vectors, Figure 3.1 show how the pointsof a unit circle are stretched by P . Specifically, note that the eigenvalues of Pare λ1 = 2 and λ2 = 0.5 with the corresponding eigenvectors
v1 =
(
0.7071−0.7071
)
v2 =
(
−0.7071−0.7071
)
-2 -1 0 1 2-2
-1
0
1
2
x1
x2
v1
P v1
P v2
v2
Figure 3.1: The effects of P on the unit circle.
Along v1, the points are stretched to twice it original size, while along v2, thepoints are compressed to half its original size.
Afterwards, the operator U rotates the ellipse by an angle θ = tan−1(0.5/0.866) =30o counterclockwise.
♦♦♦
The main rationale for calling it a polar decomposition is due to its analogy to thescalar version in which a complex number z can be represented by
Figure 3.2: The effects of U on the ellipse formed by P on the unit circle.
where r is a positive real number and the term exp(iθ) has a magnitude of 1, describing arotation.
There is a dual polar decomposition form to (3.38) in which
A = SV (3.39)
where S is the positive definite matrix and V is unitary. This polar decomposition firstrotates the vector using V prior to the stretching obtained using S. Here, S = (AA∗)1/2 andV = S−1A.
3.9 Proofs
• Proof that eigenvectors of distinct eigenvalues are linearly independent. (cf.property 6, page 72)
Let λ1, . . . , λm be a set of distinct eigenvalues of A[=]n × n, with m ≤ n, and letv1, . . . , vn be the eigenvectors. Now search for a set of coefficients, α1, . . . αm such that
But the second matrix on the left is just a Vandermonde matrix which is nonsingularbecause λ1 6= · · · 6= λm (cf. exercise 13, page 26 ). Thus
α1v1 = · · · = αnvn = 0
or
α1 = · · · = αn = 0
since the eigenvectors are nontrivial.
• Proof that∏
λi = |A|. (cf. property 7, page 72)
Using the Schur triangularization procedure, an upper triangular matrix with the eigen-values in the diagonal will result after a similarity transformation using a unitary matrixU
for the eigenvectors contains k1 arbitrary constants (cf. section 2.1, case 2). Thus thereare k1 linearly independent eigenvectors that can be obtained for λ1. Likewise, thereare k2 linearly independent eigenvectors that can be obtained for λ2, and so forth. Letthe first set of k1 eigenvectors v1, . . . , vk1
correspond to λ1 while the subsequent setof k2 eigenvectors vk1+1, . . . , vk1+k2
correspond to eigenvalue λ2, and so forth. Eacheigenvector from the first set is linearly independent from the other set of eigenvectors.And the same can be said of the eigenvectors of the other sets. In the end, all the neigenvectors obtained will form a linearly independent set.
• Proof of theorem 3.3, Cayley-Hamilton theorem: (cf. page 87)
Using the Jordan canonical decomposition, A = TJT−1, where T is the modal matrix,and J is a matrix in Jordan canonical form with m Jordan blocks,
(3.40)The elements of charpoly(Ji) are either 0, charpoly(λi) or derivatives of charpoly(λi),multiplied by finite scalars. Thus charpoly(Ji) are zero matrices, and the right handside of equation (3.40) is a zero matrix.
Exercises
E1. Find the eigenvalues and eigenvector of A, where
A =
k + 1 3 0−1 2 10 1 3
E2. Prove properties 1 to 5 and 12 of eigenvalues and eigenvectors given on page 71.
E3. An n × n companion matrix is one that has the following form:
Show that the characteristic polynomial of C is given by:
λn +n−1∑
i=0
αiλi = 0
Further, show that the eigenvectors corresponding for each distinct eigenvalue λi
of C is given by
vi =
1λi...
λn−1i
E4. Determine whether or not all sums of normal matrices are also normal.
E5. Determine which matrices below are diagonalizable and which are not. If diago-nalizable, obtain matrix T that diagonalizes the matrix, otherwise determine themodal matrix T that produces a similarity transformation to the Jordan canonicalform.
a) A =
1 −2 0 22 1 −2 00 2 1 −2−2 0 2 1
b) B =
4 1 21 3 1−1 0 2
c) C =
1 1 −10 2 00 0 2
E6. The polar decomposition of a square matrix A can be achieved using two se-quences. One sequence, which was discussed in section 3.8.3, A = UP , will firststretch the position of the vector using P and then rotating it with U . The othersequence, which is equally valid, is A = TV , where the vector is first rotated byan orthogonal matrix V and then stretched via a positive definite matrix T .
Obtain an algorithm for the evaluation of T and V . (Hint: V is the square root ofAAT .)