Yale Econ Math Camp

8/8/2019 Yale Econ Math Camp

1/141

MATH CAMP: Lecture 1

1 Linear Algebra

Simultaneous Linear Equations:

Example:

3x1 + 2 x2 = 66x1 + 7 x2 = 2

Now solve. Subtract twice rst equation from second

6x1 + 7 x2 = 26x1 4x2 = 12

3x2 = 14

x2 = 143

Substitute this into equation 1

3x1 + 2143

= 6

3x1 = 6 +283

=18 + 28

3=

463

x1 =469

The next two equations are equivalent to the rst two.

3x1 + 2 x2 = 63x2 = 14

These are equivalent to the following pairs of equations.

x1+ 23 x2 = 2 or x1 = 223

143 =

469

x2 = 143 x2 =143

The last pair is said to be row reduced and in echelon form . We want to do thismore generally

a11 x1 + a12 x2 + + a 1 N xN = y1a21 x1 + a22 x2 + + x2 N xN = y2

...aM 1 x1 + aM 2 x2 + + aMN xN = yM

1


2/141

The amn and yn are numbers and x1 ;:::;xN are unknown. In order to be moresystematic we write the equation as:

Ax = y; where

A = 0B@a 11 a1 N

.

.....

aM 1 aMN 1CAM N matrixx = 0B@

x1...

xN 1CA

N -vector of unknowns

y = 0B@y1...

yM 1CA

M -vector of numbers.

Ax is the M -vectora 11 x1 + + a1 N xN a 21 x1 + + a2 N xN

...aM 1 x1 + + aMN xN

Consider the following so called elementary operations on M N matrix A:1. Multiply a row of A by a non-zero number.2. Replace a row by that row plus c times another row, where c is a non-zero

number.3. Interchange two rows.If the M N matrix B is obtained from A by any one of these operations, then

A and B are equivalent in the sense the equations Bx = 0 and Ax = 0 have the samesolutions. (Think this through.)

Similarily, if the M (N + 1) matrix ( B... z) is obtained from the M (N + 1)

matrix ( A... y) by an elementary row operation, then the systems Bx = z and Ax = y

have the same solutions.Elementary row operations can transform any system Ax = y into a system

Bx = z wherea) the rst non-zero entry in any row of B is 1, andb) each column of B that contains the leading non-zero entry of some row has all

its other entries 0.

Denition: Such a matrix B is said to be row-reduced .

Example:

0BB@0 1 4 00 0 0 01 0 3 00 0 0 1

1CCAis row reduced.

2


3/141

Example:

0@3 2 16 4 26 8 5

1A0@x1x2x31A

= 0@3601A

! 0@3 2 10 0 00 4 31A0@

x1x2x31A

= 0@3061A

! 0@1 2=3 1=30 0 00 1 3=4

1A0@x1x2x31A

= 0@103=2

1A! 0@

1 2=3 1=30 1 3=40 0 0

1A0@x1x2x31A

= 0@13=201A

!

0@1 0 1=60 1 3=40 0 0 1A0@

x1x2x31A

=

0@23=20 1A

The matrix 0@1 0 1=60 1 3=40 0 0

1Ais an examle of a row reduced echelon matrix.

Denition: A matrix is a row reduced echelon matrix if:

a) it is row reduced,

b) any row of zeros lies below all non-zero rows, and

c) if the rst r rows are non-zero and the leading non-zero entry of row m is incolumn nm for m = 1 ; : : : ; r then n 1 < n 2 < < n r .

Example: The matrix 0BB@1 0 3 00 1 4 00 0 0 10 0 0 0

1CCAis a row reduced echelon matrix

Theorem : Every matrix is equivalent to a row reduced echelon matrix.

Proof : This may be achieved by elementary row operations.

Theorem : If A is an M N matrix and M < N , then the system Ax = 0 has anon-zero solution.

Proof : Let B be a row reduced echelon matrix equivalent to A. Ax = 0 () Bx = 0 :Let r be the number of non-zero rows of B . Then r M < N and rows 1;:::;r are

3


4/141

the non-zero rows of B . For 1 m r , let the leading non-zero entry of row mbe in column nm , where n 1 < n 2 < < n r . Since r < N , there is an n such that1 n N and n 6= nm for any m. For such an n, let xn = 1 : For k such that1 k N , let xk = 0 , if k 6= nm , for all m. It is now possible to solve for xn , forn = n 1 ;:::;n r ; so that Bx = 0 : Then Ax = 0 and x 6= 0 .

Theorem : If A is an N N matrix and if Ax = 0 has no non-zero solution, then A

is row equivalent to the N N identity matrix 10. . . 01 = I:

Proof : Let B be a row reduced echelon matrix that is row equivalent to A. Bx = 0has no non-zero solution. Hence the number of non-zero rows of B is N . Hence,B = I:

2 Vector Spaces

Consider RN

= f v = ( v1 ;:::;vN ) jvn 2 R; for all ng; where R is the set of realnumbers. We can dene the following operations on RN : If v and w belong to RN ,v + w = ( v1 ;:::;vN ) + ( w1 ;:::;wN ) = ( v1 + w1 ;:::;vN + wN ): If c 2 R and v 2 RN ,cv = c(v1 ;:::;vN ) = ( cv1 ;:::;cvN ): Let 0 = (0 ; 0; :::; 0) 2 R N : Observe that

a) x + y 2 R N ; if x 2 R N and y 2 R N

b) x + y = y + x;c) there is a 0 2 RN and 0 + x = x; for all x 2 R N ;d) for all x 2 R N , there is a unique x 2 RN such that x + ( x) = 0 ;e) 1x = x; for x 2 RN

f) (c1 c2 )x = c1 (c2 x); for all numbers c1 and c2 and for all x 2 RN ;g) (x + y) + z = x + ( y + z); for all x;y; and z in R N ;

h) c(x + y) cx + cy; for numbers c and for x and y in RN

;i) (c1 + c2 )x = c1 x + c2 x; for all numbers c1 and c2 and for all x 2 RN :

Denition : A vector space consists of a set, V , together with operations of additionand multiplication by numbers, denoted x + y and rx , where x 2 V; y 2 V and r 2 Rand these operations satisfy (a)(h) above with R N everywhere replaced by V:

Denition : W is a subspace of V , if W V and W is a vector space under theoperations on V:

Note : W V is a subspace of V if v + w 2 W; for all v; w 2 W; and if cw 2 W

whenever c 2 R and w 2 W:Example: V = f f : [0; 2 ] ! Rjf (0) = f (2 )g is a vector space under the operations(f + g)(s) = f (s) + g(s); for all s 2 [0; 2 ] and (cf )( s) = cf (s); for all s:

W = f a sin+ bcos j a 2 R and b 2 Rg is a subspace of V . W is the linear spanof sin and cos.

4


5/141

Denition : If V is vector space, v 2 V is said to be a linear combination of w1 ;:::;wN 2 V; if there are numbers c1 ;:::cN such that v = c1 w1 + + cN wN :

Denitions : If w1 ;:::;wN 2 V; their linear span is the set of all linear combinationsof w1 ;:::;wN : The linear span of w1 ;:::;wN is a subspace of V and is the smallest

subspace containing w1 ;:::;wN : The vectors, w1 ;:::;wN are said to span V , if V isthe linear span of w1 ;:::;wN :

Now I try to get at the idea of the dimension of a vector space.

Denition : Vectors v1 ;:::;vN in V are linearly dependent if there exist numbersc1 ; :::cN , not all zero, such that c1 v1 + + cN vN = 0 :

Denition : Vectors v1 ;:::;vN in V are linearly independent if there are not depen-dent.

Example: 0@1001A

;0@0101A

;0@1101A

are dependent in R 3 , since

0@1001A

+ 0@0101A0@

1101A

= 0@0001A

0@

1001A

;0@

0101A

;0@

0011A

are independent, since

0@0001A

= c1 0@1001A

+ c2 0@0101A

+ c3 0@0011A

= 0@c1c2c31A

) c1 = c2 = c3 = 0

Example: sin and cos are independent, for suppose that a sin+ bcos = 0 : Then0 = a sin( =2) + bcos( =2) = a and 0 = a sin(0)+ bcos(0) = b, so that a = b = 0 ,sin + 2 cos and 2sin 4cos are dependent, since 2(sin + 2cos)+( 2sin 4 cos) = 0

Denition : A basis for a vector space V is a set of linearly independent vectors inV that spans V .

Example: Leten = (0 ;:::; 0; 1; 0; :::; 0) 2 R N

"nth slot

e1 ;:::;eN is the standard basis for R N :

5


6/141

Theorem : If v1 ;:::;vM span a vector space V , then any independent set of vectorsin V has no more than M elements.

Proof : I must show that if N > M and w1 ;:::;wN are in V , then w1 ;:::;wN arelinearly dependent. Since v1 ;:::;vM span V , wn = M m =1 amn vm ; for all n and for

some numbers a 1 n ;:::;aM n : If x1 ;:::;xN are numbers, then

x1 w1 + + xN wN =N

Xn =1 xn wn =N

Xn =1 xnM

Xm =1 amn vm=

N

Xn =1M

Xm =1 amn xn vm =M

Xm =1N

Xn =1 amn xn !vmSince N > M , a previous theorem implies that there exist numbers x1 ;:::;xN not

all zero such that Pn amn xn = 0 for m = 1 ;:::;M: Hence, x1 w1 + + xN wN = 0and w1 ;:::;wN are linearly dependent.Denition : A vector space is nite dimensional , if it has a nite basis.

Corollary : If V is a nite dimensional vector space, then any two bases have thesame number of elements.

Proof : If v1 ;:::;vM and w1 ;:::;wN are bases, then N M and M N:

Denition : The dimension of V is dim V = the number of vectors in a basis for V .

Lemma : Let v1 ;:::;vM in V be linearly independent and let w in V not belong to

the span of v1 ;:::;vM

: Then v1 ;:::;vM

; w are linearly independent.Proof : Suppose that c1 v1 + + cM vM + bw = 0 . If b 6= 0 , then w = cb v1

c M b vM 2 span( v1 ;:::;vM ): This contradiction implies that b = 0 : Therefore,

c1 v1 + + cM vM = 0 : Since, v1 ;:::;vM are linearly independent, cm = 0 , for all m:

Theorem : If W is a subspace of a vector space V of nite and positive dimensionand W 6= V , then dim W < dim V:

Proof : Let M = dim W and N = dim V: I must show that M < N: If W = f 0g thendim W = 0 < dim V: Suppose that W 6= f 0g. If w1 ;:::;wM are linearly independentvectors in W , they are linearly independent in V and so M N: Therefore, thereis a linearly independent set of vectors in W with a largest number of elements, sayw1 ;:::;wr : By the previous lemma w1 ;:::;wr is a basis for W and r = dim W: SinceW 6= V; there is a v in V such that v =2 W: By the previous lemma, w1 ;:::;wr ; v areindependent and hence N M + 1 > M:

6


7/141

Theorem : If v1 ;:::;vN is a basis for V and v 2 V , then the numbers c1 ;::;cN , suchthat v = P

N n =1 cn vn are unique.

Proof :N

Xn =1

cn vn = v =N

Xn =1

a n vn =)N

Xn =1

(cn a n )vn = 0 = ) cn a n = 0 ; for all n,

since v1 ;:::;vN are independent.

7


8/141


Denition: If V and W are vector spaces, T = V ! W is linear or a linear transformation if T (av 1 + bv2 ) = aT (v1 ) + bT (v2 ), for all numbers a and b and forall vectors v1 and v2 in V .

Example: Let T : R N ! R M be dened by T (x) = Ax , where A is an M N matrix. Then T (ax + by) = A(ax + by) = aAx + bAy = aT (x) + bT (y):

Example: If f 2 V = f f : [0; 2 ] ! R jf (0) = f (2 )g Let

(T f ) (s ) = f (s + ); if 0 sf (s ), if s 2

T :

V !

V is linear. If [0

;2 ) is thought of as a circle, the transformation

T corre-sponds to rotating the circle counterclockwise 180 and then applying the function

f .Matrices can be used to represent any linear transformation from one nite di-

mensional vector space to another. Let T : V ! W be linear, let v1 ;:::;vN be abasis for V , and let w1 ;:::;wM be a basis for W . If v 2 V , there are unique numbersx 1 ;:::;x N such that v = x 1 v1 + + xN vN : Since T (v) 2 W , there are unique numbers,y1 ;:::;yM such that T (v) = y1 w1 + + yM wM . Since T (vn ) 2 W , for each n , thereare unique numbers a 1 n ;:::;a Mn , such that T (vn ) = a 1 n w1 + + a Mn wM : Therefore,

T (v) =

N

Xn =1 xn T (vn ) =

N

Xn =1 xn

M

Xm =1 amn wm =

M

Xm =1N

Xn =1 amn xn!w

m =

M

Xm =1 ym wm :

Let

y = 0B@y1...

yM 1CA

; x = 0B@x 1...

xN 1CA

; A = 0B@a 11 a 1 N ...

...a M 1 a MN

1CA:

Then, y = Ax . The M N matrix A represents T in that there is one and onlyone linear transformation T corresponding to A and one and only one matrix Acorresponding to T given the bases v1 ;:::;vN for V and w1 ;:::;wM for w:

Let S : W ! Q be a linear transformation and let q 1 ;:::;q J be a basis for Q . Letthe J M matrix B = ( b jm ) representing S , so that

S (wm ) =J

X j =1 b jm q j :

1


9/141

S T : V ! Q is the linear transformation dened by S T (v) = S (T (v)). Then

S T (vn ) = S (T vn ) = S M

Xm =1 a mn wm!=M

Xm =1 a mn S (wm ) =M

Xm =1 a mnJ

X j =1 b jm q j=

J

X j =1M

Xm =1 b jm a mn !q j =J

X j =1 c jn q j ,where c jn = P

M m =1 b jm a mn , so that the J N matrix C = ( c jn ) represents S T .

Denition: If A is an M N matrix and B is a J M matrix, the matrix C = BA ,the product of B and A, is the J N matrix dened by c jn = P

M m =1 b jm a mn , and

C = 0B@c11 c1 N ...

.

..cJ 1 cJN 1CA

= 0B@b11 b1 M ...

...bJ 1 bJM

1CA0B@

a 11 a 1 N ...

...a M 1 a MN

1CA= BA:

Example:

1 1 00 1 0 0

@

2 3 20 0 11 0 0

1A

= 2 3 10 0 1 :

Remark: If the M N matrix A represents T and the J M matrix B representsS , then the J N matrix C = BA represents S T .

Note: The order in which matrices are multiplied does not aect the product. Thatis, if A is an M N matrix, B is a J M matrix and C is a K J matrix, then(CB )A = C (BA ):

Denition: An N N matrix A is invertible if there is an N N matrix A 1 suchthat

A 1 A = AA 1 = I = 10 . . . 01 :

I is called the N N identity matrix and represents the identity function id V :V ! V , where V is an N dimensional vector space and id V (v) = v, for all v 2 V .Clearly, IA = AI = A; for any N N matrix A.

2


10/141

Lemma: If A and B are invertible, then AB is invertible and (AB ) 1 = B 1 A 1 .

Proof:

(B 1 A 1 )(AB ) = B 1 (A 1 A)B = B 1 IB = B 1 B = I

(AB )(B 1 A 1 ) = AIA 1 = AA 1 = I:

Denition: A function f : V ! W is invertible, if there exists f 1 : W ! V suchthat f f 1 = id W and f 1 f = id V . That is, f (f 1 (w)) = w, for all w 2 W andf 1 (f (v)) = v; for all v 2 V .

Denition: f : V ! W is onto, if for every w 2 W , there exists a v 2 V such thatf (v) = w.

Denition: f : V ! W is one to one , if for every v 2 V and v 2 V such thatv 6= v; f (v) 6= f (v).

Remarks:

1. f : V ! W is onto if and only if there exists f 1 : W ! V such thatf (f 1 (w)) = w, for all w 2 W .

2. f is one to one, if and only if there exists f 1 : f (V ) ! V such that f 1 (f (v)) =v, for all v 2 V , where f (V ) = f f (v) j v 2 V g.

3


11/141

3. f is one to one and onto, if and only if f is invertible.

Theorem: If T : V ! W is an invertible linear transformation, them T 1 is linear.

Proof: Let w1 ; w2 2 W and c1 ; c2 2 R . Let v1 = T 1 (w1 ) and v2 = T 1 (w2 ). SinceT is linear, T (c1 v1 + c2 v2 ) = c1 T (v1 ) + c2 T (v2 ) = c1 w1 + c2 w2 . Therefore,

c1 T 1 (w1 ) + c2 T 1 (w2 ) = c1 v1 + c2 v2 = T 1 T (c1 v1 + c2 v2 ) = T 1 (c1 w1 + c2 w2 ):

Proposition: Let T : V ! V be a linear transformation and let v1 ;:::;vN a basisfor V . If A is the N N matrix representing T with respect to v1 ;:::;vN , then T isinvertible if and only if A is invertible.

Proof: If T is invertible and A 1 represents T 1 , then id V = T 1 T; A 1 A repre-

sents T 1

T , and I represents id V . Hence A1

A = I . Similarly AA1

= I .If A is invertible, A 1 represents a linear transformation T 1 : V ! V andA 1 A = I = AA 1 , which implies that T 1 T = T T 1 = id V .

Theorem: If A is an N N matrix, then the following are equivalent:i) A is invertible,ii) there is an N N matrix B such that BA = I , andiii) the system Ax = 0 has no non-zero solution.

Proof: (i) ! (ii) Let B = A 1 .(ii) ! (iii). If BA = I and Ax = 0 , then 0 = BAx = Ix = x , so that x is zero.(iii) ! (i) By a previous theorem, A is equivalent to the N N identity matrix

I . Equivalence is established via elementary row operations. Each elementary rowoperation on A corresponds to left multiplication by an invertible matrix P . I checkthis statement.

4


12/141

a) Multiplication of the r th row of A by c 6= 0 corresponds to P A , where

P =

0BBBBBBBBBBBBBB@

1 0 : : : 0 : : : 00 1 : : : : : : : 0: : : :

: : : :: 1 0 :0 : : : 0 c 0 : : 0: 0 1 :: : : :: : : 00 : : : : 0 : : 0 1

1CCCCCCCCCCCCCCA

row r

"column r

and

P 1 =

0BBBBBBBBBBBBBB@

1 0 : : : 0 : : : 00 1 : : : : : : : 0: : : :: : : :: 1 0 :0 : : : 0 c 1 0 : : 0: 0 1 :: : : :: : : 00 : : : : 0 : : 0 1

1CCCCCCCCCCCCCCA

row r

"column r

b) Replacement of the r th row of A by row r plus c times row s , where c 6= 0 ,

5


13/141

corresponds to P A , where

column s#

column r#

P =

0BBBBBBBBBBBBBBBBBBBBBBBB@

1 0 0 : 0 : : : 0 : : : : : 00 1 0 : : : 0: : : :: 0 : : :0 : : 0 c 0 : 0 1 0 : : : : 00 : : : 0 : : : 0 1 0 : : : 0: : : :: : : : :: : : :: : : : :: : : :: : : : :: : : 00 : : : 0 : : : 0 : : : 0 1 00 : : : 0 : : : 0 : : : : 0 1

1CCCCCCCCCCCCCCCCCCCCCCCCA

row r

and

column s#

column r#

P 1 =


1 0 0 : 0 : : : 0 : : : : : 00 : : : 0: : : :: 0 : : :0 : : 0 c 0 : 0 1 0 : : : : 00 : : : 0 : : : 0 1 0 : : : 0: : : :: : : : :: : : :: : : : :: : : :: : : : :: : : 0

0 : : : 0 : : : 0 : : : 0 1 00 : : : 0 : : : 0 : : : : 0 1


row r

c) Interchange of rows r and s of A corresponds to multiplication of A on the leftby the matrix

6


14/141

P =


1 0 : : : 0 : : : : : : : : 00 1 0 : : 0 : : : : : : : : 0: : : : :: : : : :0 : : 0 1 0 : : : : : 0 : : 00 : : : : 0 : : : : 0 1 0 : 00 : : : : 0 1 0 : : : 0 : : 0: : : : :: : : : :: : : : :: 0 : : :0 : : : 0 1 0 : : : : 0 : : :

0 : 1 :: : : 0

0 : : : : 0 : : : : : 0 : 0 1


row r

row s

"column r

"column s

P 1 = P:

So, I = P 1 P 2 ;:::;P Q A, where P q is invertible, for all q . Let P = P 1 P 2 ;:::;P Q . P 1 =P 1Q ;:::;P

11 , so that P is invertible. I = P A , since A is row reduced to I via left

multiplication by the matrix P . P 1 = P 1 P A = ( P 1 P )A = IA = A. Therefore,I = P 1 P = AP . Since P A = I = AP , A is invertible.

Corollary: If A is an N N matrix and BA = I , for some N N matrix B , thenB = A 1 :

Proof: By the theorem, A is invertible. Therefore, BA = I implies that (BA )A 1 =A 1 , so that B = BI = B (AA 1 ) = ( BA )A 1 = A 1 :

Denition: If T : V ! W is a function, the image or range of T is f T (v) j v 2 V g :

Denition: If T : V ! W is a linear transformation, the null space or kernel of T is f v 2 V j T (v) = 0 g:

Theorem: If T : V ! W is a linear transformation, then the range of T is asubspace of W and the kernel of T is a subspace of V .

Proof: c1 T (v1 ) + c2 T (v2 ) = T (c1 v1 + c2 v2 ). T (v1 ) = 0 and T (v2 ) = 0 imply thatT (c1 v1 + c2 v2 ) = c1 T (v1 ) + c2 T (v2 ) = c1 (0) + c2 (0) = 0 .

7


15/141

Denition: If T : V ! W is a linear transformation, the rank of T is the dimensionof the range of T and the nullity of T is the dimension of the null space of T .

Denition: If A is an M N matrix, the column rank of A = the dimension of the linear span of the columns of A, and the row rank of A = the dimension of thelinear span of the rows of A.

I will later show that the row rank of A equals its column rank.

Theorem: Let T : V ! W be a linear transformation. Then, rank T + nullityT = dim V .

Proof: Let v1 ;:::;vK be a basis for the null space of T . Extend v1 ;:::;vK to abasis v1 ;:::;vK ; vK +1 ;:::;vN of V . I show that T (vK +1 );:::;T (vN ) is a basis for therange of T . The vectors T (v1 );:::;T (vN ) span the range of T . Since T (vn ) = 0 ; if

n 5 K , T (vK +1 );:::;T (vN ) span the range of T . I show that T (vK +1 );:::;T (vN ) areindependent, so that T (vK +1 );:::;T (vN ) is a basis for the range of T and hence rankT = N K:

N

Xn = K +1 cn T (vn ) = 0 = T N

Xn = K +1 cn vn!=)N

Xn = K +1 cn vn=

K

Xn =1 bn vn =)K

Xn =1 bn vnN

Xn = K +1 cn vn = 0=) bn = 0 and cn = 0 , for all n; since v1 ;:::;vN are independent. Therefore, rank of T + nullity of T = N K + K = N = dim V .

If T : V ! W is a linear transformation, then for any w 2 W , T 1 (w) =v + T 1 (0), where v is any vector in V such that T (v) = w and where v + T 1 (0) =f v + z jz 2 T 1 (0)g. In order to see that this is so, let z 2 T 1 (w): Then T (z v) =T (z ) T (v) = w w = 0 : Hence z = v + ( z v) 2 v + T 1 (0): Similarly any point inv + T 1 (0) belongs to T 1 (w): It is possible to visualize the meaning of the assertionT 1 (w) = v + T 1 (0) by considering a linear transformation T : R 2 ! R:

8


16/141

The function T portrayed in this diagram may be thought of as a projection of R 2

followed by a linear function from the vertical axis onto R .

Denition: If T : V ! W is a linear transformation, T is non-singular if the kernelof T is f 0g.

Remark: T is non-singular if and only if T is one to one, since T (v1 ) = T (v2 ) if and only if T (v1 v2 ) = 0 .

Lemma: If T : V ! W is a linear transformation, then T is non-singular if andonly if T (v1 );:::;T (vN ) are linearly independent whenever v1 ;::::;vN are linearly in-dependent.

Proof: Suppose that T is non-singular. If v1 ;:::;vN are linearly independent, then

c1 T (v1 ) + + cN T (vN ) = 0 =) T (c1 v1 + + cN vN ) = 0=) c1 v1 + + cN vN = 0 =) c1 = c2 = = cN = 0 :

Suppose T carries independent vectors to independent vectors. Let v 6= 0 , wherev 2 V . The vector v by itself is independent. Therefore, T (v) is independent.Therefore, T (v) 6= 0 , since 0 is dependent. Therefore, the kernel of T is f 0g.

Theorem: Let T : V ! W be linear and suppose that dim V = dim W . Then, the

following are equivalent.1) T is invertible.2) T is non-singular.3) T is onto.4) If v1 ;:::;vN is a basis of V , then T (v1 );:::;T (vN ) is a basis of W .5) There is a basis v1 ;:::;vN of V such that T (v1 );:::;T (vN ) is a basis of W .

9


17/141

Proof: 1 =) 2. Obvious.2 =) 3. Suppose that T is non-singular. Let v1 ;:::;vN be a basis of V . By the pre-

vious lemma, T (v1 );:::;T (vN ) are independent. Since dim W = N , T (v1 );:::;T (vN )is a basis of W . If w 2 W , w = c1 T (v1 ) + + cN T (vN ) = T (c1 v1 + + cN vN ).Therefore, T is onto.

3 =) 4. Let v1 ;:::;vN be a basis of V . Since these vectors span V and T isonto, T (v1 );:::;T (vN ) span W . Since dim W = N; T (v1 );:::;T (vN ) are independent.Therefore T (v1 );:::;T (vN ) is a basis of W .

4 =) 5. Obvious.5 =) 1. Suppose that there is a basis v1 ;:::;vN of V such that T (v1 );:::;T (vN ) is

a basis of W . Then, rank T = dim W = dim V . Therefore, by a previous theorem,nullity of T = 0 . Therefore, T is one to one. Since rank T = dim W , T is onto.Therefore, T is invertible.

10


18/141


Lemma: Let w1 ;:::;wM be independent vectors in the vector space W . Then, thevectors u 1 =

PM m =1 a m 1 wm ;:::;u K =

PM m =1 a mK wm are independent if and only if the

vectors

0BBB@

a 11a 21...

a M 1

1CCCA

;0BBB@

a 12a 22...

a M 2

1CCCA

; :::;0BBB@

a 1 K a 2 K

...a MK

1CCCA

are independent.

Proof: Suppose that u 1 ;:::;u K are independent. Then

0 = c;0BBB@

a 11a 21...

a M 1

1CCCA+ + cK

0BBB@

a 1 K a 2 K

...a M K

1CCCA=0BBBBBBB@

K

Xk =1 ck a1 k

...K

Xk =1 ck a Mk1CCCCCCCA

implies that 0 = PM m =1 (0)wm = P

M m =1 P

K k =1 ck a mk wm = P

K k =1 ck P

M m =1 a mk wm =

PK k =1 ck u k ; which in turn implies that c1 = c2 = = cK = 0 since u 1 ;:::;u K are

independent. Hence the vectors

0BBB@

a 11a 21...

a M 1

1CCCA

;:::;0BBB@

a 1 K a 2 K

...a MK

1CCCA

are independent.

Suppose that

0BBB@

a 11a 21

...a M 1

1CCCA

;:::;

0BBB@

a 1 K a 2 K

...a MK

1CCCA

are independent. To show that u 1 ;:::;u K are independent, suppose that 0 = PK k =1 ck u k =

PK k =1 ck P

M m =1 a mk wm = P

M m =1 P

K k =1 ck a mk wm; which implies that P

K k =1 ck a mk =

1


19/141

0, for all m; since w1 ; : : ;w M are independent. Therfore,

c10BBB@

a 11a 21...

a M 1

1CCCA

+ + cK 0BBB@

a 1 K a 2 K

...a MK

1CCCA

= 0

which implies that c1 = c2 = = cK = 0 , so that u 1 ;:::;u K are independent.

Corollary: If the M N matrix A represents the linear transformation T : V ! W ,then rank T = column rank of A:Proof: Let v1 ;:::;vN be a basis for V and let w1 ;:::;wM be the basis for W , suchthat A is the representation of T with respect to these bases. Let K be the rank of T . Then, dim(span (T (v1 );:::;T (vN ))) = K: There exist K of the vectors v1 ;:::;vN ,

say v1 ;:::;vK such that T (v1 );:::;T (vK ) is a basis for the range of T; which equals thespan of T (v1 );:::;T (vN ): By the lemma, the column vectors

0BBB@

a 11a 21...

a M 1

1CCCA

;:::;0BBB@

a 1 K a 2 K

...a MK

1CCCA

are independent. If n > K; PM m =1 a mn wm = T (vn ) = P

K k =1 ck T (vk ) = P

K k =1 ck P

M m =1 a mk wm =

PM m =1

PK k =1 ck a mk wm , for some numbers c1 ; : : ;cK . Since w1 ;:::;wM are indepen-

dent, a mn = PK k =1 ck a mk , for all m . That is,0BBB@

a 1 na 2 n

...a Mn

1CCCA

is in the linear span of 0BBB@

a 11a 21...

a M 1

1CCCA

;:::;0BBB@

a 1 K a 2 K

...a MK

1CCCA

and so these vectors are a basis for the linear span of the columns of A.

Duality

Denition: If V is a vector space, a linear functional on V is a linear functionf : V ! R: The set of all linear functionals on V is called the dual space of V anddenoted by V :

2


20/141

Remark: V is a vector space, where if f 2V and g 2V and a and b are numbers,af + bg : V ! R is dened by(af + bg)(v) = af (v) + bg(v):Let v1 ;:::;vN be a basis for V and let f 2V . Then (f (v1 );:::;f (vN )) is the matrixrepresentation of f , so that if v =

PN n =1 cn vn

P2V , then

f (v) =N

Xn =1 cn f (vn ) = ( f (v1 );:::;f (vN )) 0B@c1...

cN

1CA

:

We may identify f with the N -vector (f (v1 );:::;f (vN )).For n = 1 ;:::;N , let f n : V ! R be dened byf (c1 v1 + + cN vN ) = cn . Then,

f n 2 V and f n (vm )=1, if m = n0, otherwise. That is, f n (vm ) = nm , where nm is the

Dirac delta function, so that the matrix representation of f n is (f n (v1 );:::;f n (vN )) =

(0; :::;

0;1

;0

;:::;0) =

en , where

en is the

nth standard basis vector. If

f

2V

,f = PN n =1 f (vn )f n ; so that f 1 ;:::;f N span V . The functions f 1 ;:::;f N are linearly

independent, for if Pn a n f n = 0 , then 0 = (Pn a n f n )(vk ) = a k f k (vk ) = a k , for allk . Therefore, f 1 ;:::;f N is a basis for V . It is called the dual basis to v1 ;:::;vN .Therefore, dim V = N = dim V .

Example: Let V = R N and let e1 ;:::;e N be the standard basis of R N . The dualbasis of V = ( R N ) is f 1 ;:::;f N , where, for all n and k , f n (ek ) = nk : If y 2 V ,y = P

N n =1 yn f n , for some numbers y1 ;:::;yN . If x 2 R N , y(x) = y(P

N k =1 x k ek ) =

PN n =1 yn f n (P

N k =1 x k ek ) = P

N n =1 yn f n (x n en ) = P

N n =1 yn x n . Therefore, y may be iden-

tied with the vector (y1 ;:::;yN ) and y(x) =

PN n =1 yn x n . Hence, V may be identied

with R N .

Denition: If V is a vector space and S is a subset of V , the annihilator of S isthe set S of linear functionals f on V such that f (v) = 0 , for all v 2 S .Remark: S is a subspace of V .

Example: If W = f(t;:::;t ) 2R N jt 2Rg; W may be identied with f(y1 ;:::;yN ) 2R N jPN n =1 yn = 0g:

Theorem: If V is a nite dimensional vector space and W is a subspace of V , thendim W + dim W = dim V .

3


21/141

Proof: Let v1 ;:::;vK be a basis for W . Extend v1 ;:::;vK to a basis v1 ;:::;vK ;vK +1 ;:::;vN of V . Let f 1 ;:::;f N be the basis for V dual to v1 ;:::;vN :

I show you that f K +1 ;:::;f N is a basis for W : If n = K + 1 , f n 2 W ; sincef n (vm ) = 0 , for m 5 K , and for any w 2 W , w = PK n =1 a m vm , for some numbers

a 1 ;:::;a K : The functions f K +1 ;:::;f N are linearly independent, since f 1 ;:::;f N is abasis for V . In order to show that f K +1 ;:::;f N is a basis for W ; it is sucient toshow that they span W . If f 2V , f = P

N n =1 f (vn )f n . If f 2W , f (vn ) = 0 , for

n 5 K . Therefore, f = PN n = K +1 f (vn )f n , and so f K +1 ;:::;f N span W:

Theorem: If A is an M N matrix, its row rank equals its column rank.

Proof: Let W R N be the linear span of the rows of A and let K = dim W =row rank of A. Then dim W = N K . W may be viewed as a subset of R N underthe identication of (R N ) with R N . Under this identication, W is the set of allsolutions x of the equation Ax = 0 .

Let T : RN

! RM

be the linear transformation with matrix representation A withrespect to the standard bases of R N and R M . Then, W is the null space of T , andthe range of T is the linear span of the columns of A. Therefore, the column rank of A equals the rank of T . We know that rank of T + nullity of T = N . Therefore, thecolumn rank of A = rank of T = N nullity of T = N dim W = N (N K ) = K =row rank of T .

Inner Product

Denition: The standard inner product on R N is the function : R N R N ! Rdened byx y = P

N n =1 x n yn .

In this denition, the symbol stands for Cartesian product.

Denition: If A and B are sets, the Cartesian product of A and B is A B =

f(a; b)j 2A and b 2B g.Denition: If x 2R N ; jjxjj= p x x = p x 21 + + x 2N = the norm of x or thelength of x .Remarks:

1. If a and b are numbers and x , y, and z belong to R N , then x y = y x andx (ay + bz ) = ax y + bx z . (These equations are easy to verify.)

2. If x 2R N and y 2R N and is the angle between x and y, then cos = x yjjx jj jjy jj :(This equation is a little harder to verify.)

4


22/141

3. jx yj jjxjj jjyjj. This is called the CauchySchwarz inequality. It followsfrom (2).4. x is perpendicular or orthogonal to y if and only if cos = 0 , if and only if

x y = 0 .

5. x x = PN n =1 x2n = 0 and x x = 0 implies that x = 0 .

6. If y 2R N , f (x) = y x is a linear functional on R N . Under the identication of (R N )* with R N mentioned earlier, y 2R N is identied with the linear functionalf (x) = y x on R N .Denition: If S is a subset of R N , let S ? = fy 2R N jy x = 0 , for all x 2S g. S ?is called the orthogonal complement of S . Under the identication of R N with (R N )*,S ? is identied with S .

Remarks:

1. If W is a subspace of R N , dim W ? = dim W = N dim W .

2. If W is a subspace of V , which is a subspace of R N , we can write W ? = fv 2V jv w = 0 , for all w 2W gas the orthogonal complement of W in V , (ratherthan in R N ). Every vector v 2 V is the linear functional f on V dened byf (v0) = v v0. Therefore, under the identication of R N with (R N )*, we havethat V V . Since dim V * = dim V , V = V *. Hence, W ? = W and dimW ? = dim V dim W .

5


23/141


Orthogonal Projections

Let W be a subspace of V , which is a subspace of R N .

Denition: An orthogonal projection : V ! W is a linear function : V ! W such that v (v) 2 W ? . That is, [v (v)] w = 0 , for all w 2 W .

Example: V = R 2: W = f ( t; t )j t is a number g, W ? = f (t; t )jt 2 R g. (4; 14) =( 5; 5), since (4; 14j ( 5; 5) = (9 ; 9) 2 W ? .

Figure 1

Theorem: An orthogonal projection exists and is unique.

Proof: Let v1;:::;vK be a basis for W and let vK +1 ;:::;vM be a basis for W ? V ,where M = dim V and K = dim W . I show that v1;:::;vM is a basis for V . First Ishow that v1;:::;vM are independent. Suppose that P

M n =1 cn vn = 0 . Then c1v1 + +

cK vK = cK +1 vK +1 cM vM . Hence, w = c1v1+ + cK vK 2 W \ W ? . Therefore,0 = w w, so that w = 0 . Since v1;:::;vK are independent, c1 = = cK = 0 . Since0 = cK +1 vK +1 cM vM and vK +1 ;:::;vM are independent, cK +1 = = cM = 0 .Therefore, v1;:::;vM are independent. Since M = dim V , v1;:::;vM is a basis for V .

If v 2 V , then, v = PM n =1 cn vn . Let (v) = P

K n =1 cn vn . Then, v (v) =

PM n = K +1 cn vn . Since vn 2 W ? , for n > K , v (v) 2 W ? . Hence, (v) exists.In order to show that (v) is unique, suppose that v 2 V and v = w + ( v w),

where w 2 W and v w 2 W ? . Then, w = PK n =1 a n vn and v w = P

M n = K +1 a n vn ,

1


24/141

since v1;:::;vK is a basis for W and vK +1 ;:::;vM is a basis for W ? . Therefore, v =w + v w = P

M n =1 a n vn . Since v1;:::;vM is a basis for V , the numbers a 1;:::;a M are

unique. Therefore, w = PK n =1 a n vn = (v).

Orthonormal BasesDenition: A set of vectors v1;:::;vM in R N is said to be orthogonal if vn vm = 0 ,whenever n 6= m .

Theorem: Orthogonal non-zero vectors are linearly independent.

Proof: Let v1;:::;vM be orthogonal and suppose that 0 = PM n =1 cn vn . For all k ,

0 = vk 0 = vk PM n =1 cn vn = P

M n =1 cn vk vn = ck vk vk . Since vk vk > 0, ck = 0 .

Hence ck = 0 , for all k .

Denition: A basis v1;:::;vM for V is said to be orthonormal if it is orthogonaland if vn vn = 1 , for all n .

Remark: If v1;:::;vM is an orthonormal basis for V , then for any v 2 V , v =

PM n =1 (v vn )vn , for if v = P

M n =1 a n vn , then, for any k , v vk = P

M n =1 a n vn vk =

PM n =1 a n vn vk = a k vk vk = a k .

Remark: If v1;:::;vM is an orthonormal basis for V and v 2 V;

P

K n =1 (v:vn )vn is the

orthogonal projection of v onto the linear span of v1;:::;vK :

Theorem: Every vector space V R N has an orthonormal basis.

Proof: Let y1;:::;yM be a basis for V . I dene thevk by induction on k Let v1 =y1p y1 y1 . Then, v1 v1 = 1 . . Suppose we are given v1;:::;vk such that vn vn = 1 , if n k

and v1;:::;vk are orthogonal and vn is a linear combination of y1;:::;yn , for n = 1 ;:::;k .Let wk +1 = yk +1 (yk +1 v1)v1 (yk +1 vk )vk . Then, wk +1 6= 0 , for otherwisey1;:::;yk +1 would be linearly dependent and this would contradict the independenceof y1;:::;yM . If n k , wk +1 vn = yk +1 vn (yk +1 vn )vn vn = yk +1 vn yk +1 vn = 0 .

Let vk +1 =w k +1

p w k +1 w k +1 . Then vk +1 vn = 0 , if n k and vk +1 vk +1 = 1 . This

completes the induction and hence the denition of v1;:::;vM . Since v1;:::;vM areindependent and dim V = M , it follows that v1;:::;vM is a basis for V .

The construction just presented is called the GramSchmidt orthogonalization process . Notice that in the inductive construction, wk +1 is the dierence between

2


25/141

yk +1 and the projection of yk +1 onto the linear span of v1;:::;vk ; which equals thelinear span of y1;:::;yk :

Determinants

Denition: A permutation of f 1;:::;N g is a one to one and onto function :f 1;:::;N g ! f 1;:::;N g:

Every permutation can be expressed as a succession of interchanges of pairs. InFigure 2, the permutation on the left is the result of the three successive interchangesshown on the right. A permutation can be expressed in many ways as a succession of pairwise interchanges, but the number of pairwise interchanges for one permutationis either always odd or always even.

Figure 2

Say that is odd if the number of interchanges is odd. Otherwise, is even. Let

the sign of be sgn = 1, if is even1, if is odd. :

Let A = ( a mn ) be an N N matrix. The determinant of A is

det A = Xis a permutationof f1;:::;N g

(sgn )a1; (1) a2; (2) ;:::;a N; (N ) :

That is, pick one entry from each row, every time from a dierent column, andmultiply these N numbers together. The choice of column denes a permutation of f 1;:::;N g. Multiply the product by the sign of this permutation. Add these productsover all possible permutations. The sum is the determinant.

3


26/141

If N = 1 , A = ( a11 ) and det A = a11 . If N = 2 , then A =a 11 a12a 21 a22

and det

A = a11 a 22 a 21a12 .For any N , det I = 1 , where I is the N N identity matrix. That is,

I =0BBBBBBBB@

1 0 : : : 00 1 0 : : 0: :: :: :0 : : 0 1 00 : : : 0 1

1CCCCCCCCA

;

where I has N rows and N columns.The determinant of an N N matrix A may be considered to be a function of

the N rows of A, each of which is a vector in R N : In order to descriibe this function,I need the following notation.

Denition: If S is a set and K is a positive integer, let

S K = S S S K times !

;

where is the Cartesian product.

The determinant of N N matrice is then a function det : R N R N ! R: Itshould be clear from the formula for the determinant that, for any n , det( a1;:::;a n 1; ca n +dbn ; a n +1 ;:::;a N ) = c det( a1;:::;a n 1; a n; a n +1 ;:::;a N )+ d det( a1;:::;a n 1; bn ; a n +1 ;:::;a N );

where a1;:::;a N and bn are N -vectors and c and d are numbers. Such a function issaid to be multilinear.

Denition: A multilinear form on a vector space V is a function f : V K ! R suchthat, for k = 1 ; :::; K; f (v1;:::;vk 1; vk ; vk +1 ;:::;vK ) is a linear function of vk ; wherevn is held xed forn 6= k:

If we interchange two rows of A; we change the sign of A; for suppose that A0is obtained from A by interchanging rows n and k; where n 6= k: Let be the

4


27/141

permutation of f 1;:::;N g that interchanges n and k: Then

det A = X(sgn )a 1; (1) ;:::;a N; (N )=

X(sgn )a 1; (1) ;:::;a N; (N )

= Xsgn( )a1; (1) ;:::;a N; (N )= Xsgn( )a 1; (1) ;:::;a N; (N ) = det A:

A multilinear form with this property is said to be alternating.

Denition: The multilinear form f : V N ! R is alternating if f (v1;:::;vk 1;vk ; vk +1 ;:::; vn 1;vn; vn +1 ;:::;vN ) = f (v1;:::;vk 1;vn; vk +1 ;:::;vn 1;vk; vn +1 ;:::;vN ); for any k 0; for all v 6= 0 ;2. positive semi-denite if q (v) 0; for all v;3. negative denite if q (v) < 0; for all v 6= 0 ; and4. negative semi-denite if q (v) 0; for all v;

The same denitions apply to symmetric N N matrices, for each of these rep-resents a symmetric bilinear form. Thus, the symmetric N N matrix A is positivedenite if vT Av > 0, for any non-zero N -vector, v, etc.

If A is an N N matrix and 1 k N , let Ak be the k k submatrix obtainedby eliminating the last N k rows and columns of A:

Ak = 0B@a11 a1k...

...akl akk

1CATheorem: The N N symmetric matrix A and any bilinear form it represents, is

1. negative denite, if and only if ( 1)k det Ak > 0; for all k = 1 ;:::;N; and

2. positive denite, if and only if det Ak > 0; for all k = 1 ;:::;N:

Remarks: A symmetric N N matrix A is positive denite if and only if all of itscharacteristic values are positive. A is positive semi-denite if and only if all of itscharacteristic values are non-negative. A is negative semi-denite if and only if all of its characteristic values are non-positive. A is negative denite if and only if all of itscharacteristic values are negative.

Proof: I prove only the only if statement and that only for the positive denitecase. The proofs of the other cases are similar.

Let be a characteristic value of A and let x be a corresponding characteristicvector. Then,

(A I )x = 0 ;so that

Ax = xand hence

xT Ax = xT xIt follows that if A is positive denite, then 0 < x T Ax = xT x; so that > 0:

4


35/141

Real Analysis

We know what it means for numbers to be close to each other. The part of realanalysis we will use has to do with generalizations of the notion of closeness andapplications of it.

Denition: An open ball of radius " about a point y in RN is fx 2RN jkx y k


36/141

Examples:

1. RN and ; are closed in RN .2. [0; 1] is closed in R.

3. (0; 1) is not closed in R.

4. f(x0; 0) 2R2 j0 x0 1gis closed in R2.Denition: A sequence in a set X is a function x : f1; 2;:::g ! X . It is denotedby xn or x1; x2;::::Example: xn = sin( n2), n = 1 ; 2;::::

Denition: A sequence xn in RN converges to x, if for every " > 0, there exists an

integer M such that kx xn k< " , if n = M . To indicate that xn converges to x, wewrite limn !1 xn = x.Examples: lim

n !11n = 0 . limn !1

n 1n +1 = 1 : The sequence xn = sin( n

2) does not converge.

Theorem: A subset A of RN is closed if and only if every convergent sequence inA converges to a point in A.

Proof: Suppose that A is closed and that xn is a sequence in A converging to x.Suppose that x =2A. Because A is closed, RN nA is open, so that for some " > 0,B " (x) R

N

nA. That is, B " (x) \ A = ;. Since limn !1 xn = x, there is N such thatxn 2B " (x), if n N . But then, xn =2A, if n N , which is impossible since xn is asequence in A.Suppose that every sequence in A converges to a point in A. If A is not closed,

then RN nA is not open, so that there exists an x 2 RN nA, such that B " (x) \ A 6= ;,for every " > 0. Then, for every positive integer n, there exists an xn 2A, such thatkx xn k< 1=n. Since limn !1 xn = x, x 2A, which is impossible. Therefore A isclosed.Theorem: If A and B are open subsets of RN , then A \ B is open. If U is acollection of open subsets of RN , then

[u

2U U is open.

If A and B are closed subsets of RN , then A [ B is closed. If Cis a collection of closed subsets of RN , then \ C 2CC is closed.

6


37/141

Proof: I show that A \ B is open if A and B are open. If x 2 A \ B, thereis "A > 0, such that B " A (x) A and there is "B > 0, such that B " B (x) B . Let" = min( "A ; "B ). Then, B " (x) B "A (x) A and B " (x) B" B (x) B , so thatB " (x) A \ B. Therefore, A \ B is open.I show that

[U 2U U is open. If x

2 [U 2U U , then x

2U 0, for some U 0

2 U . Since

U 0is open, there is " > 0 such that B " (x) U 0 [U 2U U . Therefore, [U 2U U is open.I show that A[B is closed if A and B are closed. RN n(A[B) = ( RN nA)\ (RN nB).Since A and B are closed, RN nA and RN nB are open, so that (RN nA) \ (RN nB)is open, so that RN n(A [ B) is open and hence A [ B is closed.If C is closed, RN nC is open. Therefore, [C 2C(RN nC ) is open and so RN n(\ C 2CC )is open. Therefore, \ C 2CC is closed.Examples:

1. The intervals [1n ; 1] are closed, for n = 1 ; 2;:::, yet

1Sn =1 1n ; 1 = (0 ; 1] = fx 2R j0 < x 1gis not closed.2. The intervals ( 1n ; 1 + 1n ) are open, for n = 1 ; 2;:::, yet

1

Tn =11n

; 1 +1n

= [0; 1] is not open.

Denition: If A B RN , A is open in B, if for every x 2A, there is an " > 0such that B " (x) \ B A. A is closed in B if B nA is open in B .Examples

1. (0; 1] = fx j0 < x 1gis open in [0; 1] = fx j0 x 1g; but is not open inR:2. (0; 1] is closed in (0; 2); but is not closed in R:

3. f(x0; 0) j 0 < x 0 < 1gis open in f(x0; 0) j 1< x 0 < 1g, though it is notopen in R2.Theorem: If A B RN , A is open in B if and only if A = B \ U , where U isopen in RN .Proof: If A = B \ U , where U is open in RN , then it should be clear that A isopen in B .

Suppose that A is open in B . For each x 2A, let "(x) > 0 be such that B " (x )(x) \B A. Let U = [ x 2A B " (x )(x). Then, U is open, as the union of open sets, andA = B \ U 7


38/141

Denition: Let A RN , B RM , and f : A ! B. Then, f is continuous if forevery U B that is open in B , f 1(U ) = fx 2A jf (x) 2U gis open in A.Theorem: f : A ! B is continuous if and only if for every C B that is closedin B, f 1(C ) is closed in A.Proof: f -1(B nC ) = Anf 1(C ). If f is continuous, and C is closed in B , f 1(B nC )is open in A, since B nC is open in B . Therefore, A nf 1(C ) is open in A and hencef 1(C ) is closed in A.

Suppose that f 1(C ) is closed in A whenever C is closed in B . Then, if U B isopen in B , B nU is closed in B , so that f 1(B nU ) = A nf 1(U ) is closed in A andhence f 1(U ) is open in A and so f is continuous.Theorem: f : A ! B is continuous, if and only if limn !1 f (xn ) = f (limn !1 xn ),whenever xn is a sequence in A that converges to a point in A.Proof: Suppose that f is continuous and that xn is a sequence in A converging tox in A. If " > 0, B " (f (x)) \ B is open in B , so that A \ f 1(B " (f (x))) is open inA. Since, x 2f 1(B " (f (x))) , there is a > 0 such that A \ B (x) f 1(B " (f (x))) :Since, limn !1 xn = x, there is a positive integer N such that xn 2B (x), if n N .Therefore, if n N , xn 2 f 1(B " (f (x))) and hence kf (xn) f (x)k< " . Thereforelimn !1 f (xn ) = f (x).

Suppose that whenever xn is a sequence in A converging to a point x in A,limn !1 f (xn ) = f (limn !1 xn ). In order to show that f is continuous, let C be asubset of B that is closed in B . I must show that f 1(C ) is closed in A, that is, thatA

nf 1(C ) is open in A. If A

nf 1(C ) is not open in A, there is an x

2A

nf 1(C )

such that for every " > 0, B " (x) \ f 1(C ) 6= ;. Therefore, for every positive integern, there exits xn 2 f 1(C ) such that kxn xk< 1=n. Therefore, limn !1 xn = x,so that limn !1 f (xn ) = f (x). Since, f (xn ) 2C and C is closed, f (x) 2C . Hence,x 2f 1(C ), which is impossible. Therefore, f is continuous.Examples:

1. f : (0;1 ) ! (0;1 ) dened byf (x) = 1 =x is continuous.

8


39/141

2. f : [0;1 ) ! (0;1 ) dened byf (x) =0, if x = 01x , if x > 0.

is not continuous.

3. f : [0; 1] ! [0; 1) dened byf (x) = 1, if 0 x < 1=20, if 1=2 x 1. f is not continuous.

Denition: If f : A ! B, f is continuous at x 2A if for every " > 0, there is a > 0 such that kf (x) f (y)k< " , if kx yk< .9


40/141

Theorem: f : A ! B is continuous at x if and only if for every sequence x1; x2;:::in A that converges to x, limn !1 f (xn ) = f (x).Proof: The argument should be clear, given what has been presented earlier.

Theorem: f is continuous on A, if and only if f is continuous at every point in A.

Proof: This is so because f is continuous if and only if f (limn !1 xn ) = lim n !1 f (xn ),for every sequence xn in A converging to a point in A.

10


41/141


Denition: A sequence of numbers, x1 ; x2 ;:::; is said to be Cauchy if for every" > 0, there exists an integer N such that

jxn xm

j< " , if n > N and m > N .

The Completeness Property of the Real Numbers: If x1 ; x2 ;::: is a Cauchysequence of numbers, then there is a number x such that limn !1 xn = x.

The completeness property extends to vectors by applying it to each component.

Denition: A sequence of vectors in RN , x1 ; x2 ;:::; is said to be Cauch y if for every" > 0, there exists an integer M such that kxn xmk< " , if n > M and m > M .Lemma: If x1 ; x2 ;:::; is a Cauchy sequence of vectors in RN , then there is a vectory 2RN , such that limn !1 xn = y.Proof: For each n, let xn = ( xn 1 ;:::;xnN ). If xn is Cauchy, then for each k =1;:::;N , the sequence xnk is Cauchy. Therefore, there is a number yk such thatlimn !1 xnk = yk . Let y = ( y1 ;:::;yN ). Then limn !1 xn = y.

The completeness property of the real numbers may be expressed by saying thatevery set of numbers with an upper bound has a least upper bound or every set of numbers with a lower bound has a greatest lower bound.

Denitions: If X is a set of numbers, the number b is an upper bound for X if x b, for all x in X . If X has an upper bound, X is said to be bounded from above .The number c is said to be a least upper bound for X , if c is an upper bound for X and c b, for any upper bound b for X .

The least upper bound for X is denoted by lub X or supX , which is read as thesupremum of X . In an analogous fashion, we may dene bounded from below,lower bound, and greatest lower bound. The greatest lower bound is written asglb X or as inf X , read as the inmum of X . Clearly glb X = lub( X ), where

X = f x jx belongs to X g, so that a set that is bounded from below has a greatestlower bound if and only if a set that is bounded from above has a least upper bound.Least Upper Bound Property: Any set of numbers that is bounded from abovehas a least upper bound

1


42/141

Theorem: The least upper bound property is equivalent to the completeness prop-erty.

Denition: A subset A of RN is bounded if there is a number b > 0 such that

jjx

jj b, for all x

2A:

Denition: A subset A of RN is compact if it is closed and bounded.

Denition: A subsequence of the sequence x1 ; x2 ;::: is a sequence of the formxn 1 ; xn 2 ;:::; where nk+1 > n k ; for all k:

Theorem: Every convergent sequence in RN is bounded.

Proof: Let x1 ;x2 ;:::; be a sequence in RN that converges to x: There is a positiveinteger M such that

kxn x

k 1; if n > M: Then

kxn

k5 max(

kx1

k;:::;

kxM

k;

kx

k+

1); for all n:

Theorem: If x1 ;x2 ;:::; is a sequence in RN that converges to x; then every subse-quence of x1 ;x2 ;:::; converges to x:

Proof: If " > 0; let M be a positive integer such that kxn xk< " if n = M: If xn 1 ; xn 2 :::; is a subsequence of x1 ;x2 ;:::;then nk = k; for all k; so that kxn k xk5 "; if k = M: Therefore xn 1 ; xn 2 ; :::; converges to x:

Theorem (Bolzano-Weierstrass): A subset A of RN is compact, if and only if every sequence in set A has a subsequence that converges to a point in A:

Proof: Assume that every sequence in A has a subsequence that converges to apoint in A: I show that A is closed and bounded. If A is unbounded, then for everypositive integer N; there is an xn 2 A such that kxnk> N: Let xn 1 ; xn 2 ;:::; be aconvergent subsequence of x1 ; x2 ; :::; then kxn k k= N k = k; which goes to innity ask goes to innity, which is impossible since the sequence xn 1 ; xn 2 ; :::; converges. Thisshows that A is bounded. I now show that A is closed. If A is not closed, thereis a sequence x1 ;x2 ;:::; in A that converges to a point x not in A: Let xn 1 ; xn 2 be a

subsequence of x1 ;x2 ;:::; that converges to a point in A: Since this subsequence mustconverge to x; x must belong to A: This contradiction proves that A is closed. SinceA is closed and bounded, it is compact.

Assume now that A is compact. I show that every sequence in A has a subse-quence that converges to a point in A: Let x1 ; x2 ;::: be a sequence in A. Because A isbounded, it is contained in a cube C 1 = fy 2RN j b yn b; for n = 1 ;:::;N g; for

2


43/141

some positive number b. Divide C 1 in half along each dimension, obtaining 2N sub-cubes, each with edges of length 2b=2 = b: One of these subcubes contains the pointxn for innitely many numbers n. Call this cube C 2 : Suppose that cubes C 1 ;:::;C K have been dened where C 1 C 2 C K and for each k, C k has edges of length2b=2k = b2 k+1 and C k contains xn ; for innitely many integers n: Divide C K in half along each dimension, obtaining 2N subcubes. One of these contains xn , for innitelymany n. Call this cube C K +1 : I have dened by induction on K a sequence of cubesC 1 ; C 2 ;::: such that

1. for all k; C k contains xn , for innitely many n;

2. C 1 C 2 ; and3. for all k; each edge of C k has length b2 k+1 :

I now dene a subsequence xn k of xn by induction on k: Let xn 1 be one of x1 ; x2 ;:::

belonging to C 1 : Suppose xn 1 ;:::;xn K have been dened such that xn k 2 C k ; fork = 1 ;:::;K and n1 < n 2 < < n K : Since C K +1 contains innitely many membersof x1 ; x2 ;:::; there exists an xn K +1 2 C K +1 such that nK +1 > n K : I have denedxn 1 ; xn 2 ;::: such that xn k 2C k and nk < n k+1 ; for all k:Since the diameter of C k is bp N 2 k+1 ; which converges to zero as k goes to innityit follows that limK !1 supk= K;m = K jjxn k xn m jj= 0 : By the completeness propertyof the real numbers, there exists x 2RN such that limk !1 xn k = x. Since A is closedand xn k 2A; for all k; x 2A: That is, the subsequence xn k converges to a point inA:

I now establish another important property of compact sets.

Denition: An open cover of a subset A of RN consists of a collection, U , of opensets in RN such that A [u 2U U . That is, A is contained in the union of the setsU 2 U :Example: The set of all open intervals in R is an open cover of [0; 1]:

Denition: If U is an open cover of A, a subcover consists of a collection of sets U in U whose union contains A.Example: If U is the set of open intervals in R, the one interval ( 1; 2) is a subcoverof the interval [0; 1].Theorem (HeineBorel): A subset A of RN is compact if and only if every opencover of A contains a nite subcover.

3


44/141

Proof: Suppose that every open cover of A contains a nite subcover. I show thatA is closed and bounded. In order to show that A is closed let x 2RN A and foreach m = 1 ; 2;:::; let U m = fy 2 RN j ky xk> 1=mg. [ 1m =1 U m = RN fxg, sothat U 1 ; U 2 ;::: is an open cover of A. Therefore, for some M , A [M m =1 U m = U M .Therefore, B1 =M (x)

\A =

;: Hence RN A is open and so A is closed.

In order to show that A is bounded, for m = 1 ; 2;:::; let U m = fx 2RN j kxk 1,

C k 1 is the union of 2N cubes congruent to C k and that intersect only along sidesand C k \ A is not empty and has no nite subcover. Divide C K into 2N congruentsubcubes that intersect only along sides. One of these subcubes, C K +1 ; is such thatA\ C K +1 is not empty and has no nite subcover. By induction on K , I have denedcubes C 1 ; C 2 ;::: such that C 1 C 2 , limK !1 diam( C K ) = 0 , and for all K ,C K \ A is not empty and has no nite subcover.By the completeness property of the real numbers, there is x 2 \1k=1 C k . Also,for every k, there is xk 2 C k \ A. Because limk !1 diam( C k) = 0 , it follows thatlimk !1 xk = x. Since A is closed, x belongs to A. Since U covers A, there is a U in U , such that U contains x. Since U is open, there is a positive number " such thatB"(x) U . Because xk 2C k , limk !1 xk = x, and limk !1 diam( C k) = 0 , there is apositive integer K such that C K B" (x) U . Therefore U covers C K \ A, contraryto hypothesis. This contradiction proves that every open cover of A contains a nitesubcover.

Theorem: If A RN is compact and f : A ! RM is continuous, then f (A) =ff (x) jx 2Agis compact.Proof: Let U be an open cover of f (A). For every U 2 U , f 1 (U ) is open in A.For each U 2 U , let V U be an open subset of RN such that A \ V U = f 1 (U ). Then,V = fV U jU 2 Ugis an open cover of A. Since A is compact, V has a nite subcover,V U 1 ;:::;V U k . Then U 1 ;:::;U k is an open cover of f (A). Therefore, f (A) is compact.Theorem: If A R is compact and non-empty, then glb(A) 2A and lub(A) 2A.

4


45/141

Proof: Since A is compact, it is bounded and hence glb(A) and lub(A) exist. By thedenition of lub(A), there is a sequence x1 ; x2 ;::: in A, such that limn !1 xn = lub( A).Since A is closed, limn !1 xn 2A. A similar argument proves that glb(A) 2A.Theorem: If A RN is compact and non-empty and f : A

!R is continuous,

then there exist x and x in A such that f (x) f (x) 5 f (x), for all x 2A.Proof: Since A is compact and f is continuous, f (A) is compact. Therefore,glb(f (A)) 2 f (A) and lub(f (A)) 2 f (A). Let x and x in A be such that f (x) =glb(f (A)) and f (x) = lub( f (A)). Then, f (x) 5 f (x) 5 f (x), for all x 2A.Remark: This theorem says that a continuous function dened on a compact setachieves its minimum and maximum.

Problem (one often met in economics): Let X be a subset of RN and B a

subset of RM

. The endogenous variables vary over X: The exogenous variables orparameters vary over B:

Let f : X B ! R be the objective function for k = 1 ;:::;K; let gk : X B ! Rbe the constraint functions. Consider the problemmaxx2 X

f (x; b) (?)

s.t. gk(x; b) 0; for k = 1 ;:::;K;

where b 2B is given. For b 2B; let h(b) be the set of solutions of this problem.

Question: Under what conditions is h(b) 6= ; for all b; and is h a continuousfunction.If X is compact and the functions gk are continuous, then

fx 2X jgk(x; b) 5 0; for k = 1 ;:::;K gis compact . To see that this is so, notice that since ( 1; 0] is closed and X is closed,fx 2X jgk(x; b) 5 0g= g 1k (:; b)(( 1; 0]) is a closed subset of RN : Therefore,

fx 2X jgk(x; b) 0; for k = 1 ;:::;K g=K

\k=1

g 1k (:; b)(( 1; 0])is closed. Since X is bounded, fx 2X jgk(x; b) 5 0; for k = 1 ;:::;K gis bounded andhence compact. If fx 2X jgk(x; b) 0; for k = 1 ;:::;K gis non-empty, then problem(?) has a solution provided f is continuous. Hence, h(b) 6= ; under these conditions.h(b) may contain more than one point. Even if h is a function, it may not becontinuous, as the following example shows.

5


46/141

Example: Let X = f(x1 ; x2 ) 2 R2j0 x; 2; 0 x2 2gand let B = [0;1 ):Let p vary over B: Let f (x1 ;x2 ; p) = x1 and g(x1 ; x2 ; p) = px1 + x2 p: If p > 0;h( p) = (1 ; 0):

If p = 0 ; h( p) = (2 ; 0):

This example corresponds to maximizing the utility function f (x1 ; x2 ) = x1 overthe budget set f(x1 ; x2 ) 2 R2+ j( p;1) (x1 ; x2 ) ( p;1) (1; 0)g; where the price of commodity 1 is p, the price of commodity 2 is 1, and the consumer owns 1 unit of commodity 1 and none of commodity 2, so that her or his wealth is ( p;1) (1; 0) = p:When p = 0 ; the consumers budget set explodes in width to include the point (2; 0):

Maximum Theorem: Let X be a compact subset of RN and let B be a subsetof RM : Let f : X B ! R and gk : X B ! R; for k = 1 ;:::;K; be continuous.Assume in addition that

6


47/141

1. for all b 2B, fx 2X jgk(x; b) 0; for k = 1 ;:::;K gis non-empty,2. for all (x; b) 2X B such that gk(x; b) 0; for all k, and for all " > 0; thereexists a > 0 such that if jjb1 bjj< ; there exists an x1 2 X such that,gk(x1 ; b1 ) 0; for k = 1 ;:::;K; and jjx1 xjj< ";3. if the problem

maxx2 X

f (x; b)

s.t. gk(x; b) 0, for k = 1 ;:::;K (??)

has a solution, then it is unique.

Then, the function h : B ! X; where h(b) is the unique solution of problem ( ??);exists and is continuous.

Proof: Since X is compact and the gk are continuous, fx 2 X jgk(x; b) 0; fork = 1 ;:::;K gis compact, for all b 2B. By condition 1, this set is non-empty. Since f is continuous, problem ( ??) has a solution. By assumption 3, this solution is unique.Hence, h(b) is a well-dened function.

To show that h is continuous, let b1 ; b2 ;::: be a sequence in B such that limn !1 bn = b,where b 2B: I must show that limn !1 h(bn ) = h(b):If h(bn ) does not converge to h(b), then there exists an " > 0 and a subsequencen j ; j = 1 ; 2;:::; such that jjh(bn j ) h(b)jj> " , for all j . Since X is compact, I mayassume that h(bn j ) converges, say to x. (That is, a subsequence of h(bn j ) convergesto x1 and I call this subsequence h(bn j ) again.) Since jjx h(b)jj " > 0; it followsthat x

6= h(b):

I now derive a contradiction. Since gk(h(bn j ); bn j ) 0; for all k and j , and thefunctions gk are continuous and lim j !1 (h(bn j ); bn j ) = ( x; b); it follows that gk(x; b) 50; for all k: Therefore, f (x; b) f (h(b); b); by the denition of h(b): I prove thatf (x; b) = f (h(b); b): Suppose that f (x; b) < f (h(b); b): Then, f (x; b) < f (h(b); b) 2 ;for some > 0: Since f is continuous, there exists a positive number such that

jf (x; b) f (h(b); b)j < ; if jjx h(b)jj< and jjb bjj< : By condition 2 of the theorem, there is a > 0 such that if jjb bjj< ; then there exists an x 2X such that gk(x; b) 0; for all k; and jjx h(b)jj< : I may assume that < ; sothat, jf (x; b) f (h(b); b)j < : Since lim j !1 bn j = b and lim j !1 h(bn j ) = x; thereis a positive integer J such that jjbn j bjj< and jf (h(bn j ); bn j ) f (x; b)j < ;for j J: By what has been argued, if j = J; there exists xn j 2 X such thatjjxn j h(b)jj< and gk(xn j ; bn j ) 0; for all k: If j J; f (xn j ; bn j ) > f (h(b); b) >f (x; b) + 2 = f (x; b) + > f (h(bn j ); bnj ); which is impossible by the denition of h(bn j ): This contradiction proves that f (x; b) = f (h(b); b): Condition 3 of the theoremnow implies that x = h(b); which contradicts the inequality jjx h(b)jj ": Thissecond contradiction implies that limn !1 h(bn ) = h(b):

7


48/141


1 Calculus of one variableThe origin of calculus has to do with dierential equations, which is integration. Wewill focus on dierentiation, which has to do with the local approximation of functionsby ane ones. Let f : (a; b) ! R, where a < b .Denition: f is dierentiable at c, where a < c < b , if there is a number df (c)=dxsuch that

limx ! cx 6= c

f (x) f (c)x c

df (c)dx

= 0 :

That is, for every " > 0, there is a > 0 such that

f (x) f (c)x c

df (c)dx

< ";

if jx cj< and x 6= c. That is, f (x) f (c)df (c)

dx (x c) "jx cj; if jx cj< :

Denition: A function f : RN ! RM is said to be ane if it is a linear functionplus a constant. That is, f (x) = T (x) + b; where b 2 RM and T : RN ! RM is alinear transformation.Example: The function g(x) = 6 + 7 x is an ane function from R to R:

What does the denition of dierentiability mean? The function f : (a; b) ! R isdierentiable at c if its graph at c looks like that of an ane function when examinedunder a powerful microscope. The slope of the ane function is df (c)=dx.

1


49/141

Letg(x) = f (c) + df (c)

dx(x c) = f (c) df (c)

dxc + df (c)

dxx:

g is an ane function with constant f (c) df (c)dx c and linear partdf (c)

dx x. Then,

limx ! cx 6= c

f (x) g(x)x c

= limx ! cx 6= c

f (x) f (c) df (c)dx (x c)x c

= limx ! cx 6= c

f (x) f (c)x c

df (c)dx

= 0 :

Let be a small positive number and magnify the graphs of f and g by multiplyingboth coordinates by 1, which is a large number. Adjust the lens so that the partof the horizontal coordinate in the eld of vision varies between 1 and 1. Let

x c = x, so that g(x) = g(c+ x) and f (x) = f (c+ x), where 1 x 1.What we see after magnication is the graphs of the functions of x, 1f (c + x)and 1g(c + x), as x varies between 1 and +1 . Let " > 0 and choose > 0such that f (x ) f (c)x c

df (c)dx < " , if 0 < jx cj< . Then, if j xj 1;

1f (c+ x) 1g(c+ x)

=1

f (c+ x) f (c)

df dx

(c) x

=1

j xjf (c+ x) f (c)

df dx

(c) x j xj= f (c+ x) f (c)

xdf dx

(c) j xj< " j xj:That is, the graphs of 1f (c + x) and of 1g(c + x) as functions of x arewithin "j xj in the vertical direction as x varies between 1 and 1. In this sense,the ane function g(x) = f (c) + df (c)dx (x c) approximates f near c.

2


50/141

Notice that if x and y are numbers, then jxj= jx y + yj5 jx yj+ jyj; so thatjxj jyj5 jx yj:Lemma: If f is dierentiable at c, it is continuous at c.

Proof: There exists > 0 such that f (x ) f (c)x cdf (c)

dx < 1, if jx cj< . Therefore,f (x) f (c)

x cdf (c)

dxf (x) f (c)

x cdf (c)

dx< 1;

if jx cj < ; so that jf (x) f (c)j < jx cjdf (c)

dx +1 converges to 0 as jx cjconverges to 0:

Lemma:

a) If f is dierentiable at c, and df (c)=dx > 0, then there exists a > 0 such thatf (c) < f (x), if c < x < c + .

b) If df (c)=dx < 0, there exists a > 0 such that f (c) < f (x), if c < x < c.

Proof: a) Let correspond to " = 12df (c)

dx in the denition of dierentiability of c.Then,

f (x) f (c)x c

df (c)dx

>12

df (c)dx

;

if c < x < c + . Hence,

f (x) f (c) >12

df (c)dx

(x c) > 0;

if c < x < c + :The proof of (b) is similar.

3


51/141

Denition: If f : (a; b) ! R, where a < b, and c is such that a < c < b , then f achieves a relative maximum at c if there exists a > 0 such that f (c) f (x), if jx cj< . A relative maximum is also called a local maximum.Interior Maximum Theorem: If f : (a; b)

!R is dierentiable, where a < b and

if f achieves a relative maximum at c where, a < c < b , then df (c)=dx = 0 .

Proof: Immediate consequence of lemma.

A relative or local minimum for f is dened in the same way. f achieves a localminimum at c, if and only if f achieves a local maximum at c. If f achieves a localminimum at c, 0 = d( f (c))=dx = [df (c)=dx], so that df (c)=dx = 0 .

Rolles Theorem: Let f : [a; b] ! R, where a < b, be continuous on [a; b] anddierentiable on (a; b). If f (a) = f (b) = 0 , then there exists a c such that a < c < band df (c)=dx = 0 .

Proof: If f (x) = 0 , for all x, then df (c)=dx = 0 , for all c. So suppose f (x) 6= 0 , forsome x. Suppose f (x) > 0, for some x. Since [a; b] is compact and f is continuous,there is c such that a c b and f (c) f (x), for all x. Since f (x) > 0, for some x,f (c) > 0. Since f (a) = f (b) = 0 , a < c < b . By the previous theorem, df (c)=dx = 0 .

Use a similar argument if f (x) < 0, for some x.

Mean Value Theorem: If a < b and f : [a; b] ! R is continuous on [a; b] andf is dierentiable on (a; b), then there is c such that a < c < b and df (c)=dx =[f (b) f (a)]=(b a).

4


52/141

Proof: Let ' : [a; b] ! R be dened by

' (x) = f (x) f (a)f (b) f (a)

b a(x a):

Then ' (a) = ' (b) = 0 , ' is continuous on [a; b] and dierentiable on (a; b). By Rollestheorem, there is c such that a < c < b and

0 =d' (c)

dx=

df (c)dx

f (b) f (a)b a

:

Leibnizs Rule: If f : (a; b) ! R and g : (a; b) ! are dierentiable, then ddx f (x)g(x) =f (x) dg (x )dx +

df (x )dx g(x):

df (x)=dx is a function of x. f is twice dierentiable if df (x)=dx is dierentiable,and the derivative of df (x)=dx, denoted d2f (x)=dx2, is called the second derivativeof f . By induction of n it is possible to dene the nth derivative for n = 3 ; 4;:::. Thenth derivative is denoted by dn f (x)=dxn .

Taylors Theorem: Suppose that f : (a; b) ! R and that dk f (x)=dxk exists on(a; b), for k = 1 ;:::;n: If and belong to (a; b), there exists a number betweenand such that

f ( ) = f ( ) +df ( )

dx( ) +

12

d2f ( )dx2

( )2

+ +1

(n 1)!dn 1f ( )

dxn 1( )n 1 +

1n!

dn f ( )dxn

( )n :

5


53/141

Proof: Let the number r be dened by

( )n

n!r = f ( ) f ( )+

df ( )dx

( ) + +1

(n 1)!dn 1f ( )

dxn 1( )n 1 :

Let the function ' : [a; b] ! R be dened by' (x) = f ( ) [f (x) +

df (x)dx

( x) +12

d2f (x)dx2

( x)2

+ +1

(n 1)!df n 1(x)

dxn 1( x)n 1 +

rn!

( x)n ]:

' is continuous on [a; b] because f and all its derivatives are continuous on [a; b].Similarly, ' is dierentiable on (a; b). ' ( ) = 0 , by the denition of r . Certainly,' ( ) = 0 . By the Rolles theorem, there is a between and such that d' ( )dx =0:

d' (x)dx

=df (x)

dxdf (x)

dx+

d2f (x)dx2

( x)d2f (x)

dx2( x) +

1(n 2)!

dn 1f (x)dxn 1

( x)n 2 +1

(n 1)!dn f (x)

dxn( x)n 1

r(n 1)!

( x)n 1

=1

(n 1)!r

dn f (x)dxn

( x)n 1:

Since d' ( )=dx = 0 , r = dn f ( )=dxn . The theorem follows from the denition of r .

Theorem: Suppose that f : (a; b)

!R is dierentiable, where a < b and that the

rst two derivatives of f exist and are continuous. If c is such that df (c)=dx = 0 andd2f (c)=dx2 < 0 (> 0), then f achieves a local maximum (minimum) at c.

Proof: Because d2 f (x )dx 2 is continuous, there exist a < 0 such that if jx cj< , thend2f (x)=dx2 < 0. By Taylors theorem, if 0 < jx cj< , there exists a between cand x such that

f (x) = f (c) +df (c)

dx(x c) +

12

d2f ( )dx2

(x c)2 = f (c) +12

d2f ( )dx2

(x c)2 < f (c);

since d2f ( )=dx2 < 0.Similarly, if d2f (c)=dx2 > 0, then, f achieves a local minimum at c.

6


54/141

A local maximum. The bucket does not hold water and so d2f (c)=dx2 < 0.

A local minimum. The bucket holds water and so d2f (c)=dx2 > 0.

Theorem (Chain Rule of Dierentiation): If f : (a; b) ! (A; B) and g :(C; D ) ! R are dierentiable, where (A; B) (C; D ), thend

dx(g f )(c) =

dgdy

(f (c))df (c)

dx:

Proof: I must show that

limx ! cx 6= c

g(f (x)) g(f (c))x c

dg(f (c))dy

df (c)dx

= 0 :

7


55/141

If f (x) 6= f (c); theng(f (x)) g(f (c))

x cdg(f (c))

dydf (c)

dx

=g(f (x)) g(f (c))

f (x) f (c)f (x) f (c)

x cdg(f (c))

dyf (x) f (c)

x c

+dg(f (c))

dyf (x) f (c)

x cdg(f (c))

dydf (c)

dxg(f (x)) g(f (c))

f (x) f (c)dg(f (c))

dyf (x) f (c)

x c

+f (x) f (c)

x cdf (c)

dxdg(f (c))

dy:

Suppose there is an " > 0 such that f (x) 6= f (c); if 0 < jx cj< ": The secondterm on the right-hand side converges to zero as x converges to c: Since f (x)6= f (c), if

jx cjis small, the rst term converges to zero as x goes to c; provided f (x) ! f (c).Since f is dierentiable, it is continuous and so f (x) ! f (c) as x ! c.Suppose there is no positive number " such that f (x) 6= f (c), if 0 < jx cj< " .Then, df (c)=dx = 0 . If f (x) 6= f (c), the argument of the previous paragraph applies.If f (x) = f (c), theng(f (x)) g(f (c))

x cdg(f (c))

dydf (c)

dx= j0 0j= 0 :

Multivariate CalculusDenition: Let U be an open subset of RN and let f : U ! RM . f is dierentiable at c 2 U if there exists a linear transformation Df (c) : RN ! RM , called thederivative of f at c, such that for every " > 0, there exists a > 0 such thatkf (x) f (c) Df (c)(x c)k< " kx ck, if 0 < kx ck< . That is,

limx ! cx 6= c

kf (x) f (c) Df (c)(x c) kkx c k

= 0 :

The ane function f (c) + Df (c)(x c) approximates f (x) locally near c, that is,for x near c.

Remark: If f (x) = a + T (x), where a 2 RM and T : RN ! RM is linear, thenDf (x) = T , for all x.

8


56/141

Lemma: A function f : U ! RM has at most one derivative at a point.Proof: Suppose that S : RN ! RM and T : RN ! RM are linear and satisfythe denition of a derivative of f at c 2 U . If S 6= T , then there is a v 2 RN such that

kv

k= 1 and

kS (v) T (v)

k> 0. Let " > 0 and let > 0 be such that

kf (x) f (c) S (x c)k< " kx ckand kf (x) f (c) T (x c)k< " kx ck, if 0 < kx ck< . Let t be a non-zero number such that jtj< and let x = c + tv.Then 0 < kx ck< and0 < jtjkS (v) T (v)k= kS (tv) T (tv)k

= k [f (x) f (c)] + S (x c) + [ f (x) f (c)] T (x c)k kf (x) f (c) S (x c)k+ kf (x) f (c) T (x c)k

2" kx ck= 2 " ktvk= 2 " jtj:Dividing by jtj, we see that 0 < kS (v) T (v)k< 2", for all " > 0, which is impossible.

Lemma: If T : RN ! RM is a linear transformation, then there exists a positivenumber b such that kT (v) T (w)k bkv wk, for all v and w in RN . Therefore alinear transformation is everywhere continuous.Proof: Let A be the M N matrix representing T and let a = max m;n jamn j. If y = T (x), then, for all m = 1 ;:::;M ,

jym

j=

N

Xn =1amn xn

kam

kkx

k;

by the CauchySchwarz inequality, where am is the mth row of A.

kam k= v uutN

Xn =1 a2mn 5 p Na2 = ap N:

Therefore,

jym j5

ap

N kxkand so kyk= v uutM

Xm =1 y2m q Ma

2

N kx k2

= ap

MN kx k:Let b = ap MN .

9


57/141


Lemma: Let f : U ! RM , where U is an open subset of RN . If f is dierentiableat c 2 U , then there exist positive numbers and B such that kf (x) f (c)k B kx ck, if kx ck < . In particular, f is continuous at c.

Proof: Since f is dierentiable at c, there exists a > 0 such that

kf (x) f (c) Df (c)(x c)k 5 kx ck ;

if kx ck < . Therefore,

jjf (x) f (c)jj = jjf (x) f (c) Df (c)(x c) + Df (c)(x c)jj jj f (x) f (c) Df (c)(x c)jj + jjDf (c)(x c)jj jj x cjj + jjDf (c)(x c)jj;

if jjx cjj < : By the last lemma of the previous lecture, there is a b > 0 such that

jjDf (c)(x c)jj bjjx cjj:

Therefore,jjf (x) f (c)jj (1 + b)jjx cjj

if jjx cjj < :

Denition: Let f : U ! R, where U RN is open, and let v 2 RN . The vectorr v f (c) is said to be the directional derivative of f in the direction v if for every

" > 0, there is > 0 such that if 0 < jtj < , then1t [f (c + tv) f (c)] r v f (c)


58/141

Denition: If f : U ! R, where U RN and U is open, then r en f (c) is calledthe nth partial derivative of f at c and is written as @f (c)=@xn ;.where en is the n th

standard basis vector of RN .

Remark:@f @xn

(c) = ddxnf (c1;:::;cn 1; xn ; cn +1 ;:::; cN )jx n = cn :

That is, all variables of f but the n th are held constant at their values in the vectorc. The result is a function of the single variable xn . The derivative of this functionat xn = cn equals @f @xn (c).

Example:f (x1; x2; x3) = x1x32x

23

@f (2; 4; 5)@x2

= 2(3)(4 2)(52) = 6(16)(25) = 2400 :

If f : U ! RM

where U RN

and U is open, let f m : U ! R be the mthcomponent of f , for m = 1 ;:::;M:

Theorem: Let f : U ! RM ; where U is an open subset of RN : If f is dierentiable

at c, then f m is dierentiable at c, for all m, and Df (c) = 0B@Df 1(c)

...Df M (c)

1CA.

Proof: Let " > 0 and let > 0 be such that kf (x) f (c) Df (c)(x c)k5 " kx ck, if kx ck < . Df (c) : RN ! RM is a linear transformation, so that

Df (c) = 0B@(Df (c))1

...(Df (c)) M 1CA

; where (Df (c)) m : RN ! R is linear, for all m, and is the

mth component function of Df (c): If jjx cjj < ; then

jf m (x) f m (c) (Df (c)) m (x c)j jj f (x) f (c) Df (c)(x c)jj 5 "jjx cjj;

since f m (x) f m (c) (Df (c)) m (x c) is the mth component of

f (x) f (c) Df (c)(x c):

Therefore, by the denition of a derivative, (Df (c)) m is the derivative of f m at c:

That is, (Df (c))m = Df m (c): Therefore, Df (c) = 0B@Df 1(c)

...Df m (c)

1CA.

2


59/141

Theorem: The matrix 0BB@

@f 1 (c)@x1 :::

@f 1 (c)@xN

......

@f M (c)@x1 :::

@f M (c)@xN

1CCArepresents Df (c), if f is dieren-

tiable at c and f : U ! RM , where U RN and U is open.

Proof: Let v = ( v1;:::;vN ) 2 RN . Then, v = PN n =1 vn en , so that

Df (c)(v) = Df (c)N

Xn =1 vn en!=N

Xn =1 vn Df (c)(en )=

N

Xn =1 vn 0B@Df 1(c)

...Df M (c)

1CA(en ) =

N

Xn =1 vn 0B@Df 1(c)(en )

...Df M (c)(en )

1CA=

N

Xn =1vn

0B@r en f 1(c)

..

.r en f M (c) 1CA=

N

Xn =1vn 0BB@

@f 1 (c)@xn

..

.@f M (c)@xn

1CCA

= 0BB@

@f 1 (c)@x1 :::

@f 1 (c)@xN

......

@f M (c)@x1 :::

@f M (c)@xN

1CCA0B@

v1...

vN 1CA

:

Theorem:

1. Let f : U ! RM and g : U ! RM , where U RN and U is open. If f and g aredierentiable at c 2 U and a and b are numbers, then af + bgis dierentiableat c and D (af + bg)(c) = aDf (c) + bDg(c):

2. If f and g are as in part (1), then f g is dierentiable at c and D (f g)(c)(v) =Df (c)(v) g(c) + f (c) Dg (c)(v), for v 2 RN .

3. If ' : U ! R and ' is dierentiable at c and if f is as in part (1), then 'f isdierentiable at c and D ('f )(c)(v) = D' (c)(v)f (c) + ' (c)Df (c)(v).

Parts 2 and 3 of this theorem generalize Leibnizs rule for dierentiating products.

Some background facts about matrix transposition:

1. If A is an M N matrix and B is an N K matrix, then (AB )T = B T AT :

2. If A and B are M N matrices and a and b are numbers, then (aA + bB)T =aA T + bBT :

3


60/141

3. If A is a matrix, (AT )T = A.

4. If x and y are N -vectors, x y = xT y:

I let you verify these assertions.The equation in part 2 of the previous theorem may be written as

D (f T g)(c)v = ( Df (c)v)T g(c) + f (c)T Dg (c)v= vT (Df (c)) T g(c) + f (c)T Dg (c)v= g(c)T Df (c)v + f (c)T Dg (c)v;

where I treat Df (c) and Dg (c) as M N matrices. The last equation holds becausevT (Df (c)) T g(c) is a number and so equals its own transpose.

Now, I describe some useful special cases.

If f (x) = aT x, where a 2 RN is a constant vector, then Df (x) = D (aT x) = aT ,since aT x is a linear function of x. Similarly, if f (x) = Ax, where A is an M N matrix, then Df (x) = A, since Ax is linear.

Now, let M = N in the previous theorem. Let f (x) = x and let g(x) = Ax, whereA is an N N constant matrix. Then by part 2 of the theorem,

D (xT Ax)(c)(v) = D (f g)(c)(v)= g(c)T Df (c)v + f (c)T Dg (c)v= cT AT Iv + cT Av= cT AT v + cT Av:

If in addition A is symmetric, so that AT = A, then D (xT Ax)(c)(v) = cT AT v +cT Av = 2 cT Av. That is, the matrix representation of D (xT Ax)(c) is 2cT A, if A issymmetric.

Let U be an open subset of RN and f : U ! R:

Denition: f has a local maximum at c 2 U if for some " > 0, f (c) f (x), forall x 2 B " (c):

Theorem: If f has a local maximum at c 2 U , then Df (c) = 0 .

Proof: The restriction of f to any line through c has a local maximum at c. There-fore, r v f (c) = 0 , for all v 2 RN . In particular, @f (c)=@xn = 0 , for all n. Therefore,Df (c) = 0 .

A local minimum for f may be dened in a similar way, and Df (c) = 0 if f hasa local minimum at c:

4


61/141

Application (Least Squares Estimator):

Model y =K

Xn =1 k xk + e; e = error :We dont know the k . Suppose we have N observations,

y1...

yN

x11 ::: x1K ...

...xN 1 ::: xNK

:

The least square estimator is (b1;:::;bK ) that minimizes PN n =1 (yn P

K k=1 bk xnk )

2,which is the sum of squared errors. Let

y = 0B@

y1...

yN

1CA

; X = 0B@

x11 ::: x1K ...

...

xN 1 ::: xNK

1CA

; b = 0B@

b1...

bK

1CAWe wish to choose b so as to minimize:

(y Xb) (y Xb) = ( y Xb)T (y Xb)= ( yT bT X T )(y Xb)= yT y yT Xb bT X T y + bT X T Xb= yT y 2yT Xb + bT X T Xb;

where I have used the rules for matrix transposition and the fact that since bT X T y isa number, bT X T y = ( bT X T y)T = yT XbT T = yT Xb. The b that minimizes (yXb)T (y Xb) is called the least squares estimator.

If k = 1 , we have the following:

The least squares estimate, b, minimizes the sum of the squares of the vertical dis-tances from the data points (xn ; yn ) to the line y = bX .

5


62/141

In order to calculate the least squares estimator, we set the derivative of (y Xb)(y Xb) = yT y 2yT Xb + bT X T Xb with respect to b equal to zero. Let D b denotethe derivative with respect to the vector b.

D b(y Xb) (y Xb) = Db[yT y 2yT Xb + bT X T Xb]

= DbyT y 2D byT Xb + D bbT X T Xb= 0 2yT X + 2 bT X T X;

where I have used the fact that the matrix X T X is symmetric. X T X is symmetricbecause (X T X )T = X T X T T = X T X . Since X T X is symmetric, D bbT X T Xb =2bT X T X , by a formula proved earlier. Setting D b(y Xb) (y Xb) equal to zero,we obtain the equation 0 = 2yT X + 2 bT X T X; which implies that bT X T X = yT X .Taking the transpose of both sides of this equation, we obtain X T Xb = X T y. If thematrix X T X is invertible, then b = ( X T X ) 1X T y: This is the formula for the leastsquares estimator.

The N vector Xb is the projection of y onto the span of the columns of X: In order

to see that this is so, we must show that y Xb is orthogonal to the columns of X: Sincethe columns of X are the rows of X T ; it is sucient to show that X T (y Xb) = 0 :However, X T (y Xb) = X T y X T X (X T X ) 1X T y = X T y X T y = 0 :

Theorem (The Chain Rule): Suppose that f : U ! V , where U and V areopen subsets of RN and RM , respectively, and that g : V ! RK : Suppose that f is dierentiable at c 2 U and that g is dierentiable at b = f (c): Let h : U ! RK

be dened byh(x) = g(f (x)) = g f (x): Then, h is dierentiable at c and Dh (c) =Dg (f (c))Df (c):

Mean Value Theorem: Let f : U ! R, where U is an open subset of RN :Suppose

that f is dierentiable on U . Let a 2 U and b 2 U and suppose that the line segmentfrom a to b (= f 1 t)a + tbj0 t 1g) is contained in U: Then, there exists a pointc on this line segment such that f (b) f (a) = Df (c)(b a):

Proof: Let ' : [0; 1] ! R be dened by ' (t) = f ((1 t)a + tb): (0) = f (a):(1) = f (b): By the chain rule, d (t)=dt = Df ((1 t)a + tb)(b a): By the mean

value theorem for one variable, there exists a number t0 such that 0 < t 0 < 1 andd (t0)=dt = (1) (0) = f (b) f (a): Let c = (1 t0)a + t0b: Then Df (c)(b a) =f (b) f (a):

Theorem: Existence of a Derivative: Let f : U ! RM ; where U RN is open.

Suppose that @f m (x)=@xn exists and is continuous on U; for all n and m: Then, f isdierentiable on U:

6


63/141


Terminology: Let f : U ! RK be dierentiable function, where U is an opensubset of RN : The matrix representation of Df (x) is called the Jacobian matrix of f at x:

If f : U ! R, the derivative Df (x) is called the gradient of f at x, though theword gradient usually suggests the vector df @x1 (x);:::;

df @xN (x) that represents the

linear functional Df (x): This vector is sometimes denoted by r f (x):Second Derivative: Let f : U ! R, where U is an open subset of RN : If f isdierentiable, then for each n, @f @xn (x) is a function of x. Suppose that @f @xn (x) isdierentiable. For m = 1 ;:::;N;

@ @xm

@f @xn

(x)

is called a second partial derivative of f: This second partial derivative is written as

@ @xm

@f @xn

(x) =@ 2f (x)

@xm @xn:

Theorem: (Interchange of Order of Partial Dierentiation) Let f : U ! Rbe as above, If for any m and n,@ 2f (x)

@xm @xnexists and is a continuous function of x, then

@ 2f (x)@xn @xm

exists and equals@ 2f (x)

@xm @xn:

Remark: It follows that the matrix 0B@

@ 2 f @x1 @x1 (x)

@ 2 f @xN @x1 (x)

...@ 2 f

@x1 @xN (x)@ 2 f

@xN @xN (x)

1CA

is symmetric,

if all the partial derivatives @ 2 f (x )

@xm @xn exist and are continuous functions of x.

1


64/141

Interpretation: Let f : U ! R, where U is an open subset of RN : If v 2 RN ;Df (x)(v) = r vf (x) = N n =1 vn@f (x )@xn is the rate of change of f in the direction of v.

We need an expression for the rate of change of Df (x)(v) at x = c in a directionw 2RN : This rate of change is

r wN

Xn =1 vn @f (c)@xn ! =N

Xm =1 wm @ @xmN

Xn =1 vn @f (c)@xn !=

N

Xm =1N

Xn =1 vn wm@ 2f (c)

@xm @xn= vT D 2f (c)w; where

D 2f (c) = 0B@@ 2 f (c)@x1 @x1

@ 2 f (c)@xN @x1

......

@ 2 f (c)@x1 @xN

@ 2 f (c)@xN @xN

1CA:

This matrix is called the Hessian matrix . The function vT

D2

f (c)w of v and w is abilinear form in v and w and may be written as D 2f (c)(v; w): The Hessian is theJacobian of the function Df : U ! RN :

The rate of change of D 2f (x)(v; w) at c 2U in the direction u 2RN is

D 3f (c)(v;w;u) =N

Xn =1N

Xm =1N

Xk=1 vn wm uk@ 3f (c)

@xk @xm @xn:

This is a trilinear functional D 3f (c)(v;w;u) that is represented by a three dimen-sional matrix with typical entry @

3 f (c)

@xk @xm @xn. D 3f (c) is the third derivative of f at c.

Continuing in this way, for any positive integer r , we can dene ther th derivative of f at a to be an r -linear functional

D r f (c)(v1; : : : ; vr ) =N

Xn 1 =1N

Xn r =1 v1n 1 vrn r@ r f (c)

@xn r @xn 1

where for each s = 1 ; : : : ; r; vs = ( vs 1; : : : vsN ) 2RN .Remarks:

1. The matrix representation of Df (x) is the row vector

@f @x1

(x);:::;@f

@xN (x) :

2


65/141

When we take the derivative of Df (x), we think of Df as a function from U toRN ; and so write Df (x) as a column vector

0B@

@f (x )@x1...

@f (x )@xN

1CA

:

The matrix representation of the derivative of this function is the Hessian matrix

2. If the functions, @ 2 f (x )

@xm @xn are continuous with respect to x; then the Hessian matrixis symmetric, so that D 2f (c) is a symmetric bilinear form.

Taylors Theorem: Let f : U ! R, where U is an open subset of RN containing astraight line segment from a to b. Suppose that f has continuous partial derivativesof order r on U , where r is a positive integer. Then, there exists a point q on the linesegment from a to b such that

f (b) = f (a) + Df (a)(b a) +12

D 2f (a)(b a; b a)

+13!

D 3f (a)(b a; b a; b a)

+ +1

(r 1)!D r 1f (a)(b a ; : : : ; b a

!r 1)

+1

(r )!D r f (q )(b a ; : : : ; b a

!r)

Proof: Let F : [0; 1] ! R be dened byF (t) = f (a + t(b a)). Then,dF (t)

dt= Df (a + t(b a))( b a)

d2F (t)dt2

= D 2f (a + t(b a))(b a; b a)

...dr F (t)

dt r= D r f (a + t(b a))(b a ; : : : ; b a

!r)

By the one-dimensional Taylors theorem applied to F , there is an such that0 < < 1 and

F (1) = F (0) +dF (0)

dt+

12

d2F (0)dt2

+13!

d3F (0)dt3

+ +1

(r 1)!dr 1F (0)

dt r 1+

1r !

dr F ( )dt r

Yale Econ Math Camp

Documents