Top Banner
M2AA3 Orthogonality Lectured by John Barrett Lyxed by jm407 December 13, 2008 www.ma.ic.ac.uk/~jwb/teaching 2hr exam in the summer term (4 questions) 2 small assessed projects (involving computation - MatLab or whatever you want) exam 6 : project 1 deadline for the 2 assessed projects 1st project - mid/late november 2nd project - first week of spring term 1
79
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: M2AA3 - Notes

M2AA3 Orthogonality

Lectured by John BarrettLyxed by jm407

December 13, 2008

www.ma.ic.ac.uk/~jwb/teaching• 2hr exam in the summer term (4 questions)• 2 small assessed projects (involving computation - MatLab or whatever you

want)• exam 6 : project 1• deadline for the 2 assessed projects

1st project - mid/late november2nd project - first week of spring term

1

Page 2: M2AA3 - Notes

Contents

1 Applied Linear Algebra 31.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Inner Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.2 Outer Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.1 Classical Gram-Schmidt Algorithm . . . . . . . . . . . . . . . . . . . 12

1.3 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4 Cauchy-Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.6 Inner Products Revisited and Positive Definite Matrices . . . . . . . . . . . . 361.7 Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2 Least Squares Problems 52

3 Orthogonal Polynomials 58

4 Polynomial Interpolation 65

5 Best Approximation in ‖.‖∞ 77

2

Page 3: M2AA3 - Notes

1 Applied Linear Algebra

1.1 Orthogonality

When two vectors are perpendicular to each other

a ∈ Rn ≡ Rn×1

a =

a1

a2...an

n×1

n rows, 1 column ai ∈ R

Transpose of a:aT = (a1 . . . an)

1×n∈ R1×n 1 row, n columns

Given a,b ∈ Rn

1.1.1 Inner Product:

aTb = (a1 . . . an)

b1...bn

1× n n× 1︸ ︷︷ ︸

1×1R

=n∑i=1

aibi ∈ R

1.1.2 Outer Product:

abT =

a1...an

(b1 . . . bn)

n× 1 1× n︸ ︷︷ ︸n×n

=

a1b1 . . . a1bn

a2b1...

......

anb1 . . . anbn

n×n

Therefore abT ∈ Rn×n (abT

)jk

= ajbk

such that

j = 1→ n

k = 1→ n

3

Page 4: M2AA3 - Notes

Useful for some questions on sheet 1

u ∈ Rn

abT

n× 1 1× n︸ ︷︷ ︸n×n

un×1

= an×1

(bTu

)1×1

=(bTu

)a

is always a multiple of a ∀u,a,b ∈ Rn

Note:

let A & B be matrices of dimensions p× q, r × s respectively

A ·B = C

with C being a matrix of dimensions p× s

Given a,b ∈ Rn, let

〈a,b〉 = aTb =n∑i=1

aibi

〈. . . 〉 : Rn × Rn → R inner product

〈a,b〉 =n∑i=1

aibi =n∑i=1

biai = 〈b,a〉 ∀a,b ∈ Rn symmetric (1)

the order doesn’t matter

〈a, µb + λc〉 = aT (µb + λc)

=n∑i=1

ai (µbi + λci) (2)

= µ

n∑i=1

aibi + λ

n∑i=1

aici

= µ 〈a,b〉+ λ 〈a, c〉

linear with respect to the 2nd argument ∀a,b, c ∈ Rn and ∀µ, λ ∈ R

(1) + (2)⇒ (3)

4

Page 5: M2AA3 - Notes

〈µa + λb, c〉 (1)= 〈c, µa + λb〉(2)= µ 〈c,a〉+ λ 〈c,b〉(1)= µ 〈c,a〉+ λ 〈b, c〉 (3)

linear with respect to the 1st argument

〈a,a〉 = aTa =n∑i=1

a2i ≥ 0

Let ‖a‖ = [〈a,a〉]12 =

(∑ni=1 a

2i

) 12 length or norm of a

‖a‖ ≥ 0= 0

∀a ∈ Rn if and only if a = 0

Recall - Geometric Vectors in R3

see diagram 20081009.M2AA3.1

a = a1i + a2j + a3k

b = b1i + b2j + b3k

1.1.3 Dot (Scalar) Product

a · b = b · a = |a| |b| cos θ

order doesn’t mattera · a = |a|2

therefore|a| = (a · a)

12(

θ = 0cos θ = 1

)

see diagram 20081009.M2AA3.2

5

Page 6: M2AA3 - Notes

⇒ i · i = j · j = k · k = 1 (as i, j,k are unit vectors)

⇒ i · j = j · k = i · k = 0 (as θ =π

2)

Easy to show that

a · (λb + µc) = λa · b + µa · c (4)

Non Trivial Vectors a & b (a 6= 0,b 6= 0)a and b are orthogonal (perpendicular) if and only if their dot product = 0

a · b = 0⇔ cos θ = 0⇔ θ =π

2

a · b = a (b1i + b2j + b3k)(4)= b1a · i + b2a · j + b3a · k= b1a1 + b2a2 + b3a3

therefore

a · b =3∑i=1

aibi

Given

a = a1i + a2j + a3k ≡ a =

a1

a2

a3

∈ R3

a · b =3∑i=1

aibi = aTb ≡ 〈a,b〉

which is the inner product of a & bNon-trivial vectors a & b are orthogonal if and only if (the inner product)

〈a,b〉 = 0

Definition:Dot product = Inner product

6

Page 7: M2AA3 - Notes

see diagram 20081009.M2AA3.3

〈a,b〉 = aTb =n∑i=1

aibi ∀a,b ∈ Rn

Inner product: take two vectors in Rn and spue out a vector in R3 rules

1. isometric, order doesn’t matter

2. linearity (linear combination of inner product see above)

3. again linearity on the other argument

Length/norm as above

Ex.a,b ∈ Rn orthogonal ⇒ ‖a + b‖2 = ‖a‖2 + ‖b‖2

(Generalised Pythagoras)

see diagram 20081014.M2AA3.1

Proof:

‖a + b‖2 def= 〈a + b,a + b〉(2)= 〈a + b,a〉+ 〈a + b,b〉(3)= 〈a,a〉+ 〈b,a〉+ 〈a,b〉+ 〈b,b〉

def‖.‖= ‖a‖2 + ‖b‖2 + 2 〈a,b〉 = 0

hence result.{qk}nk=1 , qk ∈ Rm, qk 6= 0 k = 1→ n

is ORTHOGONAL if and only if

〈qk,qj〉 = 0 j, k = 1→ n ∧ j 6= k

7

Page 8: M2AA3 - Notes

Kronecker delta notation

δjk ={

1 if j = k0 if j 6= k

identity matrix I ∈ Rn×n

I =

1 0 0

0. . . 0

0 0 1

Ijk = δjk j, k = 1→ n (5)

{qk}nk=1 , qk ∈ Rm k = 1→ n

is ORTHONORMAL if and only if

〈qk,qj〉 = δjk j, k = 1→ n

Definition:

i.e. ORTHONORMAL ≡ORTHOGONAL + each vector has unit length

‖qk‖ = [〈qk,qk〉]12 = 1 k = 1→ n

Linearly Independent Vectors

{ak}nk=1 , ak ∈ Rm k = 1→ n

{ak}nk=1 is said to be LINEARLY INDEPENDENT if

n∑k=1

ckak = 0 =⇒ ck = 0 k = 1→ n

(only choice){ak}nk=1 is said to be LINEARLY DEPENDENT if

∃ {ck}nk=1

not all zero such thatn∑k=1

ckak = 0

8

Page 9: M2AA3 - Notes

(e.g. if ci 6= 0⇒ ai = −∑n

k=1k 6=i

ckci

ak)

Let A ∈ Rm×n that has {ak}nk=1 as its columns

Am×n

=←n→

(a1,a2, ...,an)∈Rm

Am×n

cn×1

= (a1,a2, ...,an)

c1...cn

=n∑k=1

ckak ∈ Rm

therefore if the only solution to Ac = 0 is c = 0 then {ak}nk=1 is linearly independenthowever if ∃ a non-trivial solution, c 6= 0, then {ak}nk=1 is linearly dependent.

Restrict to the case m = n

A = (a1, ...,an) , ak ∈ Rn k = 1→ n

(a) If A−1 exists, then

Ac = 0 ⇒ A−1Ac = A−10⇒ Ic = 0⇒ c = 0⇒ {ak}nk=1 is lin. ind.

(b) If {ak}nk=1 is lin. ind.

⇒ they form a basis for Rn, i.e. span Rn

⇒ ∀b ∈ Rn ⇒ ∃{ck}nk=1 such that b =n∑k=1

ckak (6)

Is {ck}nk=1 unique?

Assume the contrary

b =n∑k=1

dkak (7)

(6)− (7) = 0 =n∑k=1

(ck − dk)ak

{ak}nk=1 lin. ind.⇒ ck − dk = 0⇒ ck = dk k = 1→ n

therefore the representation of b by {ak}nk=1 is unique

b =n∑k=1

ckak = Ac

9

Page 10: M2AA3 - Notes

(linear combination), where

A = (a1, ...,an) ∈ Rn×n

c =

c1...cn

therefore ∀b ∈ Rn, ∃!c ∈ Rn (! = unique) such that Ac = b

see diagram 20081014.M2AA3.2

Hence (a) & (b) yield for m = n

A−1 existsA

n×n=(a1,...,an)

⇐⇒ {ak}nk=1 lin. indep.

Therefore given

ei =

0...010...0

← iA position ∈ Rn ⇒ ∃ unique si ∈ Rn

i.e. (ei)j = δij j = 1→ n

such thatAsj = ei i = 1→ n

thereforeS = A−1

i.e. A−1 exists

Lemma:

{ak}nk=1 , ak ∈ Rm, ak 6= 0, k = 1→ n

and orthogonal〈aj ,ak〉 = 0 j, k = 1→ n

j 6=k

⇒ {ak}nk=1 linearly independent

⇒ n ≤ m

(Can’t have n > m linearly independent vectors in Rm - recall exchange lemma)

10

Page 11: M2AA3 - Notes

Proof: Ifn∑k=1

ckak = 0

⇒ 〈∑n

k=1 ckak,aj〉 = 〈0,aj〉(3)⇒

∑nk=1 ck 〈ak,aj〉

=0 if k 6=j= 0

⇒ ck 〈ak,aj〉 = 0

aj 6= 0⇒ ‖aj‖2 = 〈aj ,aj〉

6= 0

thereforecj = 0

Repeat for j = 1→ ntherefore cj = 0 for j = 1→ ntherefore {ak}nk=1 lin. indep.

orthogonality implies linear independence

therefore non-trivial orthogonal vectors are lin. ind.However, lin. ind. ; orthogonal

Ex.n = m = 2

a1 =(

20

)a2 =

(31

)c1a1 + c2a2 = 0

⇒ 2c1 + 3c2 = 0c2 = 0

}c1 = c2 = 0

therefore {ai}2i=1 lin. ind.〈a1,a2〉 = aT

1 a2 = 6 6= 0

11

Page 12: M2AA3 - Notes

1.2 Gram-Schmidt

Given{ai}ni=1 , ai ∈ Rm i = 1→ n, lin. ind.

( ⇒n ≤ m

)j

find{qi}ni=1 , qi ∈ Rm i = 1→ n, ORTHOGONAL

i.e.〈qi,qj〉 = δij i, j = 1→ n

with span {qi}ni=1 = span {ai}ni=1

1.2.1 Classical Gram-Schmidt (CGS) Algorithm

v1 = a1, q1 =v1

‖v1‖for k = 2→ n

vk = ak −k−1∑l=1

〈ak,ql〉ql (8)

qk =vk‖vk‖

Proof:q1 =

v1

‖v1‖=

a1

‖a1‖

{ai}ni=1 lin. ind. ⇒ ai 6= 0 i = 1→ n

therefore

‖a1‖ 6= 0

‖q1‖2 = 〈q1,q1〉 =⟨

a1

‖a1‖,

a1

‖a1‖

⟩=

1‖a1‖2

〈a1,a1〉 = 1

span {a1} = span {q1}

Setv2 = a2 − 〈a2,q1〉q1(8)

⇒ 〈v2,q1〉 = 〈a2 − 〈a2,q1〉q1,q1〉(2)= 〈a2,q1〉 − 〈a2,q1〉 〈q1,q1〉︸ ︷︷ ︸

1

= 0

12

Page 13: M2AA3 - Notes

see diagram 20081015.M2AA3.1

check

Is v2 = 0?If v2 = 0 =⇒ a2 is a multiple of q1

(so a2 a multiple of a1 which is impossible since they are lin. ind.) contradiction

therefore v2 6= 0therefore q2 = v2

‖v2‖

〈v2,q1〉 = 0⇒ 〈q2,q1〉 =⟨

v2

‖v2‖,q1

⟩= 0

Also

〈q2,q2〉 =⟨

v2

‖v2‖,

v2

‖v2‖

⟩= 1

therefore{qi}2i=1 ORTHOGONAL

v2 is a lin. combination of a2 and q1

so v2 is a lin. combination of a2 and a1

so q2 is a lin. combination of a2 and a1

Similarly a2 is a lin. comb. of q1 and q2

Thereforespan {qi}2i=1 = span {ai}2i=1

Continue by induction

assume that when we’ve done up to k − 1

{qi}k−1i=1 are all ORTHONORMAL

qj = lin. comb. of {ai}ji=1 j = 1→ k − 1

aj = lin. comb. of {qi}ji=1 j = 1→ k − 1

true for k = 2 and 3 from the above

Set

v1 = ak −k−1∑l=1

〈ak,ql〉ql

13

Page 14: M2AA3 - Notes

=⇒ 〈vk,qj〉 =(2)〈ak,qj〉 −

k−1∑l=1

〈ak,ql〉 〈ql,qj〉 j = 1→ l − 1

= 〈ak,qj〉 − 〈ak,qj〉 = 0

therefore〈vk,qj〉 = 0 j = 1→ k − 1

If vk = 0 this would tell us that ak is a lin. comb. of {ql}k−1l=1 but the inductive hypothesis

said that the q’s can be written like the a’s, therefore it would tell us that ak is a lin. comb.of {al}k−1

l=1

=⇒ contradiction to {ai}ni=1 lin. ind.

thereforeqk =

vk‖vk‖

〈vk,qj〉 = 0 j = 1→ k − 1 ⇒ 〈qk,qj〉 = 0 j = 1→ k − 1Also 〈qk,qk〉 = 1

therefore{qk}ki=1 ORTHONORMAL

vk lin. comb. of ak and {ql}k−1l=1

vk lin. comb. of ak and {al}kl=1

qk lin. comb. of ak and {al}kl=1

therefore qj = lin. comb. of {ai}ji=1 j = 1→ k

Similarlyaj = lin. comb. of {qi}ji=1 j = 1→ k

Ex.n = m = 2

a1 =(

3−4

)a2 =

(12

)clearly lin. ind.

first step:q1 =

a1

‖a1‖first we need to work out the length of a1

‖a1‖ = 〈a1,a1〉 = aT1 a1 = (3)2 + (−4)2 = 25

this implies⇒ ‖a1‖ = 5

14

Page 15: M2AA3 - Notes

so q1

⇒ q1 =15

(3−4

)⇒ ‖q1‖ = 1

v2 = a2 − 〈a2,q1〉q1 (9)

first calculate 〈a2,q1〉〈a2,q1〉 = aT

2 q1 =15

(3− 8) = −1

now put that back into (9)

= a2 + q1 =(

12

)+

15

(3−4

)=

15

(86

)so

‖v2‖2 = 〈v2,v2〉 = vT2 v2 =

((85

)2

+(

65

)2)

=10025

= 4

⇒ ‖v2‖ = 2⇒ q2 =v2

‖v2‖=

15

(43

){(

3−4

),

(12

)}CGS→

{15

(3−4

),15

(43

)}

1.3 QR Factorisation

{ai}ni=1 lin. ind. CGS→ {qi}ni=1 ai ∈ Rm i = 1→ n⇒

lin. ind.n≤m

Look at this from a different viewpointLet

A = (a1,a2, ...,an) ∈ Rm×n

Q̂ = (q1,q2, ...,qn) ∈ Rm×n

Let R̂ ∈ Rn×n be the upper trianglar matrix

R̂ =

r11 r12 . . . r1n

r22 . . . r2n. . .

...0 rnn

R̂lk ={rlk if l ≤ k0 if l > k

rlk will be determined later

15

Page 16: M2AA3 - Notes

Let e(n)k ∈ Rn ((n) is to stress it is in Rn as opposed to Rm)

e(n)k =

0...010...0

← kth row

(e(n)k

)j

= δjk j, k = 1→ n

Bm×n

e(n)kn×1

= (b1,b2, ...,bn)

0...010...0

= bk ∈ Rm kth column of B

Q̂m×n

R̂n×n

e(n)

n×1︸ ︷︷ ︸ = Q̂

r1kr2k...rkk0...0

=

k∑l=1

rlkql (10)

CGS ⇒a1 = v1 = ‖v1‖q1 where r11 = ‖v1‖

ak = vk +k−1∑l=1

〈ak,ql〉ql

= ‖vk‖qk +k−1∑l=1

〈ak,ql〉ql

=k∑l=1

rlkql where rkk = ‖vk‖rlk=〈ak,ql〉 l=1→k−1

therefore

Am×n

e(n)kn×1

= ak =k∑l=1

rlkql(10)= Q̂R̂e(n)

k

16

Page 17: M2AA3 - Notes

R̂ is the upper triangular n× n matrix with coefficients as above.

therefore columns of A and Q̂R̂ are the sametherefore A

m×n= Q̂

m×nR̂n×n

Q̂ has orthonormal columnswhereas R̂ is a square matrix, upper triangular where it’s diagonal entries are the lengths ofthe vk (so rkk = ‖vk‖ > 0 k = 1 → n) and since the v’s are non-trivial, R̂ has strictlypositive diagonal entries.

therefore CGS yields a factorisation of A

Am×n

= Q̂m×n

R̂n×n

if m > n A rectangular, Q̂ rectangular, R̂ squarewith n ≤ m REDUCED QR FACTORISATION of A

QR Factorisation of A

Am×n

= Qm×m

Rm×n

where

Q =[Q̂←n→

qn+1 . . .qm←m−n→

]↑m↓

where{qj}mj=n+1

are chosen so that all columns of Q are orthonormal

〈qi,qj〉 =5δij i, j = 1→ m

Rm×n

=[R̂0

]←n→

↑n↓↑

m− n↓

QR =[Q̂qn+1 . . .qm

]m×m

[R̂0

]m×n

= Q̂R̂ = A

Am×n

= Qm×m

Rm×n

Note:

QT

m×mQ

m×m=

qT1...

qTm

[q1 . . .qm]

17

Page 18: M2AA3 - Notes

QTQ︸︷︷︸m×m

jk

= qTj qk = 〈qj ,qk〉 = δjk j, k = 1→ n

QTQ = I(m) ∈ Rm×m identity matrix

thereforeQT = Q−1

thereforeQTQ = I(m) = QQT

therefore the columns of Q orthonormal⇔ rows of Q orthonormal (columns of QT orthonor-mal)

Definition: Q ∈ Rm×m is called ORTHOGONAL if QTQ = I(m) = QQT (orthonormalwould be a better name however due to historical reasons it is named orthogonal)

Definition: A ∈ Rm×n and A = QR where Q ∈ Rm×m is orthogonal and R ∈ Rm×nis an upper triangular matrix, then we say that we have a QR factorisation of A

Proposition

Orthogonal matrices preserve length and angleIf Q ∈ Rm×m and QTQ = I(m), then ∀v,w ∈ Rm

〈Qv, Qw〉 = 〈v,w〉 ’angle’ (?)

and‖Qv‖ = ‖v‖ ’length’ (??)

Proof:

〈Qv, Qw〉 = (Qv)TQw

= vTQTQw = vTI(m)w

= vTw = 〈v,w〉

‖Qv‖ = [〈Qv, Qv〉]12 =by the above

[〈v,v〉]12 = ‖v‖

Geometric vectorsab = |a| |b| cos θ

18

Page 19: M2AA3 - Notes

see diagram 20081009.M2AA3.2

One can show, see section 1.4 Cauchy-Schwarz, that

〈v,w〉 = ‖v‖ ‖w‖ cos θ

〈Qv, Qw〉 = ‖Qv‖ ‖Qw‖ cosφ=

(??)‖v‖ ‖w‖ cosφ

But

(?) ⇒ cos θ = cosφ⇒ θ = φ as θ, φ ∈ [0, π]

Proposition

Q1, Q2 ∈ Rm×m, orthogonal i.e. QT1 Q1 = I(m) = QT

2 Q2 then Q1Q2 ∈ Rm×m is orthogonal.

Proof:

(Q1Q2)TQ1Q2 = QT

2 QT1 Q1Q2 = QT

2 I(m)Q2 = QT2 Q2

= I(m)

therefore Q1Q2 is orthogonal.

Ex: (rotation matrices) m = 2

Q =(

cos θ − sin θsin θ cos θ

)= (q1,q2)

therefire〈qi,qj〉 = δij i, j = 1→ 2

therefore Q is orthogonal as its columns are orthonormalQ represents rotation through an angle θ(

xy

)= l

(cosφsinφ

)

l =(x2 + y2

) 12(

ab

)= Q

(xy

)

19

Page 20: M2AA3 - Notes

Transform (xy

)→(l0

)choose

θ = −φ

therefore

cos θ = cos (−φ) = cosφ =x

l

sin θ = sin (−φ) = − sinφ = −yl

therefore

Q =(

xl

yl

−yl

xl

)where

l =(x2 + y2

) 12

therefore the rotation matrix

Q =1

(x2 + y2)12

(x y−y x

)(orthogonal)

takes (xy

)→

( (x2 + y2

) 12

0

)for 0 ≤ p < q ≤ m introduce Gpq (θ) ∈ Rm×m

Gpq (θ) =

1 0

0. . .

1cos θ sin θ ← p

1. . .

1sin θ cos θ ← q

1. . .

0 ↑p

↑q

1

=

20

Page 21: M2AA3 - Notes

=

e(m)j =

1...010...0

← jth position of j 6= p ∨ q

0...0

cos θ ← pth

0...0

sin θ ← qth

0...0

position if j = p

0...0

− sin θ ← pth

0...0

cos θ ← qth

0...0

position if j = q

each column of Gpq (θ) has unit length, the columns are also orthogonal

therefore the columns of Gpq (θ) are orthonormal.

⇒ Gpq (θ) ∈ Rm×m an orthogonal matrix

a ∈ Rm

21

Page 22: M2AA3 - Notes

Gpq (θ)a = b ⇒ bj = aj if j 6= p ∨ qbp = cos θap − sin θaqbq = sin θap + cos θaq

SimilarlyGpq (θ)m×m

Am×n

= Bm×n

All rows of B are the same as A except rows p and qGpq (θ) are called GIVENS Rotation Matrices (circa. 1950)

Obtain a QR factorisation of A using a sequence of Givens rotations (an alternative pro-cedure: Householder reflections)

Ex: m = 3, n = 2

A =

3 654 012 13

take a sequence of Givens Rotation so that

A→

X X0 X0 0

= R

Choose G12 (θ) such that

G12 (θ)A =

X X0 X12 13←

last row not affected

G12 (θ) =

cos θ − sin θ 0sin θ cos θ 0

0 0 1

G12 (θ)

3 654 012 13

x = 3, y = 4, l = 5

15

(3 4−4 3

)(34

)=(

50

)therefore choose

G12 (θ) =

34

45 0

−45

35 0

0 0 1

A(1) = G12 (θ)A =

34

45 0

−45

35 0

0 0 1

3 654 012 13

=

5 390 −5212 13

22

Page 23: M2AA3 - Notes

use a rotation matrix to obtain 0 in row 3, column 1choose either G13 (φ) or G23 (φ)?choose G13 (φ) as G23 (φ) will affect row 2, column 1 which would be counter productive

G13 (φ) =

cosφ 0 − sinφ0 1 0

sinφ 0 cosφ

choose φ on x = 5 and y = 12⇒ l = 13

⇒ G13 (φ) =

513 0 12

130 1 0−12

13 0 513

A(2) = G13 (φ)A(1) =

513 0 12

130 1 0−12

13 0 513

5 390 −5212 13

=

13 270 −520 −31

now use G13 (ψ) or G23 (ψ)? G13 (ψ) would mess up the 0 in 3,1, therefore use G23 (ψ)

x = −52, y = −31⇒ l =√

3665

A(3) = G23 (ψ)A(2) =

13 270√

36650 0

= R upper triangular

with strictly positive diagonal entries.

Note:Gpq (.) makes the (q, p)th element in the current A zero.

Therefore

R = A(3) = G23 (ψ)A(2)

= G23 (ψ)G13 (φ)A(1)

= G23 (ψ)G13 (φ)G12 (θ)︸ ︷︷ ︸G

A

G is a product of Givens rotationseach Gpq (.) is orthogonaltherefore G is orthogonal (a product of orthogonal matrices)

GTG = I = GGT

23

Page 24: M2AA3 - Notes

therefore

GA = R

⇒ GTGA = GTR

⇒ A = QR

whereQ = GT

Note:QTQ =

(GT)T

GT = GGT = I

therefore Q orthogonaltherefore it is a QR Factorisation of A

General A ∈ Rm×n with m ≥ nApply a sequence of Givens Rotations to take A to R ∈ Rm×n upper triangular with strictlypositive diagonal entries

GA = R

whereG = Gnm . . .Gnn+1︸ ︷︷ ︸

column n

. . .G2m . . .G23︸ ︷︷ ︸column 2

G1m . . .G12︸ ︷︷ ︸column 1

Gpq makes (q, p)th element zeroif y = 0, then Gpq = I Gpq ∈ Rm×mLet Q = GT ∈ Rm×m ⇒ Q is orthogonal

GA = R⇒ A = QR (QR factorisation)

We might be interested in solving

A√

m×nx?

n×1= b

m×1(m ≥ n)

Apply G ∈ Rm×m to Ax = b

GAx = Gb

⇒ Rm×n

xn×1

= c ∈ Rm×1 (equiv. system to Ax = b)

see diagram 20081028.M2AA3.1

24

Page 25: M2AA3 - Notes

If m > n and if ci 6= 0 for some i = n + 1 → m, there is no solution to Rx = c(≡there isno solution to Ax = b)INCONSISTENT SYSTEM (Return to this later in the course)

Otherwise i.e. ci = 0 i = n+ 1→ m

∃!x ∈ Rn!=unique

such that Ax = b (Rx = c)

Solve by backward substitution

xn =cnrnn

xi =

(ci −

∑nj=i+1 rijxj

)rii

i = n− 1, n− 2, . . . , 2, 1

This is all that’s needed to do questions on sheet 1.

1.4 Cauchy-Schwartz Inequality

For geometric vectors in R3

a · b = |a| |b| cos θ

see diagram 20081009.M2AA3.2

Generalises to Rn

〈a,b〉 = aTb = ‖a‖ ‖b‖ cos θ⇐⇒ |〈a,b〉| = ‖a‖ ‖b‖ |cos θ| ≤ ‖a‖ ‖b‖

Theorem (Cauchy-Schwarz Inequality)

If you take any vector a,b ∈ Rn, then

|〈a,b〉| ≤ ‖a‖ ‖b‖

with equality if and only if a and b are linearly dependent.

Proof:If a = 0, then 〈a,b〉 = 0 and ‖a‖ = 0 so result is trivial

25

Page 26: M2AA3 - Notes

If a 6= 0, then q = a‖a‖ ⇒ 〈q,q〉 = ‖q‖2 = 1

let

c = b− 〈b,q〉q⇒ 〈c,q〉 = 〈b− 〈k,q〉q,q〉

=(3)〈b,q〉 − 〈b,q〉 〈q,q〉︸ ︷︷ ︸

1

= 0

see diagram 20081028.M2AA3.2

0 ≤ ‖c‖2 = 〈c, c〉 = 〈c,b− 〈b,q〉q〉=(2)〈c,b〉 − 〈b,q〉 〈c,q〉

= 〈c,b〉=(3)〈b− 〈b,q〉q,b〉

= 〈b,b〉 − 〈b,q〉 〈q,b〉= ‖b‖2 − [〈q,b〉]2

∴ [〈q,b〉]2 ≤ ‖b‖2

∴ [〈a,b〉]2 ≤ ‖a‖2 ‖b‖2

Taking square root ⇒ desired result

Equality if and only if c = 0i.e. b a multiple of qi.e. b a multiple of ai.e. a and b lin. dep.

1.5 Gradients and Hessians

f : R→ R, f (x)

one independent variable

see diagram 20081029.M2AA3.1

Taylor Series

f (a+ h) = f (a) + hf ′ (a) +h2

2f ′′ (a) +

O(h3)↑R , where |R| ≤ Ch3, write O

(h3)

26

Page 27: M2AA3 - Notes

We want to generalise this to functions of n independent variables

f : Rn → R f (x1, x2, . . . xn)

Write f (x) where x =

x1...xn

∈ RnPartial derivative of f with respect to xi, write as δf

δxi(x);

(differentiate f with respect to x; holding x1, . . . xi−1, xi+1, . . . , xn as constants)

Ex. n = 2, f (x1, x2), x =(x1

x2

)∈ R2

f (x) = sinx1 sinx2

δf

δx1(x) = cosx1 sinx2

δf

δx2(x) = sinx1 cosx2

δ2f

δxiδxj=

δ

δxi

[δf

δxj

](?)=

δ

δxj

[δf

δxi

]=

δ2f

δxjδxii, j = 1→ n

(?)← if both derivatives exist and are continuous

Ex.

δ2f

δx2δx1(x) =

δ

δx2

[δf

δx1(x)]

= cosx1 cosx2

‖δ2f

δx1δx2(x) =

δ

δx1

[δf

δx2(x)]

= cosx1 cosx2

δ2f

δx21

δx1

[δf

δx1(x)]

= − sinx1 sinx2

δ2f

δx22

δx2

[δf

δx2(x)]

= − sinx1 sinx2

Chain Rule

(n = 1)f : R→ R, f (x)

Change variables t = t (x)⇐⇒ x = x (t)e.g.

x (t) = t2 ⇐⇒ t (x) = x12

27

Page 28: M2AA3 - Notes

Let w (t) = f (x (t))dw

dt(t) =

df

dx(x (t))

dx

dt(t)

Extend this to

f : Rn → R, f (x) , x =

x1...xn

∈ RnExample:

x (t) =√

a + t

h (√

= fixed)

see diagram 20081029.M2AA3.2

⇒ xi (t) = ai + thi i = 1→ n

In generalf (x) , x (t)

Let w (t) = f (x (t))

dw

dt(t) =

δf

δx1(x (t))

dx1

dt(t) + . . .

δf

δxn(x (t))

dxndt

(t)

dw

dt(t) =

n∑i=1

δf

δxi(x (t))

dxidt

(t) (11)

Ex. n = 2, f (x1, x2) , x =(x1

x2

)∈ R2

f (x) = sinx1 sinx2

x1 (t) = t2, x2 (t) = cos t

⇒ w (t) = f (x (t)) =u

sin t2v

sin (cos t)dw

dt(t) = cos t2sin (cos t) 2t︸ ︷︷ ︸

vu′

+ sin t2u

cos (cos t) (− sin t)︸ ︷︷ ︸v′

δf

δx1(x (t))

dx1

dt(t) +

δf

δx2(x (t))

dx2

dt(t)

28

Page 29: M2AA3 - Notes

going back to example

f : Rn → R, f (x) , x =

x1...xn

∈ Rnx (t) = a + th

see diagram 20081029.M2AA3.3

⇒ xi (t) = ai + thi

⇒ dxidt (t) = hi i = 1→ n

Letw (t) = f (x (t)) = f (a + th) (12)

=⇒

see diagram 20081029.M2AA3.4

Taylor series for w (t)⇒

w (1) = w (0) + 1 · w′ (0) +1212w′′ (0) + . . .

= w (0) + w′ (0) +12w′′ (0)

(11)⇒(12)

f (a + h) = f (a) +n∑

9=1

δf

δxi(a)hi + . . . (13)

29

Page 30: M2AA3 - Notes

(11)

dw

dt(t) =

n∑i=1

hiδ

δxif (x (t))

∴d

dt≡

n∑i=1

hiδ

δxi

⇒(d

dt

)m≡

(n∑i=1

hiδ

δxi

)m

∴d2w

dt2(t) =

n∑j=1

hjδ

δxj

( n∑i=1

hiδ

δxi

)f (x (t))

=n∑j=1

n∑i=1

hjhiδ2f

δxjδxi(x (t))

⇒ w′′ (0) =n∑i=1

n∑j=1

hihjδ2f

δxjδxi(a)

inserting this into (13)

=⇒ f (a + h) = f (a) +n∑i=1

hiδf

δxi(a) +

12

n∑i=1

n∑j=1

hihjδ2f

δxjδxi(a) +O

(‖h‖3

)compare this to (11).

We introduce the GRADIENT of f (gard f - vector of first order partial derivatives)

∇f (x) ∈ Rn

∇f (x) =

δfδx1

(x)...

δfδxn

(x)

i.e.

[∇f (x)]i =δf

δxi(x) i = 1→ n

Introduce the HESSIAN of f (matrix of second derivatives)

D2f (x) ∈ Rn×n[D2f (x)

]ij =

δ2f

δxiδxj(x) i, j = 1→ n

"smooth" f ⇒ D2f (x) is symmetric

30

Page 31: M2AA3 - Notes

n = 2

D2f (x) =

δ2fδx2

1(x) δ2f

δx1δx2(x)

↙ δ2fδx2

2(x)

A ∈ Rn×n Aij

An×n

xn×1∈ Rn [Ax]i =

n∑j=1

Aijxj

xT

1×nAn×n

xn×1

= xT (Ax) =n∑i=1

xi (Ax)i

=n∑i=1

n∑j=1

xiAijxj

∴ f (a + h) = f (a) + hT∇f (a) +12hTD2f (a)h +O

(‖h‖3

)

Ex.f (x) = xTAx ∀x ∈ Rn

where A ∈ Rn×n and is symmetric.∴ f : Rn → Rfind (i) ∇f (x) (the gradient of f) (ii) D2f (x) (the hessian of f)

(i)

f (x) = xTAx =n∑i=1

n∑j=1

Aijxixj

[∇f (x)]p =δf

δxp(x) =

n∑i=1

n∑j=1

Aijδ

δxp(xixj)

δ

δxp(xixj) =

δxiδxp

xj + xiδ

δxpxj

x1, . . . , xn are independent variables

⇒ δxiδxp

= δip i, p = 1→ n

31

Page 32: M2AA3 - Notes

∴ [∇f (x)]p =n∑i=1

n∑j=1

Aij (δipxj + xiδjp)

=n∑j=1

Apjxj +

(AT)pi

‖n∑i=1

Aipxi = [Ax]p +[ATx

]p

⇒ ∇f (x) = Ax +ATx

= 2Ax if AT = A

(ii)[D2f (x)

]qp

= δ2fδxqδxp

(x)

we know thatδf

δxp(x) =

n∑j=1

Apjxj +n∑i=1

(AT)pixi

⇒ δ2f

δxqδxp(x) =

n∑j=1

Apjδjq +n∑i=1

(AT)piδiq

= Apq +(AT)pq

Note:

δ = kronecker deltaδf

δx= partial derivative

∴ D2f (x) = A+AT

= 2A if AT = A

∴ f (x) = xTAxf : Rn → R

∇f (x) = 2Ax ∈ Rn

D2f (x) = 2A ∈ Rn×n

Analogue of f (x) = ax2 a ∈ Rf : R→ R f ′ (x) = 2ax f ′′ (x) = 2a

32

Page 33: M2AA3 - Notes

Definition:f : Rn → R

f (x) has a LOCAL MINIMUM at x = aif ∀u ∈ Rn, ‖u‖ = 1,∃ε > 0 such that

f (a + hu) ≥(≤)

f (a) ∀h ∈ [0, ε]

see diagram 20081030.M2AA3.1

n = 1, f : R→ Ru

see diagram 20081030.M2AA3.2

Reminder: Taylor Series

f (a + h) = f (a) + hT∇f (a) +12hTD2f (a)h +O

(‖h‖3

)(14)

Proposition

Let n = 1. Then f ′ (a) = 0 and f ′′ (a) > 0<0

are sufficient conditions for f to have a local

minimummaximum

at x = a.

Proof: n = 1⇒ u = ±1

see diagram 20081104.M2AA3.1

f (a± h) = f (a)± hf ′ (a) +12

(±h)2 f ′′ (a) +O(h3)

= f (a) +12h2f ′′ (a)︸ ︷︷ ︸

>0

+O(h3)

︸ ︷︷ ︸≥0 for h sufficiently small

as f ′ (a) = 0

≥(≤)

f (a) for h sufficiently small ⇒ x = a is a local minimum(maximum)

of f

33

Page 34: M2AA3 - Notes

Proposition

If ∇f (a) 6= 0, then f (x) does not have a local minimum or maximum at x = a

Proof: Put h = hu, ‖u‖ = 1, in (14)

⇒ f (a + hu) = f (a) + huT∇f (a) +O(h2)

for h ≥ 0

∇f (a) 6= 0

letu = ± ∇f (a)

‖∇f (a)‖⇒ ‖u‖ = 1

∴ f

(a± h ∇f (a)

‖∇f (a)‖

)= f (a) +

h

‖∇f (a)‖‖∇f (a)‖2 +O

(h2)

= f (a)± h‖∇f (a)‖︸ ︷︷ ︸>0

+O(h2)

>(<)

f (a) for h sufficiently small, no local min or max

∴ ∇f (a) = 0 is a necessary condition for f (x) to have a local minimum or maximum atx = a. Points a where ∇f (a) are called stationary points of f (x).

Proposition

If ∇f (a) = 0 andwTD2f (a)w >

(<)0 ∀w ∈ Rn

and w 6= 0then f (x) has a local minimum

maximumat x = a.

Proof: h = hu, ‖u‖ = 1, in (14)

⇒ f (a + hu) = f (a) + huT∇f (a)︸ ︷︷ ︸=0as ∇f(a)=0

+12h2uTD2f (a)u +O

(h3)

︸ ︷︷ ︸≥(≤)

0 for h suff. small if w2D2f(a)w >0(<)w 6=0

≥(≤)

f (a) for h suff. small ⇒ local minmax

Ex.

n = 2, x =(x1

x2

)∈ R2

f (x) =(x2

1 − 2x1 + 1)

+(x2

2 − 2x2 + 1)

f : R2 → R

34

Page 35: M2AA3 - Notes

∇f (x) =

[δfδx1

(x)δfδx2

(x)

]=[

2x1 − 22x2 − 2

]= 2

(x1 − 1x2 − 1

)∈ R2

∇f (a) = 0⇔ a =(

11

)only one stationary point

Find

D2f (x) =

δ2fδx2

1(x) δ2f

δx1δx2(x)

↙ δ2fδx2

2(x)

=[

2 00 2

]= 2I ∈ R2×2

wTD2f (a)w = 2wTw = 2 ‖w‖2 > 0

∴ a =(

11

)is a local minimum (also a global minimum as it’s the only stationary point).

[Obvious as f (x) = (x1 − 1)2 + (x2 − 1)2]

Definition

A ∈ Rn×n is called positive definite if

xTAx > 0 ∀x ∈ Rn, x 6= 0

or negative definite ifxTAx < 0 ∀x ∈ Rn, x 6= 0

or non-negative definite ifxTAx ≥ 0 ∀x ∈ Rn

or non-positive definite ifxTAx ≤ 0 ∀x ∈ Rn

Example:

n = 2, A =(

1 −1−1 1

), x =

(x1

x2

)∈ R2

xTAx =2∑i=1

2∑j=1

Aijxixj = x21 + x2

2 − 2x1x2

= (x1 − x2)2 ≥ 0 ∀x ∈ R2

∴ A is non-negative definite but not positive definite.

e.g. a =(

11

)aTAa = 0

Using these definitions, we can rewrite the above proposition

35

Page 36: M2AA3 - Notes

Proposition

If ∇f (a) = 0 and D2f (a) is{

positivenegative

}definite, then f (x) has a local

{minimummaximum

}at x = a.

1.6 Inner Products Revisited and Positive Definite Matrices

Let A ∈ Rn×n be symmetric(AT = A

)and positive definite

(xTAx > 0 ∀x ∈ Rn, x 6= 0

)Generalise the idea of an inner product by defining

〈u,v〉A = uTAv ∀u,v ∈ Rn

(perviously 〈u,v〉 ≡ 〈u,v〉I = uTIv = uTv)

Make sure the properties of the inner product still hold with this new definition

〈., .〉A : Rn × Rn → R

〈v,u〉A = vTAu =(vTAu

)T= uTATv AT=A= uTAv

= 〈u,v〉A∴ 〈v,u〉A = 〈u,v〉A ∀u,v ∈ Rn (symmetric)Easy to show that

〈u, αv + βw〉A = α 〈u,v〉A + β 〈u,w〉A〈αu + βv,w〉A = α 〈u,w〉A + β 〈v,w〉A

}∀u,v,w ∈ Rn, ∀α, β ∈ R

Introduce the idea of a generalised norm (length) by defining

‖u‖A = [〈u,u〉A]12 ∀u ∈ Rn

Note:

〈u,u〉A = uTAu > 0 if u 6= 0= 0 if and only if u = 0

∴ ‖u‖A ≥ 0 ∀u ∈ Rn

‖u‖A = 0 if and only if u = 0

A key property of positive definite matrices is that they are invertiblei.e.

Ax = 0⇒ xTAx = 0⇒ x = 0

⇒columns of A are linearly independent ⇒ A−1 exists.

36

Page 37: M2AA3 - Notes

Theorem (Generalised Cauchy-Schwarz Inequality)

If A ∈ Rn×n is symmetric positive definite, then

|〈u,v〉A| ≤ ‖u‖A ‖v‖A ∀u,v ∈ Rn

with equality if and only if u and v are linearly dependent.

Proof: Simply replace 〈., .〉 by 〈., .〉A and ‖.‖ by ‖.‖A in the original proof.

It is easy to generate symmetric positive definite matrices.Given P ∈ Rn×n, which is invertible i.e. P−1 exists, then A = PTP is symmetric positivedefinite.Check:

A ∈ Rn×n√

AT =(PTP

)T= PT

(PT)T

= PTP = A√

any x ∈ Rn xTAx = xTPTPx = (Px)T (Px) = ‖Px‖2 ≥ 0

Note‖Px‖ = 0⇔ Px = 0⇔ x = 0 as P−1 exists.

∴ A is positive definite.

We now prove the reverse implication.

Theorem

Let A ∈ Rn×n be any symmetric positive definite matrix.Then ∃ an invertible P ∈ Rn×n such that A = PTP .Furthermore we can choose P to be upper triangular with Pii > 0 i = 1 → n (diagonalentries stricly positive) in which case we say that A = PTP is a Cholesky Decomposi-tion/Factorisation of A.

Proof:

Let {v}ni=1 be any n linearly independent vectors in Rn.Using the inner product 〈., .〉A induced by A, 〈a,b〉A = aTAb, we apply the Classical Gram-Schmidt process to {vi}ni=1.

u1 =v1

‖v1‖A⇒ ‖u1‖A = 1

wi = vi −i−1∑j=1

〈vi,uj〉A uj ⇒ 〈wi,wj〉 = 0 j = 1→ i− 1, i = 2→ n

ui =wi

‖wi‖A⇒ ‖ui‖A = 1 i = 2→ n

37

Page 38: M2AA3 - Notes

⇒ ui is a linearl combination of {vj}ij=1 , i = 1→ n and

〈ui,uj〉A = δij i, j = 1→ n

Let

U = [u1,u2, . . . ,un] ∈ Rn×n

⇒ AU = [Au1, Au2, . . . , Aun] ∈ Rn×n

UTAU ∈ Rn×n

UT =

uT

1

uT2...

uTn

[UTAU

]ij

= uTi↑

ith row of UT

· Auj↑

jth col. of AU

= 〈ui,uj〉A = δij i, j = 1→ n

∴ UTAU = I(n)

∴ U−1 = UTA exists⇒

(U−1

)T =(UTA

)T= ATU = AU

LetP = U−1 ∈ Rn×n P−1 = U exists

PTP =(U−1

)TU−1 = AUU−1 = A

To show that we can choose P upper triangular with Pii > 0, i = 1 → n, we chooseparticular {vi}ni=1

Let vi = e(n)i =

0...010...0

← ith position ∈ Rn

(e(n)i

)j

= δij i, j = 1→ n

u1 is a multiple of e1 =

×0...0

ui is a linearl combination of

{e(n)j

}ij=1

, i = 2→ n

∴ ui ∈ Rn with (ui)k = 0 if k > i i = 1→ n

⇒ U = [u1,u2, . . . ,un] ∈ Rn×n upper triangular

38

Page 39: M2AA3 - Notes

we now show that (ui)j > 0 i = 1→ n

u1 =e(n)

1∥∥∥e(n)1

∥∥∥A

=

×0...0

↖ strictly positive

ui =wi

‖wi‖A∴ (ui)i > 0 if and only if (wi)i > 0 i = 2→ n

wi = e(n)i −

i−1∑j=1

⟨e(n)i ,uj

⟩A

uj

(uj)k = 0 if k > j j = 1→ n

⇒ (wi)i =(e(n)i

)i= 1 > 0

∴ U ∈ Rn×n is upper triangular with Uii = (ui)i > 0 i = 1→ nFind

P = U−1 = [p1,p2, . . . ,pn]

UP = I(n) =[e(n)

1 , e(n)2 , . . . , e(n)

n

]i.e.

[Up1 , Up2 , Up3 , . . . ] =[e(n)

1 , e(n)2 , . . .

]i.e.

Upi = e(n)i i = 1→ n (15)

solve by backwards substitution∴ (pi)n = (pi)n−1 = · · · (pi)i+1 = 0i.e.

(pi)k = 0 for k > i i = 1→ n

ith row of (15)

Uii (pi)i +n∑

k=i+1

Uik (pi)k︸ ︷︷ ︸=0 as(pi)k=0 for k>i

=(e(n)i

)i= 1

∴ (pi)i = 1Uii

> 0 i = 1→ n∴ P is upper triangular with Pii = (pi)i > 0 i = 1→ n

Proposition

A ∈ Rn×n symmetric positive definite⇒ Akk > 0 k = 1→ n and

|Ajk| < (Ajj)12 (Akk)

12 j, k = 1→ n j 6= k

39

Page 40: M2AA3 - Notes

Proof: From the above theorem A = PTP , P ∈ Rn×n, P−1 existsLet

P {p1,p2, . . . ,pn} pi ∈ Rn P−1 exists⇒ {pi}ni=1 lin. indep.

A = PTP =

pT

1

pT2...

pTn

[p1 p2 . . . pn]

∴ Ajk = pTj pk j, k = 1→ n

∴ Akk = pTk pk = ‖pk‖2 > 0 k = 1→ n as pk 6= 0

|Ajk| =∣∣pTj pk

∣∣ = |〈pj ,pk〉| <cauchy schwarz inequality

‖pj‖ ‖pk‖ j, k = 1→ n j 6= k

it is a strict inequality as {pi}ni=1 lin. ind.

Using the result ‖p‖k = (Akk)12 k = 1→ n

⇒ |Ajk| < (Ajj)12 (Akk)

12 j, k = 1→ n j 6= k

Compute a Cholesky Decomposition of A.Let L = PT. Find a lower triangular matrix L ∈ Rn×n with Lii > 0 i = 1→ n such thatA = LLT.Could compute L = PT, where P = U−1 and U = [u1 . . . un] with{

e(n)i

}ni=1

CGS→〈.,.〉A

{ui}ni=1

there is, however, an easier way.

Let L = [l1 l2 . . . ln] li ∈ Rn(lower triangular and Lii > 0 i = 1→ n)

L =

× 0... ×...

.... . .

......

. . .× × ×

A = LLT

Aij =n∑k=1

Lik(LT)kj

=n∑k=1

LikLjk =n∑k=1

(lk)i (lk)j

40

Page 41: M2AA3 - Notes

Note:lkn×1

lTk1×n∈ Rn×n ⇒

(lklTk

)ij

= (lk)i (lk)j

∴ Aij =n∑k=1

(lklTk

)ij

A =n∑k=1

lklTk

Example

n = 3Find the Cholesky Decomposition of

A =

A12 2 −1 0−1 5

2 −10 −1 5

2

i.e. Find L ∈ R3×3 lower triangular Lii > 0 i = 1→ 3 such that A = LLT.Is A symmetric positive definite?Cearly AT = A

Akk > 0 k = 1→ 3

|A12| = |−1| = 1 <√

2

√52

=√

5 = (A11)12 (A22)

12 etc.

The above are necessary, and not sufficient.Check directly that A is positive definite.

x =

x1

x2

x3

6= 0⇒ xTAx =3∑i=1

3∑j=1

Aijxixj

= 2x21 +

52x2

2 +52x2

3 −2x1x2 − 2x2x3︸ ︷︷ ︸+2

„−x1

r

«x2s

+2

„−x2

r

«x3s

41

Page 42: M2AA3 - Notes

Note:

(r + s)2 = r2 + 2rs+ s2 ≥ 0

⇒ rs ≥ −12(r2 + s2

)∀r, s ∈ R (16)

≥applying (16)

2x21 +

52x2

2 +52x2

3 −(x2

1 + x22

)−(x2

2 + x23

)

= x21 +

12x2

2 +32x2

3

≥ 12

3∑i=1

x2i =

12‖x‖2 > 0 ∀x ∈ R3 x 6= 0

Recapp from exercisen = 3

A =

A12 2 −1 0−1 5

2 −10 −1 5

2

symmetric, positive definite

L = [l1 l2 l3] lower triangular

A = l1lT1 + l2lT2 + l3lT3

find L

l1 =

×××

l2 =

0××

l3 =

00×

A = lT1 l1 + l2lT2 + l3lT3 (17) × × ×

× × ×× × ×

symmetric

+

0 0 00 × ×0 × ×

symmetric

+

0 0 00 0 00 0 ×

∴ first column/row of A is generated by l,Equate first columns of l1lT1

3×3and A

3×3 (l1)1 (l1)1(l1)2 (l1)1(l1)3 (l1)1

=

A11

A21

A31

⇒ (l1)i =Ai1(l1)1

i = 1→ 3

42

Page 43: M2AA3 - Notes

but [(l1)1]2 = A11

∴ (l1)i =Ai1√A11

i = 1→ 3

∴ in this example

l1 =1√2

2−10

Let

A(1) = A− l1lT1

=

2 −1 0−1 5

2 −10 −1 5

2

− 12

4 −2 0−2 1 00 0 0

⇒ A(1) =

0 0 00 2 −10 −1 5

2

= l2lT2 + l3lT3 by (17)

2nd column/row of A(1) is generated by l2

(l2)i =A

(1)i2√A

(1)22

⇒ l2 =1√2

02−1

A(2) = A(1) − l2lT2

=

0 0 00 2 −10 −1 5

2

− 12

0 0 00 4 −20 −2 1

=

0 0 00 0 00 0 2

= l3lT3 by (17)

∴ l3 = 1√2

002

∴ L = [l1 l2 l3] =1√2

2 0 0−1 2 00 −1 2

lower triangular Lii > 0 i = 1→ 3

Check A = LLT√

43

Page 44: M2AA3 - Notes

Now consider the above constructive algorithm in the general casei.e. A ∈ Rn×n symmetric positive definite.Since A11 > 0, we can start the algorithm

l1 =1√A11

A11

A21...

An1

Let A(1) = A− l1lT1 ∈ Rn×n (symmetric as A, l1lT1 are both symmetric)and has the structure

A(1) =

0 −→ 0↓0 B

, where B ∈ R(n−1)×(n−1) is symmetric

To continue we needA

(1)22 = B11 > 0

We will now prove that B is positive definite ⇒ Bkk > 0 k = 1→ n− 1

To do this, we note that

e1 =

10...0

∈ Rn ⇒ Ae1 =

A11

A21...

An1

← first column of A

eT1 Ae1 = A11

∴ l1 =1√eT

1 AAe1

Theorem

B ∈ R(n−1)×(n−1), as defined above, is positive definite

Proof: We need to show that uTBu > 0 ∀u ∈ Rn−1, u 6= 0.

Given u ∈ Rn−1, u 6= 0, let v =(

0u

)∈ Rn so v 6= 0

eT1 v = 0, e1,v1 6= 0⇒ e1,v lin. ind.

44

Page 45: M2AA3 - Notes

uTBu = vTA(1)v

= vT(A− l1lT1

)v

= vTAv − vT l1 lT1 v

1× n n× 1︸ ︷︷ ︸1× n n× 1︸ ︷︷ ︸(lT1 v)T

= vTAv −(lT1 v

)2l1 =

1√eT

1 Ae1

Ae1

⇒ lT1 =1√

e1Ae1eT

1

AqAT

∴ uTBu = vTAv −(eT

1 Av)2

eT1 Ae1

= 〈v,v〉A −[〈e1,v〉A]2

〈e1, e1〉A

=‖v‖2A ‖e1‖2A − [〈e1,v〉A]2

‖e1‖2AApply Cauchy-Schwarz inequality

|〈e1,v〉A| <e1,v lin ind.

‖e1‖A ‖v‖A

⇒ uTBu > 0 ∀uRn−1, u 6= 0

∴ B is positive definite.and

B = BT ⇒ Bkk > 0 k = 1→ n− 1

∴ A(1)22 = B11 > 0⇒ Cholesky Decomposition can continue etc.

Application of the Cholesky Decomposition

Given A ∈ Rn×n symmetric positive definite.If we find the Cholesky Decomposition of A

i.e. A = LLT L ∈ Rn×n lower triangular with Lii > 0 i = 1→ n,

then it is easy to solve An×n

xn×1

= bn×1

for a given b ∈ Rn.

Ax = b⇐⇒√

L

LT ?x︸︷︷︸

z?

=√

b

45

Page 46: M2AA3 - Notes

∴ Llower triang.

z = b and LT

upper triang.x = z

Solve for z by a forward solve

z1 =b1L11

zk =

(bk −

∑k−1j=1 Lkjzj

)Lkk

k = 2→ n

Solve for x by a back solve

xn =znLTnn

=znLnn

xk =

zk −∑nj=k+1

(LT)kj︸ ︷︷ ︸

Ljk

xj

(LT)kk︸ ︷︷ ︸

Lkk

k = (n− 1)→ 1

1.7 Least Squares Problems

Give A ∈ Rm×n (m ≥ n) , b ∈ RmFind x ∈ Rn such that

Ax = b(m equationsn unknowns

)If m > n, then generally there is no solution x to Ax = b.So find an approximate solution in some sense.Find x∗ ∈ Rn such that Ax∗ − b is “small”. Make this precise.

Example

Pendulum

see diagram 20081113.M2AA3.1

length l, period T √g

lT = 2π

estimate g (acceleration due to gravity) from the above relationship.

46

Page 47: M2AA3 - Notes

Let L =√l and c = 2π√

g ⇒√

Lc =√

T

Do m experiments.Lm×1

c1×1

= Tm×1

L =

L1...Lm

T =

T1...Tm

see diagram 20081113.M2AA3.2

Fit a straight line through the origin to the dots

Choose c ∈ R to minimise the sum of the squares of the errors

minc∈R

m∑i=1

(Ti − cLi)2 = minc∈R‖T− cL‖2

Let

S (c) = ‖T− cL‖2 = 〈T− cL,T− cL〉= 〈T,T〉 − 2c 〈L,T〉+ c2 〈L,L〉

∴ S (c) = ‖T‖2 − 2c 〈L,T〉+ c2 ‖L‖2

dS

dc(c) = −2 〈L,T〉+ 2c ‖L‖2

d2S

dc2(c) = 2 ‖L‖2 > 0

dS

dc(c∗) = 0 ⇐⇒ c∗ =

〈L,T〉‖L‖2

∴ S (c∗) ≤ S (c) ∀c ∈ R

c∗ is the global minimum of S (c).

Note:

c∗ ∈ R is such that−〈L,T〉+ c∗‖L‖2

〈L,L〉= 0

⇒ 〈T− c∗L,L〉 = 0

47

Page 48: M2AA3 - Notes

see diagram 20081113.M2AA3.3

Generalise to √

Am×n

xn×1

=√

bm×1

m > n generally no solution x as we have an overdetermined system.Find x∗ ∈ Rn such that

‖Ax∗ − b‖ ≤ ‖Ax− b‖ ∀x ∈ Rn

i.e.minx∈Rn

‖Ax− b‖ ≡ minx∈Rn

‖Ax− b‖2

Let Q(x) = ‖Ax− b‖2Q : Rn → R

minQ (x)

Q1Sheet2

Recall -Given A ∈ Rm×n, b ∈ Rm (m ≥ n)For m > n, in general, no solution x ∈ Rn to Ax = b.Find approximate solution x∗ ∈ Rn such that

Q (x∗) ≤ Q (x) = ‖Ax− b‖2 ∀x ∈ Rn

Q : Rn → R

Q (x) = ‖Ax− b‖2 = 〈Ax− b, Ax− b〉 = (Ax− b)T (Ax− b)=

(xTAT − bT

)(Ax− b) = xTATAx− bTAx− xTATb + bTb

= xTATAx− 2bTAx+ bTb since xTATb =(xTATb

)T= bTAx ∈ R

= xTGx− 2µTx + ‖b‖2

where G = AT

n×nAm×n

∈ Rn×n and µ = AT

n×mb

m×1∈ Rn

NoteGT =

(ATA

)T= ATA = G⇒ G ∈ Rn×n symmetric

µ = ATb⇒ µT = bTA

Recall Q1 on Sheet2 ⇒

∇Q (x) = 2(Gx− µ

)D3Q (x) = 2G

48

Page 49: M2AA3 - Notes

Theorem

Let A ∈ Rm×n (m ≥ n) has n linearly independent columns, and b ∈ Rm.Then ATA ∈ Rn×n is symmetric positive definiteMoreever ∃ unique x∗ ∈ Rn such that ATAx∗ = ATb [Normal Equations of Ax = b] andx∗ is the global minimum of Q (x) = ‖Ax− b‖2 x∗ is called the least squares solution ofAx = b.Proof: ATA is symmetric - see above

A = [a1 a2 . . . an] , ai ∈ Rm lin. ind.

cTATAc = (Ac)TAc = ‖Ac‖2 ≥ 0

and = 0 if and only if Ac = 0 i.e.n∑i=1

ciai = 0

{ai}ni=1 lin. ind. ⇒ c = 0

∴ cTATAc > 0 ∀c ∈ Rn, c 6= 0∴ ATA ∈ Rn×n is symmetric positive definite.∴(ATA

)−1 exists∴ ∃ unique x∗ ∈ Rn solving ATAx∗ = ATbNow show x∗ is the global minimum of Q (x)

Q (x) = ‖Ax− b‖2 = xTATAx− 2(ATb

)Tx + ‖b‖2

⇒ ∇Q (x) = 2(ATAx−ATb

)D2Q (x) = 2ATA

For x∗ ∈ Rn a local minimum of Q (x), we require ∇Q (x∗) = 0 and D2Q (x∗) to be sym-metric positive definite.∴ ∇Q (x∗) = 0⇐⇒ ATAx∗ = ATb∴!x∗ such that ∇Q (x∗) = 0, D2Q (x∗) = ATA s.p.d. (symmetric positive definite) ⇒ x∗

global minimum of Q (x).

Ex m = 3, n = 2

A =

3 654 012 13

b =

111

easy to show that ∃ no x ∈ R2 solving Ax = b∴ compute the least squares solution x∗ ∈ R2 such that

ATAx∗ = ATb

ATA =[

3 4 1265 0 13

]AT2×3

3 654 012 13

A

3×2

=(

169 351351 4394

)AT

2×3b

3×1=(

1978

)2×1

49

Page 50: M2AA3 - Notes

(169 351351 4394

)x∗ =

(1978

)x∗ =

(0.090587 . . .0.010515 . . .

)Note Ax∗ − b 6= 0

In practice, it is not a good idea to solve the normal equations (ATAx∗ = ATb) sinceATA is generally ill-conditioned.A matrix B ∈ Rn×n is ill-conditioned if small changes to the RHS of the system Bx = blead to large changes in the solution, unacceptable errors on a computer.i.e.

Bx = b, B (x + δx) = b + δb

B ill-conditioned: “small” δb⇒”large” δx

Alternative procedure to find x∗

In practice, find x∗ using the QR approach.Take a sequence of Givens rotations

G = Gmn . . . G13G12 such that

Gm×m

Am×n

= Rm×n

upper triangular with rii > 0 i = 1→ n

Gpq p < q

Gpq ∈ Rm×m orthogonal

GTpqGpq = I(m) = GpqG

Tpq

take out original system

Ax = b⇒ GAx = Gb⇒ Rx = Gb ∈ Rm

=

(Gb)1...

(Gb)n0...0

+

0...0

(Gb)n+1...

(Gb)m

= α+ β α, β ∈ Rm⟨

α, β⟩

= 0

50

Page 51: M2AA3 - Notes

Recall

R =

↖ X X0 ↖

↘ X↘0

0 0

↑n↓↑

m−n↓

Ax⇒ b⇒ Rx = α+ β

if β = 0 ⇒ ∃ unique solution x ∈ Rn to Rx = α = Gb

⇒ ∃ unique solution x ∈ Rn to the original system Ax = b

if β 6= 0 ⇒ Rx = Gb = α+ β is an inconsistant system⇒ no solution x

⇒ no solution x to Ax = b

If β 6= 0, solve the consistent systemRx∗ = α

∃!x∗ solving this, solve by a back solve

Claim: x∗ ∈ Rn is the least squares solution to Ax = bi.e.

‖Ax∗ − b‖2 ≤ ‖Ax− b‖2 ∀x ∈ Rn

Orthogonal matrices preserve length ‖Gy‖ = ‖y‖ ∀y ∈ Rm

∴ ‖Ax− b‖2 = ‖G (Ax− b)‖2 =∥∥Rx−

(α+ β

)∥∥2

=⟨(Rx− α)− β, (Rx− α)− β

⟩∴ ‖Ax− b‖2 = ‖Rx− α‖2 +

∥∥β∥∥2 − 2⟨Rx− α, β

⟩︸ ︷︷ ︸=0

since⟨Rx, β

⟩∀x∈Rn

=⟨α, β

⟩= 0

∴ minx∈Rn

‖Ax− b‖2 = minx∈Rn

‖Rx− α‖2 +∥∥β∥∥2

∴ Rx∗ = α⇒ ‖Rx∗ − α‖ = 0

∴ x∗ is such that∥∥β∥∥2 = ‖Ax∗ − b‖2 ≤ ‖Ax− b‖2 ∀x ∈ Rn

51

Page 52: M2AA3 - Notes

Example

Use QR approach on the example abovem = 3, n = 2

A =

3 654 012 13

b =

111

G = G23 (ψ)G13 (φ)G12 (θ) (recall from QR notes)

GA = R =

13 270 (3665)

12

0 0

Gb =

1913501

13(3665)12

41

(3665)12

=

1.46154 . . .0.63659 . . .0.67725 . . .

⇒ α =

1.46154 . . .0.63659 . . .0.67725 . . .

β =

00

0.67725

Solve Rx∗ = α⇒ x∗ =

(0.0905870.010515

)∴ ‖Ax∗ − b‖2 =

∥∥β∥∥2 = (0.67725)2

2 Least Squares Problems (A more abstract approach)

A more abstract definition of an inner product

Definition:

Let V be a real vector spaceAn inner product on V × V is a function 〈., .〉 : V × V → R such that

(i) 〈λu+ µv,w〉 = λ 〈u,w〉+ µ 〈v, w〉

(ii) 〈u, v〉 = 〈v, u〉

(i)+(ii)⇒ 〈w, λu+ µv〉 = λ 〈w, u〉+ µ 〈w, v〉

(iii) 〈u, u〉 ≥ 0 with equality if and only if u = 0 ∈ V

An inner product induces a norm

‖u‖ = [〈u, u〉]12 ∀u ∈ V

⇒ ‖u‖ = 0 if and only if u = 0

52

Page 53: M2AA3 - Notes

ExampleV = C [a, b] continuous functions over the closed interval [a, b]

Let w ∈ C [a, b] with w (x) > 0 ∀x ∈ [a, b] - w is the weight function

Define (two continous functions over the interval [a, b]) 〈f, g〉=∫ ba w (x) f (x) g (x) dx ∀f, g ∈

C [a, b]Clearly 〈., .〉 : V × V → R and (i), (ii) clearly hold.

(iii)

〈f, f〉 =∫ b

aw (x) [f (x)]2 dx ≥ 0 ∀f ∈ C [a, b]

= 0 if and only if f ≡ 0 function

Cauchy-Schwarz Inequality

|〈u, v〉| ≤ ‖u‖ ‖v‖ ∀u, v ∈ V

with strict inequality if and only if u, v are linearly independent.

Proof: same as before.

Abstract Form of the Least Squares Problem

Let V be a real vector space with inner product 〈., .〉. Let U be a finite dimensional subspaceof V with basis {φi}ni=1, by basis we mean linearly independent and span the subspace U .

Given v ∈ V , find u∗ ∈ U such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U

Example

V = C [a, b] , 〈f, g〉 =∫ b

af (x) g (x) dx

(i.e. w(x)≡1)

U = Pn−1 (polynomials of degree ≤ n− 1)Basis φi = xi−1 i = 1→ n

‖v − u∗‖2 ≤ ‖v − u‖2 =∫ b

a[v (x)− u (x)]2 dx

Return to the general case

u ∈ U ⇐⇒ u =n∑i=1

λiφi for some λ =

λ1...λn

∈ R

53

Page 54: M2AA3 - Notes

u∗ ∈ U ⇐⇒ u∗ =n∑i=1

λ∗iφi ⇐⇒ λ∗ ∈ Rn

Finding u∗ ∈ U such that ‖v − u∗‖2 ≤ ‖v − u‖2 ∀u ∈ U⇐⇒ finding λ∗ ∈ Rn such that ‖v −

∑ni=1 λ

∗iφi‖

2 ≤ ‖v −∑n

i=1 λiφi‖2 ∀λ ∈ Rn

Let

E (λ) =

∥∥∥∥∥v −n∑i=1

λiφi

∥∥∥∥∥2

E : Rn → R≥0

Find λ∗ ∈ Rn such that E (λ∗) ≤ E (λ) ∀λ ∈ Rn.

E (λ) =

⟨v −

n∑i=1

λiφi, v −n∑j=1

λjφj

⟩i, j dummy variable

= 〈v, v〉 −n∑i=1

λi 〈φi, v〉︸ ︷︷ ︸−n∑j=1

λj 〈v, φj〉︸ ︷︷ ︸the same

+n∑i=1

n∑j=1

λiλj 〈φi, φj〉

Let µ ∈ Rn such that µi = 〈v, φi〉 i = 1→ nG ∈ Rn×n such that Gij = 〈φi, φj〉 i, j = 1→ n∴ E (λ) = ‖v‖2 − 2µTλ+ λTGλ

=⇒ ∇E (λ) = 2(Gλ− µ

)D2E (λ) = 2G

E (λ∗) is a local minimum of E (λ) if

∇E (λ∗) = 0

and G is positive definite.

∇E (λ∗) = 0⇔ Gλ∗ = µ normal equations

G is called the Gram matrix, it depends on the basis {φi}nn=1 for U .[Sometimes write G (φ1 φ2 . . . φn)]

Lemma:{φi}ni=1 basis for U

⇒ {φi}ni=1 lin. ind.⇒ G is symmetric positive definite

Proof: G ∈ Rn×nGij = 〈φi, φj〉 i, j = 1→ n

Gji = 〈φj , φi〉(ii)= 〈φi, φj〉 = Gij i, j = 1→ n

54

Page 55: M2AA3 - Notes

λTGλ =n∑i=1

n∑j=1

Gijq

〈φi,φj〉

λiλj

(i),(ii)=

⟨n∑i=1

λiφi,n∑j=1

λjφj

=

∥∥∥∥∥n∑i=1

λiφi

∥∥∥∥∥2

≥ 0

and = 0 if and only if∑n

i=1 λiφi = 0 ∈ V ⇐⇒ λ = 0 as {φi}ni=1 linearly independent.

∴ λTGλ > 0 ∀λ ∈ Rn, λ 6= 0 =⇒ G is symmetric positive definite.

G positive definite ⇒ G−1 exists⇒ ∃!λ∗ ∈ Rn solving Gλ∗ = µ normal equations⇒ ∇E (λ∗) = 0

(and is unique - no other stationary points).∴ D2E (λ∗) = 2G symmetric positive definite∴ λ∗ ∈ Rn which solves the normal equations is the global minimum of E (λ)⇒ u∗ =

∑ni=1 λ

∗iφi

Recall -V a real Vector Space, Inner Product 〈., .〉U a finite dimensional subspace, basis {φi}ni=1

Given v ∈ V , find u∗ ∈ U such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U

Then u∗ =∑n

i=1 λ∗iφi, where λ

∗ ∈ Rn is the unique solution of

Gλ∗ = µ Normal Equations

G ∈ Rn×n is the GRAM MATRIX depends on basis for U .

Gij = 〈φi, φj〉 i, j = 1→ n

symmetric positive definite

µ ∈ Rn µi = 〈v, φi〉 i = 1→ n

Theorem (orthogonality property):⟨v − u∗︸ ︷︷ ︸, uerror

⟩= 0 ∀u ∈ U

55

Page 56: M2AA3 - Notes

see diagram 20081124.M2AA3.1

Proof:

Gλ∗ = µ

⇒ λTGλ∗ = λTµ ∀λ ∈ Rn

Implication goes the other way as well

λTGλ∗ = λTµ ∀λ ∈ Rn

Let

λ = ei =

0...010...1

← ith position

=⇒(Gλ∗)i = µi

Repeat for i = 1→ n =⇒ Gλ∗ = µ

∴ Gλ∗ = µ ⇔ λTGλ∗ = λTµ ∀λ ∈ Rn

⇔n∑i=1

n∑j=1

Gijq

〈φi,φj〉

λiλ∗j =

n∑i=1

λi µiq

〈v,φi〉

⟨n∑i=1

λiφi︸ ︷︷ ︸u∈U

,n∑j=1

λ∗jφj︸ ︷︷ ︸u∗

⟩=

⟨v,

n∑i=1

λiφi︸ ︷︷ ︸u∈U

⟩∀λ ∈ Rn

⇔ 〈v − u∗, u〉 = 0 ∀u ∈ U

Example

1. V = C [0, 1], 〈f, g〉 =∫ 10 f (x) g (x) dx

U = Pn−1 Basis{xi−1

}ni=1

i.e. φi = xi−1

u ∈ Pn−1 ⇔ u (x) =∑n

i=1 λixi−1

Given v ∈ C [0, 1], find u∗ (x) =∑n

i=1 λ∗ixi−1i such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ Pn−1

56

Page 57: M2AA3 - Notes

⇐⇒ ‖v − u∗‖2 ≤ ‖v − u‖2 ∀u ∈ Pn−1

⇐⇒∫ 10 (v − u∗)2 dx ≤

∫ 1

0(v − u)2 dx

Find λ∗ from solving the normal equations Gλ∗ = µ

µi = 〈v, φi〉 =∫ 10 v (x)xi−1dx i = 1→ n

Gij = 〈φi, φj〉 =∫ 10 x

i−1xj−1dx =∫ 10 x

i+j−2dx = 1i+j−1 i, j = 1→ n

=⇒ G =

1 1

2 . . . 1n

12

13 . . . 1

n+1...

.... . .

...1n

1n+1 . . . 1

2n−1

the n× n Hilbert Matrix

Badly conditioned, columns → linear dependence as n→∞.

2. V = Rm, 〈a,b〉 = aTb ∀a,b ∈ Rm

U = span {ai}ni=1 where n ≤ m, linearly independent

i.e. φi = ai ∈ RmGiven v ∈ Rm, find u∗ =

∑ni=1 λ

∗i ai such that

‖v − u∗‖ ≤ ‖v − u‖ ∀u ∈ U (18)

Let A = [a1 a2 . . . an] ∈ Rm×n

Am×n

λ∗

n×1=

n∑i=1

λ∗i ai

(18)⇐⇒ ‖v −Aλ∗‖ ≤ ‖v −Aλ‖ ∀λ ∈ Rn

Find λ∗ from solving the Normal Equations Gλ∗ = µ

µ ∈ Rn µi = 〈v, φi〉 = 〈v,ai〉 = aTi v i = 1→ n

G ∈ Rn×n Gij = 〈φi, φj〉 = 〈ai,aj〉 = aTi aj i, j = 1→ n

A = [a1 a2 . . . an] m× n

AT =

aT

1

aT2...

aTn

n×m

⇒ ATA ∈ Rn×n(ATA

)ij

= aT1 aj

⇒ G = ATA

57

Page 58: M2AA3 - Notes

AT

n×mv

m×1∈ Rn

⇒(ATv

)i= aT

i v i = 1→ n

∴ µ = ATv

∴ Gλ∗ = µ⇒ ATAλ∗ = ATv Normal Equations for Aλ = v

Change basis

1.{xi−1

}ni=1

Gram-Schmidt→ {ψ}ni=1 orthonormal

Gij = 〈ψi, ψj〉 = δij i, j = 1→ n =⇒ G ≡ I⇒ λ∗ = µ

where µi = 〈v, ψi〉 i = 1→ n

2.{xi−1

}ni=1

Gram-Schmidt→ {ψ}ni=1 orthogonal

Gij = 〈ψi, ψj〉 = 0 i, j = 1→ ni 6=j

andGii = ‖ψi‖2 > 0 i = 1→ n

⇒ G is a diagonal matrix

Gλ∗ = µ⇒ λ∗i =µi

‖ψi‖2i = 1→ n

∴ u∗ =n∑i=1

〈v, ψi〉‖ψi‖2

ψi

It is very easy to construct this orthogonal basis.

3 Orthogonal Polynomials

V = C [a, b] 〈f, g〉 =∫ b

aw (x) f (x) g (x) dx

Weight function w ∈ C (a, b) with w (x) > 0 ∀x ∈ [a, b](w (x)→∞ as x→ a, or x→ b possibly)

see diagram 20081126.M2AA3.1

58

Page 59: M2AA3 - Notes

Require integral to be well-defined

|〈f, g〉| =∣∣∣∣∫ b

aw (x) f (x) g (x) dx

∣∣∣∣≤

∫ b

a|w (x) f (x) g (x)| dx

=∫ b

aw (x) |f (x)| |g (x)| dx ≤

∫ b

aw (x) dx max

a≤x≤b[|f (x)| |g (x)|]︸ ︷︷ ︸

<∞ as f,g∈C[a,b]

∴ require∫ ba w (x) dx <∞

Ex

[a, b] = [0, 1], w (x) = x−α α > 0

see diagram 20081126.M2AA3.2

w ∈ C (0, 1) ∫ 1

0x−αdx =

[x1−α

1− α

]1

0

<∞ if α < 1

Note, α = 1 ∫ 1

0x−1dx = [lnx]10 �∞

U = Pn polynomials of degree ≤ n

Canonical basis{xi}ni=0⇒ ill-conditioned Gram matrix

so construct a new basis for Pn{φi (x)}ni=0

where φj (x) is a Monic polynomial of degree j, and is also orthogonal 〈φi, φj〉 = 0 i 6= j

φj (x) = xj +j−1∑i=0

ajixi

(monic-leading coefficient is 1).∴ φ0 (x) = 1, φ1 (x) = x− a0, where a0 ∈ Rchosen so that 〈φ0, φ1〉 = 0.

59

Page 60: M2AA3 - Notes

TheoremMonic orthogonal polynomials, φj ∈ Pj , satisfies the three term recurrence relation

φj+1 = (x− aj)φj (x)− bjφj−1 (x) for j ≥ 1

where aj = 〈xφj ,φj〉‖φj‖2

and bj = ‖φj‖2

‖φj−1‖2also for j ≥ 1.

Proof:

φj (x) ∈ Pj , monic ⇒

φj+1 (x)− xφj (x) ∈ Pj ⇒ φj+1 (x)− xφj (x) =j∑

k=0

ckφk (x)

Find ck, k = 0→ n⟨n∑k=0

ckφ(x)k , φi (x)

⟩= 〈φj+1 (x)− xφj (x) , φi (x)〉 i = 0→ j

ci ‖φi (x)‖2 = −〈xφj (x) , φi (x)〉 i = 0→ j (19)

since {φj} orthogonal

〈xφj (x) , φi (x)〉 =∫ b

aw (x)xφj (x)φi (x) dx

=

⟨φj (x) , xφi (x)︸ ︷︷ ︸

∈Pi+1

φj (x) is orthogonal to {φk}j−1k=0

⇒ φj (x) is orthogonal to Pj−1 degree ≤ j − 1∴ if i+ 1 ≤ j − 1, i.e. i ≤ j − 2then 〈xφj (x) , φi (x)〉 = 0

∴ (19)⇒ ci = 0 if i ≤ j − 2∴ φj+1 (x)− xφj (x) = cj−1φj−1 (x) + cjφj (x)

all other coefficients are 0, where

cj−1 =−〈xφj (x) , φj−1 (x)〉‖φj−1 (x)‖2

cj =−〈xφj (x) , φj (x)〉‖φj (x)‖2

=−〈xφj , φj〉‖φj‖2

〈φj , xφj−1〉 =

⟨φj ,

Pj−1︷ ︸︸ ︷xφj−1 − φj

⟩︸ ︷︷ ︸

=0

+ 〈φj , φj〉

60

Page 61: M2AA3 - Notes

∴ cj−1 =−‖φj‖2

‖φj−1‖2

Let aj = −cj = 〈xφj ,φj〉‖φj‖2

Let bj = −cj−1 = ‖φj‖2

‖φj−1‖2

φj+1 (x)− xφj (x) = −ajφj (x)− bjφj−1 (x)

⇒ φj+1 (x) = (x− aj)φj (x)− bjφj−1 (x) j ≥ 1

whereaj = 〈xφj ,φj〉

‖φj‖2

bj = ‖φj‖2

‖φj−1‖2

j ≥ 1√

(20)

φ0 (x) = 1φ1 (x) = x− a0, where a0 ∈ R

such that〈φ1, φ0〉 = 0

i.e.〈x− a01, 1〉 = 0

a0 〈1, 1〉 = 〈x, 1〉

∴ a0 =〈x, 1〉‖1‖2

=〈xφ0, φ0〉‖φ0‖2

∴ extend (20) to j ≥ 0

φj+1 (x) = (x− aj)φj (x)− bjφj−1 (x) j ≥ 0

with φ0 (x) = 1, φ−1 (x) = 0

aj =〈xφj , φj〉‖φj‖2

j ≥ 0 bj =‖φ‖2

‖φj−1‖2j ≥ 1 (21)

Recall -

g (x) is even g(−x) = g (x) ∀x⇒∫ 2−2 g (x) dx = 2

∫ 20 g (x) dx

61

Page 62: M2AA3 - Notes

see diagram 20081127.M2AA3.1

g (x) is odd if g (−x) = −g (x) ∀x⇒∫ 2−2 g (x) dx = 0

see diagram 20081127.M2AA3.2

Ex

〈f, g〉 =∫ 1

−1f (x) g (x) dx

i.e. [a, b] = [−1, 1], w (x) = 1 ∀x ∈ [−1, 1]Find the monic orthogonal polynomials with respect to this inner product.Apply (21)

φ0 (x) = 1 φ1 (x) = x− a0 a0 =〈xφ0, φ0〉‖φ0‖2

=

∫ 1−1 xdx

‖φ0‖2= 0 as x is odd

⇒ φ1 (x) = x

φ2 (x) = (x− a1)φ1 (x)− b1φ0 (x)= (x− a1)x− b1

a1 =〈xφ1, φ1〉‖φ1‖2

=

∫ 1−1 x

3dx

‖φ1‖2= 0

b1 =‖φ1‖2

‖φ0‖2=

∫ 1−1 x

2dx∫ 1−1 12dx

=2∫ 10 x

2dx

2=

13

⇒ φ2 (x) = x2 − 13

etc.φ3 (x) = x3 − 3

5x

Summary

V = C [a, b],

〈f, g〉 =∫ b

aw (x) f (x) g (x) dx ∀f, g ∈ C [a, b]

62

Page 63: M2AA3 - Notes

with constraint that w ∈ C (a, b), w (x) > 0 ∀x ∈ [a, b] and it is integrable∫ b

aw (x) dx <∞

Given f ∈ C [a, b], we are looking to approximate this by a polynomial of degree n, findp∗n (x) ∈ Pn such that the associated norm with this product is minimal, ‖f − p∗n‖ ≥‖f − pn‖ ∀pn ∈ PnOrthogonal basis {φj (x)}nj=0 for Pn ⇒ p∗n (x) =

∑ni=0

〈f,φi〉‖φi‖2

φi (x)

p∗n ∈ Pn is the best approximation to f from Pn, in that norm ‖.‖.

Ex

Show that polynomials Tk (x) = cos(k cos−1 x

)for x ∈ [−1, 1] are orthogonal with respect

to the inner product

〈f, g〉 =∫ 1

−1

(1− x2

)− 12︸ ︷︷ ︸

w(x)

f (x) g (x) dx

see diagram 20081127.M2AA3.3

Tk (x) polynomial?

⇒ T0 (x) = 1T1 (x) = x

Introduce change of variableθ = cos−1 x⇒ x = cos θ

see diagram 20081127.M2AA3.4

x ∈ [−1, 1]⇔ θ ∈ [0, π]∴ Tk (x) = cos kθ

Recall the trigonometric identity

cos (k + 1) θ + cos (k − 1) θ = 2 cos kθ cos θ

⇒ Tk+1 (x) = 2xTk (x)− Tk−1 (x) k ≥ 1

63

Page 64: M2AA3 - Notes

⇒ T2 (x) = 2xT1 (x)− T0 (x)= 2x2 − 1 ∈ P2

T3 (x) = 2xT2 (x)− T1 (x)= 2x

(2x2 − 1

)− x

= 4x3 − 3x

by inductionTk (x) = 2k−1xk + · · · ∈ Pk

not monic.

Show {Tk (x)}k≥1 is orthogonal with respect to

〈f, g〉 =∫ 1

−1

(1− x2

)− 12︸ ︷︷ ︸

w(x)

f (x) g (x) dx

see diagram 20081202.M2AA3.1

∫ 1

−1

(1− x2

)− 12 Tk (x)Tj (x) dx

x = cos θ ⇒ dx

dθ= − sin θ∫ 0

π(sin θ)−1 cos kθ cos jθ (− sin θ) dθ

=∫ π

0cos kθ cos jθdθ

=12

∫ π

0[cos [(k + j) θ] + cos [(k − j) θ]] dθ

=12

[sin (k + j) θ

k + j+

sin (k − j) θk − j

]π0

not valid if k = j ∨ k = j = 0

=

0 if k 6= jπ2 if k = j 6= 0π if k = j = 0

∴ {Tk (x)}k≥0 orthogonal, not orthonormal. These polynomials are called Chebyshev Poly-nomials.

64

Page 65: M2AA3 - Notes

4 Polynomial Interpolation

Abandon best approximation, and consider the more practical approach of polynomial inter-polation.

Given {(zj , fj)}nj=0 with zj , fj ∈ C as j = 0→ n, find pn (z) ∈ Pn such that pn (zj) = fjwith j = 0→ n.Ex. zj , fj ∈ R j = 0→ n

see diagram 20081202.M2AA3.2

pn is called the interpolating polynomial for this data.

Natural Questions

1. Does pn exist?

2. Is pn unique?

3. What is the construction of pn?

1. Prove the existence by a construction proof. Clearly {zj}nj=0 should be distinct.

Lemma

Given {(zj , fj)}nj=0 with zj , fj ∈ C and j = 0→ n, zj distinct.Let

lj (z) =n∏k=0k 6=j

(z − zk)(zj − zk)

j = 0→ n

Then lj (z) ∈ Pn j = 0→ n and lj (zr) = δjr j, r = 0→ n.

Proof

lj (z) is a product of n factors of the form z−zkzj−zk

k 6= j ⇒ lj (z) ∈ Pn

lj (zr) =n∏k=0k 6=j

zr − zkzj − zk

If r = j ⇒ lj (zj) = 1If r 6= j ⇒ one factor = 0 when k = r

⇒ lj (zr) = 0

65

Page 66: M2AA3 - Notes

Example

zj ∈ R j = 0→ n

see diagram 20081202.M2AA3.3

{lj (z)}nj=0 are the lagrange basis functions.

Lemma

The interpolating polynomials pn (z) ∈ Pn for the data {(zj , fj)}nj=0, zj , fj ∈ C j = 0→ n,zj distinct is such that

pn (z) =n∑j=0

fjlj (z)

Proof

lj (z) ∈ Pn j = 0→ n

⇒ pn (x) =n∑j=0

fjlj (z) ∈ Pn

pn (zr) (polynomial with data point zr) you want to guarantee it spews out fr.

pn (zr) =n∑j=0

fjlj (zr) =n∑j=0

fjδjr = fr r = 0→ n

∴ pn (z) interpolates the data {(zj , fj)}nj=0

2. Is pn unique?

Theorem (Fundamental Theorem of Algebra)

Letpn (z) = a0 + a1z = a2z

2 + · · ·+ anzn ai ∈ C, i = 0→ n

Then pn (z) has at most n distinct roots (zeros) in C, unless ai = 0, i = 0→ n⇒ pn (z) ≡ 0.

Recall

Given {(zj , fj)}nj=0, zj , fj ∈ C, zj distinct; find the interpolating polynomial pn ∈ Pn suchthat

pn (zj) = fj j = 0→ n

66

Page 67: M2AA3 - Notes

Lagrange Construction

pn (z) =n∑j=0

fjlj (z) where lj (z)∈Pn

=n∏k=0k 6=j

(z − zk)(zj − zk)

j = 0→ n

lj (zr) = δjr j, r = 0→ n

Is the interpolating polynomial unique?Assume the contrary

∃pn, qn ∈ Pn such that

pn (zj) = qn (zj) = fj j = 0→ n

to get a contradiction, we will use the fundamental theorem of algebra

⇒ pn − qn ∈ Pn and

(pn − qn) (zj) = 0 j = 0→ n

∴ pn − qn ∈ Pn has (n+ 1) distinct roots (zeros) as zj are distinct

F.T.A. ⇒ pn − qn = 0⇒ pn = qn

⇒ uniqueness

Example

Find p2 ∈ P2 such that p2(0)z0

= af0, p2(1)

z1

= bf1, p2(4)

z2

= cf2

n = 2

p2 (z) =2∑j=0

fjlj (z)

l0 (z) =(z − z1) (z − z2)

(z0 − z1) (z0 − z2)=

(z − 1) (z − 4)(−1) (−4)

=14(z2 − 5z + 4

)l1 (z) = · · · = −1

3(z2 − 4z

)l2 (z) = · · · = 1

12(z2 − z

)∴ p2 (z) = al0 (z) + bl1 (z) + cl2 (z) lagrange form

=(a

4+b

3+

c

12

)z2 −

(5a4− 4b

3+

c

12

)z + a canonical form

One could find the coefficients in the canonical form directly by using pn (z) =∑n

k=0 akzk.

We know that

pn (zj) =n∑k=0

akzkj = fj , j = 0→ n,

67

Page 68: M2AA3 - Notes

1 z0 . . . zn01 z1 . . . zn1...

......

1 zn . . . znn

a0

a1...an

=

f0

f1...fn

,

⇒ V a↑

C(n+1)×(n+1)

= f , a, f ∈ Cn+1,

Vjk = zkj , j, k = 0→ n,

V is called Vandermonde matrix (Q4, Sheet5). In general V is ill-conditioned (as zj getsclose to zi, columns i and j become linearly independent (this is why it is ill-conditioned)).

Canonical Basis pn (z) =n∑k=0

akzk,

{zk}nk=0⇒ V a = f ,

You should certainly not use the canonical basis, it looks as if we should use the Lagrangebasis, however there is a flaw in this basis as we will see later, even though it is far better.

Lagrange Basis pn (z) =n∑k=0

fklk (z) ,

{lk (z)}nk=0 ⇒ If = f ,

The Lagrange basis if far better than the canonical basis. However, this basis has to beconstructed. Assume we have found pn−1 ∈ Pn−1, interpolating {(zj , fj)}n−1

j=0 and one isthen given a new data point (zn, fn) . One cannot use pn−1 ∈ Pn−1 to find pn ∈ Pn. Onehas to compute new Lagrange basis functions ∈ Pn.

We now look for an alternative construction. If pn−1 ∈ Pn−1 such that pn−1 (z) =fj , j = 0→ n− 1, now find pn ∈ Pn such that pn (zj) = fj , j = 0→ n. Let

pn (z) = pn−1 (z) + Cn−1∏k=0

(z − zk)︸ ︷︷ ︸∈Pn vanishes at zj , ,j=0→n−1

⇒ pn (zj) = pn−1 (zj) = fj , j = 0→ n− 1.

Then choose C ∈ C such that

pn (zn) = pn−1 (zn) + Cn−1∏k=0

(zn − zk) = fn,

{zj}nj=0 distinct ⇒ C =fn − pn−1 (zn)∏n−1k=0 (zn − zk)

,

68

Page 69: M2AA3 - Notes

∴ C depends on all data points {(zj , fj)}nj=0.

Classical Notation C = f [z0, z1, . . . , zn] ,

This is called a divided difference of order n (depends on (n+ 1) points).

∴ pn (z) = pn−1 (z) + f [z0, z1, . . . , zn]n−1∏k=0

(z − zk) ,

so the coefficient of zn in pn (z) is f [z0, z1, . . . , zn].

Note that pn is unique and pn (zj) = fj , j = 0→ n,

⇒ f [zπ0 , zπ1 , . . . zπn ] = f [z0, z1, . . . , zn] ,

for any permutation π of the points {z0, z1, . . . , zn}.

Lemma

If {(zj , fj)}nj=0, zj , fj ∈ C, zj distinct, then

f [z0, z1, z2, . . . , zn] =n∑j=0

fj∏nk=0k 6=j

(zj − zk).

Furthermore, if fj = f (zj) , j = 0→ n for some function f (z), then f [z0, z1, . . . zn] = 0if f ∈ Pn−1.

Proof

Compare coefficient of zn in the Lagrange form of pn (z) with

pn (z) = pn−1 (z) + f [z0, z1, . . . , zn]n−1∏k=0

(z − zk) . (22)

Coefficient of zn in (22) is f [z0, z1, . . . , zn].

Recall, Lagrange form

pn (z) =n∑j=0

fjlj (z) =n∑j=0

fj

n∏k=0k 6=j

(z − zk)(zj − zk)

, (23)

⇒ coefficient of zn in (23),n∑j=0

fj

n∏k=0k 6=j

1(zj − zk)

,

69

Page 70: M2AA3 - Notes

hence the result.

If fj = f (zj), j = 0 → n, when f ∈ Pn−1, then the uniqueness of the interpolatingpolynomial,

⇒ pn (z) = f (z) ∈ Pn−1.

see diagram 20081204.M2AA3.1

The coefficient of zn in pn (z) is f [z0, z1, . . . , zn]. But pn ∈ Pn−1 in this case,

⇒ f [z0, z1, . . . , zn] = 0.

Note that,

pn (z)↑

interpolates{(zj ,fj)}nj=0

= pn−1 (z)↑

interpolates{(zj ,fj)}n−1

j=0

+ f [z0, z1, . . . , zn]n−1∏k=0

(z − zk) ,

pn−1 (z) = pn−2 (z)↑

{(zj ,fj)}n−2j=0

+ f [z0, z1, . . . , zn−1]n−2∏k=0

(z − zk) ,

...

p1 (z)↑

{(zj ,fj)}1j=0

= p0 (z)↑

(z0,f0)

f [z0]=f0

+ f [z0, z1] (z − z0) ,

∴ pn (z) = f [z0]qf0

+n∑j=1

f [z0, . . . , zj ]j−1∏k=0

(z − zk) .

This is the Newton Form of the Interpolating Polynomial.

Note that,f [z0, z1, . . . , zj ] is the coefficient of zj in pj (z) ,

where pj ∈ Pj and pj (zk) = fk, k = 0→ j.

Theorem

For any distinct z0, z1, . . . , zn+1 ∈ C, the divided difference based on all the points,

f [z0, z1, . . . , zn+1]←n+2 points→

=f←n+1 points→[z0, z1, . . . , zn]− f

←n+1 points→[z1, z2, . . . , zn+1]

z0 − zn+1

70

Page 71: M2AA3 - Notes

Proof

Given {(zj , fj)}n+1j=0 , we construct pn, qn ∈ Pn such that,

pn (zj) = fj , j = 0→ n⇒ coefficient of zn in pn (z) is f [z0, z1, . . . , zn] ,qn (zj) = fj , j = 1→ n+ 1⇒ coefficient of zn in qn (z) is f [z1, z2, . . . , zn+1]

Letrn+1 (z) =

(z − zn+1) pn (z)− (z − z0) qn (z)z0 − zn+1

∈ Pn+1, (24)

rn+1 (z0) = pn (z0) = f0

rn+1 (zj) =(zj − zn+1) fj − (zj − z0) fj

z0 − zn+1, j = 1→ n,

= fj

rn+1 (zn+1) = qn (zn+1) = fn+1,

∴ rn+1 (z) ∈ Pn+1 is such that,

rn+1 (zj) = fj , j = 0→ n.

Compare the coefficient of zn+1 in (24),

⇒ f [z0, z1, . . . , zn+1]←n+2→

=f

←n+1→[z0, z1, . . . , zn]− f

←n+1→[z1, . . . , zn+1]

z0 − zn+1,

hence result. This is the divided difference recurrence relation.

Divided Difference Tableau

z0 f [z0] = f0

z1 f [z1] = f1↖← f [z0, z1] =

f [z0]− f [z1]z0 − z1

z2 f [z2]...

= f2↖← f [z1, z2] =

f [z1]− f [z2]z1 − z2

↖← f [z0, z1, z2]...

=f [z0, z1]− f [z1, z2]

z0 − z2

zn f [zn] = fn↖← f [zn−1, zn] =

f [zn−1]− f [zn]zn−1 − zn

↖← f [zn−2, zn−1, zn] etc.

Diagonal entries in this tableau appear in the Newton form of pn (z).

Example

n = 2

z0 = 0, z1 = 1, z2 = 4f0 = 0, f1 = b, f2 = c

71

Page 72: M2AA3 - Notes

z0 = 0 f [z0] = a

z1 = 1 f [z1] = b↖← f [z0, z1] =

a− b−1

= b− a

z2 = 0 f [z2] = c↖← f [z1, z2] =

b− c−3

=c− b

3↖← f [z0, z1, z2] =

(b− a)−(c−b3

)−4

=(a

4− b

3+

c

12

)so

p2 (z) = f [z0] + f [z0, z1] (z − z0) + f [z0, z1, z2] (z − z0) (z − z1)

= a+ (b− a) z +(a

4− b

3+

c

12

)z (z − 1) .

We may be interested in approximating a function f (z) that is complicated to evaluate, by apolynomial pn (z) ∈ Pn. Evaluate f (z) at {zj}nj=0, distinct points, and form the interpolatingpolynomial pn (z), pn (zj) = f (zj) j = 0→ n. Then approximate f (z), by pn (z)

see diagram 20081209.M2AA3.1

Theorem

Let pn (z) interpolate f (z) at n+ 1 distinct points {zj}nj=0, zj ∈ C, then the error: e (z) =f (z)− pn (z) is such that

e (z) = f [z0, z1, . . . , zn, z]n∏k=0

(z − zk) z 6= zjj=0→n

,

(Note that e (zj) = 0, j = 0→ n).

Proof

pn (z) interpolates f (z) at {zj}nj=0. We now add a new point which is different from thepoints we already have, zn+1 6= zj , j = 0→ n.. This implies that the new polynomial is

⇒ pn+1 (z) ∈ Pn+1, interpolates f (z) at {zj}n+1j=0 .

Newton form of pn+1 (z) is

pn+1 (z) = pn (z) + f [z0, z1, . . . , zn, zn+1]n∏k=0

(z − zk)

⇒ f (zn+1) = pn+1 (zn+1) + f [z0, z1, . . . , zn, zn+1]n∏k=0

(zn+1 − zk)

⇒ e (zn+1) = f [z0, z1, . . . zn, zn+1]n∏k=0

(zn+1 − zk) ,

72

Page 73: M2AA3 - Notes

but zn+1 is any point zn+1 6= zj j = 0→ n

zn+1 = z 6= zj j = 0→ n

⇒ e (z) = f [z0, z1, . . . , zn, z]n∏k=0

(z − zk)√

For the above result to be useful, we need to bound

f [z0, z1, . . . , zn, z] .

We restrict ourselves from now on to the real case,

zj = xj ∈ R, j = 0→ n, distinct

f (z) = f (x) a real function

f [xj ] = f (xj) j = 0→ n

zero order divided difference based on one point.

First order divided difference, is based on 2 points

e.g.

f [x0, x1] =f [x0]− f [x1]

x0 − x1

=f (x0)− f (x1)

x0 − x1.

Mean Value Theorem

f (x1) = f (x0) + (x1 − x0)distance moved

f ′ (ξ) where ξ lies between x0 and x1,

this assumes that f ∈ C1 [x0, x1] x0 < x2 (C1 [x1, x0] if x1 < x0)

∴ f [x0, x1] = f ′ (ξ)

1st order divided difference.

Recall

e (z) = f (z)− pn (z) = f [z0, z1, . . . , zn, z]n∏k=0

(z − zk) z 6= zj j = 0→ n,

(e (zj) = 0 j = 0→ n)

73

Page 74: M2AA3 - Notes

Theorem

Let f ∈ Cn [x0, xn], i.e. f and its first n derivaties are continuous on [x0, xn] , wherefor ease of exposition we have assumed that the real interpolations points are ordered,x0 < x1 < · · · < xn.

Then ∃ξ ∈ [x0, xn] such that

f [x0, x1, . . . , xn] =1n!f (n) (ξ) ,

n+ 1 points nth order divided difference.

Proof

Let pn ∈ Pn interpolate f (x) at xi with i = 0→ n, let

e (x) = f (x)− pn (x)⇒ e (xi) = 0 i = 0→ n,

∴ e (x) has at least (n+ 1) zeros in [x0, xn].

see diagram 20081210.M2AA3.1

Rolle’s Theorem

⇒ e′ (x) has at least n zeroes in [x0, xn],⇒ e′′ (x) has at least (n− 1) zeroes in [x0, xn],...⇒ e(n) (x) has at least 1 zero in [x0, xn]

Let ξ ∈ [x0, xn] be such thate(n) (ξ) = 0

e(n) (x) = f (n) (x)− p(n)n (x) ,

Recall Newton form of pn (x)

pn (x) = f [x0, x1, . . . , xn]xn + . . .

⇒ p(n)n (x) = n!f [x0, x1, . . . , xn] ∈ R,

∴ f (n) (ξ) = p(n)n (ξ) = n!f [x0, x1, . . . , xn] ,

hence result.

We now combine the above theorems.

74

Page 75: M2AA3 - Notes

Theorem

Let f ∈ Cn+1 [a, b]. Let {xi}ni=0 be our interpolation points which are distinct over theinterval [a, b]. If pn ∈ Pn interpolates f at {xi}ni=0, then the error e (x) = f (x) − pn (x)satisfies

|e (x)| ≤ 1(n+ 1)!

∣∣∣∣∣n∏i=0

(x− xi)

∣∣∣∣∣ maxa≤y≤b

∣∣∣f (n+1) (y)∣∣∣ ∀x ∈ [a, b] .

Proof

The result is clearly true for the interpolation points x = xi, i = 0 → n, as e (xi) = 0 ⇒0 ≤ 0

√since the product of factors

∏ni=0 (x− xi) also = 0.

1st Theorem ⇒

e (x) = f [x0, x1, . . . , xn, x]n∏k=0

(x− xk) x 6= xi, i = 0→ n.

2nd Theorem ⇒

e (x) =f (n+1) (ξ)(n+ 1)!

n∏k=0

(x− xk) for some ξ ∈ [a, b] ,

|e (x)| =1

(n+ 1)!

∣∣∣∣∣n∏k=0

(x− xk)

∣∣∣∣∣ ∣∣∣f (n+1) (ξ)∣∣∣

≤ 1(n+ 1)!

∣∣∣∣∣n∏k=0

(x− xk)

∣∣∣∣∣ maxa≤y≤b

∣∣∣f (n+1) (y)∣∣∣√

Let ‖g‖∞ = maxa≤x≤b

|g (x)|, (‖.‖∞ = infinity norm)

∴ ‖e‖∞ ≤1

(n+ 1)!

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

∥∥∥f (n+1)∥∥∥∞.

Does ‖e‖∞ → 0 as n→∞, assuming f ∈ C∞ [a, b]?

Ex. 1

[a, b] =[−1

2,12

], f (x) = ex,

we know that

x, xi ∈[−1

2,12

]⇒ |x− xi| ≤ 1,

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

∥∥∥∥∥n∏i=0

|x− xi|

∥∥∥∥∥∞

≤ 1 ∀n,

75

Page 76: M2AA3 - Notes

∥∥∥f (n+1)∥∥∥∞

= ‖ex‖∞ = e12 ,

⇒ ‖e‖∞ ≤1

(n+ 1)!e

12 → 0 as n→∞.

Ex 2.

general [a, b], f (x) = cosx,⇒∥∥∥f (n+1)

∥∥∥∞≤ 1,

x, xi ∈ [a, b]⇒ |x− xi| ≤ b− a,

⇒ ‖e‖∞ ≤1

(n+ 1)!(b− a)n+1 → 0 as n→∞.

Ex 3.

f (x) = (1 + x)−1 on [0, 1]f ′ (x) = − (1 + x)−2

⇓f (n+1) (x) = (−1)n+1 (n+ 1)! (1 + x)−(n+2) ,

‖e‖∞ → 0 as n→∞?

‖f − pn‖9 0 as n→∞,

see Sheet5, Q12.

Can we choose the interpolation points {xi}ni=0 in a smart way?

Fix [a, b], fix n, and we are given f .Choose distinct interpolation points {xi}ni=0 ∈ [a, b] such that we minimize the product offactors

min{xi}ni=0

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

, (25)

⇒minqn∈Pn

∥∥xn+1 − qn (x)∥∥∞ , (26)

solve (26) i.e. find q∗n ∈ Pn such that∥∥xn+1 − q∗n (x)∥∥∞ ≤

∥∥xn+1 − qn (x)∥∥∞ ∀qn ∈ Pn.

If xn+1 − q∗n (x) has n+ 1 distinct zeroes {xi}ni=0 in [a, b], the we have solved (25).

min{xi}ni=0

∥∥∥∥∥n∏i=0

(x− xi)

∥∥∥∥∥∞

⇒ minqn∈Pn

∥∥xn+1 − qn (x)∥∥∞

76

Page 77: M2AA3 - Notes

5 Best Approximation in ‖.‖∞(Best approximation in the uniform sense or “Minimax” approximation)Given g ∈ C [a, b], find q∗n ∈ Pn such that

‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn ⇐⇒ ‖g − q∗n‖∞ = minq∈Pn

{maxa≤x≤b

|g (x)− qn (x)|}

Theorem

Let g ∈ C [a, b] and n ≥ 0.

Suppose ∃q∗n ∈ Pn and (n+ 2) distinct points{x∗j

}n+1

j=0, where a ≤ x∗0 < x∗1 · · · < x∗n <

x∗n+1 ≤ b, such that

g(x∗j)− q∗n

(x∗j)

= (−1)j σ ‖g − q∗n‖∞ j = 0→ n+ 1, (27)

where σ = +1 or −1.Then q∗n ∈ Pn is the Best Approximation to g from Pn in ‖.‖∞ i.e.

‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn.

Examplen = 3, σ = +1 and E = ‖g − q∗n‖∞

see diagram 20081211.M2AA3.1

5 alternating extremes.

Proof

Let E = ‖g − q∗n‖∞, if E = 0 then q∗n = g is the best approximation. Assume E > 0 andsuppose ∃qn ∈ Pn d oing better than q∗n, i.e.

‖g − qn‖∞ < ‖g − q∗n‖∞ = E.

Consider q∗n − qn ∈ Pn at the n+ 2 points{x∗j

}n+1

j=0

q∗n(x∗j)− qn

(x∗j)

=[q∗n(x∗j)− g

(x∗j)]

+[g(x∗j)− qn

(x∗j)]

= (−1)j+1 σE + γj and |γj | < E,

∴ sign ((q∗n)− qn)(x∗j)

= sign((−1)j+1 σE

)j = 0→ n+ 1,

77

Page 78: M2AA3 - Notes

∴ q∗n− qn ∈ Pn changes sign at least n+1 times⇒ q∗n− qn ∈ Pn has (n+ 1) distinct zeroes

FTA⇒ q∗n − qn ≡ 0 ⇒ qn = q∗n

⇒ contradiction to ‖g − qn‖∞ < ‖g − q∗n‖∞

∴ q∗n ∈ Pn is the best approximation.A polynomial satisfying the condition (27) in the above theorem is said to have the Equioscil-

lation Property (or the error g − q∗n is said to have the equioscilation property, note that q∗nmay degenerate and have degree < n - see Sheet5, Q10).The above theorem is one half of the Chebyshev Equioscillation Theorem.

Let g ∈ C [a, b] and n ≥ 0. Then ∃ a unique q∗n ∈ Pn such that

‖g − q∗n‖∞ ≤ ‖g − qn‖∞ ∀qn ∈ Pn,

and hence satisfies (27).

Proof

Omitted (straightforward apparently...)

Construction of q∗n is difficult in general. Hence why we study best least squares andinterpolation. However for g (x) = xn+1 it is easy to construct q∗n.

Theorem

Let [a, b] ≡ [−1, 1]. Consider g (x) = xn+1. Then the best approximation to xn+1 by Pn in‖.‖∞ on [−1, 1] is

q∗n (x) = xn+1 − 2−nTn+1 (x) ,

where Tn+1 (x) is the Chebyshev polynomial of degree n+ 1.

Proof

Recall Tn (x) = cos(n cos−1 x

)with n ≥ 0, remember the change of variable

θ = cos−1 x⇔ x = cos θ

[−1, 1]⇔ [0, π]

Tn (x) = cosnθ

⇒ Tn+1 (x) = 2xTn (x)− Tn−1 (x) n ≥ 1,

T0 (x) = 1, T1 (x) = x

⇒ Tn+1 (x) = 2nxn+1 + . . .

∴ q∗n (x) = xn+1 − 2−nTn+1 (x) ∈ Pn.

78

Page 79: M2AA3 - Notes

∴ the error

xn+1 − q∗n (x) = 2−nTn+1 (x)= 2−n cos (n+ 1) θ.

79