FA09-5

Introduction to Functional Analysis

Prof. Yoav Benyamini

Winter 2009

These are rough notes. Please inform me of any mathematical errors,typos or unclear explanations -there are certainly many of them.

1

Contents

1. Introduction 22. Normed spaces and preliminaries 32.1. Normed spaces 32.2. Finite dimensional spaces 42.3. Separability 62.4. Completeness 82.5. Integration 93. Hilbert spaces 123.1. Inner product spaces 123.2. Convex sets and the orthogonal projection 133.3. Orthonormal systems 153.4. Linear functionals 183.5. Weak convergence 204. Fourier series 254.1. Basic definitions 254.2. Fejer’s theorem 284.3. Norm convergence 294.4. Pointwise convergence 314.5. Additional results 345. Linear operators 355.1. Basic notions 355.2. Invertible operators 375.3. Compact operators 405.4. Self-adjoint operators on Hilbert space 426. Spectral theory of self-adjoint compact operators 456.1. The spectrum of self adjoint operators 456.2. The spectral theorem 476.3. The min-max formula 496.4. Application to differential equations 507. The Fourier transform 537.1. Basic properties 537.2. The Fourier transform on L2 557.3. The Hermit functions 567.4. Plancherel’s theorem 577.5. The Radon transform 58

2

1. Introduction

Functional analysis studies metric vector spaces (mainly infinite di-mensional) over R or C and continuous (linear or non-linear) mappingsbetween them. These spaces are the natural framework to study manyproblems, and functional analysis provides general tools that can beused to study these problems.

Here are two motivating examples.In linear algebra we studied systems of linear equations and devel-

oped methods to solve them and to give information about the solv-ability, the number of solutions etc. This was done in the frameworkof finite dimensional spaces and linear operators between them.

Similarly, we can view the study of a linear differential equation,for example, a(x)y′′ + b(x)y = f as the study of the linear operatorDy = a(x)y′′ + b(x)y which acts on the (infinite dimensional) space oftwice differentiable functions.

The appropriate spaces as domain and target of the operator willdepend on the differential operator, but general facts on spaces andoperators will be useful for the study of specific types of differentialoperators.

In Infi we studied maximum and minimum problems, starting withexistence theorems (Weierstrass) and continuing with methods of solu-tion for real valued functions of one or several variables. But one mayalso be interested in optimization problems of real valued functionswhen the variable is a function, for example, consider the “energy”function F (f) =

∫Ω|∇f |2 and the problem of minimizing F (f) over a

given set of functions f . Again, the general framework is a suitableinfinite dimensional space and a (continuous or differentiable) functionF defined on a subset of this space.

The course will cover five main topics1. Basic notions such as normed spaces, and a brief exposition of

elements of measure theory and integration .2. Hilbert spaces.3. Fourier series.4. Linear operators and basic spectral theory.5. Fourier transform.

3

2. Normed spaces and preliminaries

2.1. Normed spaces.

Definition 2.1. Let E be a vector space over R or C. A norm on Eis a real valued function on E, denoted by ‖ · ‖, satisfying:

(i) Positivity: ‖x‖ ≥ 0 for every x ∈ E and ‖x‖ = 0 iff x = 0.

(ii) Homogeneity: ‖λx‖ = |λ| ‖x‖ for every scalar λ.

(iii) Triangle inequality: ‖x + y‖ ≤ ‖x‖+ ‖y‖ for every x, y ∈ E.

Note that a norm induces a metric on E: the distance between xand y is ‖x − y‖. Using this metric we can talk on balls, open andclosed sets, convergence of sequences, Cauchy sequences, separability,compactness, etc.

Definition 2.2. A complete normed space is called a Banach space.

Examples 2.3. (i) Denote by l∞ the vector space of bounded scalarsequences. The norm of x = (x1, x2, . . .) is defined by ‖x‖∞ = sup |xn|.

More generally, if Γ is any set l∞(Γ) is the space of bounded functionsf : Γ → R (or C) with the norm ‖f‖∞ = sup

γ∈Γ|f(γ)|. With this notation

l∞ is just l∞(N). Convergence in l∞(Γ) is uniform convergence on Γ.

(ii) By c0 we denote the subspace of l∞ consisting of the sequences thatconverge to 0. One checks easily that c0 is a closed subspace of l∞.

(iii) Denote by l1 the vector space of scalar sequences x = (x1, x2, . . .)satisfying

∑ |xn| < ∞. The norm is defined by ‖x‖1 =∑ |xn|.

(iv) Denote by C(a, b) the vector space of the continuous scalar-valuedfunctions on the closed interval [a, b]. The norm is defined by ‖f‖∞ =max |f(t)|. More generally, if K is a compact set in Rn (or a gen-eral compact metric space), C(K) denotes the space of scalar-valuedcontinuous functions on K with the norm ‖f‖∞ = max

k∈K|f(k)|.

Again, convergence in norm is just uniform convergence.

(v) For 1 < p < ∞ denote by lp the vector space of all scalar sequencesx = (x1, x2, . . .) satisfying

∑ |xn|p < ∞. The norm is defined by

‖x‖p = (∑ |xn|p)1/p.

We need to check that this set is a vector space (i.e., closed underaddition) and that the formula gives a norm, and the only nontriv-ial part is to prove the triangle inequality. For this we shall need touse an important inequality (which generalizes the Cauchy-Schwartzinequality). If 1 ≤ p ≤ ∞, its conjugate index is q = p

p−1. Note that

1p

+ 1q

= 1.

4

Holder’s inequality:∑ |xnyn| ≤ ‖x‖p‖y‖q for any two sequences x =

(x1, x2, . . .) ∈ lp and y = (y1, y2, . . .) ∈ lq.

Proof. Wlog xn and yn are non-negative. By homogeneity we mayassume that ‖x‖p = ‖y‖q = 1 and we need to show that

∑xnyn ≤ 1.

One checks easily that ab ≤ ap

p+ bq

qfor all a, b ≥ 0. (For fixed b

differentiate f(a) = ap

p+ bq

q− ab to see that the minimum is 0). Thus

xnyn ≤ xpn

p+ yq

n

q, and now sum over n. ¤

To deduce the triangle inequality assume (as we may) that the vec-tors are non-negative and use Holder’s inequality to obtain

‖x + y‖pp =

∑(xn + yn)p =

∑(xn + yn)p−1xn +

∑(xn + yn)p−1yn

≤(∑

(xn + yn)(p−1)q)1/q (∑

|xn|p)1/p

+(∑

(xn + yn)(p−1)q)1/q (∑

|yn|p)1/p

= ‖x + y‖p/qp ‖x‖p + ‖x + y‖p/q

p ‖y‖p

because (p−1)q = p, hence∑

(xn+yn)(p−1)q =∑

(xn+yn)p = ‖x+y‖pp.

Dividing by ‖x + y‖p/qp and using p− p

q= 1 gives the result.

The space of finite sequences of length n under the lp norm is denotedby lnp (and similarly ln∞).

2.2. Finite dimensional spaces.

We denote the closed ball of radius R with center x by B(x, R). Theclosed ball with center at the origin and radius 1 is called the unit ballof E and is denoted by B(E) or BE. For example, the unit ball of ln2is actually a ball, and the unit ball of ln∞ is an n-dimensional cube.

Definition 2.4. Two norms ‖·‖ and |||·||| on the same space E are calledequivalent if there are constants c, C > 0 s.t. c‖x‖ ≤ |||x||| ≤ C‖x‖ forevery x ∈ E.

Geometrically this means that cB|||·||| ⊂ B‖·‖ ⊂ C B|||·|||. We can thus“see” from the picture that the l22 and l2∞ norms are equivalent andthat ‖x‖∞ ≤ ‖x‖2 ≤

√2‖x‖∞. This is a special case of the following

simple but important theorem.

Theorem 2.5. All norms on Rn are equivalent to each other.

Proof. Let ej be the standard unit vector basis of Rn. Fix a norm onRn and let C = max ‖ej‖. If x =

∑ajej then by the triangle inequality

‖x‖ ≤ ∑ |aj|C = C‖x‖1. In particular it follows that ‖·‖ is continuous

5

on the compact set S1 = ‖x‖1 = 1 and attains its minimum on S1 –and we denote it by c > 0. Thus ‖x‖ ≥ c‖x‖1 for every x ∈ S1, and byhomogeneity this holds for every x.

It follows that every norm is equivalent to the ln1 norm, hence allnorms are equivalent to each other. ¤

The unit ball is a closed, bounded, symmetric convex set with non-empty interior.

In finite dimensional spaces the notion of “non-empty interior” doesnot depend (by the theorem) on the norm, and the converse is alsotrue. Indeed, if K is such a set in Rn, it defines a norm by the formula‖x‖ = infλ > 0 : x

λ∈ K. (The triangle inequality follows from the

convexity of K.)In infinite dimensional spaces, one needs to ensure that ‖x‖ = 0 iff

x = 0, hence to also assume that infλ > 0 : xλ∈ K > 0 whenever

x 6= 0.Thus, at least in finite dimensional spaces, the theory of normed

spaces is “equivalent” to theory of closed, bounded, symmetric convexsets with non-empty interior.

The next lemma is true in every normed space.

Lemma 2.6 (Riesz’ lemma). If F is a closed proper subspace of E,then for every ε > 0 there is x ∈ E with ‖x‖ = 1 and d(x, F ) > 1− ε.

Proof. Fix z ∈ E \ F and choose y ∈ F with (1− ε)‖z − y‖ < d(z, F ).Take x = z−y

‖z−y‖ . Then d(x, F ) = d(z, F )/‖z − y‖ > 1− ε. ¤

*********end of 2 hours*********The unit ball of a finite dimensional space is a compact set. It turns

out that the converse is also true.

Theorem 2.7 (Riesz’ Theorem). A normed space E is finite dimen-sional iff B(E) is compact.

Proof. Assume E is infinite dimensional, and we shall find an infinitesequence xn ∈ B(E) s.t. ‖xn − xm‖ > 1

2for all n 6= m. It follows that

xn does not have a convergent subsequence and B(E) is not compact.Choose any x1 with ‖x1‖ = 1. We proceed inductively. Assume

we have found x1, . . . , xn and let F = spanx1, . . . , xn. Being finitedimensional, F is a proper subspace of E, hence by Riesz’ lemma withε = 1

2there is a xn+1 with ‖xn+1‖ = 1 and d(xn+1, F ) > 1

2. In particular

‖xn+1 − xj‖ > 12

for every j ≤ n. ¤

6

2.3. Separability.

We shall use the simple fact that a space E is separable when there isa countable set D that spans a dense subset of E. Indeed, a countabledense set in E is then obtained by taking all finite linear combinationswith rational coefficients of elements of D.

Finite dimensional normed spaces are, of course separable. Let uscheck the other examples.

c0 and lp for 1 ≤ p < ∞ are separable. The unit vectors en span a

dense subset.

l∞ is not separable. For each subset A ⊂ N let xA =

1 n ∈ A

0 n 6∈ A.

There are uncountably many subsets of N, and ‖xA − xB‖ = 1 forevery A 6= B, hence l∞ is non-separable.

C(a, b) is separable. This follows from

Theorem 2.8 (Weierstrass’ theorem). Every continuous function on[a, b] is the uniform limit of a sequence of polynomials.

It follows that the monomials xn span a dense subset of C(a, b).

C(K) for a general compact metric space is separable. This followsfrom the Stone-Weierstrass theorem, an important generalization ofWeierstrass’ theorem. This theorem holds for real valued functionsonly, but the separability for complex C(K) follows easily from thereal case.

A subspace A ⊂ C(K) is called a lattice if it is closed under thelattice operations max and min. (This is equivalent to requiring it isclosed under absolute value, because max(f, g) = (f + g)/2 + |f − g|/2with a similar formula for the min.) The subspace A is called an algebraif it is closed under multiplication.

Theorem 2.9 (Stone-Weierstrass). If A ⊂ C(K) is a lattice or aclosed subalgebra of real C(K) containing 1 and separating points, thenA = C(K).

Proof. Once we prove the theorem for lattices the algebra case will also follow becauseevery closed subalgebra of C(K) containing 1 is a lattice. Indeed, let A be a subalgebra,fix f ∈ A and assume, as we may, that f takes its values in [−1, 1]. By Weierstrass’theorem there is a sequence of polynomials pn so that pn(t) → |t| uniformly on [−1, 1],hence pn(f(k)) → |f(k)| uniformly on K. But pn(f) ∈ A because A is a closed algebracontaining 1, hence also |f | ∈ A.

Thus assume A is a lattice and fix F ∈ C(K). Since A is closed it is enough to showthat for every ε > 0 there is a f ∈ A with ‖f − F‖ < ε. We shall use the following

Claim. For every k ∈ K there is a fk ∈ A with fk(k) = F (k) and fk ≤ F + ε.

7

Proof of Claim. Given x ∈ K there is a function gx ∈ A with gx(k) = F (k) and gx(x) =F (x). This is trivial for x = k, and when x 6= k find h ∈ A with h(x) 6= h(k) and putgx = a + bh ∈ A for suitable scalars a, b. Then gx ∈ A because A is a linear subspace and1 ∈ A.

Since F, gx are continuous there is an open neighborhood Ux of x s.t. gx < F + ε inUx.

The Ux’s are an open cover of K, so choose a finite subcover U i (with associatedfunctions gi), and put fk = min(gi) ∈ A (because A is a lattice). Given x ∈ K, choose iso that x ∈ U i, and then fk(x) ≤ gi(x) ≤ F (x) + ε. 2

A similar argument now finishes the proof:For each k ∈ K let fk be as in the claim, and choose an open neighborhood Vk of k

such that fk ≥ F − ε on Vk. Then choose a finite subcover V i with associated functionsf i and put f = max(f i). By the definition of the f i’s

f(x) = max f i(x) ≤ F (x) + ε

for every x, and if x ∈ V i, then f(x) ≥ f i(x) ≥ F (x)− ε. ¤

Remark. The theorem is false in the complex case. For example, whenD is the closed unit disk in the complex plane and A is the subalgebraof C(D) consisting of functions analytic in the interior.

The theorem holds in the complex case if we also assume that A isself-adjoint, i.e., f ∈ A for every f ∈ A. This follows directly from thereal case.

Examples 2.10. (i) If K ⊂ Rn is compact, then the polynomials aredense in C(K).

(ii) If K is a compact metric space, then K is separable, so let kmbe a countable dense set in K and for each m,n define

fm,n(k) =

1n− d(k, km) d(k, km) ≤ 1

n

0 d(k, km) > 1n

.

The fn,m’s clearly separate points, and the subspace A generated bythe countable set of all products of the fm,n’s and 1 is an algebra thatcontains 1 and separates points, hence, by Stone-Weierstrass it is densein C(K).

(iii) If K = K1× . . .×Kn, then the linear combinations of functions ofthe form f1(x1) · . . . ·fn(xn) are dense in C(K). Indeed, these functionsspan an algebra that satisfies the conditions of the Stone-Weierstrasstheorem.

*********end of 3 hours*********

8

2.4. Completeness.

Every metric space K has a “completion” K, a complete metric spacecontaining K as a dense subset. K is determined uniquely: if L isanother complete metric space containing K as a dense subset thenthere is an isometry f of K onto L which is the identity on K.

The standard construction of K is as the space of all equivalentclasses of Cauchy sequences with the natural metric. When this con-struction is applied to a normed space E, the resulting completion Eis a Banach space.

All the examples we gave of normed spaces are actually Banachspaces. We first show that this is so for lp. Similar proofs work for allthe other sequence spaces. (Exercise: prove this for the other spaces!)

Note that a Cauchy sequence is bounded. In fact the scalar sequence‖xn‖ converges because

∣∣‖xn‖ − ‖xm‖∣∣ ≤ ‖xn − xm‖ → 0.

Theorem 2.11. lp is a Banach space.

Proof. Assume that xn = (x1n, . . . , xj

n, . . . , ) is a Cauchy sequence in lp.The proof that it converges proceeds in three typical steps:

Step 1: The limit yj = limn→∞

xjn exists for each fixed j. Indeed, since

|xjn − xj

m| ≤ ‖xn − xm‖p the sequence xjnn is a Cauchy sequence of

scalars, hence convergent. Put y = (y1, y2, . . .).

Step 2: y is in lp and ‖y‖p ≤ lim ‖xn‖p.Indeed, Fix N and estimate∑

j≤N

|yj|p = limn→∞

∑j≤N

|xjn|p ≤ lim ‖xn‖p

p.

This holds for every N , hence y is in lp and ‖y‖p ≤ lim ‖xn‖p.

Step 3: A similar argument as in Step 2 shows that for each fixed M

‖xM − y‖p ≤ lim ‖xM − xn‖p.

But xn is a Cauchy sequence, hence for every ε > 0 there is a N s.t.‖xM − xn‖p < ε whenever M,n > N . In particular, if M > N then

‖xM − y‖p ≤ lim ‖xM − xn‖p ≤ ε.

¤Theorem 2.12. C(K) is a Banach space.

Proof. By the same argument as in the previous theorem l∞(Γ) is com-plete for every set Γ. Take Γ = K, and note that C(K) is a subspace ofl∞(Γ) - and it is closed: convergence in l∞(Γ) is uniform convergence,and the uniform limit of continuous functions is continuous.

Finally, a closed subset of a complete metric space is complete. ¤

9

2.5. Integration.

In this section we describe briefly the basic notions of Lebesgue measureand integration. We do not give any details, and just say enough to beable to apply these notions when necessary. We write the definitionsfor R, but the results for Rn are analogous, replacing intervals by cubes.

Definition 2.13. We say that subset A ⊂ R has measure 0 (and write|A| = 0) if for every ε > 0 there is a cover A ⊂ ∪In by open intervalssatisfying

∑ |In| < ε.

Here are some examples and easy properties.

(i) If |A| = 0 and B ⊂ A then |B| = 0.

(ii) A point has measure zero, and if |En| = 0 for n = 1, 2, . . . then| ∪En| = 0. Thus, for example, |Q| = 0 and the same is true for everycountable set. (In Rn sets of lower dimension, such as lines or planesin R3, have measure zero.)

(iii) The Cantor set is an uncountable null set.

(iv) Intervals are not null. In fact, [a, b] ⊂ ∪In implies b− a ≤ ∑ |In|.(Hint: Show you can use open intervals, then apply Heine-Borel to passto a finite cover. Prove for finite covers.)

Definition 2.14. (i) If A ⊂ B are sets with |B \A| = 0 and a propertyholds for every point of A, we say that it holds almost everywhere inB (abbreviated a.e.).

(ii) We shall say that two functions are equal a.e. if |f 6= g| = 0.This defines an equivalence relation on functions.

For example: f ∼ g when they differ in a countable set, so D(x) = 1a.e.

(iii) We say that fn → f a.e. if |t : fn(t) 6→ f(t)| = 0.(This is, ofcourse, a weaker condition then pointwise convergence, but as far asintegrals will be concerned, it can be thought of as pointwise conver-gence.)

Definition 2.15. A measurable function is the a.e. limit of a sequenceof continuous functions with compact support.

The family of measurable functions is closed under taking sums,products, absolute value, a.e. limits, etc.

The Lebesgue Integral. We shall not define the Lebesgue Integral∫Ω

f and just say that it is an extension of the Riemann integral, de-fined for a larger family of measurable functions then just the Riemannintegrable functions. It has the following properties:

10

(i) If f = g a.e. and f is integrable, then so is g and∫

f =∫

g.

(Example:∫ 1

0D(x) = 1!)

(ii) If f is integrable, so is |f |.(iii) If f ≥ 0 and

∫f = 0 then f = 0 a.e.

(iv) If f is integrable, then there are continuous functions fn withbounded support which converge a.e. to f so that

∫ |fn| ≤∫ |f | for all

n and so that∫

fn →∫

f .

(v) Under suitable conditions double integrals are iterated integrals:

Theorem 2.16 (Fubini). If f is Lebesgue integrable in the plane then∫ ∫

R2

f(x, y)dxdy =

∫

R

(∫

Rf(x, y)dx

)dy.

Moreover, if f is also non-negative the integrals are equal also whenthey are infinite.

(vi) The Lebesgue integral behaves very nicely with respect to a.e.limits. The following theorem summarizes the main results in thisdirection.

Theorem 2.17. (i) (Fatou’s lemma): If fn are Lebesgue integrable andfn → f a.e. then f is also Lebesgue integrable and

∫ |f | ≤ lim inf∫ |fn|.

(ii) (Lebesgue’s dominated convergence theorem): If fn are Lebesgueintegrable and fn → f a.e., and if there is an integrable function gwith g ≥ |fn| a.e. for all n, then f is also Lebesgue integrable and∫

fn →∫

f .

(iii) (Lebesgue’s monotone convergence theorem): If fn ≥ 0 is a mono-tone increasing sequence of integrable functions such that fn → f a.e.,then

∫fn →

∫f .

*********end of 5 hours********* We finish this section with the introduction of an important family

of spaces, the Lp spaces for 1 ≤ p ≤ ∞.For 1 ≤ p < ∞ the space Lp(R) is the space of all equivalence classes

of measurable functions which are p-integrable, i.e.,∫ |f |p < ∞ with

the norm ‖f‖p = (∫ |f |p)1/p. (Note the abuse of notation: we use the

same symbol f for the equivalence class and for a function in this class.This will usually cause no difficulties – but sometimes we may need tobe more careful.)

The triangle inequality follows from Holder’s inequality for integrals:∫∞0|fg| ≤ ‖f‖p‖g‖q.

11

The spaces Lp(Ω) with Ω = R+, [a, b],Rn or “reasonable” subsetsΩ ⊂ Rn such as cubes are defined similarly.

We shall not give the proof that the Lp spaces are complete. Theproof uses Theorem 2.17 and the following theorem which we shall notprove.

Theorem 2.18. If fn is a Cauchy sequence in Lp then it has a subse-quence that converges almost everywhere.

For p = ∞ the space L∞(Ω) consists of all equivalent classes of essen-tially bounded functions, i.e. functions which are equivalent to boundedfunctions. The norm of f ∈ L∞(Ω) is the essential supremum of |f |,namely

‖f‖∞ = infg=f a.e.

supt∈Ω

|g(t)| = infA⊂Ω ;|A|=0

supt∈Ω\A

|f(t)|

and the space is complete.

Exercise. Show that L∞(Ω) is non-separable.It can be shown that for p < ∞ the set of continuous functions

with bounded support is dense in Lp. This implies the separability ofthese spaces. We show this for Lp(0, 1) (Exercise: Prove that Lp(R) isseparable).

Let D be a countable dense set in C(0, 1) (with respect to the ‖ · ‖∞norm!), and we shall show that it is also dense in Lp(0, 1) with respectto the ‖ · ‖p norm. To this end we use the estimate

‖h‖p ≤ ‖h‖∞which holds for every measurable bounded function h on [0, 1]. Indeed,

‖h‖p =

(∫ 1

0

|h|p)1/p

≤ esssup|h| = ‖h‖∞.

Now, given f ∈ Lp(0, 1), choose a sequence of continuous functionsgn with ‖gn−f‖p < 1

n. For each n choose fn ∈ D with ‖fn−gn‖∞ < 1

n.

Then also ‖fn − gn‖p < 1n, and

‖fn − f‖p ≤ ‖fn − gn‖p + ‖gn − f‖p ≤ 2

n→ 0

12

3. Hilbert spaces

3.1. Inner product spaces.

Definition 3.1. Let E be a vector space over R or C. An inner producton E is a scalar valued function 〈·, ·〉 on E × E satisfying

(i) Positivity: 〈x, x〉 ≥ 0 for every x ∈ E and 〈x, x〉 = 0 iff x = 0.

(ii) Symmetry: 〈y, x〉 = 〈x, y〉.(iii) Linearity: 〈λ1x1 + λ2x2, y〉 = λ1〈x1, y〉+ λ2〈x2, y〉.

The inner product induces a norm on E by ‖x‖ = 〈x, x〉 12 . To see

that this is a norm we shall use

Lemma 3.2 (Cauchy-Schwartz inequality). |〈x, y〉| ≤ ‖x‖ ‖y‖.Proof. Multiplying x by eiθ for an appropriate θ we may assume that〈x, y〉 is positive. Consider the non-negative polynomial of the realvariable t

p(t) = ‖x + ty‖2 = 〈x + ty, x + ty〉 = ‖x‖2 + 2t〈x, y〉+ t2‖y‖2.

Its discriminant must be non-positive, and this is the required in-equality. ¤Corollary 3.3. ‖ · ‖ is a norm.

Proof. The only thing to check is the triangle inequality, and indeed

‖x + y‖2 = 〈x + y, x + y〉 = ‖x‖2 + 2<〈x, y〉+ ‖y‖2

≤ ‖x‖2 + 2‖x‖‖y‖+ ‖y‖2 = (‖x‖+ ‖y‖)2

¤Definition 3.4. A Hilbert space is a complete inner product space.We shall usually denote Hilbert spaces by H.

Examples 3.5. Rn,Cn and l2 are Hilbert spaces under the inner prod-uct 〈x, y〉 =

∑xnyn for x = (x1, x2, . . .) and y = (y1, y2, . . .).

L2 is a Hilbert space under the inner product 〈f, g〉 =∫

fg.

*********end of 6 hours*********

Note that the inner product is continuous with respect to the norm.Indeed, if xn → x and yn → y then

〈xn, yn〉 − 〈x, y〉 = 〈xn − x, yn〉+ 〈x, yn − y〉 → 0

because by Cauchy-Schwartz and since yn is bounded

|〈xn − x, yn〉| ≤ ‖xn − x‖ ‖yn‖ → 0 ; |〈x, yn − y〉| ≤ ‖x‖ ‖yn − y‖.

13

Proposition 3.6 (Parallelogram law). A norm on E is induced by aninner product iff it satisfies the “Parallelogram law”

‖x + y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2.

Proof. In an inner product space expand the left hand side to obtain(‖x‖2 + ‖y‖2 + 2<〈x, y〉) +

(‖x‖2 + ‖y‖2 − 2<〈x, y〉)

which is the right hand side.Conversely, if the Parallelogram law holds put

〈x, y〉 = 14

(‖x + y‖2 − ‖x− y‖2).

in the real case and

〈x, y〉 = 14

(‖x + y‖2 − ‖x− y‖2 + i‖x + iy‖2 − i‖x− iy‖2).

in the complex case. These formulas define an inner product whichinduces the given norm. (We shall not give the proof.) ¤

The theorem gives a simple criterion to check whether a norm isinduced by an inner product. For example, if p 6= 2 then lp is not a

Hilbert space. Indeed, ‖e1 + e2‖2p + ‖e1 − e2‖2

p = 2 · 2 2p 6= 4. (Exercise:

Show that C(0, 1) is not a Hilbert space).

Definition 3.7. Two vectors x and y are called orthogonal if 〈x, y〉 = 0.We sometimes use the notation x ⊥ y.

A direct computation show that when x and y are orthogonal theysatisfy the “Pythagoras theorem” ‖x + y‖2 = ‖x‖2 + ‖y‖2.

3.2. Convex sets and the orthogonal projection.

When A is a compact subset of a metric space K, then for every x ∈K there is a point in y ∈ K nearest to x. Indeed, choose yn ∈ As.t. d(x, yn) → d(x,A) and take for y the limit of some convergentsubsequence. A very important fact is that a similar result holds forcertain subsets (closed and convex subsets) in Hilbert space even whenthey are not compact.

Definition 3.8. A subset A of a vector space is called convex if forevery x1, x2 ∈ A and λ1, λ2 ≥ 0 with λ1 + λ2 = 1 also λ1x + λ2x2 ∈ A.

Inductively, if A is convex then∑

λjxj ∈ A whenever xj is a finitesubset of A and λj ≥ 0 with

∑λj = 1.

Theorem 3.9. Let A be a closed and convex subset of a Hilbert spaceH. Then for every x ∈ H there is a unique nearest point to x in A.

14

Proof. : Put d = d(x,A) and choose yn ∈ A with ‖yn − x‖ → d(x,A).We show that yn is a Cauchy sequence. As H is complete the yn’s willthen converges to a nearest point y.

The parallelogram law with the two vectors yn− x and ym− x gives

2‖yn − x‖2 + 2‖ym − x‖2 = ‖yn − ym‖2 + ‖yn + ym − 2x‖2

= ‖yn − ym‖2 + 4‖(yn + ym)/2− x‖2

≥ ‖yn − ym‖2 + 4d2.

but the left hand side converges to 4d2, hence ‖yn − ym‖2 → 0.Note that the proof above shows that every minimizing sequence

converges, hence uniqueness follows from the uniqueness of the limit.¤

Remarks. (i) Existence is false in non-complete inner product spaces.For example, let F ⊂ l2 be the subspace of all sequences with finitelymany coordinates and let E be the non-complete space spanned by Fand x = (1, 1

2, . . . , 1

n, . . .).

Let A = y = (y1, . . .) ∈ F : |yj| ≤ 2−j. The nearest point in l2from x to the closure A in l2 is z = (1, 1

2, . . . , 2−j, . . .), but z 6∈ E.

*********end of 7 hours********* (ii) The theorem also gives uniqueness, which is false even in general

finite dimensional Banach spaces. For example in l2∞ all the upper edgeof the unit square is nearest to the point (0, 2).

(iii) In infinite dimensional Banach spaces even existence is not neces-sarily true. For example, the function h ≡ 0 has no nearest point in

the closed and convex subset A = f :∫ 1

2

0f − ∫ 1

12f = 1 of C(0, 1).

Indeed, d(0, A) = 1, and if f ∈ A would satisfy ‖f‖ = 1, thennecessarily f ≡ 1 on [0, 1

2] and f ≡ −1 on [1

2, 1] – but this is impossible.

Theorem 3.10. Let A be a closed and convex subset of H. Then z ∈ Ais the nearest point to x iff <〈x− z, y − z〉 ≤ 0 for every y ∈ A.

In particular, if M is a subspace of H then z ∈ M is nearest to x iffx− z is perpendicular to all points of M .

Proof. We may assume, by translating A and x by z, that z = 0, andwrite for any y ∈ A

‖x‖2 − ‖x− y‖2 = 2Re〈x, y〉 − ‖y‖2.

If the condition holds, then the right hand side is non-positive hence‖x− 0‖2 ≤ ‖x− y‖2.

Conversely, note that ty = ty +(1− t) ·0 ∈ A for every 0 < t < 1 (byconvexity). But if 0 is the nearest point the left term is non-positive,

15

thus replacing y by ty gives that 2t<〈x, y〉 − t2‖y‖2 ≤ 0. Dividing by tand letting t → 0 gives the condition.

Assuming again that z = 0 fix y ∈ M . Using the fact that when Mis a subspace eiθy ∈ A for every θ, we obtain that <〈x, eiθy〉 ≤ 0 forevery θ. Thus 〈x, y〉 = 0. ¤Definition 3.11. Let A be a subset of a Hilbert space H. The orthogo-nal complement of A is the set A⊥ = x ∈ H : x ⊥ y for every y ∈ A.

Note that A⊥ is a (closed) subspace for every set A.

Theorem 3.12. Let M be a closed subspace of a Hilbert space H, thenH decomposes as the direct sum H = M ⊕ M⊥, and the orthogonalprojection on M gives the nearest point map.

Proof. Given x ∈ H let z be the point in M nearest to x. Thenx = z + (x− z), and by the characterization above x− z is orthogonalto M , i.e., it belongs to M⊥. The decomposition is clearly unique. ¤

3.3. Orthonormal systems.

The notion of an orthogonal basis was very useful in linear algebra.In this section we shall find the infinite dimensional analog. Here thegeneral element in the space will not be represented as a finite linearcombination of the elements of the basis, but as a sum of an infiniteseries.

Definition 3.13. An infinite series∑

xn in a Banach space convergesif its sequence of partial sums

∑n≤N xn converges.

The series converges absolutely if∑ ‖xn‖ converges.

The series converges unconditionally if∑

εnxn converges for everychoice of signs εn = ±1 (in the real case) or εn = eiθn (in the complexcase).

Remarks. (i) The completeness of the space ensures that an absolutelyconvergent series is, indeed, convergent.

(ii) In finite dimensional spaces a series converges absolutely iff it con-verges unconditionally. This is not true in infinite dimensions. Forexample, take xn = λnen in l2, then

∑xn converges absolutely iff∑ |λn| < ∞, but it converges unconditionally iff

∑ |λn|2 < ∞.

Definition 3.14. A sequence ϕn ⊂ H is called an orthogonal systemif ϕn ⊥ ϕm for every n 6= m.

An orthonormal system is an orthogonal system with ‖ϕn‖ = 1 forevery n.

16

Given any sequence xn of linearly independent vectors in H, theGraham-Schmidt procedure yields an orthonormal system ϕn so thatspanxn : n ≤ N = spanϕn : n ≤ N for every N .

Examples 3.15. (i) The unit vectors ej are an orthonormal sequencein l2.

(ii) Consider complex L2(0, 2π) with the normalized inner product

〈f, g〉 = 12π

∫ 2π

0fg.

The functions eintn∈Z are easily seen to be an orthonormal system.(This important example will be studied in great detail in the chapteron Fourier series.)

(iii) Assume M ⊂ H is a finite dimensional subspace with an orthonor-mal basis ϕjj≤n. Then the orthogonal projection on M is given byP (x) =

∑〈x, ϕj〉ϕj. Indeed, this is a point in M with 〈x−P (x), y〉 = 0for every y ∈ M .

Theorem 3.16. Let ϕn be an orthonormal system in a Hilbert spaceH and let x ∈ H. Then

(i) (Bessel’s inequality)∑ |〈x, ϕn〉|2 ≤ ‖x‖2.

(ii) The series∑

anϕn converges iff∑ |an|2 < ∞. In fact, if the series

converges to y then an = 〈y, ϕn〉.(iii) The series

∑〈x, ϕn〉ϕn converges to the point nearest to x in theclosed linear subspace spanned by the ϕ’s.

Proof. (i) For each N the point zN =∑

n≤N〈x, ϕn〉ϕn is the nearest

point in spanϕn : n ≤ N to x. Thus∑

n≤N |〈x, ϕn〉|2 ≤ ‖x‖2 (because0 is in the space), and the inequality holds because N is arbitrary.

(ii) Since ‖∑M<n<N anϕn‖2 =

∑M<n<N |an|2 the series satisfies the

Cauchy condition iff∑ |an|2 < ∞. The formula for an is obtained by

direct calculation using the orthonormality..

(iii) The convergence follows from (i) and (ii), and the fact that thesum is the nearest point follows from the orthogonality criterion. ¤

Definition 3.17. An orthonormal sequence ϕn in a Hilbert space iscalled an orthonormal basis (or a complete orthonormal system) if everyx ∈ H has an expansion as an infinite convergent series x =

∑anϕn.

The representation is necessarily unique and the coefficients are givenby an = 〈x, ϕn〉.Examples 3.18. (i) The unit basis vectors en ∈ l2.

17

(ii) The exponentials eint in L2(0, 2π). We saw the orthogonality andwe need to check the completeness. As the next theorem shows, itsuffices to show that the trigonometric polynomials are dense in L2. Infact, they are dense in Lp for any 1 ≤ p < ∞. Indeed, considered asfunctions on the circle T the subspace spanned by the trigonometricpolynomials is a self adjoint subalgebra of C(T) that separates pointsand contains 1. By the Stone Weierstrass theorem it is dense in C(T).

The denseness in Lp follows because C(T) is dense in Lp.

*********end of 9 hours*********

Theorem 3.19. Let ϕn be an orthonormal system in a Hilbert spaceH. Then the following are equivalent

(i) ϕn is a basis.

(ii) 〈x, ϕn〉 = 0 for all n implies that x = 0.

(iii) spanϕn is dense in H.

(iv) The Parseval identity: ‖x‖2 =∑ |〈x, ϕn〉|2 for every x ∈ H.

(v) The generalized Parseval identity: 〈x, y〉 =∑〈x, ϕn〉〈y, ϕn〉 for

every x, y ∈ H.

Proof. (i) ⇒ (ii) ⇒ (iii), (i) ⇒ (v) and (v) ⇒ (iv) are obvious.

(iii) ⇒ (i) because if there are any linear combinations of the ϕn’s

that converge to x, then necessarily the canonical sums∑N

1 (x, ϕn)ϕn

do. Indeed, if ‖x −∑N1 anϕn‖ < ε then also ‖x −∑N

1 (x, ϕn)ϕn‖ < ε

because∑N

1 (x, ϕn)ϕn is the nearest point to x in spanϕn : n ≤ N.(iv) ⇒ (i) Fix any x ∈ H. By theorem 3.16(iii) the point y =∑

(x, ϕn)ϕn is the nearest point to x in the closed linear space spannedby the ϕn’s, hence (x−y)⊥y and by Pythagoras ‖x‖2 = ‖x−y‖2+‖y‖2.But x and y have the same norm by (iv), so necessarily ‖x − y‖ = 0,i.e., y = x. ¤

Corollary 3.20. Every separable Hilbert space H has an orthonormalbasis and is linearly isometric to l2.

Proof. Choose any dense sequence. Applying to it the Graham-Schmidtprocedure yields an orthonormal basis ϕn.

The operator T : H → l2, obtained by extending linearly the mapϕn → en, is the required isometry. ¤

18

Remark. We shall not discuss the non-separable case in detail, buthere is a sketch of what is the structure of such a space. One definesuncountable complete orthonormal system in a natural way, and the“model space” for a non-separable Hilbert space is l2(Γ). This is thespace of functions with a countable support f : Γ → C on some set Γsatisfying

∑ |f(γ)|2 < ∞. The norm is ‖f‖2 =∑ |f(γ)|2.

The density character of l2(Γ) is the cardinality of Γ and the unitvectors eγ are a complete orthonormal system.

Given H, we fix a set Γ whose cardinality is the density characterof H and apply Zorn’s lemma to find an orthonormal system ϕγγ∈Γ

that spans a dense linear subspace of H. (This replaces the Graham-Schmidt argument, which works only for countable systems.)

A linear isometry T : H → l2(Γ) is then obtained by extendinglinearly the map ϕγ → eγ.

*********end of 10 hours*********

3.4. Linear functionals.

Proposition 3.21. Let E be a Banach space. A linear functional

ϕ : E → R (or to C) is continuous iff it is bounded, i.e., sup |ϕ(x)|‖x‖ < ∞.

Proof. Assume sup |ϕ(x)‖x‖ = K < ∞. Then

‖ϕ(x)− ϕ(y)| = |ϕ(x− y)| ≤ K‖x− y‖so ϕ is a Lipschitz function, hence continuous.

Conversely, if ϕ is continuous, choose δ > 0 so that |ϕ(y)| < 1 forevery ‖y‖ ≤ δ. Then for every x

|ϕ(x)|‖x‖ =

|ϕ(δx/‖x‖)|δ

≤ 1

δ

so take K = 1δ. ¤

Definition 3.22. We denote the space of continuous linear functionalson a normed space E by E∗. It is a Banach space under the norm

sup|ϕ(x)|‖x‖ = sup

|ϕ(x)|‖x‖ : ‖x‖ = 1

.

We shall not prove here that the formula defines a norm on E∗, andthat E∗ is complete under this norm. This will be proved in a moregeneral setting in section 5.

19

Examples 3.23. (i) Define ϕ on C(0, 1) by ϕ(f) =∫ 1

0f(t)dt. Then

‖ϕ‖ = 1 because∣∣∣∫ 1

0f∣∣∣ ≤ max |f | = ‖f‖∞ and ϕ(1) = 1.

(ii) Let H be a Hilbert space. For each y ∈ H the formula ϕy(x) =〈x, y〉 defines a linear functional with ‖ϕy‖ = ‖y‖. Indeed, it followsfrom Cauchy-Schwartz that |ϕy(x)| ≤ ‖y‖ ‖x‖, and ϕy(y/‖y‖) = ‖y‖.

It turns out that example (ii) gives the general form of a continuouslinear functional on a Hilbert space. Before proving this fact we recalltwo facts from algebra.

(i) Let ϕ be a non-zero linear functional on a vector space V . ThenK = ker(ϕ) has co-dimension 1.

Indeed, fix y ∈ V with ϕ(y) = 1. Given v ∈ V put k = v − ϕ(v)y.Then k ∈ K and v = k + ϕ(v)y, so K and y span V .

(ii) Let ϕ and ψ be two non-zero linear functionals on a vector spaceV . If ker(ϕ) = ker(ψ) then there is a scalar λ so that ϕ(v) = λψ(v) forevery v ∈ V .

Indeed, let K be the common kernel. By (i) choose z ∈ V withψ(z) = 1 an then any v ∈ V is represented as v = k + αz. Thusϕ(v) = αϕ(z) and ψ(v) = αψ(z) = α, so take λ = ϕ(z).

Theorem 3.24 (The Riesz representation theorem). Let H be a Hilbertspace, then for every ϕ ∈ H∗ there is a unique y ∈ H s.t. ϕ(x) = 〈x, y〉for every x ∈ H.

Proof. If ϕ = 0 take y = 0. Otherwise choose 0 6= z ∈ ker(ϕ)⊥. Sinceϕ and ϕz(x) = 〈x, z〉 have the same kernel, they are proportional,ϕ = λϕz, so take y = λz.

The uniqueness is obvious. ¤Example 3.25. The general continuous linear functionals on l2 andL2(0, 1) are given by ϕy(x) =

∑xnyn and ϕ(f) =

∫ 1

0fg respectively.

Corollary 3.26. (i) Let F ⊂ H be a proper closed subspace of H.Then there is a 0 6= ϕ ∈ H∗ s.t. ϕ

∣∣F

= 0.

(ii) For every 0 6= x ∈ H there is a functional ϕ ∈ H∗ with ‖ϕ‖ = 1and ϕ(x) = ‖x‖.(iii) Let F ⊂ H be a closed subspace, and assume ψ ∈ F ∗. Then thereis a ϕ ∈ H∗ with ‖ϕ‖ = ‖ψ‖ and ϕ

∣∣X

= ψ.

Remark. What happens in Banach spaces? We shall not discuss thisin this course, but we mention two very important results.

20

The analog of the Corollary holds in every Banach space. This is oneof the fundamental theorems of functional analysis, the Hahn-Banachtheorem. Note that even the fact that every Banach space admits anontrivial continuous functional is not obvious and requires the Hahn-Banach theorem.

In Hilbert we have a concrete representation of the space of con-tinuous linear functionals. There is no chance, of course, for such arepresentation theorem for a general Banach space, but for concretespaces such theorems can sometimes be proved – and they are veryuseful. For example, the Riesz representation theorem for lp (or Lp)for 1 ≤ p < ∞ says that every linear functional on lp is given by

ϕy(x) =∑

xnyn for some y ∈ lq (or ϕg(f) =∫ 1

0fg for some g ∈ Lq).

*********end of 11 hours*********

This section wasdone at the endof the course andclass hours wereadjusted

3.5. Weak convergence.

Definition 3.27. We say that a sequence xn in a Hilbert space Hconverges weakly to x, and write xn

w→ x, if 〈xn, y〉 → 〈x, y〉 for everyy ∈ H.

Norm convergence clearly implies weak convergence, and the con-verse is false. For example en

w→ 0 in l2. To contrast with weakconvergence we sometime use the term strong convergence for normconvergence.

Proposition 3.28. A weakly convergent sequence is bounded.

We shall not give the proof of this proposition, and just say that itfollows easily from the Baire category theorem and is a special case ofone of the three fundamental theorems of functional analysis, the socalled “principle of uniform boundedness”.

Note that in order to check whether xnw→ x it suffices to check that

〈xn, y〉 → 〈x, y〉 for every y in the closed linear space spanned by thexn’s and x. Indeed, if P is the orthogonal projection on this subspace,then 〈xn, y〉 = 〈xn, Py〉, and similarly for x. It follows that when aweakly convergent sequence is given, we may assume that it lies in aseparable space.

Theorem 3.29. Let xn be a a bounded sequence in a separable Hilbertspace H with an orthonormal basis ej. If xn converges coordinatewiseto a vector x, i.e. 〈xn, ej〉 → 〈x, ej〉 for every j, then x ∈ H and

xnw→ x.

21

Proof. Put M = sup ‖xn‖. We first check that x ∈ H and ‖x‖ ≤ M .If not, choose J s.t.

∑j≤J |〈x, ej〉|2 > M2 and let y =

∑j≤J〈x, ej〉ej.

Then ‖y‖ > M , but

M‖y‖ ≥ 〈xn, y〉 → 〈x, y〉 = ‖y‖2

a contradiction.To prove weak convergence fix y ∈ H and ε > 0, and choose J s.t.

z =∑

j≤J〈y, ej〉ej satisfies ‖y − z‖ < ε. Then∣∣〈xn − x, y〉

∣∣ ≤∣∣〈xn − x, z〉

∣∣ +∣∣〈xn − x, y − z〉

∣∣ ≤∣∣〈xn − x, z〉

∣∣ + 2Mε

but the condition implies that 〈xn − x, z〉 → 0. ¤Remark. The condition of boundedness is, of course, necessary byproposition 3.28, but here are two concrete examples:

(i) For any vector x = (a1, . . .) the vectors xn = (a1, . . . , an, 0, . . .)converge coordinatewise to x, even when x /∈ l2.

(ii) The vectors xn = nen converge coordinatewise to 0, but notweakly. Indeed, taking y =

∑1nen ∈ l2 we obtain that 〈xn, y〉 = 1 6→ 0.

Corollary 3.30. Every bounded sequence has a weakly convergent sub-sequence.

Proof. A standard diagonal argument gives a subsequence that con-verges coordinatewise. ¤Remark. It is possible to define a topology (the “weak topology”) s.t.a sequence converges in this topology iff it is weakly convergent. Abasis for this topology is given by sets of the form

V (z; y1 . . . , yn; ε1, . . . , εn) =x : |〈x, yj〉 − 〈z, yj〉| < εj ; 1 ≤ j ≤ n

.

This is the weakest topology with respect to which all the linearfunctionals x → 〈x, y〉 are continuous.

The Corollary says that with respect to this topology a boundedset is relatively sequentially compact. (In general topological spaces,compactness, defined by finite subcovers, and sequential compactnessdo not necessarily coincide.) In fact closed and bounded set are actuallyweakly compact as follows from Tychonoff’s theorem. For example, theclosed unit ball in l2 is a closed subset of the product [−1, 1]N which iscompact with respect to the product topology.

We now turn to study the relation between weak and norm conver-gence.

Proposition 3.31. Weak and norm convergence of sequences are equiv-alent iff H is finite dimensional.

22

Proof. We know from Infi that when H is finite dimensional, then asequence converges in norm iff it converges coordinatewise.

Conversely, if H is infinite dimensional, let en be an infinite orthonor-mal sequence in H. Then ‖en‖ = 1 but en

w→ 0 because for any y ∈ Hthe series

∑ |〈y, en〉|2 even converges. ¤

Proposition 3.32. (i) If xnw→ x then ‖x‖ ≤ lim inf ‖xn‖.

(ii) If xnw→ x and ‖xn‖ → ‖x‖ then xn converges to ‖x‖ in norm. In

particular, weak and norm convergence coincide on the unit sphere.

Proof. (i) We may assume x 6= 0. Applying inner product with x gives

‖x‖2 = 〈x, x〉 = lim〈xn, x〉 ≤ lim inf ‖xn‖‖x‖and the result follows.

(ii) As ‖xn‖ → ‖x‖ and 〈xn, x〉 → 〈x, x〉 = ‖x‖2 we obtain

‖xn − x‖2 = ‖xn‖2 − 2<〈xn, x〉+ ‖x‖2 → 0.

¤Theorem 3.33 (Banach-Saks). If xn

w→ x then there is a subsequence

xniwhose averages converge in norm to x, i.e.,

xn1+...+xnm

m→ x in

norm as m →∞.

Proof. Let ej be an orthonormal basis for H and we may assume thatx = 0. The proof will follow from the following useful general fact:

Claim: Assume xnw→ 0, then for any sequence εi > 0 there are vectors

zi and ni s.t.(i) ‖zi − xni

‖ < εi.

(ii) There are 0 = N0 < N1 < N2 < . . . s.t. zi =

Ni∑j=Ni−1+1

ajej. In

particular the zi’s have disjoint supports.

To deduce the theorem choose εi → 0. Since the xn’s are bounded,so are the zi’s. Assume ‖zi‖ ≤ M for every i, and then

‖xn1 + . . . + xnm‖m

≤ ‖z1 + . . . + zm‖m

+ε1 + . . . + εm

m

≤√

mM + ε1 + . . . + εm

m→ 0

The Claim is proved by inductive choice of the ni’s and Ni’s, andthen taking zi to be the restriction of xni

to the coordinates in theinterval [Ni−1 + 1, Ni].

23

Set n1 = 1 and choose N1 s.t. ‖∑j>N1

〈x1, ej〉ej‖ < ε1. This com-pletes the first step.

As the xn’s converge coordinatewise to 0, there is an n2 > n1 such

that∑

j≤N1|〈xn2 , ej〉|2 <

(ε2

2

)2. Also the series xn2 =

∑〈xn2 , ej〉ej

converges, hence there is an N2 s.t. ‖∑j>N2

〈xn2 , ej〉ej‖ < ε2/2.This completes the second step, and the induction step is similar.

Having chosen ni and Ni, we first use the coordinatewise convergence

to 0 to find an ni+1 > ni s.t.∑

j≤Ni|〈xni+1

, ej〉|2 <( εi+1

2

)2, and then

find Ni+1 s.t. ‖∑j>Ni+1

〈xni+1, ej〉ej‖ < εi+1/2. ¤

Corollary 3.34. A convex set K in a Hilbert space is norm closed iff itis weakly sequentially closed, i.e., if xn ∈ K converge weakly, xn

w→ x,then x ∈ K. (In fact K is norm closed iff it is closed in the weaktopology, but we have not discussed the weak topology.)

Proof. If K is weakly sequentially closed then it is certainly norm closedbecause ‖xn − x‖ → 0 implies xn

w→ x.

Conversely, assume K is norm closed and xnw→ x. By the Banach-

Saks theorem there are convex combinations, zj, of the xn’s that con-verge to x in norm. But by the convexity of K the zj’s are also in K,and as K is norm closed x ∈ K. ¤

We finish the section with a typical application of weak convergence:under suitable conditions corollary 3.30 and theorem 3.33 can be usedas a substitute for norm-compactness.

Definition 3.35. A real-valued function f on a metric space is calledlower semi continuous if xn → x implies f(x) ≤ lim inf f(xn).

Theorem 3.36. Let K be a bounded closed and convex subset of aHilbert space H and let f : K → R be a convex lower semi continuousfunction. If f is bounded from below on K then f attains its minimumon K.

Proof. Put infK f = M and choose xn ∈ K s.t. f(xn) → M . ByCorollary 3.30 and by passing to a subsequence we may assume thatxn converges weakly, xn

w→ x. We shall show that f(x) = M . Indeed,by theorem 3.33 zn = x1+...+xn

n→ x in norm, and f(zn) ≤ 1

n

∑nj=1 f(xj)

by the convexity of f . Thus

M ≤ f(x) ≤ lim inf f(zn) ≤ lim1

n

n∑j=1

f(xj) = M.

¤

24

Example 3.37. We give an alternative proof that if K ⊂ H is a closedconvex set and h ∈ H then the distance from h to K is attained.

Indeed, the non-negative function f(x) = ‖x − h‖ is convex andcontinuous. By the theorem it attains a minimum on any closed convexbounded subset of K, so consider its restriction to K ∩B(h, 2d(h,H)).

*********end of 16 hours*********

25

4. Fourier series

4.1. Basic definitions.

We shall consider 2π-periodic functions on R and identify them withfunctions on [0, 2π] with f(0) = f(2π) or with functions on the unitcircle T.

In this chapter we denote Lp(0, 2π) with the norm ‖f‖pp = 1

2π

∫ 2π

0|f |p

simply by Lp.Note that Lq is (a dense) subspace of Lp whenever 1 ≤ p < q ≤ ∞.For functions in Lp it is, of course, not important what are their val-

ues at the two points 0 and 2π, but for periodic extensions of continuousfunctions it is. Thus the periodic extension of the function f(t) = ton [0, 2π) is discontinuous, and we shall simply say that f(t) = t isdiscontinuous.

It will also be convenient sometimes to consider the interval [−π, π]rather than [0, 2π]. In this case the middle point is 0, and this maysave some clumsy notation.

The inner product in L2 is, of course, 〈f, g〉 = 12π

∫ 2π

0fg.

Definition 4.1. (i) The nth Fourier coefficient (n ∈ Z) of f ∈ L1 is

f(n) = 12π

∫ 2π

0f(t)e−intdt(= 〈f, eint〉 when f ∈ L2).

(ii) The Fourier series of f is∑

n∈Z f(n)eint.

(iii) A finite linear combination of exponentials is called a trigonometricpolynomial.

Remark. The Fourier series can also be written in terms of sines andcosines, namely

∑∞0 an sin nt + bn cos nt where an = 1

π

∫ 2π

0f(t) sin ntdt

and bn = 1π

∫ 2π

0f(t) cos ntdt for n ≥ 1 and a0 = 0, b0 = 1

2π

The following theorem was already proved in Example 3.18(ii).

Theorem 4.2. The trigonometric polynomials are dense in C(T) andin Lp for 1 ≤ p < ∞.

In particular the exponentials are an orthonormal basis for L2 andthe Fourier series of f ∈ L2 converges to f in the L2 norm. Thusf(n) = 0 for all n iff f = 0 (as an element in L2) and the Parseval

identity ‖f‖22 =

∑ |f(n)|2 holds for every f ∈ L2.

Later in this section we shall also study conditions on f that ensurethat its Fourier series converges to it a.e., pointwise, or even uniformly.

The following summarizes some elementary properties of the Fouriercoefficients.

26

Proposition 4.3. Let f ∈ L1. Then

(i) |f(n)| ≤ ‖f‖1.

(ii) fτ (n) = e−inτ f(n) (where fτ (t) = f(t− τ)).

(iii) If f is differentiable and f ′ ∈ L1, then f ′(n) = inf(n).

Proof. (i) |f(n)| =∣∣∣ 12π

∫ 2π

0f(t)e−intdt

∣∣∣ ≤ 12π

∫ 2π

0|f | = ‖f‖1.

(ii) Substitute s = t − τ in fτ (n) = 12π

∫ 2π

0f(t − τ)e−intdt and use the

periodicity of f and of the exponential.

(iii) Use integration by parts in f ′(n) = 12π

∫ 2π

0f ′(t)e−intdt. ¤

We now introduce a new operation on integrable functions. Recallthat we always consider functions to be defined and periodic on all ofR.

Definition 4.4. The convolution of two functions f, g ∈ L1 is

f ∗ g(x) =1

2π

∫ 2π

0

f(x− t)g(t)dt.

The convolution is well defined and ‖f ∗ g‖1 ≤ ‖f‖1‖g‖1. We shallnot show that the function h(x, t) = f(x − t)g(t) is measurable, butassuming it is, we show it is integrable on the square, so the convolutionexists a.e. and is integrable by Fubini’s theorem. Indeed,

∫ ∫|f(x− t)g(t)|dxdt =

∫ (|g(t)|

∫|f(x− t)|dx

)dt

=

∫|g(t)|dt ·

∫|f(x− t)|dx

by the periodicity of f . The formula also gives the norm estimate.

*********end of 17 hours********* The following proposition summarizes some basic properties of the

convolution.

Proposition 4.5. (i) The convolution is commutative.

(ii) f ∗ g(n) = f(n)g(n).

(iii) If f is differentiable with f ′ ∈ L1 (or is a trigonometric polyno-

mial), then also f ∗g ∈ L1 (or f ∗g is also a trigonometric polynomial).

27

Proof. (i) follows by change of variables and (ii) by direct computation.To prove the first part of (iii) we differentiate inside the integral

to obtain that (f ∗ g)′ = f ′ ∗ g. (Differentiating inside the integralis permissible. If f ′ is bounded this follows directly from Lebesgue’sdominated convergence theorem). The second part follows by directcomputation. ¤

The formula f ∗ g(n) = f(n)g(n) is extremely important. First, itrepresent the convolution as a “natural” algebraic operation. It is themultiplication term by term of the sequence of Fourier coefficients –which only “looks new” at the functions level.

Secondly, many natural operators on L1 are given by convolution.For example, it follows directly from (ii) that the partial sums

Sn(f)(t) =∑

|j|≤n

f(n)eijt

of the Fourier series of f are given as Sn(f, t) = Dn ∗ f(t), where theDirichlet polynomials Dn are

Dn =∑

|j|≤n

eijt =sin(n + 1

2)t

sin t2

.

To prove this formula write

2i sin t2· Dn = ei t

2 Dn − e−i t2 Dn = ei(n+ 1

2)t − e−i(n+ 1

2)t = 2i sin(n + 1

2)t

because the second term is a telescopic sum. Now divide by 2i sin t2.

*********end of 18 hours*********

We finish this section with the definition of Fourier series in severalvariables.

Definition 4.6. Let f be a 2π-periodic function of d variables inte-grable on the d-cube. Its Fourier coefficients are

f(n1, . . . , nd) =

(1

2π

)n ∫ 2π

0

. . .

∫ 2π

0

f(t1, . . . , td)e−i

∑njtjdt1 . . . dtd.

The Fourier series of f is∑

f(n1, . . . , nd)e−i

∑njtj .

All the one-dimensional results we discussed have multi-dimensionalanalogs, including the basic properties, the denseness of the trigono-metric polynomials, the fact that the exponentials are an orthonormalbasis for L2(Tn), etc.

28

4.2. Fejer’s theorem.

Definition 4.7. A sequence kn ∈ L1 is called an approximate identity(or a summability kernel) if

(i) kn ≥ 0.

(ii) ‖kn‖1 = 12π

∫kn = 1.

(iii) limn→∞

∫

|t|≥δ

kn(t) = 0 for every δ > 0.

Theorem 4.8. Let kn be an approximate identity. Then f ∗ kn → funiformly for every f ∈ C(T).

Proof. Fix ε > 0 and let δ > 0 be such that |x − y| ≤ δ implies|f(x)− f(y)| ≤ ε. Use (ii) to write f(x) = 1

2π

∫kn(t)f(x)dt, hence

|kn ∗ f(x)− f(x)| =

∣∣∣∣1

2π

∫kn(t)

(f(x− t)− f(x)

)dt

∣∣∣∣

≤ 1

2π

∫kn(t)

∣∣f(x− t)− f(x)∣∣dt

=1

2π

∫

|t|>δ

+1

2π

∫

|t|≤δ

.

But

1

2π

∫

|t|>δ

kn(t)∣∣f(x− t)− f(x)

∣∣dt ≤ 2 max |f(s)|2π

∫

|t|>δ

kn → 0

by (iii), and1

2π

∫

|t|≤δ

kn(t)∣∣f(x− t)− f(x)

∣∣dt ≤ ε

by the choice of δ and by (ii). ¤Remark. The theorem also holds for f ∈ Lp when 1 ≤ p < ∞, andthe convergence is then in the Lp norm. We shall not prove this.

Corollary 4.9. If f ∈ L1 (or in Lp) satisfies f(n) = 0 for all n, thenf = 0 as an element of L1 (or Lp). If f is continuous then f ≡ 0.

It follows that f(n) = g(n) for all n implies that f = g.

Proof. The condition implies that 0 = Fn ∗ f → f . ¤To apply the theorem we need to make “clever” choices of approx-

imate identities. We illustrate this by giving an alternative proof ofTheorem 4.2. One advantage of this proof is that it will give an ex-plicit formula for the approximating trigonometric polynomials.

29

Definition 4.10. The Fejer polynomials are

Fn(t) =∑

|j|≤n

(1− |j|

n + 1

)eijt =

1

n + 1

(sin (n+1)t2

sin t2

)2

.

To prove the identity write sin2 s2

= 1−cos s2

= −e−is+2−eis

4and compare

coefficients to check that

sin2 t2· Fn(t) =

−e−it + 2− eit

4

∑

|j|≤n

(1− |j|

n + 1

)eijt

=−e−i(n+1)t + 2− ei(n+1)t

4(n + 1)=

1

n + 1sin2 (n+1)t

2.

Note that Fn = D0+D1+...+Dn

n+1where Dn is the Dirichlet kernel, hence

σn(f) = Fn ∗ f is the average

σn(f) =S0(f) + S1(f) + . . . + Sn(f)

n + 1

of the partial sums of the Fourier series of f – and it is, indeed, “easier”task to check that the averages converge than to check that the sequenceof partial sums is convergent.

Theorem 4.11 (Fejer). The Fejer polynomials are an approximateidentity.

Hence the trigonometric polynomials σn(f) = Fn ∗ f converge uni-formly to f when f is continuous and in the Lp norm when f ∈ Lp

with 1 ≤ p < ∞.In particular the trigonometric polynomials are dense in C(T) and

in Lp for 1 ≤ p < ∞.

Proof. The sum representation of Fn gives that it is an n-polynomialwith integral 1. The fraction representation shows it is nonnegativeand that (iii) holds (because the denominator | sin(t/2)| is boundedaway from 0 when |t| > δ, so the fraction is bounded). ¤

*********end of 20 hours*********4.3. Norm convergence.

We saw that for f ∈ L2 the Fourier series converges to f in the L2

norm, and that if f ∈ Lp for p < ∞ then σn(f) = Fn ∗ f → f in theLp norm. Does also the Fourier series of f ∈ Lp converge to f? Itturns out that this is false in L1 (and also in C(T)). We shall not give

30

the (not very complicated) examples, but will explain the differencebetween the averages σn(f) and the partial sums Sn(f).

The Dirichlet polynomials Dn =∑

|j|≤n eijt =sin(n+ 1

2)t)

sin t2

satisfy two

of the properties of an approximate identity: Clearly∫

Dn = 1, and itis also true that

∫|t|≥δ

Dn(t) → 0 for every δ > 0.

But the Dn’s are not positive and ‖Dn‖1 →∞. It can be shown (asthose of you who will take the Functional Analysis course will learn)that the fact that ‖Dn‖1 →∞ implies that the Fourier series of func-tions in L1 or C(T) do not generally converge.

As averages of the sequence Sn(f) the σn(f)’s still have a chance toconverge even when the Sn(f)’s do not – and Fejer’s theorem says thatthey really do!

The following theorem is much deeper then the special case p = 2and we shall not prove it.

Theorem 4.12 (Riesz). For 1 < p < ∞ the Fourier series of f ∈ Lp

converges in the Lp norm to f .

Remark. Fourier series are very useful in applications. Here is a typ-ical example.

How does one transmit a periodic signal f(t)? In practice one sendsjust a few numbers – for example, some of its Fourier coefficients – andthe receiver processes them to “reconstruct” a good enough approxi-mation of f . To isolate the required coefficient one convolves f witha “filter”, for example, p(t) =

∑j∈A eijt, and the question is how well

p ∗ f approximate f .From what we proved we see that it is sometimes useful to use a gen-

eral trigonometric polynomial p as the filter and not just polynomialsof the form

∑j∈A eijt. Indeed, it seems advantageous to use the Fejer

kernel as a filter rather than the Dirichlet kernel.

Another example of filters is in your ADSL connection. How doesit transmit your phone conversation and your computer data simul-taneously on the same line? The answer is that they use differentfrequency ranges: The conversation is approximately a trigonometric

polynomial of the form p(t) =∑

M1≤j≤M2

ajeijt and the computer data is

approximately q(t) =∑

N1≤j≤N2

bjeijt with N1 much bigger that M2. The

network carries p+ q. At home the signals are separated by two filters.One “kills” the coefficients that are far from [M1,M2] and is connectedto the telephone and the other “kills” the coefficients that are far from[N1, N2] and is connected to the computer.

31

4.4. Pointwise convergence.

Theorem 4.13 (The Riemann-Lebesgue lemma). Let f ∈ L1. Then

f(n) → 0.

It follows that also∫ 2π

0f(t) cos nt → 0 and

∫ 2π

0f(t) sin nt → 0.

Proof. For f ∈ L2, hence also for continuous functions, this followsfrom the Bessel inequality.

Given f ∈ L1 and ε > 0, let g be a trigonometric polynomial with‖f − g‖1 < ε. If |n| > degg then |f(n)| = |f(n)− g(n)| < ‖f − g‖1 <ε. ¤

As a rule, the “smoother” f is the faster its Fourier coefficients tend

to 0. We have seen that when f ′ ∈ L1 then f ′(n) = inf(n). Since

f ′(n) is bounded, it follows that f(n) = O( 1n). More generally, if

f is k times differentiable with f (k) ∈ L1 then f (k)(n) = (in)kf(n)

and f(n) = O( 1nk ). It follows that if f is twice differentiable, then∑ |f(n)| ≤ ∑

cn2 < ∞.

Theorem 4.14. Let f be periodic and twice differentiable. Then itsFourier series is absolutely and uniformly convergent to f .

In fact the conclusions hold under the weaker condition that f isdifferentiable and f ′ ∈ L2.

Proof. We saw that∑ |f(n)| ≤ ∑

cn2 < ∞. By Weierstrass’ M -test,

this proves that the Fourier series of f is absolutely and uniformlyconvergent. Put g =

∑f(n)eint, then Corollary 4.9 and the continuity

of f and g implies that f = g.For the second part we only need to show that

∑ |f(n)| < ∞, and

since |f(n)| = |f ′(n)/n| and∑ |f ′(n)|2 < ∞ (because f ′ ∈ L2) the

Cauchy-Schwartz inequality gives

∑|f(n)| =

∑ ∣∣∣∣∣f ′(n)

n

∣∣∣∣∣ ≤(∑ 1

n2

) 12 (∑

|f ′(n)|2) 1

2< ∞.

¤*********end of 21 hours*********The following important proposition will be useful when we study

the convergence of Fourier series of noncontinuous functions.

Theorem 4.15 (The localization principle). Let f ∈ L1 so that f ≡ 0in some subinterval I. Then the Fourier series of f converges to 0 on I.More generally, if two functions f, g ∈ L1 agree on an open subinterval

32

I, then their Fourier series diverge or converge (and to the same limit!)at every point of I.

Proof. Assume 0 ∈ I and we check that Sn(f)(0) → 0. Using the

representation Snf = Dn ∗ f , the identity Dn =sin(n+ 1

2)t

sin t2

and the

formula sin(a + b) = sin α cos β + cos α sin β give

Sn(f, 0) =1

2π

∫ π

−π

f(t)sin(n + 1

2)t

sin t2

=1

2π

∫ π

−π

f(t) cos nt +1

2π

∫ π

−π

f(t) cos t2

sin t2

sin nt .

The first integral converges to 0 by the Riemann-Lebesgue Lemma,

and this also holds for the second integral because the functionf(t) cos t

2

sin t2

is also in L1 because f vanishes in a neighborhood of 0. ¤

The following example will be used in the proof of the next theorem.

Example 4.16. Consider the (discontinuous!) periodic function whosevalues on (−π, π) are F (t) = t. By the localization principle its Fourierseries converges to F (t) for every t ∈ (−π, π).

To see what happens at ±π we first compute the Fourier coefficients.Clearly F (0) = 0 and for n 6= 0 integration by parts gives

F (n) =1

2π

∫ π

−π

te−int

=t

−in2πe−int

∣∣π−π

+1

in2π

∫ π

−π

e−int =(−1)n

−in+ 0 =

i(−1)n

n.

It follows that at the discontinuity point t = π the partial sums of

the Fourier series are∑

|n|≤N F (n)einπ =∑

|n|≤N i (−1)n

n= 0 (because

the nth and −nth term cancel each other). Thus the Fourier seriesconverges to the average of the one sided limits ±π of F at t = π

0 =1

2

(lim

t→π+F (t) + lim

t→π−F (t)

).

More generally, translating by τ the discontinuity of Fτ (t) = F (t−τ)

is at π + τ , the one-sided limits are again ±π, and Fτ (n) = i e−inτ

n. The

sum of Fourier series of Fτ at the discontinuity point is again

∑ie−inτ

n· ein(−π+τ) = i

∑ (−1)n

n= 0

33

which is the average

0 =1

2

(lim

t→(π+τ)+Fτ (t) + lim

t→(π+τ)−Fτ(t)

).

Theorem 4.17 (Dirichlet). Let f be a bounded and piecewise contin-uous function, and assume that f ′ ∈ L2. Then the Fourier series of fconverges at every continuity point t0 of f to f(t0).

Moreover, if f has a jump discontinuity at a point t0 then the Fourierseries of f converges at t0 to the average

∑f(n)eint0 =

1

2

(limt→t+0

f(t) + limt→t−0

f(t)

).

Proof. For a continuity point t0 the result follows from the localizationtheorem. Indeed, f is piecewise continuous hence there is a neighbor-hood I of t0 where f is continuous. Let g be a continuous function on[0, 2π] with g

∣∣I

= f∣∣I

and so that g′ ∈ L2. Then the Fourier series of gconverges at t0 to g(t0) and by the localization principle Sn(f)(t0) andSn(g)(t0) converge to the same limit g(t0) = f(t0).

Assume now that f has a jump continuity at t0. By adding a constantwe may assume that limt→t+0

f(t) = − limt→t−0f(t) = L and we need to

show that Sn(f)(t0) → 0. Put

g(t) =

f(t)− 2LFt0−π(t) t 6= t0

0 t = t0

where F is as in the example above. Then g is continuous at t0hence, by the first part, its Fourier series converge to g(t0) = 0. ButSn(Ft0−π)(t0) → 0 hence also Sn(f)(t0) → 0. ¤Remark. The Dirichlet theorem is usually formulated for piecewisedifferentiable f with f and f ′ bounded. This certainly implies thatf ′ ∈ L2 so the theorem holds.

We finish this section with some highlights from the history of thestudy of pointwise convergence of Fourier series.

Du Bois Reymond constructed in 1876 a continuous function whoseFourier series diverges at a single point or, more generally, at any givencountable set.

This led Lusin, in his 1915 ph.d. dissertation, to ask whether theFourier series of a continuous function must converge a.e.

In 1923 Kolmogoroff constructed a function in L1 whose Fourier se-ries diverges a.e. More dramatically, in 1926, he constructed a function

34

in L1 whose Fourier series diverges everywhere. This made everybodybelieve that the answer to Lusin’s problem is negative. But —

Carlson proved in 1966 that the Fourier series of L2 functions (hencealso continuous functions) converges a.e. Soon after Hunt showed thatthis is true for Lp functions when p > 1.

*********end of 23 hours********* 4.5. Additional results. Three more topics were discussed in class:

(i) A construction of a continuous function whose Fourier series di-verges at a point.

(ii) Demonstration how Fourier series are applied to solving the vi-brating string equation.

(iii) Uniform distribution of sequences in [0, 1]; Weyl’s criterion andthe uniform distribution of 2πnα (mod 1) for irrational α.

*********end of 25 hours*********

35

5. Linear operators

5.1. Basic notions.

Proposition 5.1. Let E and F be Banach spaces. A linear operator

T : E → F is continuous iff it is bounded, i.e., sup ‖Tx‖‖x‖ < ∞.

The norm of a continuous T : E → F is

‖T‖ = sup

‖Tx‖‖x‖ : 0 6= x ∈ E

= sup

‖Tx‖‖x‖ : ‖x‖ = 1

= sup

‖Tx‖‖x‖ : 0 6= ‖x‖ ≤ 1

and the space of continuous linear operators from E to F , denoted byL(E, F ), is a Banach space under this norm. (When F = E we writeL(E) instead of L(E, E)).

If T : E → F and S : F → G then ‖ST‖ ≤ ‖S‖‖T‖.Proof. The proof of the first part is the same as of Proposition 3.21. Weshall not check that ‖·‖ is a norm and that it satisfies ‖ST‖ ≤ ‖S‖‖T‖,and only check that L(E,F ) is complete.

Assume Tn is a Cauchy sequence. For each x ∈ E the sequence Tnxis Cauchy in F , and since F is complete it converges. Denote the limitby Tx. Fix ε > 0 and choose N s.t. ‖Tnx−Tmx‖ < ε for every ‖x‖ ≤ 1and n,m > N . Letting m →∞ we see that ‖Tnx− Tx‖ ≤ ε for every‖x‖ ≤ 1 and n > N , i.e., Tn → T uniformly on BX , so T is continuousand Tn → T . ¤Remark. We only used the completeness of the range space F . If E

is not complete, denote its completion by E. Since every continuousoperator T : E → F has a unique extension to a continuous operator

T : E → F and ‖T‖ = ‖T‖, it follows that L(E, F ) is essentially the

same space as L(E, F ).

An operator T on a Hilbert space can be represented by an infinitematrix with respect to some orthonormal basis en. The entries ofthe matrix are an,m = 〈Tem, en〉.

Operators on spaces of functions can sometimes be represented byexplicit analytic formulas.

There is no “general method” to check whether an operator is con-tinuous (and to estimate its norm), and one usually needs to representit in a specific convenient form to do so.

Examples 5.2. (i) A diagonal matrix D = (dn) represents a boundedlinear operator on l2 iff dn is a bounded sequence. The norm of D isthen sup |dn|.

36

(ii) Multiplication by a measurable function g on L2 is bounded iffg ∈ L∞, and then its norm is ‖g‖∞.

This example is a “continuous” analog of (i). The matrix repre-sentation of this operator with respect to the basis of exponentials isan,m = 〈g(t)eint, eimt〉 = g(m−n). This representation could sometimesbe very useful – but does not help in checking boundedness.

(iii) Convolution by a function g ∈ L1 on L2. One checks directly thatits norm is bounded by ‖g‖1 and that its matrix is

an,m =

g(n) m = n

0 m 6= n

i.e., it is the mth Fourier coefficient of g ∗ eint. Thus the matrix of the“complicated” operator of convolutions is the simplest matrix when weuse the appropriate basis – it is a diagonal matrix with g(n) on thediagonal.

(iv) An operator on l2 is called a Hilbert Schmidt operator if its matrixsatisfies

∑ |am,n|2 < ∞. These are bounded operators and ‖T‖ ≤∑ |am,n|2. Indeed, assume x =∑

bnen with ‖b‖ =∑ |bn|2 ≤ 1. Then

by Cauchy Schwartz

‖Tx‖2 =∥∥∥

∑m

( ∑n

an,mbn

)em

∥∥∥2

=∑m

∣∣∣∑

n

an,mbn

∣∣∣2

≤∑m

( ∑n

|an,m|2)(∑

n

|bn|2)≤

∑m,n

|an,m|2.

(v) The “continuous” analog of example (iv) is an integral operator

Tf(x) =∫ 2π

0K(x, y)f(y)dy on L2 with kernel K ∈ L2([0, 2π]× [0, 2π]).

Similar computation shows that ‖T‖2 ≤ ∫∫ |K|2.(vi) The right and left shifts on l2 are given by the following formulas.

Put x =∞∑

n=1

bnen, then

Rx =∞∑

n=1

bnen+1 ; Lx =∞∑

n=1

bn+1en.

They are clearly norm-one operators. The matrix representation ofthe right shift, for example, is an,m = 1 for n = m−1, i.e., on the lowerdiagonal, and zero elsewhere.

37

Remark. Differential operators are usually not continuous, and tostudy them we must either consider them as operators between care-fully chosen spaces with appropriate norms. For example, Df = f ′ isunbounded in L2, even when restricted to the (non-closed) subspace ofdifferentiable functions. It is also unbounded on the subspace of dif-ferentiable functions in C(0, 1), but it is bounded as an operator fromC1(0, 1) to C(0, 1).

However, the inverses of a differential operators (and this is whatwe are looking for when we try to solve a differential equation) are,in many cases, bounded integral operators. For example, the inverseof Df = f ′ is Tf(x) =

∫ x

0f(t)dt – and this is a bounded operator on

both C(0, 1) and L2.

Operators with finite dimensional range are called finite rank oper-ators.

Proposition 5.3. Let T ∈ L(H) be a continuous finite rank operatoron a Hilbert space H. Then there are points xj, yjj≤n ⊂ H so thatTx =

∑n1 〈x, yj〉xj

More generally, if E and F are Banach spaces and T ∈ L(E,F )is a continuous finite rank operator, then there are xjj≤n ⊂ F andfjj≤n ⊂ E∗ s.t. Tx =

∑n1 fj(x)xj.

Proof. Fix an orthogonal basis xjn1 for TH and expand Tx with re-

spect to this basis, Tx =∑〈Tx, xj〉xj. But ϕj(x) = 〈Tx, xj〉 is a

continuous linear functional on H hence, by Riesz’ representation the-orem, there are yj ∈ H s.t. 〈Tx, xj〉 = 〈x, yj〉.

The proof of the general case is similar, but requires the Hahn-Banach theorem and we shall not give it. ¤

Example 5.4. The rank of a convolution operator Tf = f ∗ p on L2,where p is a trigonometric polynomial of degree n, is at most 2n + 1.

*********end of 26 hours*********

5.2. Invertible operators.

Let T ∈ L(E, F ). Its kernel is a closed subspace, but the range neednot be closed. It could even be a nontrivial dense subspace of F . Forexample, when D : l2 → l2 is a diagonal operator with 0 6= dn → 0,or Tf(x) =

∫ x

0f on L2. An operator could be left-invertible or right-

invertible without being invertible (e.g., right- and left- shifts).We shall say that a continuous linear operator T : E → F is invert-

ible if it is 1− 1 and onto and if its inverse is also continuous.

38

It turns out that one does not need to assume the last condition (thatits inverse is continuous). This is one of the fundamental theorems offunctional analysis (that we shall not prove in this course):

Theorem 5.5 (The open mapping theorem). Let E and F be Banachspaces. If T ∈ L(E, F ) is surjective, then it is an open mapping: theimage of an open set is open.

In particular if T is 1-1 and onto, then its inverse is continuous.

Proposition 5.6. If T ∈ L(E, F ) satisfies ‖T‖ < 1, then I − T isinvertible and ‖(I − T )−1‖ ≤ 1

1−‖T‖ .It follows that the set of invertible operators is open and that the map

T → T−1 is continuous on the set of invertible operators.

Proof. One checks directly that (I − T )−1 =∑

T n, and then

‖(I − T )−1‖ =∥∥∥

∑T n

∥∥∥ ≤∑

‖T n‖ ≤∑

‖T‖n =1

1− ‖T‖ .

Note also that

(∗) ‖(I−T )−1−I‖ ≤∑n≥1

‖T‖n =‖T‖

1− ‖T‖ → 0

when ‖T‖ → 0.To prove the second part assume that S is invertible and we show

that S+T is invertible whenever ‖T‖ < 1/‖S−1‖. Indeed, the operatorS + T = S(I + S−1T ) is the product of two invertible operators – thesecond factor is invertible because ‖S−1T‖ ≤ ‖S−1‖‖T‖ < 1.

Finally,

(S + T )−1 = (I + S−1T )−1S−1 → S−1

as ‖T‖ → 0 by (∗). ¤

Remark. The formula (I−T )−1 =∑

T n is an example of “functionalcalculus”, i.e., a method for extending a function f : C→ C to a maptaking operators to operators. This is only possible for “nice functions”.In this example the function was f(t) = (1− t)−1 and we used the factthat it is analytic and has a Taylor series which converges for |t| < 1.This method works for any function represented by a power series andone can then define f(T ) whenever ‖T‖ is smaller than the radius ofconvergence of the series. For example, eT =

∑T n/n! is well defined

for every bounded operator T because the function ez is entire. (Butnote, however, that the equality eT+S = eT eS holds only for commutingoperators.)

39

Definition 5.7. Let E be a Banach and T ∈ L(E). The spectrumσ(T ) of T is the set

σ(T ) = λ ∈ C : T − λI is not invertible.The resolvent set ρ(T ) of T is the complement of σ(T ), i.e., the set

of λ’s for which T − λI is invertible .The resolvent of ‖T‖ is the function R = RT : ρ(T ) → L(E) given

by R(λ) = (T − λI)−1.

Note that unlike the finite-dimensional case the spectrum may con-tain points that are not eigenvalues. For example, the right shift on l2and Tf(x) =

∫ x

0f on L2 are not onto, so 0 is in their spectrum, but it

is not an eigenvalue.*********end of 27 hours*********Theorem 5.8. The resolvent set is open and the spectrum is closed

and bounded. If E is a complex Banach space, then the spectrum isnon-empty.

The resolvent RT is a continuous function on ρ(T ) and it satisfieslim|λ|→∞

‖R(λ)‖ → 0.

Proof. Almost everything follows directly from Proposition 5.6. Indeed,it implies that the resolvent R is continuous, and that the resolvent setis open as the inverse image of the open set of invertible operatorsunder the continuous map λ → T −λI. As the complement of an openset σ(T ) is closed. Also T − λI is invertible whenever |λ| > ‖T‖, thusσ(T ) is bounded. Also |λ| → ∞ implies

‖(T − λI)−1‖ =1

|λ|∥∥∥(T

λ− I

)−1∥∥∥ ≤ 1

|λ| ·1

1− ‖T‖|λ|

→ 0 .

The only new ingredient in the proof is to show that σ(T ) is non-empty, and we shall prove this only when E is a Hilbert space. (Forgeneral Banach spaces the proof is similar – but requires the Hahn-Banach theorem.)

The resolvent R(λ) = (T − λI)−1 satisfies the resolvent equation

R(λ)−R(µ)

λ− µ= R(λ)R(µ)

as follows directly by multiplying the equation by (T − λI)(T − µI).It follows from this identity and from the continuity of its right hand

side that for each fixed x, y ∈ H the map f(λ) = 〈R(λ)x, y〉 is analyticon the complement of σ(T ).

40

If σ(T ) were empty. it would follow that f(λ) is an entire function.But it is bounded (because ‖R(λ)‖ → 0 as λ → ∞) hence, by Li-ouville’s theorem it is constant – and the constant must be 0. Thus〈R(λ)x, y〉 = 0 for all x, y, hence R(λ) = 0, which is impossible.

(Once again the proof for general Banach spaces is the same, replac-ing 〈R(λ)x, y〉 by ϕ(R(λ)x) for x ∈ E and ϕ ∈ E∗, and applying theHahn-Banach theorem to say that if ϕ(R(λ)x) = 0 for every ϕ ∈ E∗

then R(λ)x = 0. ¤

5.3. Compact operators.

We start by recalling some basic properties of compact sets in a metricspace.

A subset A of a metric space X is compact when every open cover ofA has a finite sub-cover, iff every sequence xn ⊂ A has a subsequencethat converges to a point x ∈ A.

If A is compact, then for every ε > 0 there is finite ε-net in N , i.e.,a finite subset N ⊂ A so that for every x ∈ A there is a point y ∈ Nwith d(x, y) < ε. Also, there is no infinite “separated” set in A, i.e.,there is no δ > 0 and an infinite subset D ⊂ A with d(x, y) ≥ δ forevery x 6= y in D.

If X is complete then the closure of A is compact iff A contains afinite ε-net for every ε > 0, iff A does not contain an infinite separatedset, iff every sequence has a Cauchy subsequence.

Definition 5.9. Let E and F be Banach spaces. An operator T inL(E, F ) is said to be compact if the image TBE of its unit ball has acompact closure. (Equivalently, the closure of TA is compact for everybounded subset A ⊂ E.)

The space of compact operators from E to F is denoted by K(E, F ),and when E = F we write K(E).

Examples 5.10. (i) A finite rank operator is compact.

(ii) The identity on E is compact iff E is finite-dimensional.

(iii) A diagonal operator D on l2 is compact iff dn → 0.

Proposition 5.11. Let E and F be Banach spaces. Then K(E, F ) is aclosed linear subspace of L(E, F ). In other words, the sum of compactoperators is compact and the limit of a sequence of compact operatorsis compact.

Moreover, if T ∈ L(E,F ) and S ∈ L(F,G) and if one of them iscompact, then so is the composition ST .

41

Proof. Assume that S, T are compact. Given any sequence xn ∈ BE,find a subsequence xnn∈M so that Txnn∈M is Cauchy, and a furthersubsequence xnn∈L with L ⊂ M so that Sxnn∈L is Cauchy. Thenalso (S + T )xnn∈L is Cauchy. Thus S + T is compact.

Assume now that Tn are compact with Tn → T and that T is notcompact. Since the closure of TBE is not compact, there is an infiniteset D ⊂ BE so that TD is δ-separated for some δ > 0. Fix n so largethat ‖Tn − T‖ < δ/3. Simple computation then shows that TnD isδ/3-separated, contradicting the compactness of Tn.

The final claim follows from the fact that a bounded operator takesa bounded or Cauchy sequence to a bounded or Cauchy sequence. ¤Example 5.12. This is an important and non-trivial example. LetK ∈ L2([0, 2π]×[0, 2π]). Then the operator Tf(x) =

∫ 2π

0K(x, y)f(y)dy

is a compact operator on L2. The proof is a combination of results wehave already checked.

First, the trigonometric exponentials (in two variables!) are dense inL2([0, 2π]× [0, 2π]), so choose trigonometric polynomials Pn(x, y) with‖Pn −K‖2 → 0.

Let Tnf(x) =∫ 2π

0Pn(x, y)f(y)dy be the integral operator with kernel

Pn. Then Tn is a finite rank operator, hence compact.Finally, we saw in Example 5.2(v) that ‖Tn − T‖ ≤ ‖Pn −K‖2. As

the right hand side converges to zero we obtain that T is a limit ofcompact operators, hence compact by the proposition.

In the following proposition we shall assume that the range space isa Hilbert space.

Proposition 5.13. Let E be a Banach space and let H be a Hilbertspace. Then T ∈ L(E, H) is compact iff there are finite-rank operatorsTn ∈ L(E, H) with ‖Tn − T‖ → 0.

Proof. Finite-rank operators are compact, so one direction follows fromthe previous proposition.

Conversely, assume T is compact. Fix ε > 0 and choose a finite ε-netN in TBE. Let P be the (finite rank!) orthogonal projection on thesubspace spanned by N , and we check that the finite rank operatorPT satisfies ‖PT − T‖ ≤ 2ε.

Indeed, Fix x ∈ BE and choose y ∈ BE such that Ty ∈ N and‖Ty − Tx‖ < ε. Then PTy = Ty implies

‖PTx− Tx‖ ≤ ‖PTx− PTy‖+ ‖Ty − Tx‖≤ (‖P‖+ 1)‖Ty − Tx‖ ≤ 2ε

42

¤

Remarks. (i) The theorem holds (even if the proof could be muchmore complicated) when the range space is any of the classical Banachspaces. In fact, the question whether there is a Banach space for whichit does not hold was one of the main problems left open by Banach. Acounterexample was constructed only in 1972 by Per Enflo.

(ii) Note that the requirement is for convergence in the operator norm,and not just pointwise convergence, namely ‖Tnx− Tx‖ → 0 for everyx. Indeed, assume for example that H = l2 and Pn are the projectionson the subspace spanned by the first n basis vectors. Then for eachfixed x the sequence PnTx converges to Tx for every operator T (evenwhen T is not continuous!).

*********end of 29 hours*********

5.4. Self-adjoint operators on Hilbert space.

Proposition 5.14. Let H1, H2 be Hilbert spaces and T ∈ L(H1, H2).Then there is a unique operator T ∗ ∈ L(H2, H1) s.t. 〈Tx, y〉 = 〈x, T ∗y〉for every x ∈ H1 and y ∈ H2. The operator T ∗ is called the adjoint ofthe operator T , and it satisfies ‖T ∗‖ = ‖T‖.Proof. For each y ∈ H2 the formula ϕ(x) = 〈Tx, y〉 defines a boundedlinear functional on H1. By Riesz’ representation theorem there is aunique y∗ ∈ H1 s.t. 〈Tx, y〉 = 〈x, y∗〉. One checks easily that the mapT ∗y = y∗ is linear, and

‖T ∗‖ = sup‖y‖≤1

‖T ∗y‖ = sup‖x‖,‖y‖≤1

〈x, T ∗y〉 = sup‖x‖,‖y‖≤1

〈Tx, y〉 = ‖T‖.

¤

Examples 5.15. (i) The matrix of the adjoint of an operator on l2with matrix (an,m) is (am,n).

(ii) Similarly, the adjoint of an integral operator with kernel K(x, y) is

the integral operator with kernel K(y, x).

(iii) The adjoint of the operator of multiplication by g ∈ L∞ on L2 ismultiplication by g.

The following proposition summarizes some elementary properties ofthe adjoint.

43

Proposition 5.16. (i) T ∗∗ = T .

(ii) (S + T )∗ = S∗ + T ∗.

(iii) (αT )∗ = αT ∗.

(iv) (ST )∗ = T ∗S∗.

(v) kerT = (T ∗H)⊥ and TH = (kerT ∗)⊥.

(vi) T is finite rank iff T ∗ is.

Proof. The proofs are elementary and we only check (vi): If T is finite

rank, write Tx =∑N

j=1〈x, xj〉yj and direct computation shows that

T ∗y =∑N

j=1〈y, yj〉xj. If T ∗ is finite rank than the first part of the

proof shows that so is T = (T ∗)∗. ¤Proposition 5.17. T is compact iff T ∗ is.

Proof. Assume that T is compact and we show T ∗ is compact.By Proposition 5.13 there are finite rank operators Tn → T . By (vi)

above T ∗n are finite rank, and since ‖T ∗

n−T ∗‖ = ‖Tn−T‖ → 0 it followsthat T ∗ is compact.

If T ∗ is compact, write T = (T ∗)∗, and use the first part to deducethat T is compact. ¤Definition 5.18. An operator is said to be self-adjoint if T = T ∗.

It follows from part (v) of the proposition that if T is self-adjointthen TH = (kerT )⊥, hence H = TH ⊕ (kerT ).

Examples 5.19. : (i) An operator on l2 is self adjoint iff its matrixsatisfies an,m = am,n.

(ii) A diagonal matrix on l2 is self-adjoint iff it is real. Similarly mul-tiplication by a function g on L2 is self-adjoint iff g is real.

(iii) An integral operator with kernel K is self-adjoint iff K satisfies

K(x, y) = K(y, x).

(iv) T ∗T is self-adjoint for every bounded operator T .

Proposition 5.20. Let H be a complex Hilbert space. Then an oper-ator T ∈ L(H) is self-adjoint iff the quadratic form 〈Tx, x〉 is real.

Proof. If T = T ∗ then 〈Tx, x〉 = 〈x, Tx〉 = 〈Tx, x〉, hence it is real.Conversely, assume that the quadratic form is real. Then

〈T (x + λy), x + λy〉 = 〈T (x + λy), x + λy〉 = 〈x + λy, T (x + λy)〉for each λ ∈ C. Expanding and canceling equal terms gives

λ〈Ty, x〉+ λ〈Tx, y〉 = λ〈x, Ty〉+ λ〈y, Tx〉.

44

Taking λ = 1 and then λ = i gives

〈Ty, x〉+ 〈Tx, y〉 = 〈x, Ty〉+ 〈y, Tx〉and

〈Ty, x〉 − 〈Tx, y〉 = −〈x, Ty〉+ 〈y, Tx〉.Adding the two equations gives 〈Ty, x〉 = 〈y, Tx〉, i.e., T = T ∗. ¤

Definition 5.21. We say that an operator T ∈ L(H) is non-negativeif 〈Tx, x〉 ≥ 0 for every x ∈ H.

Note that by the proposition a non-negative operator is necessarilyself adjoint. The notion of non-negative operators induces a partialorder on L(H): T ≥ S iff T − S ≥ 0.

Proposition 5.22. A bounded projection P (i.e., an operator satisfy-ing P 2 = P ) is self-adjoint iff it is the orthogonal projection onto itsrange.

Proof. If P is the orthogonal projection on the subspace V ⊂ H, writex = v1 +w1 and y = v2 +w2 with vi ∈ V and wi ∈ V ⊥ = ker(P ). Then

〈Px1, x2〉 = 〈v1, v2 + w2〉 = 〈v1, v2〉 = 〈x1, Px2〉.Conversely, if P is self adjoint, then (PH)⊥ = ker(P ). Also, since P

is a projection, PH = ker(I−P ), hence closed. Thus H = PH⊕ker(P ),and P is the orthogonal projection. ¤

*********end of 30 hours*********

45

6. Spectral theory of self-adjoint compact operators

The spaces in this chapter will be complex Hilbert spaces. It will beobvious which of the results also hold in the real case.

An ideal representation of an operator on a Hilbert space is by adiagonal matrix with respect to a suitable orthonormal basis. Theelements of this basis will then be eigenvectors of the operator, and thediagonal elements the corresponding eigenvalues.

The finite dimensional case indicates that this cannot be done forevery operator. There are matrices that cannot be diagonalized. Butone of the important theorems of linear algebra is that if T is self ad-joint, then it is diagonalizable. We shall thus assume also in the infinitedimensional case that T is self adjoint. (As in the finite dimensionalcase, the spectral theorem in infinite dimension will hold also if we as-sume that T is normal, i.e., that T and T ∗ commute. We shall not gothrough the (rather easy) changes needed in the proof.)

But there is also a genuine new difficulty in the infinite dimensionalcase: Operators (even self adjoint) do not even have to have eigenvalues!For example, multiplication by a real L∞ function g on L2(0, 1) is suchan operator. We thus need to make some more assumptions on theoperator, and we shall assume it is compact.

(There is a spectral theorem for self adjoint non-compact operators– with ”diagonal” replaced by “multiplication operator” – but we shallnot discuss it in this course.)

6.1. The spectrum of self adjoint operators.

The following lemma gives some basic properties.

Lemma 6.1. (i) If λ is an eigenvalue of an operator T then |λ| ≤ ‖T‖.(ii) If T is self-adjoint then every eigenvalue of T is real.

(iii) If T is self adjoint and λ 6= µ are eigenvalues of T with corre-sponding eigenvector x, y, then x ⊥ y.

(iv) If T is self adjoint and V ⊂ H is invariant under T , then so isV ⊥.

Proof. The proofs are immediate and we only prove (iv) for example.Fix y ∈ V ⊥ and we need to show that 〈x, Ty〉 = 0 for every x ∈ V .

But 〈x, Ty〉 = 〈Tx, y〉 = 0 because Tx ∈ V by the assumption. ¤

Proposition 6.2. Let T be self adjoint. Then sup‖x‖=1

|〈Tx, x〉| = ‖T‖.

46

Proof. Denote the sup by M and note that |〈Tz, z〉| ≤ M‖z‖2 for anyz ∈ H.

Clearly M ≤ ‖T‖, and we show that ‖Tx‖ ≤ M for every x ∈ Hwith ‖x‖ = 1.

We assume, as we may, that Tx 6= 0 and put y = Tx‖Tx‖ . Then by self

adjointness

〈Tx, y〉 = 1‖Tx‖〈Tx, Tx〉 = 〈y, Tx〉 = 〈Ty, x〉

hence

〈T (x + y), (x + y)〉 = 〈Tx, x〉+ 〈Ty, y〉+ 2〈Tx, y〉and

〈T (x− y), (x− y)〉 = 〈Tx, x〉+ 〈Ty, y〉 − 2〈Tx, y〉.Subtracting the two identities and 〈Tx, y〉 = ‖Tx‖ give

4‖Tx‖ = 4〈Tx, y〉 = 〈T (x + y), (x + y)〉 − 〈T (x− y), (x− y)〉≤ M(‖x + y‖2 + ‖x− y‖2)

which, by the parallelogram law, is 2M(‖x‖2 + ‖y‖2) = 4M . ¤

Remark. The proposition is also true in the real case, and it impliesthat if the quadratic form 〈Tx, x〉 of a self adjoint operator is identicallyzero, then T = 0.

Note that if T is self adjoint and sup‖x‖=1

|〈Tx, x〉| = ‖T‖ is attained

at a point z, ‖z‖ = 1, then z is necessarily an eigenvector of T witheigenvalue λ s.t. |λ| = ‖T‖. Indeed

‖T‖ = |〈Tz, z〉| ≤ ‖Tz‖ ‖z‖ ≤ ‖T‖implies equality in Cauchy-Schwartz, i.e. Tz = λz and necessarily|λ| = ‖T‖.

The key to proving the existence of eigenvectors (and later the spec-tral theorem) when T is also compact, is that in the case the sup isindeed necessarily attained:

Proposition 6.3. Let T be a compact self adjoint operator. Then‖T‖ = sup‖x‖=1 |〈Tx, x〉| is attained.

Proof. We may assume that ‖T‖ = 1 (hence also the sup is 1). Re-placing T by −T if necessary we may also assume that the sup issup‖x‖=1〈Tx, x〉.

Pick a sequence xn with ‖xn‖ = 1 so that 〈Txn, xn〉 → 1. By thecompactness of T we may assume (by passing to a subsequence) that

47

Txn is a convergent sequence, and we show that the sequence xn itselfis actually convergent. Indeed,

0 ≤ ‖Txn − xn‖2 = ‖Txn‖2 − 2〈Txn, xn〉+ 1

≤ 2− 2〈Txn, xn〉 → 0

hence Txn − xn → 0. But we know that Txn is convergent, hence alsoxn converges – and to the same limit.

Denote this common limit by z, then ‖z‖ = 1, Tz = z and 〈Tz, z〉 =lim〈Txn, xn〉 = 1 so the sup is attained at z. ¤

Remark. It is not true that both sup〈Tx, x〉 and inf〈Tx, x〉 must beattained. For example, take the diagonal operator T with −1

non the

diagonal, and then the sup is not attained.

The next corollary summarizes propositions 6.2-6.3 in the form thatwill be applied later.

Corollary 6.4. Let T be a compact self adjoint operator. Then themaximum max‖x‖=1 |〈Tx, x〉| = ‖T‖ is attained. The point where itis attained is an eigenvector of T with eigenvalue λ s.t. |λ| = ‖T‖.6.2. The spectral theorem.

Theorem 6.5 (The spectral theorem for compact self adjoint opera-tors). Let T be a compact self adjoint operator on H. Then H = V ⊕K,where K = Ker(T ) and T is one-one on V . The subspace V has anorthonormal basis ϕn composed of eigenvectors of T with correspondingeigenvalues 0 6= λn → 0. Thus Tx =

∑λn〈x, ϕn〉ϕn for every x ∈ H

and the matrix of T with respect to the decomposition H = V ⊕Kand

the basis ϕn is

(Λ 00 0

)where Λ is a diagonal matrix with the λn’s on

the diagonal.

Proof. Let µj be the different non-zero eigenvalues of T , and let Hj

be the eigenspace corresponding to µj. By lemma 6.1(iii) the Hj’s aremutually orthogonal and also orthogonal to K.

Each µj has a finite multiplicity, because if em were an infinite or-thogonal basis for Hj, then the sequence Tem = µjem would not havea convergence subsequence, contradicting the compactness of T . Sim-ilarly µj → 0, because if |µj| ≥ δ > 0 for an infinite set J of j’s, thenchoose ej ∈ Hj with ‖ej‖ = 1. The ej’s are an orthonormal system andTejj∈J does not have a convergent subsequence.

The space V =∑⊕Hj is invariant under T and by lemma 6.1(iv)

so is V ⊥. Also T is one-one on V . To check that T∣∣V⊥ ≡ 0, note that

48

by corollary 6.4

‖T∣∣V ⊥‖ = max

v∈V ⊥:‖v‖=1|〈Tv, v〉|

is attained at an eigenvector of T and its value is |µ| for some nonzeroeigenvalue µ. But all the eigenvectors with non-zero eigenvalue are inV , hence the max is necessarily zero and ‖T

∣∣V ⊥‖ = 0.

Finally choose orthonormal bases in each of the Hj’s and arrangethem as an infinite sequence ϕn, and denote their corresponding eigen-values by λn. ¤

Remarks. (i) Whenever there are infinitely many λ’s, TH is a propersubspace of H (so 0 ∈ σ(T )). Indeed, y =

∑anϕn ∈ TH if and only if∑ | an

λn|2 < ∞ and in this case y = Tz where z = k +

∑an

λnϕn, for some

k ∈ ker(T ).

(ii) The spectrum of T is either λn or 0 ∪ λn. Moreover, ifλ 6∈ σ(T ) we can give an explicit formula for the inverse of T − λI:Given any y ∈ H we can solve Tx − λx = y by expanding everythingin terms of the ϕn’s and the decomposition H = V ⊕K. Comparingcoefficients we see that

x =1

λ

( ∑λj

λj−λ〈y, ϕj〉ϕj − y

)

and direct computation shows that the series is convergent and that(T − λI)x = y.

(iii) We can now define a “functional calculus” for compact self adjointoperators. Here are two examples:

For any compact self adjoint operator T with Tx =∑

λn〈Tx, ϕn〉ϕn

we define its absolute value |T | by |T |x =∑ |λn|〈Tx, ϕn〉ϕn.

As another example we define the square root of a compact non-negative operator T . (Recall that T ≥ 0 means that 〈Tx, x〉 ≥ 0 for

all x or, equivalently λn ≥ 0 for all n). We then define√

T (x) =∑λ

12n 〈Tx, ϕn〉ϕn.

More generally, h(T ) is defined for every compact self-adjoint T andevery bounded function h : R → R. (h(T ) is then bounded, but notnecessarily compact. If we want to ensure it is compact we need totake h which satisfies h(t) → 0 when t → 0).

*********end of 32 hours*********

49

6.3. The min-max formula.

Recall that T is non-negative if 〈Tx, x〉 ≥ 0 for every x ∈ H, and thatnon-negative operators are self adjoint.

Let T be a compact non-negative operator. Then λn > 0 and shallassume from now on that the eigenvector basis is ordered so that theeigenvalues are monotone nonincreasing λn ≥ λn+1.

If T is compact and non-negative, it follows from the spectral theo-rem that

λn = max〈Tx, x〉 : ‖x‖ = 1; x⊥ϕj for j ≤ n− 1

.

While this looks like an “explicit” formula, its application requiresthe computation, in each step, of all the previous eigenvectors. Themain result of this section is a less explicit formula that does not dependon such computations.

Lemma 6.6. Let M and N be two subspaces of H with dimN > dimM .Then there is a z ∈ N with ‖z‖ = 1 and z ⊥ M .

Proof. Let P : H → M be the orthogonal projection. Its restriction toN cannot be one-one, so choose ‖z‖ = 1 in its kernel. ¤Theorem 6.7 (The min max formula). Let T be a non-negative com-pact operator, then

λn = mindim(M)=n−1

max〈Tx, x〉 : ‖x‖ = 1; x⊥M

.

Proof. Taking M0 = spanϕj : j ≤ n− 1 we see that

λn = max‖x‖=1: x⊥M0

〈Tx, x〉 ≥ min max〈Tx, x〉

and we need to prove that λn ≤ max‖x‖=1; x⊥M〈Tx, x〉 for every M .For n = 1 the formula is just Corollary 6.4 and there is nothing to

prove. Inductively, fix n > 1 and any (n − 1)-dimensional subspaceM and use the lemma to find z ∈ spanϕ1, . . . , ϕn with ‖z‖ = 1 andz ⊥ M . Writing z =

∑j≤n ajϕj the monotonicity of the λ’s gives

max‖x‖=1; x⊥M

〈Tx, x〉 ≥ 〈Tz, z〉 = (∑j≤n

λjajϕj,∑j≤n

ajϕj)

=∑j≤n

λj|aj|2 ≥ λn

∑j≤n

|aj|2 = λn‖z‖2 = λn.

¤The next proposition gives several applications of the min max for-

mula. Note that the operators T and S do not need to commute.

50

(When they do they have a common diagonalization, and the proposi-tion is trivial.)

Proposition 6.8. Let S and T be compact non-negative operators.Denote their (non-increasing) eigenvalues and eigenvectors by λn, ϕn

and µn, ψn respectively. Then

(i) If S ≤ T then λn ≤ µn.

(ii) |λn − µn| ≤ ‖S − T‖. In particular, if Tn ≥ 0 are compact andTj → T , then λn(Tj) → λn(T ) for every n.

(iii) λn+m−1(S + T ) ≤ λm(S) + λn(T ).

Proof. (i) Fix an (n− 1)-dimensional subspace M and choose z ∈ M⊥

s.t. 〈Sz, z〉 = max‖x‖=1;x∈M⊥〈Sx, x〉. Thus

max‖x‖=1; x∈M⊥

〈Sx, x〉 = 〈Sz, z〉 ≤ 〈Tz, z〉 ≤ max‖x‖=1; x∈M⊥

〈Tx, x〉.

Choosing now M that minimizes the right hand side, we see that

λn = min max〈Sx, x〉 ≤ min max〈Tx, x〉 = µn.

(ii) A similar argument shows that (Sx, x) ≤ 〈Tx, x〉+ ‖S − T‖ when-ever ‖x‖ ≤ 1, hence λn ≤ µn + ‖S − T‖. The other inequality followssimilarly.

(iii) Take M = spanϕ1, . . . , ϕm−1, ψ1, . . . , ψn−1. Then dim(M) =n + m− 2, hence

λn+m−1(S + T ) ≤ max‖x‖=1;x⊥M

〈(S + T )x, x〉≤ max

‖x‖=1;x⊥M〈Sx, x〉+ max

‖x‖=1;x⊥M〈Tx, x〉

≤ max‖x‖=1; x⊥ϕ1,...,ϕm−1

〈Sx, x〉+ max‖x‖=1; x⊥ψ1,...,ψn−1

〈Tx, x〉= λm(S) + λn(T )

¤

6.4. Application to differential equations.

We shall study in some detail only one specific equation, but the sameprinciples apply quite generally. In particular, they apply almost word-by-word to general Sturm-Liouville equations.

Fix f ∈ L2(0, 1) and consider the equation

(∗) −y′′ = f ; y(0) = y(1) = 0.

51

The boundary conditions are well defined because y is actually con-tinuously differentiable. Indeed, y′(x) =

∫ x

0f(s)ds+c, and the integral

of an L2 function is continuous by Cauchy-Schwartz: If z > x then

|y′(z)− y′(x)| ≤∫ z

x

|f(s)|ds ≤ √z − x

(∫ z

x

|f(s)|2ds

) 12

.

To solve (∗) integrate twice to obtain

y(x) = −∫ x

0

∫ t

0

f(s)dsdt + cx + c1.

Hence c1 = y(0) = 0, and y(1) = 0 gives c =∫ 1

0

∫ t

0f(s)dsdt.

Let us now analyze the solution. Interchanging the order of integra-tion gives

y(x) = −∫ x

0

∫ x

s

f(s)dtds + x

∫ 1

0

∫ 1

s

f(s)dtds

=

∫ x

0

(s− x)f(s)ds +

∫ 1

0

x(1− s)f(s)ds

=

∫ x

0

s(1− x)f(s)ds +

∫ 1

x

x(1− s)f(s)ds

=

∫ 1

0

g(x, s)f(s)ds = Gf(x)

where G is the integral operator with kernel

g(x, s) =

s(1− x) if s ≤ x

x(1− s) if x ≤ s.

The kernel g is called the Green function of the equation.Note that the transformation to an integral operator took into ac-

count both the differential operator and the boundary conditions.One checks directly that for every f ∈ L2 the function y = Gf is

twice differentiable with −y′′ = f , and that y satisfies the boundaryconditions.

We have thus started with a differential operator Ly = −y′′, which is1-1 because if y′′ = 0, then y is linear, but then y(0) = y(1) = 0 implythat y = 0. The operator L is not continuous and is only defined on adense subspace of L2 (the “smooth functions” satisfying the boundaryconditions). But we computed its inverse, and the inverse, G, turns outto be a very nice operator: It is a self-adjoint integral operator witha continuous kernel. In particular G is compact, and by the spectraltheorem L2 has a basis of eigenfunctions of G with eigenvalues λn → 0.

52

Let us compute the eigenvalues: Gf = λf iff Ly = µy with µ = 1λ,

so we need to solve the differential equation

y′′ + µy = 0 ; y(0) = y(1) = 0.

For µ = 0 there is no solution.For µ < 0 there is also no solution. Indeed, the general solution

of the equation is y = aex√−µ + be−x

√−µ, but these functions do notsatisfy the boundary conditions.

For µ > 0 the solution of the equation is y = a cos(x√

µ)+b sin(x√

µ).Thus y(0) = 0 implies that a = 0, and then y(1) = 0 gives

√µ = nπ.

Henceµn = n2π2 ; ϕn(x) =

√2 sin(nπx).

(Note that the general theory implies that this is a complete or-thonormal system for L2(0, 1)! This observation can be modified togive yet another proof that the trigonometric system is complete inL2(0, 2π).)

We can also give a good description of the domain D of L. It is theimage of G, hence y(x) =

∑an sin(nπx) ∈ D iff

∑ ∣∣∣ an

λn

∣∣∣2

∼∑

|an|2n4 < ∞.

Such functions are, indeed, continuous (they have an absolutely con-vergent Fourier series), and it can be proved that they are continuouslydifferentiable with second derivative in L2.

*********end of 34 hours*********

53

7. The Fourier transform

7.1. Basic properties.

In this chapter Lp denotes Lp(R) with the norm ‖f‖p =( ∫∞

−∞ |f |p)1/p

for 1 ≤ p < ∞ and with the essential sup of |f | as ‖f‖∞.Unlike the bounded interval case there are no inclusions between

the Lp spaces for different p’s. We shall, however, use the fact thatthe subspace of continuous functions with bounded support and thesubspace of Lr functions (for r > p) with bounded support are densesubspaces of Lp for any p < ∞.

Definition 7.1. The Fourier transform of f ∈ L1 is

f(ξ) =1√2π

∫ ∞

−∞f(x)e−ixξdx.

(The choice of√

2π for the normalization will be explained later.)

The next lemma lists some elementary properties of the Fouriertransform.

Lemma 7.2. (i) The Fourier transform is a bounded linear operator

from L1 to L∞, and ‖f‖∞ ≤ ‖f‖1/√

2π.

(ii) f is continuous.

(iii) (The Riemann-Lebesgue Lemma) If f ∈ L1 then f(x) −→|x|→∞

0.

(iv) If fα(x) = f(x + α) then fα(ξ) = eiαξf(ξ).

(v) If λ 6= 0 and g(x) = f(λx) then g(ξ) = f(ξ/λ)λ

.

(vi) If xf(x) ∈ L1, then f is differentiable, and (f)′(ξ) = (−ixf)(ξ).

Equivalently, (xf)(ξ) = i(f)′(ξ).

(vii) If f(x) ∈ L1 is differentiable and f ∈ L1, then (f ′)(ξ) = iξf(ξ).

Proof. (i) is obvious. To prove (ii) we start with the observation thatits claim is immediate when f has a bounded support. For general f ,put fn = f

∣∣[−n,n]

, then the fn’s are continuous, and by (i) they converge

uniformly to f , so it too is continuous.

(iii) If f = χ[a,b] is an indicator function of an interval then

f(ξ) =1√2π

∫ b

a

e−ixξdx =1

−iξ√

2πe−ixξ

∣∣ba→ 0.

54

It follows that (iii) also holds for any step function (i.e. a finite sum∑ciχ[ai,bi]).A general function f ∈ L1 can be approximated in the L1 norm

by continuous functions with bounded support, hence also by a stepfunctions. Thus, given ε > 0 choose a step function g with ‖f−g‖1 < εand M > 0 s.t. |g(x)| < ε whenever |x| > M . Then (i) implies thatfor every |x| > M

|f(x)| ≤ |f(x)− g(x)|+ |g(x)| < 2ε.

The proofs of (iv) and (v) follow by change of variable.

(vi) Write (f)′(ξ) = limh→0

∫f(x) e−i(ξ+h)x−e−iξx

hdx. As ez is analytic, it

follows that there is a C > 0 s.t. |e−i(ξ+h)x − e−iξx| ≤ C|hx| whenever|hx| ≤ 1, and this is certainly true with C = 2 when |hx| > 1. Thus theintegrand is bounded by the integrable function C|xf(x)|, and hence,by Lebesgue’s dominated convergence theorem, we can pass to a limitinside the integral – which gives the desired formula.

(vii) follows by integration by parts. ¤

*********end of 35 hours********* The following example will play an important role in this chapter.

Example 7.3. The Fourier transform of e−x2/2 is e−ξ2/2. We shall givetwo proofs. Both will only give that the Fourier transform of e−x2/2 isa multiple Ce−ξ2/2. To obtain C = 1 we only need to check that thetwo functions have the same value at 0 – and this follows immediatelyfrom the standard computation that

∫∞−∞ e−x2/2dx =

√2π.

The first proof uses parts (vi) and (vii) of the lemma. The function

e−x2/2 is the unique solution (up to a multiplicative factor) of the linear

differential equation y′ + xy = 0. By lemma 7.2(vi+vii) F(e−x2/2) also

solves this equation, hence F(e−x2/2)(ξ) = Ce−ξ2/2.The second proof uses complex integration. Write

∫e−x2/2e−ixξdx = e−ξ2/2

∫e−(x+iξ)2/2dx

and we show that the integral on the right is a constant independent ofξ. This follows from Cauchy’s theorem when we consider the integralof e−z2/2 on the rectangle with vertices (±R, 0) and (±R, iξ), and thenlet R →∞.

55

7.2. The Fourier transform on L2.

The Fourier transform is defined by an explicit formula on L1. It turnsout that it can also be defined as an operator on L2. The explicitformula does not hold any more but it does hold on the dense subspaceL1 ∩ L2. (We shall, however, use the smaller dense subspace of allL2 functions with bounded support). The extension to all of L2 isthen obtained by the following elementary proposition which we quotewithout a proof.

Proposition 7.4. Let E and F be Banach spaces, and assume thatE0 ⊂ E is a dense subspace. If T0 : E0 → F is a bounded linear op-erator, then T0 admits a unique extension to a bounded linear operatorT : E → F . Moreover, if T0 is an (into) isometry, then so is T .

In order to apply the proposition we shall check that the Fouriertransform is an isometry from the dense subspace of L2 consisting offunctions with bounded support into L2. (This explains the normal-ization

√2π for the Fourier transform – it makes it an isometry.)

Proposition 7.5. If f ∈ L2 has a bounded support, then f ∈ L2 and‖f‖2 = ‖f‖2.

Proof. We may assume, wlog, that f is supported is the interval [0, 2π].To avoid confusion between the Fourier transform at the point n andthe nth Fourier coefficient f(n) of f as a function in L2(0, 2π), we shalluse the notation F(f) for the Fourier transform.

Recall that the norm ‖h‖2 of a function h ∈ L2(0, 2π) is(

12π

∫ |h|2)12

and the Fourier coefficients are h(n) = 12π

∫h(x)e−inx.

Thus the Parseval identity can be rewritten as

1

2π

∫ 2π

0

|h|2 = ‖h‖22 =

∑|h(n)|2 =

∑∣∣∣ 1

2π

∫h(x)e−inx

∣∣∣2

or

(∗)∫ 2π

0

|h|2 =∑ ∣∣∣ 1√

2π

∫h(x)e−inx

∣∣∣2

.

Fix now 0 ≤ t ≤ 1 and put ft(x) = e−itxf(x). By (*)∫ 2π

0

|f(x)|2 =

∫ 2π

0

|ft(x)|2 =∑ ∣∣∣ 1√

2π

∫ 2π

0

f(x)e−i(n+t)x∣∣∣2

=∑

n

|F(f)(n + t)|2.

56

Integrating this equation with respect to t ∈ [0, 1] gives∫ 2π

0

|f(x)|2 =∑

n

∫ 1

0

|F(f)(n + t)|2 =

∫ ∞

−∞|F(f)(ξ)|2.

¤Combining this proposition with the previous one we obtain

Corollary 7.6. There is an (into) isometry F : L2 → L2 which coin-cides with the Fourier transform on the subspace of L2 of the functionswith bounded support. (In fact it coincides with the Fourier transformon all of L1 ∩ L2.) We call F the Fourier transform on L2.

We shall prove in section 7.4 that F is actually an onto isometry, andwe shall give a formula (which also holds only on a dense subspace ofL2) for its inverse. To do this we shall show that L2 has an orthonormalbasis of eigenvectors of F , and that all the eigenvectors have absolutevalue 1.

7.3. The Hermit functions.

In this and the next section we denote by P the subspace

P =

f = p(x)e−x2/2 ∈ L2 : p is a polynomial

.

Theorem 7.7. (i) P is a dense subspace of L2

(ii) P has a unique orthonormal basis of the form hn(x) = Hn(x)e−x2/2,where Hn is an nth-degree polynomial with a positive leading coefficient.

Proof. (i) We shall show that if f ∈ L2 satisfies

(∗) 〈f, g〉 = 0

for every g ∈ P , then f = 0.Thus, assume f satisfies (*) and define a function of a complex vari-

able z by

ϕ(z) =

∫ ∞

−∞f(t)e−t2/2e−itzdt.

The function ϕ is well defined and is analytic. Indeed, differentiationunder the integral sign gives that ϕ′(z) =

∫∞−∞ f(t)e−t2/2(−it)e−itzdt.

(It is justified by the estimate∣∣∣ e−it(z+h)−e−itz

h

∣∣∣ ≤ CeC|t| and Lebesgue’s

dominated convergence theorem, as in the proof of lemma 7.2(vi).)More generally, its nth derivative is given by

ϕ(n)(z) =

∫ ∞

−∞f(t)e−t2/2(−it)ne−itzdt.

57

Taking z = 0 and using (*) we obtain that the analytic function ϕsatisfies

ϕ(n)(0) = (−i)n

∫f(t)tne−t2/2dt = 0

for every n, hence ϕ is identically 0, i.e.,∫∞−∞ f(t)e−t2/2e−itzdt = 0 for

every z. Taking z = ξ real gives that f(t)e−t2/2(ξ) = 0 for every ξ. But

the Fourier transform is an (into) isometry, hence f(t)e−t2/2 = 0 as an

element of L2, in particular f(t)e−t2/2 = 0 a.e. This implies that alsof(t) = 0 a.e., and hence f = 0 as an element of L2.

*********end of 37 hours*********(ii) To obtain the hn’s, just apply the Graham-Schmidt procedure to the

functions xne−x2/2. The explicit formulas of this procedure determinethe orthonormal system uniquely up to constants λn with |λn| = 1,which we choose so that Hn has a positive leading coefficient. ¤Definition 7.8. The polynomials Hn are called the Hermit polynomi-als, and hn are called the Hermit functions.

Lemma 7.9. The Hermit functions satisfy hn(−x) = (−1)nhn(x).

Proof. As the functions hn(−x) are also an orthonormal basis for Pwith deg Hn(−x) = n. The uniqueness of such a basis implies that weonly need to check that the sign of the leading coefficient of Hn(−x) is(−1)n, and for this we need only check that this is so when we substitute

−x for x in xne−x2/2, which is obvious. ¤7.4. Plancherel’s theorem.

Recall that T : H1 → H2 is an (into) isometry iff it preserves the innerproduct, i.e. 〈Tx, Ty〉 = 〈x, y〉 for all x, y ∈ H1.

Part (ii) of the following theorem is the Plancherel Theorem. Part(iii) is the inversion formula for the Fourier transform F . We denoteby R = R−1 the isometric reflection Rf(x) = f(−x) on L2.

Theorem 7.10. (i) The Hermit functions hn are eigenfunctions of Fwith eigenvalues (−i)n.

(ii) (Plancherel’s Theorem) F is an isometry of L2 onto itself. In

particular the Parseval identity 〈f, g〉 = 〈f , g〉 holds for all f, g ∈ L2.

(iii) F2(f) = Rf for every f ∈ L2. In particular F−1 = RF .

If f, f ∈ L1 ∩ L2 then F−1 is given by the “inversion formula”

f(x) =1√2π

∫ ∞

−∞f(ξ)eixξdξ.

58

Proof. (i) Being a linear (into) isometry F takes orthonormal systemsto orthonormal systems, thus Fhn is an orthonormal system.

By example 7.3(ii) F(e−x2/2)(ξ) = e−ξ2/2. Thus lemma 7.2(vi) im-plies that Fhn is a linear combination of derivatives up to order n ofe−ξ2/2, and hence has the form pn(x)e−x2/2 with deg(p) = n.

It follows that F maps P into itself and by the uniqueness of such abasis Fhn = λnhn for some signs λn’s, i.e., pn = λnHn.

To compute λn we only need to compute the leading coefficient of pn.To this end, it suffices to compute the leading coefficient in F(xne−x2/2)which, by Lemma 7.2(vi) again, is in times the leading coefficient in

the nth derivative of e−ξ2/2, i.e., (−1)n. Thus λn = (−1)nin = (−i)n.

(ii) F is an isometry of the dense subspace P onto itself, hence anisometry of L2 onto itself.

(iii) By part (i)

F(f) =∑

(−i)n〈f, hn〉hn

for every f ∈ L2. Since hn(−x) = (−1)nhn(x) (lemma 7.9), we obtain

F2(f)(x) =∑ (

(−i)n)2〈f, hn〉hn(x) =

∑(−1)n〈f, hn〉hn(x)

=∑

〈f, hn〉hn(−x) = Rf(x).

The inversion formula follows directly. ¤

7.5. The Radon transform.

The Fourier transform of functions of n variables is defined by

f(ξ) =1

(2π)n2

∫

Rn

f(x)e−i〈x,ξ〉dx

where x = (ξ1, . . . , ξn) , ξ = (ξ1, . . . , ξn) and dx = dx1 · · · dxn. Every-thing we did in one dimension generalizes.

As a nice application we prove that the Radon transform of functionsof two variables is one-one. This is a prototype of a number of theoremsthat form the basis for Tomography.

Definition 7.11. Let f be a measurable function on R whose restric-tion to any line L ⊂ R is integrable. The Radon transform of f is thefunction on the set of all lines in the plane defined by

R(f)(L) =

∫

L

f.

Theorem 7.12 (Radon). The Radon transform is one-one on all theL2 functions with compact support.

59

Proof. We need to show that if f ∈ L2 has a bounded support andsatisfies

∫L

f = 0 for every line L, i.e.,∫∞−∞ f(x, ax+ b)dx = 0 for every

a and b, then f = 0.Fix α and compute 2πf(−αξ, ξ) =

∫∫f(x, y)e−i(−αxξ+yξ)dxdy. Sub-

stitute x = x and t = −αx + y. Then the Jacobian is 1, and hence theintegral is

∫∫f(x, αx + t)e−iξtdxdt. But e−iξt is independent of x, and

thus the inner integral vanishes by our assumption.It follows that f ≡ 0, and thus f = 0 as an element in L2. ¤A similar theorem holds for any L1 function. All we need is the

theorem (which we did not prove) that the Fourier transform is one-one on L1.

It is a very important practical problem to obtain estimates on fwhen we only know that its integral vanishes on some fixed finite setof lines (where we may, in addition, make some assumptions on f , forexample that it is continuous or even smooth).

*********end of 39 hours*********

FA09-5

Documents

xn yn

nite dimensional

nite rank

leading coecient

nite dimensional

hand side

innite dimensional

orthonormal