Dvoretzky’s Theorem and Concentration of Measure · Bodies and Banach Space Geometry as well as on Yehoram Gordon’s unpublished manuscript \Applications of the Gaussian Min-Max

$Page 1: Dvoretzky’s Theorem and Concentration of Measure · Bodies and Banach Space Geometry as well as on Yehoram Gordon’s unpublished manuscript \Applications of the Gaussian Min-Max$
Dvoretzky’s Theorem and Concentration of Measure

David Miyamoto

November 20, 2016

Version 5

Contents

1 Introduction 1

1.1 Convex Geometry Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Motivation and the Gaussian Reformulation . . . . . . . . . . . . . . . . . . . . 3

1.3 The Full Dvoretzky Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Discretization Lemmas 10

2.1 Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Lemma 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 The Gaussian Reformulation 14

4 The Extended Dvoretzky-Rogers Lemma 16

4.1 Lewis’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Lemma 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.3 The Dvoretzky-Rogers Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4 Talagrand’s Proposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.5 The Extended Dvoretzky-Rogers Lemma . . . . . . . . . . . . . . . . . . . . . . 25

1

5 Dvoretzky’s Theorem 27

5.1 Lemma 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Dvoretzky’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Alternate Proof With Min-Max Theorem 30

6.1 Gaussian Min-Max Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2 Gordon’s Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.3 Dvoretzky’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A Appendix 33

A.1 Estimating the Expectation of a Gaussian Vector . . . . . . . . . . . . . . . . . 33

A.2 Lemma 3: The Infinite Dimensional Case . . . . . . . . . . . . . . . . . . . . . . 35

A.3 Estimating δ(ε) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

A.4 Proving the Gauge Gives a Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1 Introduction

Dvoretzky’s Theorem is a result in convex geometry first proved in 1961 by Aryeh Dvoretzky.

In informal terms, the theorem states that every compact, symmetric, convex (CSC) subset

of RN has a relatively high-dimensional slice that resembles an ellipsoid. For some intuition

on the geometry involved, one could imagine 2-dimensional slices of a 3-dimensional cube.

While the squares and rectangles are rather far from a 2-dimensional ellipsoid (a disk), the

hexagon is somewhat closer. One could imagine that in higher dimensional cubes, we could

do even better than a hexagon. Note that the 1-dimensional slice is a line segment, which

is exactly an ellipsoid in 1 dimension. As this is the case for any 1-dimensional slice of any

CSC body, we see that the “relatively high-dimensional” part of Dvoretzky’s theorem is what

keeps it from being a triviality.

The body of these notes relies heavily on Gilles Pisier’s book The Volume of Convex

2

Bodies and Banach Space Geometry as well as on Yehoram Gordon’s unpublished manuscript

“Applications of the Gaussian Min-Max theorem”.

1.1 Convex Geometry Basics

We will now formalize the language used above in order to introduce the two versions of

Dvoretzky’s theorem we will prove. Let (E, ∥ ⋅ ∥E) and (F, ∥ ⋅ ∥F ) be two N -dimensional

Banach spaces (i.e. complete normed vector spaces). Since any two vectors spaces of the same

dimension are isomorphic (in the sense that there exists an invertible linear map between

them), we can define the Banach-Mazur distance between F and E as:

d(E,F ) = inf∥T ∥∥T −1∥ ∣ T ∶ E → F, T is linear and invertible. (1)

Note that ∥ ⋅ ∥ denotes the appropriate operator norm. For instance,

∥T ∥ = sup∥Tα∥F ∣ α ∈ E, ∥α∥E = 1. (2)

Now d(E,F ) is at least 1 since 1 = ∥TT −1∥ ≤ ∥T ∥∥T −1∥. So d(E,F ) = 1 when E and F

are isometric. Then we may say E and F are close, or (1 + ε)-isomorphic, if d(E,F ) ≤ 1 + ε

for some small ε. An invertible map T ∶ E → F such that ∥T ∥∥T −1∥ ≤ 1+ ε may be viewed as

almost an isometry.

It will later be useful to develop an equivalent characterization of the Banach-Mazur

distance. Suppose that x1, . . . , xN is a basis of E and y1, . . . , yN is a basis of F . Let T denote

the isomorphism xk ↦ yk. Then d(E,F ) ≤ 1 + ε if for all α = ∑αixi ∈ E:

3

1√1 + ε

⋅ ∥α∥E ≤ ∥Tα∥F = ∥∑αiyi∥F ≤√

1 + ε ⋅ ∥α∥E. (3)

To verify this, first take any ∥α∥E = 1. Then ∥Tα∥F ≤√

1 + ε, so ∥T ∥ ≤√

1 + ε. Now

take any ∥β∥F = 1, then (√

1 + ε)−1 ⋅ ∥T −1β∥E ≤ ∥TT −1β∥F = 1, so ∥T −1β∥E ≤√

1 + ε. Thus

∥T −1∥ ≤√

1 + ε. Hence ∥T ∥∥T −1∥ ≤ 1 + ε, and E is indeed (1 + ε)-isomorphic to F . This is an

especially useful characterization of the Banach-Mazur distance, since (3) is much easier to

check than our first definition.

As we now have a notion of distance between Banach spaces, it makes sense to draw

a parallel between Banach spaces and convex bodies. More precisely, we will describe the

bijection - up to isomorphism - between Banach spaces and CSC bodies in RN . Let (E, ∥ ⋅∥E)

be an N -dimensional Banach space. Then its unit ball v ∈ E ∣ ∥v∥E ≤ 1 can be identified

with a CSC body BE in RN via any linear isomorphism u ∶ E → RN . Conversely, take any

CSC body K in RN . Define the gauge ∥ ⋅ ∥K ∶ RN → [0,∞) of K by

∥x∥K = infr > 0 ∣ x ∈ rK. (4)

It is shown in section (A.4) that the gauge of K is a norm on RN , making (RN , ∥ ⋅ ∥K)

a Banach space (setting aside the question of completeness). Thus, every Banach space E

admits a CSC body BE in RN , and vice versa. Therefore, any statement about CSC bodies

in RN can be interpreted as a statement about Banach spaces. With this correspondence,

we present and interpret a version of Dvoretzky’s theorem proved by Vitali Milman in 1971.

1.2 Motivation and the Gaussian Reformulation

Theorem (Dvoretzky’s Theorem). For each ε > 0 there is a number η(ε) > 0 with the

following property. Let (E, ∥ ⋅ ∥) be an N-dimensional Banach space. Then E contains a

4

subspace F of dimension n = [η(ε) logN] such that d(F, `n2) ≤ 1 + ε.1

The spaces E and F correspond to a CSC body BE in RN and a slice of this body through

the origin, respectively. The slice F has dimension of order logN , and d(F, `n2) ≤ 1+ ε means

the slice F is close to an ellipsoid. The parameter ε therefore dictates how close F is to an

ellipsoid. What follows is an outline of the proof of this theorem using the concentration of

measure phenomenon. To build motivation, we work backwards.

To prove Dvoretzky’s theorem, we will show F can be found by choosing random sub-

spaces of dimension n. To generate these random subspaces, take a linearly independent

collection of 1 ≤ m ≤ N vectors zkmk=1 in E. The magnitude of m determines an up-

per bound for the dimension of the subspace F which we will ultimately create from the

zk. Although the work immediately following is valid for any choice of zk and m, to prove

Dvoretzky’s theorem in full we will later need to choose these parameters in a special way.

For instance, we will require m to be of order√N . But for now, the zk and m remain

arbitrary.

Fix some 1 ≤ n ≤m. Let gi,kni=1mk=1 be independant standard Gaussian random variables

on some probability space (Ω,F ,P). Define Xi = ∑mk=1 gi,kzk, giving a random vector in E.

Notice that E ∥Xi∥ = E ∥Xj∥ for all i and j, so we will denote X1 by X and M = E ∥X∥.2 Since

choosing the basis vectors of F in the unit ball of E makes the analysis cleaner, let Yi = XiM ;

this gives E ∥Yi∥ = 1, so we expect Yi to have unit length. Take F = spanY1, . . . , Yn, which

is by construction a random n-dimensional subspace of E whose basis vectors are expected

to be in the unit ball of E.

Let ε > 0. Our ultimate goal is to show there exists some number η(ε) such that for any

n ≤ [η(ε) logN],1`n2 denotes the Banach space (Rn, ∥ ⋅ ∥2), which is Rn equipped with the usual Euclidean norm. Square

brackets denote the floor function.2Here E denotes the expectation.

5

∃ω ∈ Ω s.t. d(F (ω), `n2) ≤ 1 + ε (⋆)

This result is a bit beyond our reach at this moment. Instead, we will determine weaker

conditions on n that gaurentee (⋆) holds. Recalling (3) from earlier in this introduction,

suppose n is such that for all α ∈ `n2 ,

1√1 + ε

⋅ ∥α∥2 ≤ ∥∑αiYi(ω)∥ ≤√

1 + ε ⋅ ∥α∥2. (5)

Then by the discussion surrounding (3), it follows that (⋆) holds. Now, verifying (5)

directly is challenging. However, we have a result, named Lemma 2 and proven in (2.2), that

allows us to discretize the situation. Lemma 2 states that there exists some δ(ε) = δ > 0 with

the following property. If for some δ-net3 A of the unit sphere in `n2 (denoted Sn−1), we have

∀α ∈ A, ∣∥∑αiYi(ω)∥ − 1∣ ≤ δ, (6)

then (5) holds, as desired. Take this δ. Our goal has now become to show that (14) holds

for some δ-net A of Sn−1 and ω ∈ Ω. But for any δ-net A, an ω satisfying (6) exists if

P (∀α ∈ A, ∣∥∑αiYi∥ − 1∣ ≤ δ) > 0 or equivalently (7)

P (∃α ∈ A s.t. ∣∥∑αiYi∥ − 1∣ > δ) < 1. (8)

Therefore we want to choose n so that a δ-net A of Sn−1 satisfying (8) exists. Take a

3Definition of δ-net: given a set X ⊆ `n2 and δ > 0, a subset A ⊆ X is a δ-net of X provided that for allpoints x ∈X, there exists some a ∈ A such that ∥x − a∥2 ≤ δ.

6

moment here to convince oneself that finding such an n will ensure that we have satisfied

our ultimate goal, namely shown that (⋆) holds. Now, by using subadditivity and the fact

Yi = XiE ∥X∥

, we see that the probability in (8) is bounded above by

∣A∣ ⋅ P (∣∥∑αiXi∥ −E ∥X∥∣ > δE ∥X∥) , for any α ∈ A. (9)

We want to show this quantity can be forced below 1 by the proper choice of n. It is

this desire to bound the probability in (9) that brings us to the concentration of measure

phenomenon. Recall that since X = ∑mk=1 g1,kzk and the zk are linearly independant, the third

example of the concentration of measure phenomenon in the online notes gives that

P(∣∥X∥ −E ∥X∥∣ ≥ δE ∥X∥) ≤ 2 exp(−δ2d(X)

4) .4 (10)

The probabilities in (10) and (9) appear quite similar; in fact, it turns out they are equal.

Therefore we can use the bound in (10) to force the probability in (9) to be small. First,

however, we will prove that these probabilities are equal; this is done by showing X and

∑αiXi have the same distribution. By definition of the Xi, we have

∑αiXi =n

∑i=1

αi (m

∑k=1

gi,kzk) =m

∑k=1

(n

∑i=1

αigi,k) zk. (11)

Now, ∑ni=1αigi,k is a random variable distributed normally with mean 0 and variance

∑ni=1α

2i . But since α ∈ A, and A is a δ-net of Sn−1, we know ∥α∥2 = 1. Therefore ∑n

i=1αigi,k ∼

N(0,1). So we see that X and ∑αiXi are sums of the form ∑mk=1 gkzk, where the gk ∼ N(0,1)

and are independant. We may then conclude X and ∑αiXi the same distribution.

Continuing from earlier, we can now use (10) to say that

4Notation d(X) = E ∥X∥σ(X)

, where σ(X) = sup(ζ(z1)2+ . . . + ζ(zm)

2)

12 ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1

7

P (∣∥∑αiXi∥ −E ∥X∥∣ > δE ∥X∥) ≤ 2 exp(−δ2d(X)

4) . (12)

This means we finally have an upper bound for the quantity in (9), namely

2exp(log ∣A∣ − δ2d(X)

4) . (13)

To recap, we have that our ultimate goal (⋆) holds if there exists n such that we can

find a δ-net A of Sn−1 which makes (13) strictly less than 1. As an aside, notice here that

(13) can also be made small by controlling d(X). This is achieved through manipulating the

choice of the zk. In fact, we will perform such manipulations later, for instance in section

(1.3). For now, we will restrict ourselves to controlling n.

Here we introduce a result, named Lemma 1 and proven in section (2.1), which states

that given n and δ > 0, there is a δ-net A of Sn−1 of size f(δ, n) = (1 + 2δ)n. Therefore, if we

can find some n (remember δ is fixed by ε by Lemma 2) such that

2 exp(log f(δ, n) − δ2d(X)

4) < 1. (14)

then the δ-net A from Lemma 1 will make (13) less than 1. In particular, notice that if

n ≤ δ2

8 log(1+ 2δ)d(X), then the right side of (14) is less than 2 exp(− δ

2d(X)

8 ). It can be shown

this quantity is less 1 in the proper circumstances. Therefore, if we set η1(ε) = δ2

8 log(1+ 2δ), then

any n ≤ [η1(ε)d(X)] satisfies (14). Following our chain of reasoning, this shows (⋆) holds,

and we can conclude there exists a subspace F of E that is (1+ ε)-isomorphic to `n2 . We call

this intermediate result the Gaussian reformulation of Dvoretzky’s theorem. It is similar to

Dvoretzky’s theorem, but gives a different upper bound for n that depends on both ε and

8

d(X). To see the detailed proof of this theorem, see section (3).

1.3 The Full Dvoretzky Theorem

We are now in a position to establish Dvoretzky’s theorem in its entirety. As stated above,

our Gaussian reformulation result gives n as a function of ε and d(X), whereas Dvoretzky’s

theorem gives n as a function of ε and logN . It is therefore natural to assume Dvoretzky’s

theorem holds as a consequence of its Gaussian reformulation because we can show that for

some universal constant C,

C logN ≤ d(X). (15)

Unfortunately, the existance of C depends on the construction X. As alluded to earlier,

this is the point where we need to control the zk (and therefore m). Our goal is to ensure

d(X) remains large, and then to conclude (15) must hold. It turns out we can extract the

necessary zk and m from an extension of the Dvoretzky-Rogers Lemma, which is stated

below.

Lemma (Extended Dvoretzky-Rogers Lemma). There are N = [√

[N2]

16 ] linearly independent

elements zk of E such that for all α ∈ RN ,

1

2sup ∣αk∣ ≤ ∥∑αkzk∥ ≤ 2 (∑ ∣αk∣2)

12 (16)

Proving this lemma is challenging, and requires its own motivation and preliminary re-

sults. Its proof is also very geometric; the concentration of measure phenomenon has already

exhausted its use. To see the proof and discussion of this lemma, please refer to section (4).

For now, we will show how this lemma completes the proof of Dvoretzky’s theorem.

Choose m = N and the zk from the Extended Dvoretzky-Rogers Lemma. We show

d(X) = E ∥X∥

σ(X)is relatively large. To this end, first we prove σ(X) ≤ c for some constant c.

9

Then all that will remain is to ensure E ∥X∥ is large. By our choice of zk and m, we get c = 2

in the following way:

σ(X) = sup(ζ(z1)2 +⋯ + ζ(zm)2)12 ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1 (17)

= sup∥(ζ(z1), . . . , ζ(zm))∥2∣ ∥ζ∥ ≤ 1 (18)

= sup supm

∑k=1

αkζ(zk) ∣ ∥α∥2 ≤ 1 ∣ ∥ζ∥ ≤ 1 (19)

= supm

∑k=1

αkζ(zk) ∣ ∥α∥2 ≤ 1, ∥ζ∥ ≤ 1. (20)

Now ∑αkζ(zk) = ζ (∑αkzk), which is bounded above by ∥ζ∥ ⋅ ∥∑αkzk∥. As ∥ζ∥ ≤ 1 by

necessity, and ∥∑αkzk∥ ≤ 2 (∑ ∣αk∣2)12 ≤ 2 by the choice of zk and since ∥α∥2 ≤ 1 by necessity,

we can conclude σ(X) ≤ 2. With this bound completed, we turn our attention to ensuring

E ∥X∥ is sufficiently large.

Since σ(X) ≤ 2, the inequality C logN ≤ d(X) follows from proving there exists some

constant c′ such that c′√

logN ≤ E ∥X∥. To get bound, first notice that by the choice of the

zk,

1

2E sup ∣g1,k∣ ≤ E ∥∑ g1,kzk∥. (21)

Then, by a technical result named Lemma 4, proven in section (5.1), we have that there

exists some constant c such that c√

logN ≤ E sup ∣g1,k∣. Taking c′ = c2 and C = (c′)2, we

have then shown that C logN ≤ d(X). By earlier remarks, we conclude we have proven

Dvoretzky’s theorem.

The second version of Dvoretzky’s theorem we will prove was established by Yehoram

Gordan in 1985. This version is identical to the first, except it gives η ∼ ε2 and d(F, `n2) ≤ 1+ε1−ε .

The proof uses the Gaussian Minmax theorem and an intermediate result, as well as the

10

Extended Dvoretzky-Rogers Lemma. See section (6) for the details of this proof.

2 Discretization Lemmas

The introduction has how we will prove Milman’s version of Dvoretzky’s theorem. Herein

we supply the details. We begin with Lemma 1 and Lemma 2, the two discretization lem-

mas. Note that although we use the Euclidean norm ∥ ⋅ ∥2 throughout this section, this is

unnecessarily restrictive; an arbitrary norm on Rn would also suffice.

2.1 Lemma 1

Lemma 1. Denote the unit ball and sphere of `n2 by Bn and Sn−1, respectively. For any

δ > 0, there is a δ-net A of Sn−1 such that

∣A∣ ≤ (1 + 2

δ)n

. (22)

Proof. Let A = yimi=1 be a maximal subset of Sn−1 such that ∥yi − yj∥2 ≥ δ for all i ≠ j.5 By

maximality, any other p ∈ Sn−1 ∖A is a distance less than δ from some yi. Therefore A is a

δ-net of Sn−1. We show A satisfies the theorem.

The balls Bi of radius δ2 centred at yi are pairwise disjoint and contained in the ball

B0 of radius 1 + δ2 centred at the origin. Further, we have vol(Bi) = ( δ

2)n vol(Bn) and

vol(B0) = (1 + δ2)n vol(Bn). Therefore

m ⋅ (δ2)n

vol(Bn) =m

∑i=1

vol(Bi) ≤ vol(B0) = (1 + δ2)n

vol(Bn). (23)

Dividing by ( δ2)n vol(Bn) and recalling that ∣A∣ =m completes the proof.

5In this case, maximal means no set strictly containing A satisfies the condition that its points are allseperated by a distance at least δ.

11

2.2 Lemma 2

Lemma 2. For each ε > 0 there exists 0 < δ(ε) = δ < 1 with the following property. Let n

be any natural number. Let A be a δ-net of Sn−1 and let x1, . . . , xn be elements of a Banach

space (B, ∥ ⋅ ∥). If for all α ∈ A we have

1 − δ ≤ ∥n

∑i=1

αixi∥ ≤ 1 + δ, (24)

then for all α ∈ Rn,

1√1 + ε

⋅ ∣∣α∣∣2 ≤ ∥n

∑i=1

αixi∥ ≤√

1 + ε ⋅ ∣∣α∣∣2. (25)

In particular, assuming the xi are linearly independant, the subspace generated by x1, . . . , xn

is (1 + ε)-isomorphic to `n2 .

To develop some intuition, suppose the xi are a basis of B. Then T ∶ Rn → B defined

by T (α) = ∑αixi is a linear isomorphism. The lemma states that if T almost preserves the

norm of elements of A, then T is almost an isometry. Thus, if we want to T to be almost an

isometry, all we need is to ensure is that T almost preserves the lengths of elements of some

δ-net. This will prove especially useful if A is finite, such as the net from Lemma 1, since

then we can show two spaces are (1+ ε) isomorphic by considering a finite number of points.

To prove the lemma, we first show that for any 0 < δ < 1 and α in Sn−1, we can bound

∥∑αixi∥ above and below by functions of δ. Taking δ small enough will then give (√

1 + ε)−1 ≤

∥∑αixi∥ ≤√

1 + ε. Homogeniety of the norm ∥ ⋅ ∥ will allow us to capture the result for all

points in Rn.

Proof. Fix 0 < δ < 1 and assume (24) holds for some A and x1, . . . , xn ∈ B. Take α ∈ Sn−1.

Since A is a δ-net of Sn−1, there exists y0 ∈ A such that ∣∣α − y0∣∣2 = λ1 ≤ δ. Hence we can

write α = y0 + λ1α′, where λ1 ≤ δ and α′ = α−y0

λ1∈ Sn−1. As α′ ∈ Sn−1, we can similarly write

α′ = y1 + λ2α′′ where y1 ∈ A, λ2 ≤ δ, and α′′ ∈ S. Continuing, we get

12

α = y0 + λ1(y1 + λ2(y2 + λ3(. . .))) (26)

where each yj ∈ A and λj ≤ δ. Relabelling λ1⋯λj by λj, we get α = ∑∞j=0 λjy

j with yj ∈ A

and λj ≤ δj (we take λ0 = 1). We compute

∥∑αixi∥ = ∥n

∑i=1

(∞

∑j=0

λjyji)xi∥ (27)

= ∥∞

∑j=0

(λjn

∑i=1

yjixi)∥ (28)

≤∞

∑j=0

λj ∥n

∑i=1

yjixi∥ by triangle inequality (29)

≤∞

∑j=0

δj ∥n

∑i=1

yjixi∥ since λj ≤ δj. (30)

Now ∥∑ni=1 y

jixi∥ ≤ 1 + δ for each j, since yj ∈ A. Therefore summing the geometric series

yields

∥∑αixi∥ ≤∞

∑j=0

δj ∥n

∑i=1

yjixi∥ ≤∞

∑j=0

δj(1 + δ) = 1 + δ1 − δ

. (31)

This gives the upper bound for ∥∑αixi∥. For the lower bound, we pick up from (28),

only this time we split the series

13

∥∑αixi∥ = ∥∞

∑j=0

(λjn

∑i=1

yjixi)∥ from (28) (32)

= ∥n

∑i=1

y0i xi − (−1)

∞

∑j=1

(λjn

∑i=1

yjixi)∥ recall λ0 = 1 (33)

≥ ∥n

∑i=1

y0i xi∥ − ∥

∞

∑j=1

(λjn

∑i=1

yjixi)∥ reverse triangle inequality (34)

≥ ∥n

∑i=1

y0i xi∥ −

∞

∑j=1

δj ∥n

∑i=1

yjixi∥ since λj ≤ δj and triangle inequality. (35)

Since y0 ∈ A, having (24) implies 1 − δ ≤ ∥∑ y0i xi∥. Further, by the same argument used

(31), we can write

∞

∑j=1

δj ∥n

∑i=1

yjixi∥ ≤∞

∑j=1

δj(1 + δ) = δ(1 + δ)1 − δ

. (36)

Putting everything together, we get the upper bound for ∥∑αixi∥ to be

∥∑αixi∥ ≥ 1 − δ − δ(1 + δ)1 − δ

= 1 − 3δ

1 − δ. (37)

Combining the bounds in (31) and (37), and recalling α ∈ Sn−1 was arbitrary, we have

for all α ∈ Sn−1,

1 − 3δ

1 − δ≤ ∥∑αixi∥ ≤

1 + δ1 − δ

. (38)

Now assume ε > 0. Take δ small so that (√

1 + ε)−1 ≤ 1−3δ1−δ and 1+δ

1−δ ≤√

1 + ε. Then (38)

becomes

1√1 + ε

≤ ∥∑αixi∥ ≤√

1 + ε. (39)

Hence we have shown (39) holds for all α ∈ Sn−1. This completes the first part of the

14

proof. Now take any α ∈ Rn. Homogeniety of ∥ ⋅ ∥ implies

∥∑αixi∥ = ∣∣α∣∣2 ⋅ ∥∑αi

∣∣α∣∣2xi∥ . (40)

Since ( α∣∣α∣∣2

) ∈ Rn is a unit vector, the result in (39) gives

1√1 + ε

⋅ ∣∣α∣∣2 ≤ ∥∑αixi∥ ≤√

1 + ε ⋅ ∣∣α∣∣2. (41)

This completes the proof.

3 The Gaussian Reformulation

We now state and prove the Gaussian reformulation of Dvoretzky’s theorem. An extensive

introduction of this proof was given in the section (1.2). This part is therefore devoted to

rigorously stating and proving the Gaussian reformulation, foregoing lengthy motivational

discussion.

Definition. Consider a collection of m independent standard Gaussian variables gk on a

probability space (Ω,F ,P) and a collection zkmk=1 from Banach space (E, ∥ ⋅ ∥). Let X =

∑mk=1 gkzk. We define

σ(X) = sup(∑ ζ(zk)2)12 ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1 and d(X) = (E ∥X∥

σ(X))

2

. (42)

Also recall the concentration of measure phenomenon. In the setting of the definition,

we have for all δ > 0,

15

P(∣∥X∥ −E ∥X∥∣ ≥ δE ∥X∥) ≤ 2 exp(−δ2d(X)

4) . (43)

Theorem (Gaussian reformulation). For every ε > 0, there exists a number η1(ε) with the

following property. Let m ≤ N , take m independant standard Gaussian random vairables gk,

and let zkmk=1 be a collection of elements of an N-dimensional Banach space (E, ∥ ⋅ ∥). Let

X = ∑ gkzk. Then for each n ≤ [η1(ε)d(X)], the space E contains a subspace F of dimension

n which is (1 + ε)-isomorphic to `n2 .

Proof. Fix ε > 0. Take δ > 0 to be the smaller of 1 and δ(ε) obtained from Lemma 2. We

claim η1(ε) = δ2

8 log(1+ 2δ). Take any n ≤ [η1(ε)d(X)].

Let A be the δ-net of Sn−1 of cardinality at most (1 + 2δ)n, as given by Lemma 1. Set

gi,kni=1mk=1 to be a collection of independant standard Gaussians, and defeine Xi = ∑m

k=1 gi,kzk.

Then, for any α ∈ A, both ∑ni=1αiXi and X have the same distribution, as proven in section

(1.2). Therefore, denoting E ∥X∥ =M , our concentration inequality allows

P(∣∥∑αiXi∥ −M ∣∣ ≥ δM) ≤ 2 exp(−δ2d(X)

4) and therefore (44)

P(∃α ∈ A s.t. ∣∥∑αiXi∥M−1 − 1∣ ≥ δ) ≤ 2 exp(n log (1 + 2

δ) − δ

2d(X)4

) . (45)

The second inequality follows from the bound on ∣A∣ and finite subadditivity of P. Since

the exponential is an increasing function, we can replace n in the right side of (45) with the

larger ε1(ε)d(X). Then we can recall the definition of ε1(ε) to conclude the right side of (45)

is bounded above by

2 exp(η1(ε)d(X) log (1 + 2

δ) − δ

2d(X)4

) = 2 exp(−δ2d(X)

8) . (46)

16

We now consider two cases.

• If the quantity in (46) is larger than or equal to 1, we must have d(X) ≤ −8 log 12

δ2 .

Rearranging and then using δ ≤ 1, we see

η1(ε)d(X) ≤ −log 1

2

log (1 + 2δ)< 1. (47)

This means that n can only taken to be 0. Since any 0 dimensional subspace F of E

is (1 + ε)-isomorphic to `02, the theorem holds.

• Otherwise, if the quantity in (46) is less than 1, we can use the fact this quantity

bounds the probability in (45) to see

P(∃α ∈ A s.t. ∣∥∑αiXi∥M−1 − 1∣ ≥ δ) < 1. (48)

In other words, there exists some ω ∈ Ω such that for all α ∈ A (denoting Yi = XiM ),

1 − δ ≤ ∥∑αiYi(ω)∥ ≤ 1 + δ. (49)

But then the vectors Yi(ω), the δ-net A, and the choice of δ satisfy the hypotheses of

Lemma 2. So by the lemma, F = spanY1(ω), . . . , Yn(ω) is a subspace of E that is

(1 + ε)-isomorphic to `n2 . Thus the theorem holds.

Therefore, we have shown that taking η1(ε) = δ2

8 log(1+ 2δ), where δ = δ(ε) is taken from

Lemma 2, satisfies the conclusion of Dvoretzky’s theorem. This completes the proof.

17

In appendix (A.3), we show that for ε ≤ 19 , we can take δ(ε) = ε

9 . Hence for sufficiently

small ε, the above result gives η1(ε) = ε2

648 log(1+ 18ε), which is of the order ε2

log(1+ 1ε).

4 The Extended Dvoretzky-Rogers Lemma

Using the notation from section (3), our goal, as outlined in (1.3), is to find X such that d(X)

is bounded below by C logN for some universal constant C. Once this bound is established,

Dvoretzky’s theorem follows from its Gaussian reformulation. Recall X is constructed from

our choice of zkmk=1 from E. The appropriate zk and m will come from an extension of

a classical result known as the Dvoretzky-Rogers Lemma. The proof of the lemma relies

on Lewis’ theorem and Lemma 3, which we prove first. Lewis’ theorem in particular is of

singular interest. The extension of the lemma is presented after, and relies on a proposition

due to M. Talagrand.

4.1 Lewis’ Theorem

Here we state and prove Lewis’ theorem, a result shown by D. Lewis in 1979. It is a

generalization of a theorem of F. John concerning ellipsoids of maximal volume in CSC

bodies. First, note Lewis’ theorem deals with an arbitrary norm α on L(Rn,E), the space

of linear maps Rn → E. For our later purposes we will take α to be the usual operator norm.

Regardless, we will reproduce Lewis’ result in full generality.

Definition. Let U and V be vector spaces, and α be a norm on L(U,V ). Then the dual

norm α∗ is a norm on L(V,U) defined by

α∗(v) = suptr(vT ) ∣ T ∶ U → V,α(T ) ≤ 1. (50)

Theorem (Lewis’ Theorem). Let E be an n-dimensional vector space and let α be a norm

on L(Rn,E). There exists an isomorphism u ∶ Rn → E such that

18

α(u) = 1 and α∗(u−1) = n. (51)

We interpret this theorem. An ellipsoid in a Banach space (E, ∥ ⋅ ∥) is the image of

the unit ball Bn of `n2 under an isomorphism u ∶ Rn → E; thus we identify ellipsoids with

isomorphisms. Note this identification is not necessarily unique, but its existance is sufficient

for our purposes. If this isomorphism u has operator norm 1, then the ellipsoid u(Bn) is

contained inside BE, the unit ball of E. If we give E some basis, then we can further say

vol(u(Bn)) = c∣detu∣ for some constant c. Hence, maximizing det over the unit ball in

L(Rn,E) (equipped with the operator norm) gives an ellipsoid in BE of maximum volume.

F. John proved in 1948 that this ellipsoid is unique up to an orthogonal transformation.

Hence, ellipsoids of this type are called John ellipsoids.

Lewis’ theorem is an enumeration of some properties of the John ellipsoid of BE, should

we take α to be the operator norm. In particular, the u in Lewis’ theorem is the John

ellipsoid, and (51) helps prove the uniqueness of u should we desire to do so. The proof of

Lewis’ theorem is an exercise in linear algebra, and begins by maximizing det over the unit

ball of L(Rn,E) equipped with an arbitrary norm.

Proof. Let K = T ∶ Rn → E ∣ α(T ) ≤ 1, the unit ball in L(Rn,E) equipped with the

norm α. Since L(Rn,E) is finite dimensional, K is compact. Consider the determinant map

T ↦ det(T ) for some given basis of E. Since det is continuous and K is compact, det attains

its maximum over K. Say this occurs at u ∈K. We must have det(u) ≠ 0, so u is invertible.

We show u satisfies (51) by bounding tr(u−1T ) for any T ∶ Rn → E.

Take T , and note u+Tα(u+T )

is in K. Hence

19

∣det( u + Tα(u + T )

)∣ ≤ ∣det(u)∣ by choice of u (52)

∣det(u + T )∣ ≤ α(u + T )n∣det(u)∣ by homogeniety (53)

∣det(u + T )det(u)

∣ ≤ α(u + T )n since det(u) ≠ 0 (54)

∣det(1 + u−1T )∣ ≤ α(u + T )n by multiplicativity. (55)

The triangle inequality for α implies α(u + T )n ≤ (α(u) + α(T ))n, which is at most

(1+α(T ))n since u is in K. So inequality (55) gives ∣det(1+εu−1T )∣ ≤ (1+εα(T ))n, where ε > 0

can be added since T is arbitrary. For small ε, we have det(1+ εu−1T ) = 1+ εtr(u−1T )+ o(ε),

and thus for all T ∶ Rn → E

1 + εtr(u−1T ) + o(ε) ≤ (1 + εα(T ))n (56)

tr(u−1T ) + o(ε)ε

≤ (1 + εα(T ))n − 1

ε(57)

tr(u−1T ) ≤ nα(T ) letting ε→ 0. (58)

Inequality (58) holds in particular for T = u, giving n = tr(u−1u) ≤ nα(u) and thus

1 ≤ α(u). As u ∈ K we get 1 = α(u). Now (58) further gives tr(u−1T ) ≤ n for all T , so

α∗(u−1) ≤ n. But n = tr(u−1u) ≤ α∗(u−1) is clear, and we conclude α∗(u−1) = n. Thus u

satisfies (51), and the proof is complete.

4.2 Lemma 3

This lemma is a technical exercise in linear algebra, and of marginal interest in its own right.

Although there is a version of the lemma that holds in infinite dimensional spaces, given in

20

appendix (A.2), we only use the version for finite dimensional spaces.

Definition. Let T ∶ U → V be a linear map between Banach spaces. Define Kn = ∥T − S∥ ∣

S ∶ U → V, rank(S) < n, where ∥ ⋅ ∥ is the usual operator norm. Then define an(T ) = infKn.

Note here that K1 = ∥T ∥, and thus a1(T ) = ∥T ∥, since the only S ∶ U → V of rank less than

1 is the zero map.

Lemma 3. Let T ∶H →X be a linear operator from a Hilbert space (H, ∣ ⋅ ∣) of dimension N

to a Banach space (X, ∥ ⋅ ∥). There exists an orthonormal sequence fnNn=1 in H such that

∥Tfn∥ ≥ an(T ) − ε for any n.

Proof. We construct the sequence fnNn=1 recursively. By definition a1(T ) = ∥T ∥ = sup∥Tv∥ ∣

∣v∣ = 1. Now, ∥ ⋅ ∥ is continuous, and since H is finite dimensional we have that T is con-

tinuous and the set v ∈ H ∣ ∣v∣ = 1 is compact. Therefore, ∥Tv∥ ∣ ∣v∣ = 1 is compact, and

hence there exists some f1 ∈H with ∣f1∣ = 1 such that a1(T ) = ∥Tf1∥. Take f1 to be the first

vector in our sequence.

To construct the next vector f2, let S1 = (spanf1)⊥. Then define S ∶H →X by

Sv =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

Tv if v ∈ spanf1

0 if v ∈ S1.

(59)

Clearly S is linear. We also have rank(S) < 2 since S is potentially nonzero only on a

subspace of dimension 1. Finally, (T −S)v = T ∣S1v for all v ∈ S1, and is 0 otherwise. It follows

that ∥T ∣S1∥ = ∥T − S∥, and hence ∥T ∣S1∥ is in K2. By definition a2(T ) we therefore get

∥T ∣S1∥ ≥ a2(T ). (60)

But, by similar continuity and compactness arguments used above, there must exists

some f2 ∈ S1 such that ∥T ∣S1f2∥ = ∥T ∣S1∥ and ∣f2∣ = 1. Hence by inequality (60) and the

21

definition of restriction of a map we conclude ∥Tf1∥ ≥ a2(T ), as desired. Take f2 to be the

second vector in our sequence. Repeating this procedure generates an orthonormal list of

vectors fnNn=1 in H with the desired properties, proving the lemma.

4.3 The Dvoretzky-Rogers Lemma

We have now have the tools to prove the Dvoretzky-Rogers Lemma. We begin, however,

with motivation for the proof, as well as a description regarding the contributions of Lewis’

theorem and Lemma 3.

Lemma (Dvoretzky-Rogers Lemma). Let (E, ∥ ⋅ ∥) be an N-dimensional Banach space. Let

N = [N2]. Then there are N linearly independent elements zk in E such that for all α ∈ RN

∥N

∑K=1

αkzk∥ ≤ (N

∑k=1

∣αk∣2)12

, (61)

and ∥zk∥ ≥ 12 for all k.

First we discuss the geometry of the lemma. Consider the zk from the lemma. Denote

F = spanz1, . . . , zN and let T be the usual isomorphism RN → F defined by ek ↦ zk. Then,

the first part of the lemma states that ∥Tα∥ ≤ ∥α∥2. In other words, T (BN) ⊆ BF . The

second part simply means that T does not shrink the basis vectors ek too much. Therefore

the Dvoretzky-Rogers Lemma states there exists an N -dimensional subspace F of E such

that BF contains the image of BN in such a way so that the images of the basis vectors ek

remain relatively large. Of course, if we can find such an F (in lieu of the zk), we would

expect some basis of F to satisfy the lemma.

Now we motivate the proof. We need to find an N -dimensional subspace F of E where

the usual isomorphism T ∶ RN → F shrinks the unit ball BN but still keeps the basis

vectors ek as big as possible. To find F , we can instead find a map T ′ ∶ RN → E that,

when restricted to an isomorphism from RN , behaves like the desired T . We may then

take F = spanT ′e1, . . . , T ′eN). A good candidate for T ′ is some John ellipsoid v of BE,

22

since v shrinks BN into BE in a way that maximizes its volume, thereby making it likely

v keeps the ek large. So we search for John ellipsoids v of BE such that we can take

F = spanve1, . . . , veN. In other words, we want v where taking zk = vkek satisfies both (61)

and ∥zk∥ ≥ 12 for 1 ≤ k ≤ N .

The desired John ellipsoid v is in fact be given by ek ↦ ufk = zk, where u ∶ `N2 → E is the

John ellipsoid in Lewis’ theorem (where α is the operator norm) and the fk are from Lemma

3. Below, we verify that this choice of zk indeed satisfies the lemma.

Proof. Take α to be the operator norm on L(`N2 ,E). By Lewis’ theorem, there exists a John

ellipsoid u ∶ `N2 → E such that α(u) = 1 and α∗(u−1) = N . Then by Lemma 3, noting `N2 is

finite dimensional, we can find an orthonormal collection of vectors fkNk=1 in `N2 such that

∥ufk∥ ≥ ak(u) for all k ≤ N . We now show that ak(u) ≥ 1 − kN .

Fix k and let P be an orthogonal projection on `N2 with rank less than k. Then

N − k ≤ rank(I − P ) = tr(I − P ) since P is idempotent (62)

= tr(u−1u(I − P )) (63)

≤ α∗(u−1)∥u(I − P )∥ by definition of α∗ (64)

= N∥u − uP ∥ since α∗(u−1) = N. (65)

Rearranging gives ∥u − uP ∥ ≥ 1 − kN . As rankuP < k by choice of P , we then have by

definition of ak(u) that ak(u) ≥ ∥u − uP ∥ ≥ 1 − kN , as we set out to show. So then, by choice

of fk, we have ∥ufk∥ ≥ ak(u) ≥ 1 − kN , and in particular ∥ufk∥ ≥ 1

2 for k ≤ N .

Thus, taking zk = ufk satisfies the lemma, and the proof is complete.

4.4 Talagrand’s Proposition

Comparing the Dvoretzky-Rogers Lemma we just proved with its extension used in the

introduction, we observe some similarities. Namely, for α in Euclidean space, both give ∥α∥2

23

as an upper bound to ∥αkzk∥. However, whereas the Dvoretzky-Rogers Lemma gives [N2]

vectors zk of norm larger than 12 , its extension gives [

√

[N2]

16 ] vectors zk such that half the

supremum norm of α was less than ∥∑αkzk∥. With this comparison in mind, we will show

that the vectors in the extension are simply a subset of the [N2] vectors zk from the lemma.

That such a subset exists is due to a result of M. Talagrand, named Proposition 2 in his

1995 paper “Embedding of `∞2 and a Theorem of Alon and Milman”. We adapt his proof

here.

Lemma (Talagrand’s Proposition). Suppose (E, ∥ ⋅ ∥) is a Banach space and zimi=1 is a

collection of vectors in E such that ∥zi∥ = 1 for i = 1, . . . ,m. Let

Zm = supm

∑i=1

∣ζ(zi)∣ ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1. (66)

There is a subset A of 1, . . . ,m with ∣A∣ ≥ m8Zm

such that for all collections of real

numbers αii∈A, we have

∥∑i∈A

αizi∥ ≥1

2supi∈A

∣αi∣. (67)

Proof. Let δ = 14Zm

< 1. Let δimi=1 be a collection of independent Bernouilli random variables

on some probability space (Ω,F,P) such that δi ∼ Bernouilli(δ). For any 1 ≤ i ≤m, fix ζi ∈ E∗

such that ζi(zi) = ∥ζi∥ = 1. For convenience, denote ζi(zj) by zij. By choice of ζi, we see

m

∑j=1

∣zij ∣ =m

∑j=1

∣ζi(zj)∣ ∈ m

∑j=1

∣ζ(xj)∣ ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1. (68)

Therefore ∑mj=1 ∣zij ∣ ≤ Zm. Let K = (i, j) ∣ 1 ≤ i ≠ j ≤m. Now we can compute

24

E⎛⎝ ∑(i,j)∈K

δiδj ∣zij ∣⎞⎠= ∑

(i,j)∈K

(E δiE δj) ∣zij ∣ as E is linear and δi, δj are independant (69)

= δ2 ∑(i,j)∈K

∣zij ∣ since E δi = δ (70)

≤ δ2mZm since each ∑j≤mi≠j

∣zij ∣ ≤m

∑j=1

∣zij ∣ ≤ Zm (71)

= mδ4

by definition of δ. (72)

Using the linearity of the expectation, followd by the above inequality, then allows us to

write

E⎛⎝

m

∑i=1

δi − 2 ∑(i,j)∈K

δiδj ∣zij ∣⎞⎠≥mδ − 2 ⋅ mδ

4= mδ

2. (73)

Therefore there exists ω ∈ Ω such that

m

∑i=1

δi(ω) − 2 ∑(i,j)∈K

δi(ω)δj(ω)∣xij ∣ ≥mδ

2. (74)

Let I = 1 ≤ i ≤ m ∣ δi(ω) = 1. Of course, for all i /∈ I, we necessarily have δi(ω) = 0.

Then we have

m

∑i=1

δi(ω) = ∣I ∣ and (75)

2 ∑(i,j)∈K

δi(ω)δj(ω)∣zij ∣ = 2 ∑i,j∈Ii≠j

∣zij ∣. (76)

This means that (74) becomes

25

∣I ∣ − 2 ∑i,j∈Ii≠j

∣zij ∣ ≥mδ

2i.e. ∑

i∈I

⎛⎝

1 − 2 ∑j∈I∖i

∣zij ∣⎞⎠≥ mδ

2. (77)

Now for each i ∈ I, we have 1 − 2∑j∈I∖i ∣zij ∣ ≤ 1. Since the sum of these ∣I ∣ terms is at

least mδ2 , it then follows that at least mδ

2 of these terms are positive. In other words, if we

take A to be the collection of these positive terms, namely

A = i ∈ I ∣ 0 ≤ 1 − 2 ∑j∈I∖i

∣xij ∣ i.e. ∑j∈I∖i

∣xij ∣ ≤1

2, (78)

then ∣A∣ ≥ mδ2 . We show A is the subset of 1, . . . ,m that satisfies the theorem. Fix a

collection of real numbers αii∈A and take l ∈ A such that ∣αl∣ = supi∈A ∣αi∣. Then

∥∑i∈A

αizi∥ ≥ ∣ζl (∑i∈A

αizi)∣ since ∥ζl∥ ≤ 1 (79)

=RRRRRRRRRRRαlζl(zl) + ∑

i∈A∖l

αiζl(zi)RRRRRRRRRRR

(80)

≥ ∣αlζl(zl)∣ − ∑i∈A∖l

∣αi∣∣ζl(zi)∣ by the reverse and regular triangle inequality

(81)

≥ ∣αl∣⎛⎝∣ζl(zl)∣ − ∑

i∈A∖l

∣ζl(zi)∣⎞⎠

replacing each ∣αi∣ with the larger ∣αl∣ (82)

= ∣αl∣⎛⎝

1 − ∑i∈A∖l

∣zli∣⎞⎠

since ∣ζl(zl)∣ = 1 (83)

≥ ∣αl∣1

2since ∑

i∈A∖l

∣zli∣ ≤ ∑i∈I∖l

∣zli∣ ≤1

2as l ∈ A (84)

= 1

2supi∈A

∣αi∣ by choice of αl. (85)

26

Since the collection αii∈A was arbitrary, A indeed satisfies the theorem.

4.5 The Extended Dvoretzky-Rogers Lemma

With the Dvoretzky-Rogers Lemma and Talagrand’s proposition at our disposal, we may

now prove the Extended Dvoretzky-Rogers Lemma as used in the introduction.

Lemma (The Extended Dvoretzky-Rogers Lemma). Suppose (E, ∥ ⋅ ∥) is an N-dimensional

Banach space. Then there exists a collection zk of N = [√

[N2]

16 ] vectors in E such that for

all α ∈ RN ,

1

2sup ∣αk∣ ≤ ∥∑αkzk∥ ≤ 2 (∑ ∣αk∣2)

12 . (86)

We will get a preliminary collection of [N2] vectors from the Dvoretzky-Rogers Lemma,

then take a subset of this collection using Talagrand’s Proposition, and then discard a few

more vectors so the size of the remaining collection is as desired.

Proof. First, suppose we had a collection zk that satisfied either inequality in (86). Then

any subset must satisfy the same inequality (with the number of terms appropariately ad-

justed). To see this, simply apply the inequality in question to zk in the case where αk = 0

whenever zk is not in the subset in question.

Now take the m = [N2] vectors z′k given by the Dvoretzky-Rogers Lemma. Let zk = 2z′k.

Then ∥∑αkzk∥ ≤ 2∥α∥2 for all α ∈ Rm, and each ∥zk∥ ≥ 1. Then Talagrand’s proposition

applied to zk gives a collection of l ≥= m8Zm

vectors zkj from zk such that 12 sup ∣αkj ∣ ≤

∥∑αkjzkj∥ for all α ∈ Rl.

Recall Zm = sup∑mk=1 ∣ζ(zk)∣ ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1. Then, for each ζ ∈ E∗, an elementary

property of norms on Euclidean spaces gives

∑ ∣ζ(zk)∣ ≤√m (∑ ∣ζ(zk)∣2)

12 . (87)

27

Therefore Zm ≤√m ⋅ sup(∑ ∣ζ(zk)∣2)

12 ∣ ζ ∈ E∗, ∥ζ∥ ≤ 1. But since ∥∑αkjzkj∥ ≤ 2∥α∥2

for all α ∈ Rl (remember zkj is a subset of zk, which satisfies the right side of (86)), we

can conclude by the calculation carried out in section (1.3) around (17) that Zm ≤√m ⋅ 2.

Now we have

l ≥ [ m

8Zm] ≥ [ m

16√m

] = [√m

16] = N . (88)

Consider the collection zkjNj=1. Since it is a subset of zkj, which satisfies (86), we

conclude that zkjNj=1 also satisfies (86). Noting that N = [√

[N2]

16 ] completes the proof.

5 Dvoretzky’s Theorem

With both the Gaussian reformulation of Dvoretzky’s theorem and the Extended Dvoretzky-

Rogers Lemma proven, we can now tackle the version of Dvoretzky’s theorem stated in the

introduction. However, recalling the end of section (1.3), there is one more technical lemma

we need before we can proceed. This is Lemma 4, which is proven first.

5.1 Lemma 4

Lemma 4. Let gkNk=1 be independant standard Gaussian variables on a probability space

(Ω,F,P). Let N = [√

[N2]

16 ]. Then there exists a constant c such that

c√

logN ≤ E supk≤N

∣gk∣. (89)

Proof. We first establish a preliminary result. Fix n > 1. Then we claim

P (∣g1∣ >√

logn) ≥ 1

n. (90)

28

By definition of a standard Gaussian variable, for λ > 1 we have P(∣g∣ > λ) = 2 ∫∞

λ e−t2

2 dt.

Then we may calculate

2∫∞

λe−

t2

2 dt ≥ ∫∞

λe−

t2

2 (1 + t−2)dt since t−2 < 1 (91)

= −e− t2

2 t−1∣∞λ

(92)

= e−λ

2

2

λ. (93)

Letting λ =√

logn we can conclude by substitution that

P (∣g∣ >√

logn) ≥ e−logn2

√logn

= ( 1

n logn)

12

≥ ( 1

n2)

12

= 1

n(94)

The last inequality used the fact n ≥ logn. Therefore P (∣g∣ >√

logn) ≥ 1n for n > 1. Direct

computation gives the result for n = 1, thus establishing this preliminary result.

Now then, letting n =√N , we have, by the above result

P(∣g1∣ >√

log√N) ≥ 1√

N. (95)

Then we can compute

P(supk≤N

∣gk∣ ≤√

log√N) = P(∣g1∣ ≤

√log

√N)

N

(96)

≤ (1 − 1√N

)N

by (95) (97)

≤ e−1

16√

2 (98)

29

Rearranging gives 1 − e−1

16√

2 ≤ P (supk≤N ∣gk∣ ≥√

log√N). Then by Markov’s inequality,

1 − e−1

16√

2 ≤E supk≤N ∣gk∣√

log√N

(99)

(1 − e−1

16√

2) ⋅ 1√2

√logN ≤ E sup

k≤N

∣gk∣. (100)

We have therefore proven the lemma, with c = (1 − e−1

16√

2) ⋅ 1√2.

5.2 Dvoretzky’s Theorem

Theorem. For each ε > 0 there is a number η(ε) > 0 with the following property. Let

(E, ∥ ⋅ ∥) be an N-dimensional Banach space. Then E contains a subspace F of dimension

n = [η(ε) logN] such that d(F, `n2) ≤ 1 + ε.

Proof. Apply the Extended Dvoretzky-Rogers Lemma to E to get N = [√

[N2]

16 ] vectors zk

in E such that for all α ∈ RN ,

1

2sup ∣αk∣ ≤ ∥∑αkzk∥ ≤ 2∥α∥2. (101)

Let gkNk=1 be independent standard Gaussian variables on a probability space (Ω,F,P).

Set X = ∑ gkzk. We claim C logN ≤ d(X) for some constant C.

Note that the argument in end of section (1.3) and Lemma 4 together imply that for

c = (1 − e−1

16√

2) ⋅ 1√2,

c

2

√logN ≤ 1

2E sup ∣gk∣ ≤ E ∥X∥. (102)

Then since σ(X) ≤ 2, as by section 1.3 around equation (17), we have

30

c2

4logN ⋅ 1

4≤ (E ∥X∥

2)

2

≤ d(X). (103)

Letting C = c2

16 gives the desired inequality.

Now take η1 from the Gaussian reformulation of Dvoretzk’s theorem. Denoting η(ε) =

Cη1(ε), we have [η(ε) logN] ≤ [η1(ε)d(X)]. By the Gaussian reformulation of the Dvoretzky

theorem, it follows that there exists a subspace of E of dimension n = [η(ε) logN] that is

(1 + ε)-isomorphic to `n2 . This completes the proof.

Notice that since η = Cη1, the function η is of the order ε2

log( 1ε)

for ε ≤ 19 .

6 Alternate Proof With Min-Max Theorem

Now we study Gordon’s version and proof of Dvoretzky’s theorem. As mentioned in the

introduction, Gordon’s result gives η ∼ ε2, which is considered an improvement on the η

given in our first version of Dvoretzky’s theorem. As many proofs are omitted here, we will

state the various prerequisite results and then describe how they prove Dvoretzky’s theorem

without much motivation.

6.1 Gaussian Min-Max Theorem

The following theorem is proved in section 9 of the online notes, and is also stated in Gordon’s

paper. Here we forgo the proof.

Theorem (Min-Max Theorem). Let Xi,jni=1mj=1 and Yi,jni=1

mj=1 be two collections of Gaus-

sian vectors. Suppose that both

31

E(Xi,j −Xi,l)2 ≥ E(Yi,j − Yi,l)2 for all i, j, l (104)

E(Xi,j −Xk,l)2 ≤ E(Yi,j − Yk,l)2 for all i, j, k, l with i ≠ k. (105)

Then Emini maxjXi,j ≥ Emini maxj Yi,j.

6.2 Gordon’s Theorem 2

The following theorem and its proof are found in Gordon’s paper as a consequence of the

Gaussian Min-Max theorem. First, we establish some notation - although some of the

following objects are also used earlier in this paper, we remain closer to Gordon’s notation

in this section. Let m and N be natural numbers. Let T be a subset of Sm−1 and take

xiNi=1 to be a collection of vectors in a normed space (X, ∥ ⋅ ∥). Let giNi=1 and hjmj=1 be

independant standard Gaussian variables, and set

E(xi) = E∥N

∑i=1

gixi∥ E∗(T ) = Emaxt∈T

m

∑j=1

hjtj . (106)

Theorem (Gordon’s Theorem 2). Let 0 < ε < 1. Assume ∥∑Ni=1 τixi∥ ≤ ∥τ∥2 for all τ ∈ RN . If

E∗(T ) < E(xi), then there is a linear map A ∶ Rm →X such that

maxt∈T ∥A(t)∥mint∈T ∥A(t)∥

≤ E(xi) +E∗(T )E(xi) −E∗(T )

. (107)

If we further have E∗(T ) < εE(xi), then

32

maxt∈T ∥A(t)∥mint∈T ∥A(t)∥

≤ 1 + ε1 − ε

. (108)

6.3 Dvoretzky’s Theorem

Theorem. Let 0 < ε < 1. Take n and m natural numbers with 1 ≤ m ≤ cε2 logn, where

c = (c′)2

4 , and c′ is the constant from Lemma 4. Given a normed space X = (Rn, ∥ ⋅ ∥), there

exists an m-dimensional subspace Y of X such that d(Y, `m2 ) ≤ 1+ε1−ε .

Proof. First, apply the Extended Dvoretzky-Rogers Lemma to get a sequence of N = [√

[n2]

16 ]

vectors xiNi=1 in X such that for all t ∈ RN , we have

1

2maxi

∣ti∣ ≤ ∥N

∑i=1

tixi∥ ≤ 2(N

∑i=1

t2i)12

. (109)

Take T = Sm−1 and giNi=1, hjmj=1 to be independant standard Gaussian variables. We

will show the collection xi satisfies the conditions of Gordon’s Theorem 2. The first

condition is satisfied up to the constant 2, which is fine for our purposes. Now, we have

E∗(T ) = E maxt∈Sm−1

m

∑j=1

hjtj = E supt∈Sm−1

h ⋅ t = E ∥h∥ ≤√m. (110)

The last inequality is not obvious, and a proof given in the appendix, section (A.1).

Continuing on:

33

√m ≤ ε

√c√

logn by choice of m (111)

≤ ε12Emax

i≤N∣gi∣ by choice of c (112)

≤ εE∥N

∑i=1

gixi∥ by choice of xi (113)

= εE(xi). (114)

Therefore, E∗(T ) ≤ εE(xi), and we can apply Gordon’s Theorem 2 to get a map

A ∶ Rm →X such that, letting Y = A(`m2 ):

d(Y, `m2 ) ≤ ∥A∥∥A−1∣Y ∥ =maxt∈T ∥A(t)∥mint∈T ∥A(t)∥

≤ 1 + ε1 − ε

. (115)

This completes the proof.

A Appendix

A.1 Estimating the Expectation of a Gaussian Vector

In section (6.3), we used a well-known by not obvious inequality. This inequality is proven

here.

Theorem. Suppose h = (h1, . . . , hn) is a standard Gaussian random vector in Rn. Then

n√n+1

E ∥h∥ ≤√n.

Proof. We will first directly integrate E ∥h∥. By definition

E ∥h∥ = 1

(2π)n2 ∫Rn∥x∥e−

∥x∥2 dx. (116)

34

Now, moving to spherical coordinates, we see that

E ∥h∥ = 1

(2π)n2 ∫∞

0rne−

r2

2 drn−1

∏k=2∫

π

0sink−1(φn−k)dφn−k ∫

2π

0dφn−1. (117)

Now it is a property of the beta function that B(x, y) = 2 ∫π2

0 (sin θ)2x−1(cos θ)2y−1dθ and

that B(x, y) = Γ(x)Γ(y)Γ(x+y) . Using these, we can confirm

∫π

0sink−1(φn−k)dφn−k = B (k

2,1

2) =

Γ (k2)Γ (1

2)

Γ (k+12

)(118)

Then, using the fact that Γ (12) =

√π and ∫

2π

0 dφn−1 = 2π, we can write

E ∥h∥ = 2ππn−22

(2π)n2 ∫∞

0rne−

r2

2 drn−1

∏k=2

Γ (k2)

Γ (k+12

)(119)

= 22−n2

Γ (n2) ∫

∞

0rne−

r2

2 dr by cancelling (120)

= 22−n2

Γ (n2) ∫

∞

02n2 t

n2 e−t

dt

(2t) 12

lettingr2

2= t (121)

=√

2

Γ (n2) ∫

∞

0tn−12 e−tdt (122)

=√

2Γ (n+12

)Γ (n

2)

by definition of Γ. (123)

Now we will prove the theorem by induction. The base case has n = 1. By direct

computation we can verify that

1√2<

√2Γ(1)

Γ (12)

=√

2√π< 1. (124)

35

Now, assume the inductive hypothesis, namely that for some natural n = k we have

k√k + 1

≤√

2Γ (k+12

)Γ (k

2)

≤√k. (125)

Therefore, taking reciprocals and then multiplying by k gives (noting 1√2=

√2

2 ):

√k ≤ k

2

√2Γ (k

2)

Γ (k+12

)≤√k + 1. (126)

Now since xΓ(x) = Γ(x + 1), the middle term becomes√

2Γ( k+22

)

Γ( k+12

). Noting that

√k + 1 ≤

√k + 2 and k+1√

k+2≤√k, we conclude

k + 1√k + 2

≤√

2Γ (k+22

)Γ (k+1

2)

≤√k + 1. (127)

This completes the induction and thus the proof.

A.2 Lemma 3: The Infinite Dimensional Case

In section (4.2), we alluded to the fact that Lemma 3 had a variant that held in infinite

dimensional spaces. We state and prove this variant here.

Definition. Let T ∶ U → V be a linear map between Banach spaces. Define Kn = ∥T − S∥ ∣

S ∶ U → V, rank(S) < n, where ∥ ⋅ ∥ is the usual operator norm. Then define an(T ) = infKn.

Note here that K1 = ∥T ∥, and thus a1(T ) = ∥T ∥, since the only S ∶ U → V of rank less than

1 is the zero map.

Lemma (Infinite Dimensional Version of Lemma 3). Let T ∶H →X be a linear operator from

a Hilbert space (H, ∣ ⋅ ∣) to a Banach space (X, ∣ ⋅ ∣). For all ε > 0, there exists an orthonormal

36

sequence fn in H such that ∥Tfn∥ ≥ an(T ) − ε for any n.

The proof proceeds in exactly the same way as in Lemma 3, only we cannot appeal to

compactness of closed unit balls or to continuity of T . This being the case, details already

covered in the proof of Lemma 3 are ommited here.

Proof. We again construct the sequence fn recursively. Since ∥T ∥ = sup∥Tv∥ ∣ ∣v∣ = 1,

there exists f1 ∈ H such that ∥Tf1∥ ≥ ∥T ∥ − ε and ∣f1∣ = 1. We can take f1 to be the first

vector in our sequence.

Let the subspace S1 and the map S ∶X →H be as defined in the proof of Lemma 3 (that

is, section 4.2). Then as before we have

∥T ∣S1∥ ≥ a2(T ). (128)

Now find f2 ∈ S1 such that ∥T ∣S1f2∥ ≥ ∥T ∣S1∥ − ε and ∣f2∣ = 1. Then by inequality (128)

we have ∥Tf2∥ = ∥T ∣S1f2∥ ≥ a2(T ) − ε. So we can take f2 to be the second vector in our

sequence. Repeating this procedure generates an orthonormal list of vectors fn in H with

the desired properties. This completes the proof.

A.3 Estimating δ(ε)

In section (2.2), we found that Lemma 2 holds by taking 0 < δ(ε) = δ to be such that

1√1 + ε

≤ 1 − 3δ

1 − δand

1 + δ1 − δ

≤√

1 + ε. (129)

That δ exists can be easily seen. However, we will calculate here the size of δ compared

to ε for small ε. First note that we can rewrite (129) as

37

√1 + ε ≥ max( 1 − δ

1 − 3δ,1 + δ1 − δ

) . (130)

Then, for 0 < δ < 13 we have

1 − δ1 − 3δ

− 1 + δ1 − δ

= 4δ2

(1 − 3δ)(1 − δ)> 0. (131)

Since we are interested primarily in small δ stipulating δ < 13 is reasonable. Then we can

rewrite (130) as√

1 + ε ≥ 1−δ1−3δ . Rearranging this inequality leads to

(8 + 9ε)δ2 − (4 + 6ε)δ + ε ≥ 0, (132)

and by concavity of this quadratic we see the above is satisfied when 0 < δ ≤ 2+3ε−2√

1+ε8+9ε

(remember we do not want large δ). Now, supposing ε ≤ 19 , we see

2 + 3ε − 2√

1 + ε8 + 9ε

≥ 2 + 3ε − 2√

1 + ε8 + 9ε

since√

1 + ε ≤ 1 + ε (133)

= ε

8 + 9ε(134)

≥ ε9

since ε ≤ 1

9. (135)

Therefore, if ε ≤ 19 , we can set δ(ε) = min ( ε

9 ,13) = ε

9 . This result is what allows us to

calculate η1 and η in the first version of Dvoretzky’s theorem.

38

A.4 Proving the Gauge Gives a Norm

In section (1.1), we stated that given a compact, symmetric, convex body K ⊆ RN containing

0, the gauge ∥ ⋅ ∥K defined a norm on RN . This is not obvious; we give a proof here. Denote

∥ ⋅ ∥K = φ. We need to show φ is homogeneous, satisfies the triangle inequality, and achieves

zero only on the zero vector.

• First, we verify the triangle inequality. Before we begin, notice that if x ∈ r1K and

y ∈ r2K, then x + y ∈ (r1 + r2)K. To see this, let z = x+yr1+r2

. Then z ∈ K since K is

convex, and so (r1+r2)z = x+y is in (r1+r2)K. Now take any x, y ∈ RN and any ε > 0.

Then by definition of the gauge, we have both x ∈ (φ(x) + ε2)K and y ∈ (φ(y) + ε

2)K.

This means x + y ∈ (φ(x) + φ(y) + ε)K, and hence that

φ(x + y) ≤ φ(x) + φ(y) + ε. (136)

Since (136) is true for all ε > 0, we conclude the triangle inequality does indeed hold.

• Second, we check that φ(x) = 0 if and only if x = 0. Since 0 ∈ 0K, and φ is at least

0, we have φ(0) = 0. Now suppose φ(x) = 0. Then we have x ∈ 1nK for all naturals

n. This means that x ∈ ⋂n∈N 1nK. Now, using the facts that K is compact, symmetric,

and convex, we see the 1nK form a nested sequence of compact intervals of decreasing

diameter. By Cantor’s intersection theorem, it follows that ⋂n∈N 1nK is a singleton set.

This set obviously contains 0, and since it also contains x we see x = 0. This proves

φ(x) = 0 if and only if x = 0.

• Now we show φ is homogeneous. Suppose x ∈ RN and α ∈ R. If α = 0, then φ(αx) =

φ(0) = 0 = 0 ⋅ φ(x), so homogeniety holds. So let α ≠ 0. Then take ε > 0. We have

39

φ(αx) = infr > 0 ∣ αx ∈ rK (137)

= infr > 0 ∣ ∣α∣x ∈ rK by symmetry of K (138)

= infr > 0 ∣ x ∈ r

∣α∣K since α ≠ 0 (139)

= inf∣α∣ r∣α∣

> 0 ∣ x ∈ r

∣α∣K (140)

= ∣α∣ infr′ > 0 ∣ x ∈ r′K letting r′ = r

∣α∣(141)

= ∣α∣φ(x). (142)

This shows φ is homogeneous. As this was the final property of φ we had to verify, we

conclude the gauge of K is indeed a norm.

References

[1] Gilles Pisier, The Volume of Convex Bodies and Banach Space Geometry, Cambridge

University Press, Cambridge, 1st edition, 1989.

[2] Yehoram Gordon, Applications of the Gaussian Min-Max Theorem. Unpublished. Linked

on the course website.

[3] Michel Talagrand Embedding of `∞k and a theorem of Alon and Milman, Geometric as-

pects of functional analysis (Israel), Operator Theorey Advances and Applications, 77,

Birkhauser, Basel, 1995.

40

Dvoretzky’s Theorem and Concentration of Measure · Bodies and Banach Space Geometry as well as on Yehoram Gordon’s unpublished manuscript \Applications of the Gaussian Min-Max

Documents