Nature Origins

8/13/2019 Nature Origins

1/461

The Nature and Origins of ModernMathematics: an Elementary Introduction

Andrew McLennan

School of Economics

University of Queensland

Level 6 Colin Clark Building

St Lucia, Qld 4072

Australia

[email protected]

July 27, 2009


2/461

Preface

Contemporary mathematics is a very different thing from the mathematicsof 150 years ago. To a certain extent this is simply because we know a

lot more, but the more radical changes are transformations of the mostfundamental concepts of the subject.

At a certain point in the 19th century mathematicians realized that settheory could be used to give exact descriptions of all the objects they workedwith. The most obvious and immediate benefit is increased clarity and rigor,but that is far from the end of the story. The methods used to give precisedefinitions of existing concepts can also be used to define novel structures,and in the 20th century this led to the emergence of many entirely new fieldsof research. A bit more subtly, the axiomatic method based on set theorycan be used to take a concept apart, to break it down into more fundamen-tal elements, to recombine these elements, and ultimately to reformulate

the original concept in ways that discard inessential aspects inherited fromparticular applications while retaining a critical core. This is the process ofabstraction.

This book describes some of the resulting concepts. Up to a point itstrajectory is quite similar to the mathematical curriculum at the secondaryschool and early university level: fundamentals of mathematical reasoning,basic facts about real numbers, continuity and convergence, some algebra,and then the calculus. Every idea had some predecessor in the mathematicalthought of Sir Isaac Newton. But instead of thinking of these as a collec-tion of problem-solving methods or skills, we will be entirely concernedwith viewing them as a system of interrelated definitions that combine to

create a mathematics that is more general, unified, and powerful than any-thing Newton could have imagined. The final chapters use these concepts todevelop geometric structures that go far beyond geometry as it was under-stood in the 18th century, but which are now fundamental in mathematicsand physics.

These concepts are, in themselves, quite simple. Whatever difficulties

2


3/461

3

they entail arise in two ways. First, understanding one of them is primarily

a matter of seeing that the definition is a correct response to some need, ora more general version of another concept of proven value. We will empha-size relationships between the concepts, their historical origins, and certainfundamental results that validate them, but this is only the beginningof an accumulation of experience that generates an ever-evolving sense andappreciation of each of them. The second source of difficulties is that trulyunderstanding these concepts means understanding their implications andlarger consequences for mathematics. At present mathematical knowledgeis exploding, and the distant future is unknowable; my own guess is that forat least a few more generations our understanding of these ideas will becomeincreasingly incomplete. But while these thoughts should make professional

mathematicians feel humble, the reader certainly isnt expected to grapplewith such profound mysteries.

This book is different from other books about mathematics you mayhave seen. It aims at a broad audience, and assumes very little in the wayof prior mathematical background. It is, I hope, particularly well suited forhigh school and college students who are interested in mathematics but havea hard time finding books that dont assume background they lack. At thesame time it is a book ofmathematics, with formal definitions, theorems,and proofs, rather than a bookaboutwhat mathematics is like, or what itaims to accomplish, or the biographies of mathematicians. Even though ittreats some subjects that are usually thought to be advanced, it is in various

ways easier reading than other math books, focusing on the simplest aspectsof each topic, with thorough, detailed, and gentle explanations of the stepsin each argument. There is very little in the way of gotcha cleverness. Itrambles a bit, so that not that much needs to be carried forward from onechapter to the next. I would hope that it is accessible to anyone of normalintelligence who approaches it with patience, going slowly enough to reallyunderstand each new idea and argument, and with sincere interest. I havetried in various ways to make it a bit more entertaining, in the everydaysense, than scholarly math books which presume a readership of addicts likemyself. But your mileage may vary.

In spite of everything, there will almost certainly be times when what

youre reading doesnt seem to make sense. The first thing to do is to retraceyour mental steps and try to puzzle things out; usually a confusion has itsroots in an earlier misapprehension of some small detail. But sometimesthat wont work, at which point it would be best to ask someone. Contraryto what your teachers may have told you about the only stupid questionsbeing those that are unasked,almost all your questions will be dumb. Really


4/461

4

dumb. Everything here is simple, once youve seen it in the proper light, so

most likely the answers to your questions will make you look, and feel, likea goofball. Try to have a thick skin about this; it happens to everyone.

There is also a lot of terminology. The first time a technical term appearsit will be in bold face, either in a definition or (more commonly) with someprecise, but less formal, explanation of its meaning. There will probably bemany times when you encounter a term you dont recall, or whose meaningyou dont recall, or recall vaguely. Even if you are just a little bit unsure, itsa good idea to review the definition. (You should be able to find it quicklyby looking in the index.) This is largely a book about definitions, andthe perspective on mathematics they embody and express; using the jargonfreely once its been introduced is an important strategy for reinforcing your

familiarity and understanding. This may seem harsh, but its a bit likelearning a foreign language: a good instructor conducts the course entirelyin that language from a very early stage.

Almost certainly you already know that mathematical notation makesheavy use of the Greek alphabet. But you might see a few Greek lettersthat are new to you, and it helps to know how they are pronounced, anda bit about how the Greek and Roman alphabets are related. If, insteadof trying to memorize the Greek alphabet now, you keep a reference handy(perhaps bookmarked in your browser) and look up unfamiliar letters as yougo along, youll learn everything you need to know effortlessly.

Will reading this book help you get good grades in math courses?I started writing it in response to frustrations I felt in teaching a course

that is usually called Mathematics for Economists. In part because theamount of time in a one semester course is very short, in part because stu-dents primarily want to know how to solve the problems that will determinetheir grades, and in part because the problem solving techniques developedin this course are inputs to other courses in economics, I felt, and largelysuccumbed to, pressure to focus on the how to do it aspects of the subject,shortchanging the conceptual underpinnings. Some of my students were ex-cellent, but many were woeful products of years of precisely this sort ofinstruction, with an almost transcendental inability to deal with the sub-

ject matter that went far beyond any lack of native intelligence. In actualfact many of them were at least as smart as most people, or significantlysmarter. And no normal human being is truly so stupid as to be incapable ofunderstanding elementary mathematics, which is much simpler than manycommonplace aspects of everyday life.

Imagine a piano student who works for years with scales, triads, and


5/461

i

exercises, but who never plays or hears any actual music. It would b e a

hopeless struggle to remember information that had been stripped of anymeaning. Now suppose that one day this person attends a concert. Thenext day her skills would be no different than they had been before, butshe could begin to practice in an entirely different manner, especially if shecontinued to listen to music, and began to play real music herself. Myhighest hope for this book is that in some readers it will trigger such aprocess, but it is a starting point, not a cure.

There is a much simpler answer to the question above: kittens that liketo play inevitably turn into cats who know how to catch mice. Profes-sional mathematicians are primarily motivated by intellectual stimulationand aesthetic pleasure; the unreasonable effectiveness of mathematics as

a tool for dealing with the world is, for them, an unintended side effect.The last chapter recommends several other books that convey this sense ofmathematics to less experienced readers, and theres no reason not to startexploring them now.


6/461

Contents

1 What Mathematics Is 1

1.1 Why Read This Book? . . . . . . . . . . . . . . . . . . . . . . 11.2 What Is Doing Mathematics? . . . . . . . . . . . . . . . . . 3

1.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Foundations: Sets, Relations, and Functions . . . . . . . . . . 22

2 The Real Numbers 32

2.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.2 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.3 Ring Homomorphisms . . . . . . . . . . . . . . . . . . . . . . 46

2.4 Prime Factorization . . . . . . . . . . . . . . . . . . . . . . . 53

2.5 Algebraic Integers and Modules . . . . . . . . . . . . . . . . . 60

2.6 Fermats Last Theorem . . . . . . . . . . . . . . . . . . . . . 70

2.7 Ordered Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.8 The Least Upper Bound Axiom . . . . . . . . . . . . . . . . . 76

2.9 Constructing the Real Numbers . . . . . . . . . . . . . . . . . 82

3 Limits and Continuity 88

3.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.2 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . 96

3.3 Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.4 The Zariski Topology . . . . . . . . . . . . . . . . . . . . . . . 106

3.5 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.6 More on Compactness . . . . . . . . . . . . . . . . . . . . . . 120

3.7 Sequences and Series of Functions . . . . . . . . . . . . . . . 127

3.8 The Fundamental Theorem of Algebra . . . . . . . . . . . . . 136

3.9 The Exponential and Trigonometric Functions . . . . . . . . 139

3.10 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . 141

ii


7/461

CONTENTS iii

4 Linear Algebra 145

4.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 1474.2 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . 150

4.3 Linear Transformations . . . . . . . . . . . . . . . . . . . . . 157

4.4 Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

5 The Determinant 169

5.1 Positive and Negative Volume . . . . . . . . . . . . . . . . . . 169

5.2 Even and Odd Permutations . . . . . . . . . . . . . . . . . . 177

5.3 The Determinant of a Matrix . . . . . . . . . . . . . . . . . . 180

5.4 Transposes and Products . . . . . . . . . . . . . . . . . . . . 184

5.5 Back to Linear Transformations . . . . . . . . . . . . . . . . . 189

5.6 The Characteristic Polynomial . . . . . . . . . . . . . . . . . 191

5.7 The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . 195

5.8 Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . . 200

6 The Derivative 210

6.1 Assumptions on Scalars . . . . . . . . . . . . . . . . . . . . . 212

6.2 A Weird Valuation . . . . . . . . . . . . . . . . . . . . . . . . 215

6.3 Normed Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 218

6.4 Defining the Derivative . . . . . . . . . . . . . . . . . . . . . . 223

6.5 The Derivatives Significance . . . . . . . . . . . . . . . . . . 2256.6 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 230

6.7 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 235

6.8 Computation of Derivatives . . . . . . . . . . . . . . . . . . . 238

6.9 Practical Computation . . . . . . . . . . . . . . . . . . . . . . 244

6.10 Rolle, Clairaut, Taylor . . . . . . . . . . . . . . . . . . . . . . 248

6.11 Derivatives of Sequences of Functions . . . . . . . . . . . . . . 255

7 Complex Differentiation 258

7.1 The Cauchy-Riemann Equations . . . . . . . . . . . . . . . . 259

7.2 Conformal Mappings . . . . . . . . . . . . . . . . . . . . . . . 263

7.3 Complex Clairaut and Taylor . . . . . . . . . . . . . . . . . . 2667.4 Functions Defined by Power Series . . . . . . . . . . . . . . . 271

7.5 Multivariate Power Series . . . . . . . . . . . . . . . . . . . . 274

7.6 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . 280

7.7 Smooth Functions . . . . . . . . . . . . . . . . . . . . . . . . 284

7.8 The Inverse Function Theorem . . . . . . . . . . . . . . . . . 289


8/461

iv CONTENTS

8 Curved Space 297

8.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3008.2 Differentiable Manifolds . . . . . . . . . . . . . . . . . . . . . 3098.3 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3158.4 Differentiable Functions . . . . . . . . . . . . . . . . . . . . . 3178.5 The Tangent Space . . . . . . . . . . . . . . . . . . . . . . . . 3228.6 A Coordinate-Free Derivative . . . . . . . . . . . . . . . . . . 3278.7 The Regular Value Theorem . . . . . . . . . . . . . . . . . . . 332

9 Going Higher 341

9.1 Differential Geometry . . . . . . . . . . . . . . . . . . . . . . 3419.2 Hyperbolic Space . . . . . . . . . . . . . . . . . . . . . . . . . 355

9.3 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3749.4 Some Riemann Surfaces . . . . . . . . . . . . . . . . . . . . . 3779.5 The Fundamental Group . . . . . . . . . . . . . . . . . . . . . 3989.6 Classification of Compact Manifolds . . . . . . . . . . . . . . 415

10 More and More Math 423

10.1 Some Other Books . . . . . . . . . . . . . . . . . . . . . . . . 425

A Problems 428


9/461

Chapter 1

What Mathematics Is

1.1 Why Read This Book?

You enter the first room of the mansion and its completelydark. You stumble around bumping into the furniture, but gradu-ally you learn where each piece of furniture is. Finally, after sixmonths or so, you find the light switch, you turn it on, and sud-denly its all illuminated. You can see exactly where you were.Then you move into the next room and spend another six monthsin the dark.

Andrew Wiles

Does this sound like you taking a math course? Maybe youre thinkingYeah, except for the part about how suddenly its all illuminated. Firstof all, youre in excellent company: Andrew Wiles (b. 1953) is describing hisexperience working out the proof of what was until then the most famousunresolved conjecture in mathematics. For everyone, especially the b est,new mathematics is baffling until you understand it, but once you reallyunderstand it, its simple.

Unlike most books, which focus on one nut or bolt at a time, this bookproceeds with a lighter touch, aiming to first give you a sense of the overall

structure of mathematics, its methods, and its larger agenda. One concretepurpose is to serve as a supplemental reading for students taking coursesin advanced calculus and real analysis. A supplemental reading should dosomething different from the courses main text, and if I had to summarizethe difference in just a few words, I would say that whereas textbooks con-ceive of the learning process as work, this book is meant to be entertaining.

1


10/461

2 CHAPTER 1. WHAT MATHEMATICS IS

Like a popular book about science, it takes a broader (and somewhat super-

ficial) view of the material, emphasizing certain key concepts as they standin relation to each other and the larger goals of the subject. Following apedagogical method that is standard in physics, but rather uncommon inmathematics, the historical development of these ideas is used to portraythem as creative responses to the problems and opportunities of the eras inwhich they were created, not just static facts devoid of drama.

For many people a course in real analysis is their first exposure tohigher mathematics, with clear axiomatic foundations and rigorous justifi-cations of all assertions. Prior to this point, mathematics may have seemedlike a grabbag of algorithms for performing various calculations, but realanalysis and subsequent courses are primarily concerned with theorems and

proofs. Computations are still important, but if you really understand thelogic of the material, you should be able to figure out how to compute whenthe need arises, and merely knowing how to compute is no substitute forreal understanding. There is an undeniable sense in which this is a harderkind of mathematicsyou are asked to perform (by your textbook, but nothere!) at a higher levelbut knowingexactlywhy things are the way theyare is in many ways, in the end, the simplest and easiest approach.

An initial acquaintance with the concepts described here doesnt requirea huge effort. In order to understand this book you have to be able to followa logically compelling argument, like a juror at a trial, but there are onlya few algebraic computations that could be described as complex. While

some of the proofs have surprising aspects, they are mostly straightforward,and there is nothing terribly deep or complex here. This book is not meantto be studied, and you are not expected to do lots of problems as you gothrough it. The problems at the end were added as an afterthought. Possiblyyoull enjoy themmany introduce interesting concepts and resultsandworking a few for each chapter may help reinforce and consolidate yourunderstanding, but its up to you. The main text was written with theexpectation that youll simply be reading.

And it is all breathtakingly beautiful. The concepts described in thisbook are among the greatest contributions to science ever, as important asevolution and relativity in making the last couple centuries a watershed in

human affairs. Results such as the fundamental theorem of algebra andthe existence of non-Euclidean geometries, which stymied the most talentedmathematicians for decades or centuries, are made accessible, not just toexperts, but to beginners.

The rigorous, proof-oriented approach to mathematics requires exactaxiomatic foundations, which means going back to the very beginnings of


11/461

1.2. WHAT IS DOING MATHEMATICS? 3

the subject and redeveloping everything from scratch. For some readers this

will mean that the early parts of the book, especially this chapter and thenext, are largely review, but the point of view will probably be somewhatunfamiliar, and there are a few advanced ideas and intriguing tangents tospice things up.

Going back to the very beginning also means that this book is in prin-ciple accessible to readers who have only advanced as far as high schoolalgebra. Whether this is true in practice depends on the readers abilityand motivation. If youre a secondary school student (or a curious layper-son) who enjoys mathematics, and you think youre reasonably good at it,by all means give this book a try! When I was young I was frustrated bythe paucity of books that were accessible, and aimed at letting me advance

in the subject by reading on my own. If you feel the same way, youre thesort of reader I have had in my minds eye.

Especially if you have less preparation, youll probably find that thisbook is much more tiring than other kinds of reading. With a book ofcrossword puzzles the usual pace would be one or two a day, or maybe threeif you felt exceptionally enthusiastic. Thinking about each section here asa puzzle, to be solved and savored, will put you on a good pace. Learningmathematics reprograms the mind at a deep level, and this is a process thattakes time, and sleep. So read slowly, trying to fully understand each step,and take some time out to digest before going on to the next section orchapter. As you reflect on what youve read, youll often notice some new

connection or unexpected perspective.Reading a book about rigorous mathematics is a bit like walking a

tightropeif you dont correctly understand something, subsequent mate-rial quickly becomes confusing or impenetrableand perhaps most readerswont make it to the end. This happens to all readers of math books at alllevels, and you shouldnt feel bad about it. Possibly youll pick up whereyou left off a couple weeks, or a couple years, later, with your prior confusionclarified. But even if you dont, by going as far as you could you will havesucceeded in pushing your understanding to a new level.

1.2 What Is Doing Mathematics?

I recently read some autobiographical comments by a mathematician whosaid that when he was a high school student, the idea that new mathematicswas being created in the present day would have seemed as bizarre to himas imagining that professors of English sat around making up new words.


12/461


One can hardly imagine learning about music without quickly realizing that

it is something that people compose and perform, but in elementary andsecondary school mathematics is often presented as an entirely impersonalcollection of true facts and computational methods. Computational ques-tions are used in math courses to test the students facility, and too oftenstudents (and their instructors!) mistakenly believe that the ability to solvesuch problems is the goalof the course. If you just practice the methods ofdoing standard problems you might scrape by, but the real point of learningmathematics is to develop the ability to understand logical and quantitativearguments, and to create original, valid arguments yourself. Before anythingelse, one should have a sense of what people are trying to do when they domathematics.

It seems that elementary mathematics emerged in the ancient world as aresponse to practical problems such as keeping accounts or measuring land.As knowledge accumulated and became more voluminous, presumably therearose a desire to make it more systematic. All this is pretty murky andspeculative, but what we do know for sure is that this process led eventuallyto the discovery of the axiomatic method, as embodied in Euclids (325-265 BC) Elements.

In the axiomatic method a substantial body of knowledge is organized

as a combination of a small number of fundamental propositions, called ax-ioms, that are taken as given, and a large number of logical consequences.It is a fundamental method of all of science (not just mathematics) for atleast three reasons. First, it simplifies, clarifies, and organizes everything.In your own study of mathematics you should aim at capturing the psycho-logical benefits of the axiomatic approach by organizing your own knowledgearound first principles and deductive methods. Second, the distinction be-tween assumptions and logical inferences is critical in science: assumptionsare open to doubt, but logic is not, so if a conclusion seems dubious ordownright counterfactual, some assumption has to be modified.

Most important, though, is that axiomatic organization of scientificknowledge almost always suggests a host of specific questions and generaldirections for further research. The explosive growth of scientific knowl-edge during the last few centuries is, in large part, the natural consequenceof a relentless pursuit of what seem, after logical organization of existingknowledge, to be the most fundamental unresolved issues.


13/461


A B

C

Isoceles Triangle Equilateral Triangle

Figure 1.1

For a concrete illustration of how this might happen (the actual his-torical process was much more complicated) well consider the notion ofsymmetry. Well begin with the observation that, in some vague sense,an equilateral triangle is more symmetric than an isoceles triangle. Justwhat is this symmetry thing, of which there might be more or less in anyparticular instance? Well, the general idea of symmetry has a huge num-ber of applications, and one thing they all have in common is interchangeof various elements of the object under consideration. Concretely, one can

place a copy of an equilateral triangle on top of the original triangle in sixdifferent ways that preserve the distances between all vertices, but for anisoceles triangle there are only two ways to do this.

So, it seems that a symmetry of an equilateral triangle can be representedby a function1 mapping vertices to vertices, say (A) =B , (B) =C, and(C) =A. It seems natural to ask about the properties of such mappings,and it seems evident that a mapping representing a symmetry should be abijection: exactly one element of its domain is mapped to each element ofits range2.

1The discussion b elow assumes you already know what sets and functions are. If youdont, you should first read the description of these concepts at the beginning of Section1.4.

2As you may already know, this property is usually broken down into two parts. Afunction f :X Y is one-to-one, or injective, or an injection, if, for each element yof the range Y, there is at most onex in Xsuch thatf(x) = y. It is onto, orsurjective,or a surjection, if, for each yY there is at least one xX such that f(x) =y. For afunction between two finite sets with the same number of elements, or from a finite set toitself, these two conditions amount to the same thing.


14/461


Actually, a mathematician is inclined to ask a somewhat different ques-

tion: what are the properties of thecollectionof all mappings representingsymmetries of a given object? It turns out that there are three key proper-ties. First, the identity function should always be a symmetry. For anyset X, IdXwill denote the mapping that takes each element ofXto itself.Thus:

Id{A,B,C}(A) =A, Id{A,B,C}(B) =B, Id{A,B,C}(C) =C.

Second, the composition of any two mappings representing symmetries should,in turn, be a mapping representing a symmetry. In general, iffmaps thesetXinto the set Y, andg maps the set Y into the setZ, then the compo-

sitionoff andg is the mappingg f ofX intoZthat takes each xX tog(f(x)). Suppose(A) =A,(B) =C, and(C) =B . Then (as you shouldverify for yourself) takes A to C, B to itself, and C to A. Third, theinverse of any mapping representing a symmetry should represent a sym-metry. Iffis a bijection mappingX intoY, thenf1 is the unique mappingofY intoX satisfyingf1 f= IdX. So,1, which takes A to C, B toA,and C to B , should represent a symmetry.

Now lets consider a different situation exhibiting symmetries. Alice,Bob, and Carol play the following game. Each takes a black pawn and awhite pawn behind his or her back, then, without letting the other playerssee, brings forward a hand containing a single pawn. If all three players chose

the black pawn, or they all chose the white pawn, then no money changeshands. If two chose black and one chose white, or if two chose white and onechose black, then the two who chose the same color each pay $1 to the thirdplayer. Evidently this game is symmetric insofar as the rules are invariantunder any bijective mapping of the set{Alice, Bob, Carol} to itself.

The point of this example is that the relevant symmetries on the set{A,B,C} of vertices of an equilateral triangle are, in some obvious but asyet unexpressed sense, the same as the relevant symmetries on the set{Alice, Bob, Carol} of players of this game. The technique modern mathe-matics uses to capture such notions is abstraction: we define a new type ofobject that embodies the common features of these two symmetric situations

while discarding the aspects that are particular to one or the other of thetwo examples.

Sometimes this process goes further than one might expect, arriving atdefinitions that can seem completely baffling if one has not already tracedthrough the process that led to them. The definition below might seemmystifying if you didnt know that it is based on two further observations


15/461


about composition of functions. First, iff :X

Y is a function, then

f IdX=f= IdY f.

Second, composition is associative: if f : X Y, g : Y Z, andh : Z W are functions, then for any x the three steps in figuring outwhat h(g(f(x))) is can be thought of as the result of combining pairwisecompositions in two different ways, but the grouping doesnt affect the result:

h (g f) = (h g) f.

Here is our first main concept, which will come up again and again.

Definition 1.1. A group is a set G with a binary operation (that is, afunction taking ordered pairs of elements of G to elements of G, that wewill write using the notational conventions of multiplication) satisfying the

following conditions:

(a) The operation is associative: g(gg) = (gg)g for allg, g, g G.(b) There is aneGG, called the identity element, such that

eGg= g = geG

for allg

G.

(c) For eachgG there is aninverse g1 G such that

gg1 =eG= g1g.

The set of all bijections from{A,B,C} to itself is a group, as is the setof all bijections from{Alice, Bob, Carol} to itself. If we want to emphasizethat these are really the same group we can proceed as follows. First, forany positive integer n we define Sn to be the set of all bijections from{1, . . . , n} to itself. This is called the symmetric group on{1, . . . , n};since compositions and inverses of bijections are bijections, it clearly satisfies

the three conditions above. Anactionof a group G on a setXis a functiontaking each pair (g, x) in whichgG and xXto an element ofX, whichwe denote by gx. This function must have the following properties:

(i) eGx= x for all xX;(ii) g(gx) = (gg )x for all g, g G and xX.


16/461


In the first action we have in mind an element ofSninterchanges the elements

of{A,B,C}in the same way that it interchanges the elements of{1, 2, 3}.For example, if(1) = 1, (2) = 3, and (3) = 2, then A = A, B = C,andC=B . The action ofS3on{Alice, Bob, Carol}is defined analogously.

The action is said to be faithful if eG is the only element of G thatinduces the identity function on X. That is, if gx = x for all x X,then g = eG. We can now sum it all up quite succinctly by saying thatin the two situations described above S3 acts faithfully on{A,B,C} and{Alice, Bob, Carol} respectively.

This all seems pretty simple, and the ancient Greeks clearly knew aboutsymmetry, and were interested in it, so you might guess that these defini-tions have been around for at least 2500 years, but you would be wrong.

The concept of a group is only about 200 years old. Its not so easy to psy-choanalyze the failings of our forebears, but one can suggest three factorsto account for this.

First, until recently people have generally believed that mathematicalobjects are, in some sense, already out there. In ancient Greece thePythagorean schoolof philosophy held that all numbers are rational, i.e.,ratios of integers. Note the denigration suggested by this terminology: evenif other numbers do exist, theyre kind of nutty. Possibly you were taught,at a certain age, that1 doesnt have a square root, then later you learnedthat it does, sort of, except that

1 is imaginary and not real.Probably more important than such prejudices, which mathematicians

would have been happy to overcome, even if the solid citizens were dubious,is the fact that the technology for constructing new mathematical objectsusing set theory is a recent development. Well say much more about thisin a little bit.

Finally, even though the central definitions of mathematics are, in theend, simple, the historical process that created them wasnt. Nowadaysthere are thousands of new definitions proposed every year, in the courseof mathematicians doing their work. For specific, well defined projects thiscan be straightforward, but the biggest concepts in mathematics, such assymmetry, or number, or geometry, are studied continually for centuries onend, and the formulations of them that are most popular in any period are

obtained by recrafting earlier approaches to fit the applications of greatestcurrent interest. Although symmetries have been around in various forms fora long time, certain aspects of their importance became apparent during thefirst part of the 19th century. The definitions above, and the overall way wethink about groups, emerged gradually during the next fifty or one hundredyears. As you probably noticed, by themselves these definitions dont really


17/461


say anything about symmetry that you didnt know beforehand, and while

they might seem simple and natural, we havent yet presented any particularreason to think theyll be useful or interesting.

Much of the recent growth of mathematics is the result of a more or lessautomatic consequence of abstraction: a new concept, say the notion of agroup, is introduced because it is relevant and illuminating in its applicationto preexisting mathematical phenomena, but it then becomes an object ofstudy in itself. To give some flavor of this, and to introduce ideas that willrecur in different contexts later, I am going to quickly state some of the basicdefinitions and results of group theory. The material below is written in theterse, just the facts, style of modern mathematics, but please dont beintimidated. At this point you cant possibly truly understand either the

importance or the ramifications of what follows, and you shouldnt expectto comprehend it fully. (Realistically, youll probably need to review it,possibly more than once, as the concepts are applied later.) Just read itslowly and carefully, trying to see that it makes sense on its own terms.

Let G be a group. One important fact about groups is that ifg, Gsatisfy g= g, then

= eG= (g1g)= g1(g) =g1g= eG.

A symmetric argument shows that = eG whenever g = g. That is, thereis only one element ofG that acts like eG, in connection with any element

of G, from either side. Another important fact is that g

= g1

whenevergg =eG because

g1 =g1eG= g1(gg ) = (g1g)g =eGg

=g.

Again, a symmetric argument implies that g = g1 whenever gg = eG.Thus, for each gG, there is only one element ofG that acts, from eitherside, like g1. In particular, each gG is the inverse of its inverse:

(g1)1 =g.

Now let H be a second group. A homomorphism from G to H is a

function : GH that respects or commutes with the group opera-tions:

(gg) =(g)(g)

for all g, g G. It is always the case that (eG) =eH because

(eG) = (eG)(eG)(eG)1 =(eGeG)(eG)

1 =(eG)(eG)1 =eH.


18/461


For any g

G we have (g1) =(g)1 because

(g)(g1) =(gg1) =(eG) =eH.

A homomorphism is said to be an isomorphism if it is bijective, inwhich case we say that G and H are isomorphic. The most basic andimportant fact about isomorphisms is that the inverse of an isomorphismis also an isomorphism. To prove this we first note that the inverse of anybijection is a bijection, so the key point is that the inverse of an isomorphismis a homomorphism. Take two elements h and h ofH, setg := 1(h) andg :=1(h), and compute that

1(hh) =1((g)(g)) =1((gg )) =gg =1(h)1(h).

By the way, the symbol := is the assignment operator. In mostmathematics books its written as =, leaving the reader to determine fromthe context whether the sentence in question is an assertion that two thingsare equal or a definition of the thing on the left as a symbol whose meaningis the thing on the right.

If f : X Y is a function and f(x) = y, then we say that y is theimage of x under f, and that x is a preimage of y. Another importantpoint about notation is that whenever f :XY is a function and BY,f1(B) denotes the set{ x X : f(x) B } of preimages of elements ofB. This makes sense regardless of whether f is invertible, in the sense of

being one-to-one and onto. Usually well writef1

(y) in place of the morecumbersome f1({y}) when yY, but now you have to be careful: iff isinvertible, f1(y) will typically denote the element ofX that is mapped toy byf, and otherwise it denotes the setof preimages ofy .

An isomorphism from a group to itself is called an automorphism.That IdG is an automorphism is a simple and obvious, but crucially impor-tant, fact. There are automorphisms that are called inner automorphismsbecause they come from the group itself: for any G let C :GG bethe function that takes gG to C(g) =g1. This is a homomorphismbecause

C(gg) =gg1 =geGg

1 =g1g 1 =C(g)C(g)

for all g, g G, and C1 is the inverse of C (please convince yourselfthat this is so) so C is an automorphism. Note that IdG = CeG is aninner automorphism. An automorphism that isnt inner is called an outerautomorphism.

A subgroupofG is a subsetG G such that:


19/461


(i) eG

G;

(ii) g1 G whenever gG;(iii) gg G whenever g, g G.

That is, a subgroup of G is a subset containing eG that is closed underinversion and the group operation, in the sense that{ g1 : g G } and{ gg :g, g G }are both contained in G. (Since (g1)1 =g for all g, thefirst set is actually equal toG, and the second set is equal to G because G

containseG.) Observe thatG itself and{eG} are always subgroups. (Thatis, please check (i)-(iii) in your head.) To a large extent group theory regardsthe structure of a group as synonymous with its system of subgroups.

Let : G H be a homomorphism. The following argument showsthat ifH is a subgroup ofH, then

1(H) :={gG : (g)H }

is a subgroup ofG. Above we showed that (eG) = eH, so eG 1(H).Ifg1(H), then g1 1(H) because

(g1) =(g)1 { h1 :hH }= H.

Ifg, g 1(H), then gg 1(H) because

(gg

) =(g)(g

) { hh

:h, h

H

}= H

.

The kernel ofisker() :=1(eH).

Since{eH} is a subgroup ofH, ker() is a subgroup ofG, but it turns outthat not every subgroup can be the kernel of a homomorphism. A normalsubgroupofG is a subgroup N such thatC(g)N whenevergN and G. If g ker() and is any element of G, then C(g) ker() byvirtue of the calculation

(C(g)) =(g1) =()(g)(1) =()eH(

1)

=()(1) =()()1 =eH.

Thus the kernel of is a normal subgroup ofG.Heres an example of a subgroup that is not normal. Let S3 be

the function that takes 1 to 2, 2 to 1, and 3 to itself. Then 1 = ,so G :={ Id{1,2,3}, } obviously contains all products and inverses of its


20/461


elements and is consequently a subgroup ofS3. If

S3 takes 1 to itself, 2

to 3, and 3 to 2, then

=

112332,

=

122133,

and 1 =

112332,

so C() =

132231.

(It doesnt matter in this particular instance because 1 = , but ournotation for compositions lead to compositions like the one above beingcomputed by reading from right to left, first finding the effect of1, thenthe subsequent effect of , and finally the effect of .) Since it does notcontainC(), G

isnota normal subgroup ofS3.

Everything so far is quite elementary and basic. If you feel a bit over-whelmed, rest assured that that is natural: for many nonmathematical sub-

jects the learning process can be primarily a matter of remembering the sortsof things that the human mind finds easy to remember, in part because theyare easily related to other things you already know. The concepts above aresecond nature for any mathematician, but only as a result of seeing themapplied again and again over the years. There are many groups in the restof the book, so the concepts will probably seem quite familiar by the timeyou reach the end.

The next definition is quite a bit less elementary. The groupG is said tobe simpleif its only normal subgroups are

{eG

}andG itself. (One thing you

should know about mathematical terminology is that simple objects areusually not so simple. Truly simple things are typically said to be trivial.)As it happens, one of the most celebrated recent advances of mathematicsis the completion of the classification of finite simple groups. That is,there is now a list of exactly described finite simple groups, and any finitesimple group is isomorphic to some element of the list. The proof of thetheorem stating that this is so is scattered in about 500 journal articlescomprising over ten thousand pages, almost all of which are written in thedense style we saw above. Currently a group of mathematicians is workingto boil this down to a simplified and unified presentation that is expectedto occupy only about 5000 pages.

Pretty much everyone can directly experience the wonderful floweringof music, film, and other arts, echoing around the world these days. Alleducated people know that we are living in an era of profound and rapidscientific advances, even if each of us is limited in our ability to understandthe specifics. Unfortunately, only a small fraction of the population knowsthat this is a period of equally wonderful progress in mathematics, and only


21/461

1.3. PROOFS 13

experts can fully appreciate the beauty and magnificence of these contribu-

tions to world civilization.

1.3 Proofs

In mathematics a proposition is known if it has been proven. In this sense,mathematics is all about proofs, but many students arrive at college levelmathematics without having seen any proofs, and among those who haveseen some, many have never had to write their own proofs. Here we addresssome basic questions. What is a proof? Why are they the standard of truthand knowledge in mathematics? How are proofs conceived and constructed?

What should you be trying to do when you read one? How can you learnto write proofs yourself?

The fundamental idea is quite simple: a proof is a logically compellingargument showing that certain premises imply a desired conclusion. Thatis, we wish to show that a propositionP implies another propositionQ. Wedo this by constructing a sequence of intermediate propositions R1, . . . , Rn,where R1 = P and Rn = Q, such that for each i = 2, . . . , n, Ri is anobvious or elementary consequence of R1, . . . , Ri1 and other facts ofmathematics that are already known. Everybody understands what a pros-ecutor is trying to do in the courtroom, and at a first cut a mathematicalproof is the same sort of thing: an argument that is airtight.

Things get a little bit complicated, both theoretically and practically,when one delves into the details. What constitutes a valid inference?This has been an important issue in philosophy from ancient Greece to thepresent, but, practically speaking, it isnt a serious problem at the beginninglevel, since everyone knows what simple logical inferences look like. Theinferences in the vast majority of proofs, including all the proofs in thisbook, are simple in this sense. We wont worry about it.

Other mathematical sciences use proofs, but by and large their ethos con-cerning what is known is more permissive, including empirical regularities orpropositions that seem overwhelmingly likely, but for which no proof has yetbeen found. Why are mathematicians such purists? Actually, conjectures

and open problems play an important role in mathematical research, so itis not quite correct to say that proof is the only accepted form of knowl-edge. In this sense the bright red line between theorems and conjecturesfor which we have compelling evidence is a social phenomenon, and to de-scribe it carefully would take many pages. But at the heart of any detailedexplanation is the idea that in mathematics knowing which things are


22/461


23/461

1.3. PROOFS 15

If you glance at a book about higher mathematics, youll quickly see

that it consists largely of definitions, theorems, and proofs, with a bit ofless formal explanation thrown in, but not much. If it is a textbook, theproblems mostly ask the students to supply proofs. The view of mathe-matics embodied in this approach is that there is an established, logicallystructured, body of material that is generally accepted, and that the task ofthe author, and any student, is to first forge a secure connection with thislarger structure, then develop the books specific topic by extending thatstructures logic, paying meticulous attention to getting each detail right.

This approach also embodies certain beliefs about the psychology oflearning mathematics, and the role of mathematical writing in that process.The students primary focus should always be onwhythings are the way they

are. It is generally thought that the most effective way to communicate thatin writing is to say as little as the logical structure of the material allows,precisely in order to highlight that structure.

If you are accustomed to focusing on how to do calculations, you wontalready know the proper way to learn this sort of mathematics. It will takesome getting used to, youll have to make several adjustments, and thereare some pitfalls youll need to avoid. And to be frank, even if you succeedin all this, learning new mathematics will still be hard. To an importantextent mathematicians enjoy the subject precisely because it is somethingthey can really sink their teeth into.

The most important thing to get used to is that you have to go one step

at a time, paying attention to each aspect of the definition, theorem, orproof at hand, before going on to the next item. If you skip over even a fewfine points, pretty soon what you are trying to read will stop making sense,and you will have to go back. Actually, you will need to review material youhave already read fairly often, either because you dont remember things oryou get confused, even if you do sincerely try to apprehend each element.Measured in pages per hour, reading mathematics is a very slow process.In part this is because the content per page is quite highwith everythingstripped down to the minimum, a 200 page math book is actually muchlonger than a 700 page book about, say, historybut it is also the case thatlearning mathematics is just inherently slow.

What you should always be trying to do is to turn all the little detailsinto a larger, simpler picture of the key ideas underlying the main results,and more generally why the topic is interesting, important, and structuredthe way it is. Ultimately, understanding a mathematical topic is a matter ofachieving a mental state in which you could reverse engineer the particulardetails by starting with the big picture and applying standard methods of


24/461


proof and the general background obtained from prior study, the things that

everyone knows. (The meaning of everyone varies according to your level!)Your ability to do this is very much a function of your command of the basics,and an important technique for strengthening them is to continually askyourself whether you really understand the calculations and earlier resultsthat are being applied in the proof you are reading right now, recreatingthem mentally, or even looking things up, if you are even a bit unsure. Thismeans going even slower, which will try your patience, but in a sense it ismerely a matter of supplying the sort of repetition and reinforcement thatoccur naturally in other sorts of writing, but which are lost in the processof stripping mathematical exposition down to the minimum. If this attitudetoward the material is habitual, in the long run it will speed things up

because you will be able to read with greater assurance and confidence.A very different set of issues arise when a student starts writing proofs. It

often happens that the first assignment asking for a proof leaves the studentparalyzed, feeling that it must be simple, but not knowing where to begin.Its actually a lot like learning a computer programming language, in thatthe first step, writing a program that simply prints Hello, world!, is hardbecause it applies several aspects of the language, whereas everything thatcomes later can be assimilated one step at a time. To see whats involved,lets look at a simple, but very famous, proof.

Theorem 1.2 (Euclid). There are infinitely many prime numbers.

Proof. Suppose there are only finitely many prime number p1, . . . , pn. Letr= p1 . . . pn +1. Thenr has a factorization as a product of prime numbers,but nopi can divider, so there must be primes that are not included in thelistp1, . . . , pn. This contradiction completes the proof.

Before talking about the content of this argument, lets look at the purelymechanical features. We see a heading consisting of Theorem and anidentifying number, both in bold face. In this case, but not always, there isan attribution consisting of a name in parentheses. Usually this is either thename of the theorem, if it has one, or the name of the person who first provedthe theorem, but in this particular case we know that the theorem appeared

in EuclidsElements, but we dont know very much about its prior history.Following this there is a statement in italics called the assertion. Somespace is skipped, and the actual proof is bracketed by the word Proof(unindented and in italics) and a square box.

The square box is a replacement in modern texts for the more traditionalsymbol Q.E.D. This is an abbreviation for the the Latin phrase quod erat


25/461

1.3. PROOFS 17

demonstratum, which means thus it is demonstrated. Possibly the symbol

Q.E.D. was truly useful back in the days when Latin had some real statusas a universal language, but now its just a piece of trivia. More importantis the fact that the reader often needs to be told that a proof is over. Eventhough this is the meaning of the square box, it can still be helpful to saythis in words, as we have done here.

This format embodies and enforces an important principle called infor-mation hiding: all the information required to understand the proof iscontained in the assertion, except to the extent that a proof invokes the-orems that were proved earlier, and the assertion can be applied in laterarguments without knowing how the proof works. A similar idea has beenfound to be very useful in organizing the code of large computer program-

ming projects. The declaration of a subroutine makes a promise about whatwill happen when the subroutine is invoked. In order to use the subrou-tine you do not need to know the details of how this promise is fulfilled.The declaration will typically also specify the resources that are utilized bythe implementation of the subroutine. This sets an outer bound on whatyou need to know in order to understand the implementation. Informationhiding seems to be an indispensable principle for organizing large bodies ofprecise interrelated technical information in a way that can be understoodand manipulated by people.

Turning to the actual content of the proof, we see a very common anduseful idea called proof by contradiction. Logically, it is expressed by

the following formula:

[(P Q)Q][P Q].

Here P and Q are variables that represent elementary propositions, and,, and mean and, not, and implies respectively. In words thisformula says that if we can prove that Q is true whenever both P andQhold, thenQ must be true whenever Pholds. If our goal is to prove that Pimplies Q, then, in the proof, we can add the assumption that Q is false tothe other assumptions embodied in P. A more general version of this ideais expressed by the formula

[[(P Q)R] R][P Q].

IfP andQ together imply some proposition R (e.g., 1 = 0) that we knowto be false, then Q must be true whenever P is true.

Heres another famous example of this sort of argument.


26/461


Theorem 1.3. There do not exist nonzero integersa andb such thata2 =

2b2, so 2 is an irrational number.Proof. Suppose that, contrary to the assertion, there do exist such a and b.The prime factorization ofa2 is unique, so it must be obtained by squaringthe prime factorization of a, and consequently it has 2 raised to an evenpower. But the same reasoning shows that the prime factorization of b2

has 2 raised to an even power, so that the prime factorization of 2b2 has 2raised to an odd power. This is a contradiction of the uniqueness of primefactorization.

Another important proof technique is called induction. It has the fol-

lowing logical pattern.

[P0 (n >0)[Pn1Pn]](n0)Pn.

The symbol is read for all. The idea is that there is a sequence ofpropositions P0, P1, P2, . . . that we want to prove. If we can prove thatP0 is true, then it is enough to prove Pn, for general n, with the additionalhypothesis thatPn1is true. (To be a bit more precise, it is actually enoughto prove Pn with the additional hypothesis that P0, P1, . . . , P n1 are alltrue.)

We will use induction to prove the binomial theorem, which is a famous

and very useful result that is used to expand expressions of the form (x+y)n.First of all you have to know that for any positive integer n,

n! := 1 2 (n 1) n

(pronounced n factorial) is the product of all the integers between 1 andn. For example 4! = 1 2 3 4 = 24. We also set 0! := 1; there are deepexplanations of why this is the right definition of 0!, but well just acceptit as a convention. For a positive integern and an integer k with 0knwe define

nk

:=

n!

k!(n k)! .Lemma 1.4. For any integersn andk withn1 and1kn,

n

k

=

n 1k 1

+

n 1

k

.


27/461

1.3. PROOFS 19

Proof. We compute thatn

k

=

n!

k!(n k)! =n (n 1)!

k!(n k)! =(k+ (n k)) (n 1)!

k!(n k)!=

(n 1)!(k 1)!(n k)!+

(n 1)!k!(n k 1)! =

n 1k 1

+

n 1

k

.

This time the result was called a lemma because its primary role to serveas an intermediate step in the proof of another theorem. Propositionsareusually results that are less important than theorems, but which nonetheless

have some conceptual interest. A corollaryis a simple consequence of someresult, usually the one that comes right before it.

Here is another proof using induction:

Corollary 1.5. For any integersn andk withn1 and0kn, nk isan integer.

Proof. Ifk = 0 or k =n, then

nk

= n!n!0! = 1, obviously. In particular, we

have

10

= 1 =

11

, so the result is true forn = 1. Suppose, for somen2,

that we have already shown thatn1

0

,n1

1

, . . . ,

n1n2

,n1

n1

are integers.

Then n0,

n1, . . . ,

nn1,

nn are integers: ifk = 0 or k = n, then

nk = 1,

as we have already noted, and if 0 < k < n, then

nk

is an integer by virtueof the lemma above.

The symbol

nk

is called a binomial coefficient and pronounced n

choose k because it is the number of distinct k-element subsets of{1, . . . , n},or any set with n elements. To show this we argue by induction on n,using the formula in the proof above. Arbitrarily, choose some element of{1, . . . , n} that well call the last element. For k = 0 or k = n the claimis obvious, there is one null set and one subset containing all n elements.For 0< k < n, anyk-element subset is either the last element together with

some (k 1)-element subset of the remainder or a k element subset of theremainder. Assuming that the claim has already been established with nreplaced byn 1, there are n1k1subsets of the first type and n1k subsetsof the second type, so the claim follows from the hypothesis that the claimis true with n replaced by n 1, and the lemma above.

Here is the binomial theorem, with an inductive proof.


28/461


Theorem 1.6 (Binomial Theorem). For any numbers x and y, and any

integern1,(x + y)n =

ni=0

ni

xniyi.

Proof. This is obviously true when n = 1. Suppose it has already beenestablished with n 1 in place ofn. We compute that

(x+ y)n = (x+ y)(x+ y)n1 = (x + y)n1i=0

n1

i

xni1yi

=

n1

i=0n1

i

x

ni

y

i

+

n1

i=0n1

i

x

ni1

y

i+1

.

Changing the index in the second sum by one gives

(x+ y)n =n1i=0

n1i

xniyi +

ni=1

n1i1

xniyi

=n1

0

xn +

n1i=1

n1i

+n1

i1

xniyi +

n1n1

yn

= n0xn +

n1

i=1n

ixniyi + n

nyn

where the last equality applies Lemma 1.4 and the fact that

n10

= 1 =

n0

and

n1n1

= 1 =

nn

.

In each of the proofs we have seen so far we invoked certain mathematicaltruths without proving them first. Specifically, the first proof uses the factthat any integer can be written as a product of prime numbers, and that ifan integer r is divisible by another integer p2, then r + 1 is not divisibleby p. The second proof uses the fact that the factorization of an integerinto a product of primes is unique, up to the ordering of the factors. The

inductive proofs use things like the commutative, associate, and distributiveproperties of addition and multiplication. For those new to writing proofs,these facts might all be very well known, and at the same time it is not clearwhat it is, precisely, that justifies the assumption that the readeralso knowsthem. There is also an important related question: how much detail do youneed to include? These questions have a practical and a theoretical aspect.


29/461

1.3. PROOFS 21

From a practical point of view, a proof occurs in some larger context

which creates certain assumptions about what is known, it is directed atsome actual or imagined audience, and it is intended to create certain effectsthat go beyond its purely logical mission. The proofs you read will usually bein books that (if they are logically organized) begin with some declarationof expectations concerning the readers prior study in mathematics, andsome statement of the fundamental assumptions of the work. Many books,including this one, begin at a lower level than the central topic, reviewingmaterial that should be familiar to any plausible reader, precisely in orderto create a commonly understood framework.

The proofs that youll write while taking courses are a bit different, sinceyour goal is to demonstrate that you understand the material, and that you

are a smart person who can present arguments clearly and precisely. Thequestion of what you can assume will be answered, to some extent, by thematerial already presented in the course, but really you should think interms of a proof having some central idea. Your goal should be to presentthat central idea clearly, with enough detail to convince the instructor thatyou are aware of and know how to handle any nuances in the argument.

Proofs are written in English: you should use proper grammar (or atleast try if youre not a native speaker) and write in complete, correctlypunctuated sentences. When you can use words in place of symbols withoutloss of precision, do so. In particular, the logical symbols, (or),,

,

, and

(there exists) should generally be avoided except when one

wishes to emphasize the logical structure. Creating a well written proofusually involves extensive rewriting. Often this is a matter of aiming forgreater brevity without loss of content, but more generally good mathemat-ical writing results from a process in which the author just keeps askingif there is some way to make things even a little bit easier for the readeruntil she really cant think of anything. This is hard work, but it can givea surprising amount of aesthetic satisfaction. Whereas the central defini-tions of mathematics are embodiments of profound thought and centuries ofexperience, proofs can be surprising, clever, and charming.

Both for the proofs you read and those you write, there is a differencebetween a proof in logic and a proof aimed at a human audience. For all

but the simplest theorems, a complete and exact proof in which every stepwas spelled out explicitly would be long and virtually unreadable, at leastby humans. (There is an active research program developing languages thatexpress proofs exactly, so that computers can verify them. Converting onepage of mathematics written for people into such a language currently takesroughly one week.) A proof for humans is really a compelling argument to


30/461


the effect that a proof in logic could be constructed. There is an expectation

that the reader will be able to fill in obvious details, and the appropriatestyle depends very much on the level of the intended audience.

An additional problem, both for reading proofs and writing them, is thatalthough they must (with minor exceptions) be presented in a logically linearfashion, to guard against circular reasoning, they are best thought of as theresult of a top-down way of thinking. That is, the proposition we wantto prove is first understood as a consequence of a few big intermediatesteps, then we look for proofs of these steps, perhaps breaking some of theseinto smaller pieces, and so forth. When reading a proof, you should not becontent with merely seeing how each step is a consequence of what camebefore. In addition, you should try to understand the larger architecture

of the argument, and you should try to imagine the process by which theauthor passed from the main ideas to the details.

After youve had a little practice, the problem of what you can legiti-mately assume in a proof will not seem so hard. But that does not meanthat we have resolved the issue from a theoretical point of view. In fact thisquestionWhere does mathematics begin?is a very hard one that isstill not completely settled. The next section describes an overall approachto it that is at least very effective in a practical sense.

1.4 Foundations: Sets, Relations, and Functions

Whence it is manifest that if we could find characters or signsappropriate for expressing all our thoughts as definitely and as exactlyas arithmetic expresses numbers or geometric analysis expresses lines,we could in all subjects in so far as they are amenable to reasoningaccomplish what is done in Arithmetic and Geometry.

For all inquiries which depend on reasoning would be performed bythe transposition of characters and by a kind of calculus, which wouldimmediately facilitate the discovery of beautiful results. For we shouldnot have to break our heads as much as is necessary today, and yet weshould be sure of accomplishing everything the given facts allow.

Moreover, we should be able to convince the world what we should

have found or concluded, since it would be easy to verify the calculation

either by doing it over or by trying tests similar to that of casting out

nines in arithmetic. And if someone would doubt my results, I should

say to him: Let us calculate, Sir, and thus by taking to pen and ink,

we should soon settle the question.

Gottfried Wilhelm Leibniz (1646-1716) The Method of Mathematics


31/461

1.4. FOUNDATIONS: SETS, RELATIONS, AND FUNCTIONS 23

Optimism about Leibniz dream peaked around the year 1900, as a result

of the development of set theory. Here well first explain how set theory isuseful to mathematics, then say a bit about why things didnt work out aswell as some had hoped.

You probably already know that a set is a collection of things calledelements. For instance{you, me} denotes a set whose two elements aresimply listed. A set is determined by its elements: if two sets have thesame elements, they are the same set. A set can contain a single element,if which case it is called a singleton. Dont confuse a singleton with itsunique element: a and{a} are not the same thing! The set that has noelements is called the null set or empty set and is denoted by. We saythat A is a subset ofB , and write A

B , if every element ofA is also an

element ofB. It is a proper subset ofB if, in addition, there is at leastone element ofB that is not an element ofA.

The basic operations for constructing sets include union, intersection,and set difference: ifA and B are sets, then their union A B is the setcontaining all the elements ofAand all the elements ofB, their intersectionA B is the set containing all the elements ofA that are also elements ofB ,and theset differenceA \B is the set containing all the elements ofA thatare not elements ofB. In addition, one may define a subset of a given setby selecting out those elements that have a certain property. IfP(b) meansbis red, andB is the set of balloons, then{bB : P(b) }is the set of redballoons. More generally, almost any method of constructing mathematical

objects can be used to define sets; well see many examples as we go along.Set theory provides an all-purpose toolkit for precisely describing math-

ematical objects and concepts we already know about, and for defining newones. For example, everyone knows that ordered pairs like (x, y) are impor-tant in mathematics, but just whatisan ordered pair? Using set theory, wecan define (x, y) to be{{x}, y}. There are other ways to define an orderedpair, and nobody really wants to work with this definition, so it probablyall seems pretty boring. But thats the whole point! By agreeing on sucha definition, all the ambiguity and potential for controversy is eliminated,

just as Leibniz had hoped.

Continuing, the cartesian product of two sets A and B is defined to

be

A B :={(a, b) :aA and bB }.Ordered triples, quadruples, etc., and cartesian products of three sets, foursets, etc., can be defined in many analogous ways, at least some of whichshould be obvious, and which are much too tedious to describe here.


32/461


A binary relation is defined to be an ordered triple r = (A,B,R) in

which A and B are sets and RA B. For example, the relation is tallerthan could be the triple (H,H,S) in which H is the set of people and Sis the set of pairs (p, q) is which p and qare people and p is taller than q.As with our formal definition of ordered pairs, this definition of a relation isuseful because it is precise and fully general, but in almost all cases we willthink of a relation, say less than, as a symbol like


33/461


restrictions are often so simple and straightforward that they are regarded

as too obvious to mention. But in order to develop a secure grasp of thefoundations you have to pay careful attention to the nuts and bolts, so wewill lean in the direction of discussing restrictions explicitly.

The importance of functions had been recognized long before set theorywas developed, but there was no single definition, and there was a tendencyto think of a function as synonymous with the formula that defined it. Thishas all the usual disadvantages of lack of standardization, and in addition itcreated a tendency to overlook the fact that a single formula can define morethan one function. For example, the formula f(x) = x2 defines a functionfrom the integers to the integers, another function from the rational numbersto the rational numbers, and a third function from the real numbers to the

real numbers. Set theory gave mathematicians a language that allowed themto settle on one general definition of the term function, with obvious andenormous benefits for mathematical communication.

By the way, the function concept is another one of those profoundideas that seems to embody some very deep wisdom about how mathematicalinformation should be organized, even though it has a very simple definition.It emerged gradually out of the experience of mathematicians, and its notso easy to say why it works so well. (Its easy to see that in some sensesets are like nouns and functions are like verbs, but do we really understandwhy human languages have this organization?) In somewhat the same way,academics have found that if they save copies of journal articles in file folders,

as most do, the only method that allows you to find what you are lookingfor is to label the folders with the names of the authors. (Filing by topicworks very poorly.) There is no obvious reason why no other method iseffective, but it is perhaps not at all coincidental that there is a well defined(and easily computed!) function from the set of journal articles to the setof authors that maps each article to its first author.

It is a digression, and a more advanced concept than one would typicallyexpect at this level, but I would like to explain another organizational prin-ciple that mathematicians have noticed, and found very useful, during thelast sixty or so years. A categoryC consists of:

(a) a class3 Ob(C) of things called objects;

(b) for each pair of objects A, B Ob(C), a setC(A, B) ofmorphismsfrom A to B;

3The concept of a class is a variant of the set concept that will be explained below.


34/461


(c) for each triple of objects A,B, C

Ob(

C) a function from

C(A, B)

C(B, C) toC(A, C) called composition. The image of (f, g) underthis mapping is denoted by g f.

Usually, but not always, the objects are sets, the morphisms are functions,and composition is composition of functions. This structure is requiredto have the following properties:

(i) Composition is associative: if A, B,C, D Ob(C), f C(A, B),g C(B, C), and h C(C, D), then

h (g f) = (h g) f.

(ii) For each object A Ob(C) there is an identity IdA C(A, A) suchthat IdB f = f and fIdA = f whenever A, B Ob(C) and fC(A, B).

For an initial example, we can point out that sets and functions consti-tute a category. Groups and homomorphisms give a slightly more interest-ing example. Suppose that G, G, and G are groups and : GG and :G G are homomorphisms. Then

((gh)) =((g)(h)) =((g))((h))

for all g, h

G, so

: G

G is a homorphism. Clearly IdG is a

homomorphism, and properties (i) and (ii) are satisfied by homomorphismsbecause they are satisfied by functions.

You have probably noticed that the definition of a category is extremelylong-winded, and at the same time everything it says is (in the applicationswe have seen, and many others) quite trivial. In fact one very useful aspectof this concept is that saying, for example, that groups and homomor-phisms constitute a category compresses a lot of easily understood, easilyverified information into a neat little package. Of course this alone wouldntmake the concept useful if the situation didnt arise frequently, but it does.Actually, categories are so ubiquitous in mathematics that there is not muchinteresting mathematics associated with the concept itself, at least until one

studies very advanced topics, in more or less the same way that there isnot much of interest to say about functions in general, even though thereare many interesting types of functions. For most of our applications ofthe concept there is no useful theory of categories, and in this sense wedont really need the concept. But the experience of mathematicians hasbeen that it is good to organize mathematics as the study of this or that


35/461


category, and much of the material in this book conforms to this principle,

so I think it will be illuminating to keep the concept in mind.The language of categories can already be used to give a very general

explanation of one reason groups are so important. Some of the terminol-ogy we introduced earlier in connection with groups is actually applicableto any categoryC. A morphismf C(X, Y) is anisomorphism if it hasan inverse, which is a morphism g C(Y, X) such that gf = IdX andf g = IdY. An endomorphismofXOb(C) is a morphismf C(X, X)whose domain and range are both X. An automorphismofXis an endo-morphism ofXthat is also an isomorphism.

Theorem 1.7. For any categoryC and any X Ob(C), the set of auto-morphisms ofX is a group.Proof. We must first of all show that a composition of automorphisms is anautomorphism, so suppose thatf and f are automorphism with inverses gand g . Since composition is associative we have

(f f) (g g) = f (f g) g =f IdX g =f g = IdXand

(g g) (f f) = g (g f) f=g IdX f=g f= IdX,

sof

fis indeed an automorphism becauseg

g is its inverse. It is now easy

to see that (a)-(c) of Definition 1.1 are satisfied: composition of automor-phisms is associative because composition of morphisms is associative, IdX isan identity element, and the category theoretic inverse of an automorphismis an automorphism, and an inverse in the group theoretic sense.

In addition to providing a sort of universal language and toolbox formathematics, set theory made some very important substantive contribu-tions to mathematical understanding. The theory of cardinality for infinitesets is particularly important and useful. For finite sets it clearly makessense to say that two sets Aand B have the same cardinality if there if abijection b : A

B. Georg Cantor (1845-1918) took the step of applying

this notion to infinite sets. In particular, we say that a setA is countableifthere is a bijection b : NA whereN :={1, 2, 3, . . .}is the set ofnaturalnumbers. IfA is an infinite subset ofA, then b1(A) ={n1, n2, . . .}is aninfinite subset of N, and we can define a bijection b : N A by settingb(i) := b(ni). Thus any infinite subset of a countable set is countable, socountability is the smallest infinite cardinality.


36/461


37/461


The Axiom of Choice: Ifr = (A,B,R) is a binary relation such that for

each aA there is at least one bB such that (a, b)R, then there is afunction f :AB with (a, f(a))R for all aA.

In the discussion above it was assumed that for each i there is a nonemptyset of functions like ci. But the argument actually assumes that there is a

functionici that simultaneously specifies such a bijection for everyi. Inthe preceeding proof we could explicitly define a function D : N {4, 5, 6}such that dn = D(n) is different from the n

th digit ofsn for all n N by,for instance, lettingD(n) be the smallest element of{4, 5, 6} different fromthenth digit ofsn. The set of all bijections between N and Si isnt endowedwith a structure that allows us to construct the desired function ici byspecifying a canonical choice of ci, and (although it is far from obviousat this point) there is simply no way to get the desired function withoutinvoking something like the axiom of choice.

More generally, the axiom of choice is not a consequence of other stan-dard assumptions of set theory, and it was a source of considerable con-troversy for many years. Nonconstructive reasoning of the sort employedby Cantor was sharply criticized by Leopold Kronecker (1823-1891) whichresulted in Cantor being embattled for much of his career. Ernst Zermelo(1871-1953) gave a precise formulation of the axiom of choice in 1904, andover the next few decades it became clear that attempting to live without itwould result in severe constraints on the sorts of mathematics that could be

done. Nowadays the type of mathematics advocated by Kronecker, whichis called constructivism, is a minor specialization that is of some interestmore broadly because constructivist mathematics is, to some extent, a use-ful model of what computers can do. Although some logicians study axiomsthat might be thought of as possible replacements for the axiom of choice,all other mathematicians utilize it freely.

Possibly youre wondering whether there is any cardinality between count-ability and the cardinality of the real numbers, which is sometimes calledthe cardinality of the continuum. Thats a damn good question. In1900 David Hilbert (1862-1943) gave a lecture in which he laid out a listof unsolved problems that he thought were very important, and which he

hoped might prove useful as targets to guide the development of mathe-matics during the coming century. Hilbert was then already the leadingmathematician in the world, and he would go on to make many other im-portant contributions, but nothing else he did is as famous as the HilbertProblems. The continuum hypothesisthe conjecture that there is nocardinality between countability and the cardinality of the continuumwas


38/461


the first problem on his list! As well explain in a little bit, the question was

resolved a few decades later. Would you care to guess what the answer is?So, as of 1900 mathematicians could see that sets could be used to repre-

sent just about any mathematical object. The next step in fulfilling Leibnizvision was to develop a symbolic calculus that gave a precise formal languagefor defining sets, creating new sets from given sets, and more generally repre-senting any valid mathematical argument as a sequence of allowed inferenceswithin an exact system of symbolic logic. This project was attempted byBertrand Russell (1872-1970) and Alfred North Whitehead (1861-1947) butRussell found an unexpected and extremely painful problem. LetSbe theset of sets that are not elements of themselves. Is San element of itself?Working through the two cases, we find that if it is, then it isnt, and if it

isnt, then it is. Ouch!Russell and Whitehead managed to salvage their project by developing

something called the theory of types which gave a very finely describedhierarchy of sets, carefully designed to prohibit the sorts of constructionsthat led to the paradox. Around the same time, Zermelo and AbrahamFraenkel (1891-1965) gave a different system of axioms describing allowedconstructions of new sets from given sets that they hoped would providea satisfactory foundation. These works, and the huge amount of researchdescended from them, are very complicated, and even professional mathe-maticians dont need to know that much about it, nor do many of themhave the time to study very deeply in this area. Mostly they take what

is generally called a naive approach to the subject, using the simplestconstructions freely and knowing a few additional things like the theory ofinfinite cardinals that appear frequently in other areas of research. Oneidea that is useful is the notion of a class. I must confess that I really haveno precise knowledge concerning how classes are described formally. Thegeneral intuition is that Russells paradox arises because we falsely assumethat the operations that are allowed for sets are also allowed for these morediffuse collections. By describing such collections as classes, while sharplycircumscribing the operations that classes allow, we are able to talk mean-ingfully about the class of all sets or the class of all groups, as we didin our discussion of categories, even though there is no set of all sets or

set of all groups.Leibniz was hoping not only for a language that could represent all math-

ematical objects (and all concepts of science, apparently) but also for com-putational procedures, analogous to the algorithms for addition and mul-tiplication, that would allow any well posed problem to be answered in amechanical and uncontroversial fashion. During the last one hundred years,


39/461


along with the fantastic growth in computational technology, researchers

have developed a detailed and precise understanding of what turn out to berather severe limits to what computation can possibly accomplish. Barriersoccur at several levels. For certain types of problems computational solutionis possible in principle, but the fastest possible algorithms would consumea vast amount of computer time if applied to any problem instance outsideof a few toy examples. In certain areas of mathematics there are typesof problems for which there can never be a general algorithm, even thoughany instance of the problem has a definite answer.

Finally, there are questions that simply have no answers. In 1931 KurtGodel (1906-1978) showed that any sufficiently rich system of symbolic logicmust include propositions that areundecidable, which means that neither

the proposition nor its negation can be proved using the logics formal rulesof deduction. (Since we can always expand our axiom system by appendingan undecidable proposition, or its negation, his argument actually showsthat there are infinitely many undecidable propositions.) In 1940 he showedthat the negation of the continuum hypothesis cannot be derived from theZermelo-Fraenkel axiom system. In 1963 Paul Cohen (b. 1934) showed thatthe continuum hypothesis cannot be derived from the Zermelo-Fraenkel ax-ioms, so it is undecidable.

As was the case with the axiom of choice, after this the continuum hy-pothesis can only be judged in terms of whether its consequences are morein accord with our intuitions, or more useful in applications, than the con-

sequences of its negation. Many mathematicians talk about this as a matterof determining whether the continuum hypothesis is true or not, but tome it seems that such ways of speaking further compound the problem thatthe word true is already overburdened with multiple meanings. If, onsome basis, we decided that the continuum hypothesis was true, and thensome mathematician showed that its negation implied wondrously beautifultheorems, would we really want to say that that person was doing falsemathematics?

In any event, although these ideas are quite important in the history ofmathematics, for the more mundane work of the rest of the book they are adistant background where the horizon meets the sky. The most important

point for us is that even if we lack a completely precise formal apparatus oflogical deduction, the language of set theory will allow us to proceed in amanner that is, in every practical sense, exact and rigorous.


40/461

Chapter 2

The Real Numbers

One of the things that makes math hard is that the practical side of thesubject, which is what most people spend most of their time studying, isbound up with the real numbers. Because the set of real numbers is so famil-iar, its easy to lose sight of the fact that it is actually an extremely complexmathematical structure. When one starts to approach the subject from thepoint of view of proofs it is important to develop clear understandings, orconventions, concerning the properties of the reals that are taken as given.This chapter lays out an axiom system for the real numbers. The axioms arenumerous, but, with one possible exception, each of them expresses a prop-

erty of the real numbers that has been familiar since you first learned aboutfractions and decimals and negative numbers and such, back in elementaryschool.

Well also look at a number of structures that share some of the proper-ties of the real numbers. Strictly speaking, this material is really not partof the standard curriculum in courses on calculus and linear algebra, and itis included mainly in the hope that youll find it interesting. But the ideaswell talk about are starting points of a great deal of mathematics that iscentral to the discipline, not to mention rich and deeply beautiful. Andthese seemingly unnecessary concepts and terminology will actually comeup frequently throughout the rest of the book.

2.1 Fields

To start off with, heres a big, Big, BIG definition.

32


41/461

2.1. FIELDS 33

Definition 2.1. A field is a triple (F, +,

) in whichF is a set and

+ :F F F and : F F F

are binary operations (written using the usual conventions of addition andmultiplication, i.e., the is usually omitted) with the following properties:(F1) x + (y+ z) = (x+ y) + z for allx,y , zF.(F2) There is0F such thatx + 0 =x for allxF.(F3) For eachxF there isxF such thatx + (x) = 0.(F4) x + y= y + x for allx, y

F.

(F5) x(yz) = (xy)z for allx, y, zF.(F6) There is1F\ {0} such thatx

Nature Origins

Documents