Abstract Set Theory [Skolem]

NOTRE DAME MATHEMATICAL LECTURES

Number 8

ABSTRACT SET THEORY

by

THORALF A. SKOLEM

Professor of Mathematics

University of Oslo, Norway

NOTRE DAME, INDIANA

1962

Copyright 1962UNIVERSITY OF NOTRE DAME

COMPOSITION BY BELJAN, ANN ARBOR, MICHIGAN

PHOTOLITHOPRINTED BY GUSHING - MALLOY, INC.ANN ARBOR, MICHIGAN, UNITED STATES OF AMERICA

1962

PREFACE

The following pages contain a series of lectures on abstractset theory given at the University of Notre Dame during the FallSemester 1957-58. After some historical remarks the chief ideasof Cantor's theory, now usually called the naive set theory, areexplained. Then the axiomatic theory of Zermelo-Fraenkel is de-veloped and some critical remarks added. In particular the set-theoretic relativism is emphasized as a natural consequence ofthe application of Lowenheim's Theorem on the axioms of settheory. Other versions of axiomatic set theory which logicallyare of very similar character are not dealt with. However, thesimple theory of types, Quine's theory and the ramified theory oftypes are treated to a certain extent. Also Lorenzen's operativemathematics and the intuitionist mathematics are outlined.Further, there is a short remark on the possibility of finitistmathematics in a strict sense and finally some hints are givenabout the possibility of a set theory based on a logic with an in-finite number of truth values.

The book "Transfinite Zahlen" by H. Bachmann has beenvery useful in particular for the writing of parts 6 and 8.

Some references to the literature on these subjects occurscattered in the text, but no attempt has been made to set up acomplete list. Such a task seems indeed scarcely worth while,because very extensive and complete lists can be found both inthe mentioned book of Bachmann and in the book "Abstract SetTheory" by A. Fraenkel.

Th. Skolem.

CONTENTS

1. Historical remarks. Outlines of Cantor's theory 1

2. Ordered sets. A theorem of Hausdorff 7

3. Axiomatic set theory. Axioms of Zermelo and Fraenkel 12

4. The well-ordering theorem 19

5. Ordinals and alephs 22

6. Some remarks on functions of ordinal numbers 28

7. On the exponentiation of alephs 32

8. Sets representing ordinals 35

9. The notions "finite" and "infinite" 38

10. The simple infinite sequence. Development of arithmetic 41

11. Some remarks on the nature of the set-theoretic axioms. Theset-theoretic relativism 45

12. The simple theory of type.s 48

13. The theory of Quine 50

14. The ramified theory of types. Predicative set theory 52

15. Lorenzen's operative mathematics 61

16. Some remarks on intuitionist mathematics 64

17. Mathematics without quantifiers 68

18. The possibility of set theory based on many-valued logic 69

ABSTRACT SET THEORY

fayThoralf A. Skolem

1. Historical remarks. Outlines of Cantor's theory

Almost 100 years ago the German mathematician Georg Cantor was study-ing the representation of functions of a real variable by trigonometric series.This problem interested many mathematicians at that time. Trying to extendthe uniqueness of representation to functions with infinitely many singularpoints he was led to the notion of a derived set. This was not only the begin-ning of his study of point sets but lead him later to the creation of transfiniteordinal numbers. This again lead him to develop his general set theory. Thefurther development of this, the different variations or modifications of itthat have been proposed in more recent years, the discussions and criticismswith regard to this subject, will constitute the contents of my lectures on settheory.

One ought to notice that there have been some anticipations of Cantor'stheory. For example B. Bolzano wrote a paper with the title: Paradoxien desUnendlichen (1951) (Paradoxes of the Infinite), where he mentioned some ofthe astonishing properties of infinite sets. Already Galilei had noticed theremarkable fact that a part of an infinite set in a certain sense contained asmany elements as the whole set. On the other hand it ought to be remarkedthat about the same time that Cantor exposed his ideas some other peoplewere busy in developing what we today call mathematical logic. These investi-gations concerned among other things the fundamental notions and theorems ofmathematics, so that they should naturally contain set theory as well as othermore elementary or ordinary parts of mathematics. A part of the work ofanother German mathematician, R. Dedekind, was also devoted to studies of asimilar kind. In particular, his book "Was sind und was sollen die Zahlen"belongs hereto.

In my following first talks I will however confine my subject to just anexposition of the most characteristic ideas in Cantor's work, mostly done inthe years 1874-97.

The real reason for a mathematician to develop a general set theory wasof course the fact that in mathematics we often have to do not only with singlemathematical objects but also with collections of them. Therefore the studyof properties of such collections, even infinite ones, must be of very greatimportance.

There is one fact to which I would like to call attention. Most of mathe-matics and perhaps above all the classical set theory has been developed inaccordance with the philosophical attitude called Platonism. This standpointmeans that we consider the mathematical objects as existing before and in-dependent of our actual thinking. Perhaps an illustrating way of expressingit is to say that when we are thinking about mathematical objects we are look-ing at eternal preexisting objects. It seems clear that the word "existence"

2 LECTURES ON SET THEORY

according to Platonism must have an absolute meaning so that everything wetalk about shall either exist or not in a definite way. This is the philosophicalbackground for classical mathematics generally and perhaps in particular forclassical set theory. Being aware of this, Cantor explicitly cites Plato.Everybody is used to saying that a mathematical fact has been discovered,not that it has been invented. That shows our natural tendency towards Plato-nism. Whether this philosophical attitude is justified or not, however, I willnot discuss now. It will be better to postpone that to a later moment.

When Cantor developed his theory of sets he liked of course to conceivethe notion "set" as general as possible. He therefore desired to give a kindof definition of this notion in accordance with this most general conception.A definition in the proper sense this could not be, because a definition in theproper sense means an explanation of a notion by means of more primitive orpreviously defined notions. However, it is evident that the notion "set" istoo fundamental for such an explanation. Cantor says that a set is a collectionof arbitrary well-defined and well-distinguished objects. What is achieved,perhaps, by this explanation is the emphasizing that there shall be no restric-tion whatever with regard to the nature of the considered objects or to theway these objects are collected into a whole. Taking the Platonist standpoint,it is clear that this whole, the collection, must itself again be considered asone of the objetts the set theory talks about and therefore can be taken as anobject in other collections. This is indeed clear, because there are no re-strictions as to the nature of the objects.

Now we are very well acquainted with sets in daily life. These sets arefinite, but I shall not now enter into the distinction between finite and infinitesets. The most important mathematical property of the finite sets is thenumber of their elements. By the way I write

me M,

expressing that m is an element of or belongs to M. Indeed this notation isused everywhere in the literature. If we shall compare two finite sets M andN with regard to number, we may do that in the way of pairing off the ele-ments by distributing as far as possible the elements of M and N into disjointpairs. Let us for simplicity assume M and N disjoint, that is, without com-mon elements. If it is possible to distribute the elements of M and N into dis-joint pairs (m,n), meM, neN, such that all meM and all neN occur in thesepairs, then it is evident that there are just as many elements m in M as ele-ments n in N. If, however, we may build a set of pairs (m,n) such that all moccur, but not all n, then in the case of finite sets M possesses less elementsthan N. It is clear that it must be possible to compare sets by consideringsuch sets of disjoint pairs in the case of infinite sets as well. This leads toone of the most important notions not only in the classical set theory butalso in ordinary mathematics, namely, the notation of one-to-one correspon*-dence or mapping. We say that f is a one-to-one correspondence between thesets M and N, if f is a set of mutually disjoint pairs (m,n) such that eachmeM and each neN occur in one of the pairs. In order to be able to take intoaccount the case that M and N have some common elements, it is necessaryto replace the simple notion pair {a,b}, which means the set containing a andb as elements, with the notion ordered pair (a,b), which can be conceived as{{a,b}, {a} }. However I will here, to begin with, use the notion ordered pair,triple etc. as known ideas without worrying about an analysis of them.

HISTORICAL REMARKS 3

Possessing the notion one-to-one correspondence or mapping, we mayobtain this generalisation of the number concept:

M and N have the same cardinal number, if a mapping f exists of M on N.This circumstance is written M ~ N.

Cantor says that the cardinal number M of M is what remains, if wemake an abstraction with regard to the individual characters of Us elements.This definition is made much clearer by Russell, who says that M is the setof all sets N being ~ M._ Further, this definition of the relatiqg = Between cardinals was natural:M i N if M is ~ a subset of N. Further M < N if M ~ a subset of N, but Nnot ~ M.

Let us again introduce some notations. I shall write A £ B when the setA is contained in B, and AcB, if A is contained in B, but not inversely B inA. Then we know that for the finite sets as we encounter them in everydaylife, there is never a mapping of the set on a proper part of itself. Thus, ifM is finite,

Nc M -» N <M.

Dedekind uses this as a definition of finite sets: A set M is finite, if it is not~ any proper part N of itself.

On the other hand, if we look at the simplest infinite set we know,namely the number series 0,1,2,..., then it is easily seen that this set admitsa mapping on a proper part of itself, for example, the set of positive integers1,2,.... It is said that already Galilei wondered about this, and found it anastonishing property of an infinite set, that a proper part of it could in acertain sense be said to possess just as many elements as the whole set.

Some further notations may suitably be mentioned now. We write

M U i N resp. M O N

as the notation for the union of M and N, resp. the intersection of M and N.Thus M U N contains as elements all the elements of M and N and only these,while M n N contains as elements just the common elements of M and N. IfM n N is empty, i.e., M and N are disjoint, I shall often write M + N insteadof M U N. Both operations can be generalized very far. Let T be a set whoseelements are again sets A,B,C,.... Then I will write ST and DT as denotationsfor the union of all sets A,B,C,..., resp. the intersection of all A,B,C,...

In natural analogy to the arithmetic of finite sets, addition of cardinalsis defined thus:

M + N = M + N, ifM and N are disjoint, and generally, if A,B,C,..., constitut-ing all elements of T, are disjoint in pairs, then ST is said to be the sum ofthe cardinals of all the elements A,B,C,... of T.

These definitions are justified by the simple theorem:

If A ~ A1, B ~ Bf, C ~ C1, , any two of A,B,C,.... as well as any two ofA f ,B f ,C f .... being disjoint, then ST ~ ST1, TT denoting the set of all AT,B f,Ct, ....

The proof of this theorem is of course quite trivial, but as we shall seelater, the so-called axiom of choice must be applied.


Multiplication of cardinals is defined in the simplest way by taking againdisjoint sets. If M and N are two such sets, we may build the set of all pairs{m,n}, where m and n run independently through all elements of M and N.This set will be written M • N. It is again easy to see that if M ~ M', N ~ N',where MT and NT are again disjoint, then the set M1 • N1 of all pairs {mf, nf}will be ~M • N. Therefore we may define an operation on the cardinalnumbers called multiplication by putting

M • N = M • N .

This can then in an obvious way be generalized to the general case, where Tis a set of mutually disjoint sets A,B,C,.... Letting PT denote the set of allsets which consists of just one element from ja.ch of A,B,C,..., we say thatPT is the product of the cardinal numbers A,B,C,... Using ordered pairs wemay define the Cartesian product MX N of M and N. This is the set of allordered pairs (m,n) such that meN, neN. Of course MX N = M • N.

A natural assumption after the discovery that the natural number seriesis ~ proper parts of itself was that many sets of mathematical objects oughtto possess the same cardinal number as the number series, even if they con-tained the latter as a proper subset. This assumption Cantor proved to becorrect. Quite trivial is the remark that the series of integers > a certainnegative integer is of the same cardinality as the series of non-negative in-tegers. A little more remarkable is the fact that this is true of the set of allrational integers, negative, positive or zero. The last fact is verified bywriting the integers for ex. in this order:

0, -1, 1, -2, 2, -3, 3,

Or in other words, if we put for x = 0

y = 2x

and for x < 0

y = -2x - 1,

then this function y of x furnishes a 1-1- correspondence between all integerson the one hand and the non-negative ones on the other hand.

Let P denote the set of all pairs of non-negative integers, while N is theset of the non negative integers themselves. Then one finds that

= (x+y + 1) + /x\\ 2 / (l)

yields a one-to-one correspondence between P and N. Indeed to every pair(x, y) corresponds a unique value of z and to each value of z there is only onepair of non-negative integers x, y such that the above equation is fulfilled.

Similarly the set of all ordered n-tuples (xi,..., xn) all xêN has the samecardinal number as N. All sets possessing this cardinal number are calleddenumerable.

Turning to the more often considered sets of numbers, Cantor provedthat the set of all rational numbers is denumerable. We can take the rationals

ain the form r-, b > 0, a and b coprime integers. Then we arrange the ra-

tionals so that lal + b successively takes the values 1,2,3,.... and the for

HISTORICAL REMARKS 5

which lal + b has the same value we arrange according to their magnitude.Thus we obtain the sequence

£ l l ± l l 2 ^ j l j ^ I 3 : ^ j 4 H ^ ^ z 3 I ^ ^ 4 l H ^ + 3 j 41' 1' 1' 1' 2' 2' I9 1' 3' 3' 1' 1' 2' 3' 4' 4' 3' 2' 1'

containing all the rational numbers.Cantor proved also that even the set of all algebraic numbers is denum-

erable. This can be done in the simplest way as follows. Every algebraicnumber is a root in an irreducible equation anx

n + ....+ ao = 0 for some n,the a0,.... an being integers with 1 as g.c. div. Now we can arrange then-tuples an, ...., a0 in a sequence by taking the successively increasingvalues of

m = |an| + + I a 0 l + n.

Those with the same m we can take according to increasing values of n, andfor those with the same value of m and n, which are only finite in number, wearrange the corresponding roots first according to their absolute value andfinally those which have the same absolute value we arrange according toincreasing amplitude.

One might get the impression that all infinite sets were denumerable.However, Cantor proved that the set of all real numbers, even all reals be-tween 0 and 1, is not denumerable. His proof is performed by the diagonalmethod, called after him in the literature: Cantor's diagonal method.

We know that every real number = 0 and < 1 can be written as a decimalfraction

0. ai a2....

and this decimal fraction is unique, if we require tnat there shall not occuronly 9Js from a certain place on. Then let us assume that

c*i = 0. an a2i ...

Q?2 = 0. aw a22 ...

were all reals ^ 0 and < 1. Let the real number 0 be O.bib2 ...., where br

for each r is the next digit after arr (0 when arr is 9) except when all an froma certain i on are all 8; then we take the bi as 7 for example. Then obviously0 ^ /3 < 1, while ]3 is 4= every at. Thus the set of reals i 0 and <1 is not denu-merable.

This means that in Cantor's theory we have to do witn different infinitecardinals. It is now natural to ask, if spaces of higher dimensions wouldyield greater cardinals. Cantor showed that this is not the case. His resultthat e.g., a plane could be mapped onto a line or say onto a segment of astraight line astonished the mathematical world at that time. I shall now ex-pose a proof of the fact that the 1. quadrant of a plane, say in Cartesiancoordinates the set of all pairs of positive real numbers x,y, can be mappedon the real numbers z > 0. The definition of such a mapping is particularlyeasy when we make use of the development of reals in continued fractions.Any positive real number a can be developed thus:

a1 = ao + *


where a<j^ 0, while ai, a2,.... are all ^ 1. Now I define the correspondence sothat the points (x, y), where x and y are both irrational, are mapped on theirrational z > 2, the points (x,y), where x is irrational, y rational, are mappedon the irrational z such that 1 < z < 2, the points (x, y), where x is rational,y irrational, are mapped on the irrationals z < 1, and finally the rationalpoints (x,y) are mapped on the rationals z. This mapping is defined as fol-lows. As often as x and y are both irrational, their continued fractions being

y -Q - 9 0 .

X + - y +

x2 + ... yi y2

the corresponding z shall be

1z = XQ + 2 +

yo + 1 + —

* + ̂ + 11

+x2

y2 +. . .If x is irrational, but y rational, the corresponding z shall be

z = l + \n +- + 1 +

+X2

where n is the number given to y in an enumeration of all rationals. If x isrational, y irrational, the corresponding z is, when n is the number of x,

1

n +yo + l + —

Finally the (x,y) where x and y are both rational and > 0, are mapped in anarbitrary way on the rational numbers z > 0.

Cantor also proved generally that the set UM of subsets of a set M wasof higher cardinality than M; however, I will talk about this theorem later.

ORDERED SETS.

2. Ordered sets. A theorem of Hausdorff.

One obtains a more complete idea of Cantor's work by studying his theoryof ordered sets. As to the notion "ordered set" this is nowadays mostly de-fined in the following way:

A set M is ordered by a set P E M2, if and only if the following state-ments are valid:

1) No pair (m,m), meM, is eP.

2) For any two different elements m and n of M either (m,n)eM or (n,m)eMbut not both at the same time.

3) for all m,n,peM we have (m,n)eP & (n,p)eP—«-(m,p)eP (transitivity).

As often as (m,n)eP, we also say m is less than n or m preceeds n, writtenm< n.

K M and N are ordered sets and there exists a one-to-one order-preserv-ing correspondence between them, Cantor said that they were of the sameorder type and wrote M - N. They are also called similar. Evidently twoordered sets of the same order type possess the same cardinal number; butthe inverse need not be the case. Only for finite sets is it so that two O£deredsets of the same cardinality are also of same type. Cantor denoted by M theorder type of M.

That two infinite ordered sets possessing the same cardinal number mayhave different order types is seen by the simple example of the set of positiveintegers on the one hand and that of the negative integers on the other. Bothsets are denumerable, but obviously not ordered with the same type, becausethe former has a first member, which the other has not, whereas the latterhas a last member, which the former does not possess. Cantor studied to acertain extent the denumerable types, also types of the same cardinality asthe continuum, but above all he studied the so-called well-ordered sets. Inthis short survey of Cantor's theory I shall only mention some of the mostremarkable of his results and add a theorem of Hausdorff.

It will be necessary to define addition and multiplication of ordered sets.If A and B are ordered by PA and PB while A and B are disjoint, the sum setA + B will be ordered by PA + PB + A • B. We have of course to distinguishbetween A + B and B + A. This addition may be extended to the case of anordered set T of ordered sets A,B,C,... which are mutually disjoint. Indeedthe union (or sum) ST will then be ordered by the sum of the sets PA,PB>PC>.... and the products X • Y when (X,Y) run through all pairs which are theelements of the ordering set PT for T.

By the product of two ordered sets A and B we understand A • B orderedlexicographically: that means that ai, bi precedes ai, b2 if either ai precedesa2, or ai = a2, but bi precedes b2. This definition also admits generalization,but that will not be necessary just now.

If a 1-1-correspondence exists between the ordered^ sets M and N suchthat the order is reversed by the correspondence, then N is said to be theinverse order type of M. For example the order type of the set of negativeintegers is the inverse of the type of the positive integers. Cantor denotesthe inverse of the order type a by a*. Thus a; and u>* denote the types of thesets of positive and of negative integers.


An interesting class of ordered sets are the dense ones. An ordered setis called dense, if there is always an element between two arbitrary ones.The simplest example is the set of rational numbers in their natural order.This set is also open, that means that there is no first and no last member.Now we have the remarkable theorem:

There is one and only one open and dense denumerable ordertype.

Proof. Let A = {ai, a2,....} and B = {bi, b2,....} be two denumerable sets,both open and dense. First we let ai correspond to bi. Then ai divides theremaining elements of A into those < ai and those > ai. Let ami be the aj

with least index < ai and am2 the a^ with least index > ai. Either mi or m2

is 2. Letting bnj be the bj with least j < fy , while bnj2 is the bj with least

j > bi, then either ni or n2 is 2. We let bni correspond to ami and bn to

am . Now every remaining a^ from A is either < am or > ami but < ai, or

> ai but < am2 or > am2, which gives 4 cases. There are 4 corresponding

cases for the remaining bj. Then if am3 is the a^ with least i such that aj <

ami and bna the b; with least j such that bj<bmi, we let am3 correspond to bna

and so on. It is easily seen how we obtain in this way an order-preservingcorrespondence between the at and the bj. One has only to remark that if amis the ai with the least i which has not already got any corresponding bj, thenit gets one when in the different intervals between the already chosen am thefurther amg are chosen. r

We have further:

In an open and dense denumerable set a subset can be found similar to anygiven denumerable ordered set. This is seen in a similar way as in the proofof the preceding theorem. Indeed if bi, b2,.... are elements of an arbitrarydenumerable ordered set while ai a2 .... is an open and dense denumerableset, then we may map bi on ai. Then according as b2 < bi or > bi we mapb2 on an element a"< ai or > ai. Then b3 is either less than both bi and b2or lies between bi and b2 or is greater than both. Respectively we map b3on an element a f l f having the same order relation to ai and aff and so on.

Let us use the term scattered set for a set having no dense subset. Thenan interesting theorem of Hausdorff says that every ordered set is eitherscattered or the sum of a set T of such sets, where T is dense.

Proof: It is easy to understand that if an interval a to b in an orderedset is scattered and the interval b to c as well, then the whole interval a to chas the same property. Indeed, if d < e both belong to a dense set S then theset of all xeS such that d ^ x i e constitute a dense subset of S, and an eventualdense subset of the interval a to c must either contain at least 2 elements inthe interval a to b or at least 2 in the interval b to c. Therefore the statementthat the interval between a and b in an ordered set M is scattered is transitiveso that we can divide M into classes A,B,C,... such that in each class any twodifferent elements furnish a scattered interval. These classes are thereforesuccessive parts of M, each of them scattered. On the other hand, if there are

ORDERED SETS 9

two different ones A and B, there must always be a C between, else A and Bwould amalgamate into one class. Thus a set T of the successive scatteredparts of M must be dense.

As to the denumerable ordered sets I should like to mention two factswhich will be useful when I talk about Cantor's second number class. If adenumerable ordered set M has no first element, then it is coinitial with a;*,and if it has no last element, it is cofinal with cu These statements meanthat we can in the first instance find an infinite sequence of type co* in the setsuch that there is no earlier element than all these in M, and in the secondinstance we may find an infinite sequence of type cu such that there is no ele-ment in M after all these.

Proof: Let in the first case ai€M, ani be the a^ with least i such that

aj < ai , further anz be the aj with least i such that ai < anj , etc. Clearly

1 < ni < j\2 < ... If am were < every an , then we should have m > 1, ni, n2 ,

..., which is absurd. Similarly in the second case.

Among the ordered sets, the well-ordered ones, namely those possessinga least element in every non-empty subset, are especially important. Thatwell-ordering is equivalent to the principle of transfinite induction is wellknown. This principle says that if a statement S is always valid for an ele-ment of a well-ordered set M when it is valid for all predecessors, then S isvalid for all elements of M. Further I ought to mention that the sum of awell-ordered set T of well-ordered sets A,B,C,.... is again a well-orderedset. If T is denumerable and a denume ration is simultaneously given foreach element A,B,C,... of T, then the sum is a well-ordered denumerable set.Also the product of two well-ordered sets is again well-ordered.

The order types of the well-ordered sets are called ordinal numbers.These ordinals Cantor has introduced by a creative process which is verycharacteristic of his way of thinking. I will now give an exposition of thiscreative process.

He begins with the null set 0 containing no element. Then since this 0 isan object of thought he has obtained one thing which he denotes by 1. (Wemay think of 1 as the set {0}, see the later axiomatic theory). Now, conceiv-ing 0 and 1 as ordinals he has the right to write 0 < 1. Then he has this setof two ordinals which represents the ordinal 2. Having obtained 0 < 1 < 2 hehas an ordered set representing the ordinal 3. Now he has 0 < 1 < 2 < 3which furnishes a well-ordered set with 4 elements, etc. Now he thinks thisprocess continued infinitely so that he obtains the set of all positive integers0 < 1 < 2 <.... This well-ordered set, however, represents an infinitelygreat ordinal co . Then he has

a set containing all finite integers together with co. This is a well-orderedset representing a greater ordinal than co, denoted by cu+ 1. Proceeding inthis way he obtains after a while

a well-ordered set consisting of two infinite series of increasing ordinals.This set represents a still greater ordinal written as cu + cu.


It is evident that all the infinite sets hitherto introduced are denumerable.But now Cantor collects all ordinals of denumerable well-ordered sets. Thisset represents an ordinal that is not denumerable. Strictly speaking theaxiom of choice is being taken into account here, but Cantor uses that as anevident principle without even being aware of it. According to this principlewe have that a denumerable set of denumerable or finite sets has a denum-erable union. Now let us assume that the ordinals of finite and denumerablesets constitute a denumerable set. Then this set is cofinal with co, becausethere is evidently no greatest ordinal of this kind. Thus we may assumethat oil < a2 < a 3 < ..... is a sequence of type co, such that every denumer-able ordinal is = some ar- However we <iould then find finite or denumer-able ordinals &, &,.... such that

but now the ordinal

y = ai + Pi + $2 + ...must be denumerabie. Nevertheless y is clearly > every ar, so that we geta contradiction. Therefore the sequence of all finite and denumerable ordi-nals represents a non-denumerable ordinal. This was by Cantor denoted byQ.

Cantor used the first letter aleph, written N, of the Hebraic alphabetwith indices to denote the cardinal numbers of well-ordered sets. The cardi-nal of a), that is the cardinal number of the denumerable sets he called N0,the cardinal of Q he called HI . He proved that every subset of Q is eitherfinite or has the cardinal N0 or the cardinal tf j. . Indeed if we have a subsetof Q we may enumerate successively the elements of the subset by the ele-ments 0,1,2,..., co, co + 1, ... of Q and then either this enumeration will stopwith a finite number n or it will stop with some a < Q or it does not stop, sothat the subsequence also has the ordinal Q .

The finite ordinals are also called those of the first number class, thedenumerable ones those of the second class. Now Cantor again collects theordinals of cardinal number NI and proves similarly that they constitute asequence of still greater cardinality N2 • There is no cardinal between NJand N2 - The ordinals belonging to well-ordered sets whose cardinal numberis NI are said to be the numbers of the third number class. In this way hecontinues and obtains an increasing infinite sequence of alephs

each Nn+l being the cardinal number of the set of all ordinals represented bywell-ordered sets with cardinal number Nn. These latter ordinals are thoseof class n + 2.

But now he collects all ordinals belonging to all the classes with finitenumber. Then he obtains a set with a cardinal number which is suitably de-noted NU, being > every Nn, n finite, while there is no cardinal between theNn and this Nw . From Nwhe then derives Kw+1 , ^w+2 etc. Quite generallythere is an K^fc r every ordinal a.

It must be conceded that Cantor's set theory, and in particular his crea-tion of ordinals, is a grandiose mathematical idea. But what was at that timethe reaction of the mathematical world to all this? In the first instance the

ORDERED SETS 11

reaction was rather unfavourable. No wonder, these ideas were too new andtoo strange. However, very soon the reaction got favourable for two reasons;1) Cantor's way of thinking was of the same nature as, for example, Cauchy'sand Weierstrass's treatment of analysis and the theory of functions, 2) Manyof the notions introduced by Cantor were useful in ordinary mathematics.There were, however, also some opponents, above all Kronecker and Poin-care. Kronecker did not only attack Cantor's theory of sets but also mostof ordinary analysis. He required decidable notions. Poincare's main objec-tion was that in set theory so called non-predicative definitions are usedwhich according to him (and also Russell) are logically objectionable. Thesituation for Cantor's theory became indeed very much changed after 1897.In this year the Italian mathematician Burali-Forti discovered that the theoryof transfinite ordinals leads to a contradiction. According to the Platonistpoint of view the existing ordinals are well-defined and well-distinguishedobjects such that they, according to Cantor's definition, should constitute aset. This set is well-ordered, therefore it represents an ordinal. Howeverthe ordinal represented .by a well-ordered set of ordinals is always greaterthan all ordinals in the set. Thus we obtain an ordinal which is greater thanall ordinals, which is absurd.

Another still better known antinomy was discovered a few years later(1903) namely Russell's. Ordinary sets are not elements of themselves.According to platonism the existing sets which are not elements of themselvesought to constitute a set U. We have then the logical equivalence

x e x-—«~x e U.

If, however, we put here U instead of x, which should be allowed because theequivalence should be generally valid, we get

Ue"u—UeU

which of course is absurd. Also Cantor's theorem that the set UM of all sub-sets of M is of greater cardinality than M leads to an absurdity when we askif there is a greatest cardinal or not. Indeed according to this theorem thereis no greatest cardinal. But the union of all sets ought on the other hand tohave the greatest possible cardinal number.


3. Axiomatic set theory. Axioms of Zermelo and Fraenkel

The discovery of the antinomies made it clear that a revision of theprinciples of set theory was necessary. The attempt to improve set theorywhich is best known among mathematicians is the axiomatic theory first setforth by Zermelo. I shall expose his theory in a somewhat more preciseform, replacing his vague notion "definite Aussage" (= definite statement)by the notion proposition or prepositional function in the first order predicatecalculus. We assume that we are dealing with a domain D of objects togetherwith the membership relation e, so that all propositions are built up fromatomic propositions of the form xey by use of the logical connectives &, v, - ,-^•( and, or, not, if - when) and the quantifiers (x), (Ex) (for all x, for some x).Then the following axioms are assumed valid. I write them both in logicalsymbols and in ordinary language.

1. Axiom of extensionality.If x and y have just the same elements, then x = y. In symbols

(z)(zex— -zey) — -(x = y)

Here x = y has the usual meaning, so that

where U is an arbitrary predicate. Hence we also have

2. Axiom of the small sets.a) There exists a set without elements denoted by the symbol 0. Because

of 1. there can be only one such set.

(Ex)(y)(ylx).

b) For every object m in D there exits a set {m} containing m, but onlym, as element,

(x)(Ey)(xey & (z)(zey — (z = x) ) )

c) For all m and n in D there exists a set {m, n} containing m and n, butonly these, as elements.

(x)(y)(Ez)(xez & yez & (u)(uez— *(u =x) v (u = y))) .

Of course b) might be omitted because it follows from c) by putting n =m.

3. Axiom of separation.Let C(x) be a prepositional function with x as the only free variable, and man arbitrary set. Then there exists a set consisting of all elements x ofm having the property C(x).

(x)(Ey)(z)(zc y— C(z) & zex)

AXIOMATIC SET THEORY 13

4. Axiom of the power set.For every set m there exists a set Um whose elements are just all subsetsof m.

(x) (Ey) (z) (ze y—(u) (ue z -me x))

5. Axiom of the union.For every set m there exists a set Sm whose elements are just all ele-ments of the elements of m.

(x)(Ey)(z)(zey-*-^(Eu)(zeu & uex))

6. The axiom of choice.Let T be a set whose elements are mutually disjoint sets A,B,C,... 4= 0.Then there exists a set M having just one element in common with each ofthe sets A,B,C,...

(x)((y)(z)(yex & zex & y ± z—~(u)(ue~x v uey))—*-(Ev)((w)(wex -*-

(Et)(tev & tew & (s)(sev & sew—s = t)) ).

These are the most general axioms set up by Zermelo (1908). Most ofthe general theorems of set theory are proved by the aid of these axioms.However, in order to ensure the existence of infinite sets Zermelo added:

7. The axiom of infinity.There exists a set U such that OeU and whenever xeU, {x} is eU as well.

(Ex)(0ex & (y)(yex-^{y} ex) ).

Later Fraenkel introduced a further axiom which is more powerful withregard to the proof of the existence of large transfinite cardinals, namely thefollowing.

8. Let the binary relation F(x,y) (= prepositional function of two free vari-ables x,y and any number of bound variables derived from the membershiprelation by the means of the predicate calculus) be such that(x)(y)(z)(F(x,z) & F(y,z)—*-(y = x)). Then to every set m there exists aset n such that xen-*—(Ey)(y em & F(x,y)). Or written more completely:

(u)(v)(w)(F(u,w) & F(v,w)—(u = v)) —(x)(Ey)(z)(zey—(Eu)(uex & F(z,u)).

The following development of the Zermelo-Fraenkel set theory is car-ried out in such a way that it could be formalized in the predicate calculus.Such a procedure would however be very cumbersome if it were performed inall details. Therefore I have chosen an exposition that is somewhat more in-formal and more like the ordinary mathematical procedures.

Theorem 1. (x)(Ey)(yeF x) .

That means that to each set M we may find an object a such that a e M.Therefore the total domain D is not a set.

Proof: According to the axiom of separation, the xeM for which xex istrue, constitute the diverse elements of a set N. Then Ne M. OtherwiseNeN would imply Ne N and inversely.

Theorem 2. To each M and N there is anM* such that M! ~ M and M' 0(M UN) = O.


Proof: Let a be I S(MUN).

The pairs {a,m}, where m runs through M, constitute a set Mf obviously~ Mbecause the pairs (m, {a,m}) furnish a one-to-one correspondence be-tween M and Mf. Indeed if {a,mi}4= {a,m2}, then mi =1= m2, and if mi ± m2,then {a,mi} =f= {a,m2}, because else we must have mi = m2 or mi = a & m2 = a,whence again mi = m2 . Now MT is disjoint to M U N, because otherwise wewould have an element m of M such that {a,m}eM UN, whence aeS(M UN),contrary to supposition.

Theorem 3. Let T be a set of sets A,B,C,.... 4 0 Then there exists a set T1

of sets Af,5!,Cf,... together with a one-to-one correspondence between Tand T1 such that the unions ST and ST1 are disjoint while A',131 ,Cf,... aremutually disjoint and resp. ~ A,B,C,....

Proof: According to the previous theorem a set P exists which is dis-joint to T U ST, while P ~ T, which means that we have a one-to-one mappingf(X) = XM such that Xff runs through P when X runs through T. For everyXeT the pairs

tf(X), x},

where x runs through X, constitute a set F(X). The function F has an inverse.Indeed, as often as Xi 4= X2 , F(Xi) and F(X2 ) will be disjoint, because f (Xx) 4=f(X2), and if we compare two elements from F(Xi) and F(X2), namely

X l}and (f(X2), x 2 },

we cannot have f(Xi) = x2 , because X2 and P are disjoint. Therefore F andits inverse Ff give a one-to-one correspondence between T and Tf when Tv isthe set of all F (X) = Xf, X running through T. For every XeT, the pair

{f(X), x} eX'

will correspond uniquely to xeX. If this pair is called gx(x)> then g^ and itsinverse yields a mapping between X and XT. In this way we have obtained asimultaneous mapping of the elements of X and those of Xf for all X.

Thus the theorem is proved. However we may add the following remark:The function g is such that if xeX then x1 = g^ (x) is eX' and xex1.

We have: To every xeX the x1 = gx(x) is the element of Xf such thatxex f , and inversely if x reX f is given, the xeX such that gx(x) = xf is theelement of X which is ex f . The simultaneous mapping of the elements x inthe diverse X onto the elements xf of the diverse X1 is therefore here con-structed so that x e x f when x and xf correspond.

Now according to the axiom of choice there exists a set W having justone element in common with every set XT. If this element is denoted byw(XT), being a function of X1 (this function is the set of pairs (X% xf) wherexf = W n XT), then we have

W n X1 = {w(X')}

and g^ (w(X f))eX, i.e. g^ (w(F(X)))eX. Thus we have found a function,

namely g^ wF, of the elements of T which has as its value for each X an

element of X. This is the general principle of choice.Even without the axiom of choice we can introduce addition and multipli-


cation of cardinals although only in the case of a finite number of operands.Indeed if 0 is a set of ordered pairs (a,af) yielding a mapping of a set A ontoa set Af, ^ a similar set furnishing a mapping of B onto Bf, A n B = 0,Af n Bf = 0, then 0 + i// is a mapping of A + B onto Af + Bf. Therefore wecan just as in the case of the naive set theory define the sum of the cardinalsof two disjoint sets as the cardinal of the sum. Similar remarks are validfor multiplication.

If we take the more general case, however, of addition, where the num-ber of cardinal numbers to be added together is infinite; then the definition ofaddition is only possible when the axiom of choice is presupposed. If T is aset of mutually disjoint sets A,B,C,...., Tf a set of disjoint sets Af,B!,Cf...,while F is a mapping of T onto TT consisting of the pairs (A,Af), (B,Bf),....,then if A ~ A1, B ~ B1,.... we can prove by the axiom of choice that the unionST is ~ STf. Indeed according to supposition there is a set 0A of mappingsof A onto Af, a set 0g of mappings of B onto B1,... Then according to theaxiom of choice there exists a set consisting of one element <?& from 0A, ^3from 0B>--" and the union of these is then a mapping 0 of ST on ST1. With-out the axiom of choice we can only formulate the following theorem: Let Tand Tf be as mentioned above, and let us assume that a set of mappings isgiven consisting of just one mapping of A onto Af, one of B onto Bf, etc., forall elements X resp. Xf of T resp. Tf; then ST ~ STf.

There is on the other hand one important theorem concerning the com-parison of cardinals which can be proved without the axiom of choice, namelythe Bernstein Theorem.

Theorem 4. Let M be ~ M1, AT c A^c M. ThenM~Mtl.

Remark: I use for every subset A of M the notation A1 for the image of A bythe same mapping as of M onto Mf.

Proof: We put

Mi = Q + Mf, or in other words Q = MI - Mf.

Let T be the set of subsets A of M which have the properties1) Q c A 2) A' c A.

T is not empty because at least MeT. Then let A0 be the intersection of allelements of T. I denote this also by DT. Obviously A0 has still the proper-ties 1) and 2), i.e., A0eT or

3) Q c Ao 4) A0f c Ao .

3) and 4) furnish 5) Q U A0f c A0

whence (Q U A0')f c AO'

whence a fortiori

6) (Q U A0!)f c Q u Ao' .

From 6) it follows that

Q U Ao1 e T,

whence


7) Ao C Q U A0f ,

5) and 7) yield

Ao = Q + A0f ,

noticing that Q n A i ! = Q n M f = 0. Now we have

Mi = Q + Mf = Q + A!0 + (Mf - A0

f) = A0 + (Mf - AJ),

whence, A0 being ~ Aof,

Mi ~ A0f + (Mf - Af

0) = M'

which is the theorem.An immediate consequence is that if

M ~ Ni c N and N ~ MI c M,

then

M ~N.

Indeed it follows from NI c N and N ~ Ni that Ni ~ M2, where M2 is a certainsubset of MI , so that since M ~ NI

M ~ M2 c MI c M,

whence after the previous theorem

M ~ Mi ~ N.

Corollary: If M ~ Mf c M, m = M, then

m + 1 = m.

It may be remarked that we have not used the axiom of choice in the proofof this theorem. As an example of another simple theorem of a certain in-terest, provable as well without the axiom of choice, I will mention Cantor'stheorem and the very simple one below concerning the case m and n = 2.

Theorem 5. (Cantor's theorem). For every set M we have M < UM.

Proof: In the first place the pairs (m,{m}) yield a mapping of M on asubset of UM, namely the subset consisting of all sets {m} where meM. Inthe second, no mapping f of UM into M can exist. Indeed, let us assume theexistence of suchâ mapping f and let N be the set of all f(X) for subsets X ofM for which f(X)eX. Then we should have

(X) (X CM-^(f(X) e N-"-f(X)l X).

Putting in particular N into this formula instead of X, we obtain, since N£ M,

f(N)eN-*-f(N)e~N

which is absurd.Using the cardinal number notation this theorem may be written

2m> m,because it is seen that the cardinal number of UM must be 2m when m denotesthe cardinal number of M. This is perhaps seen most convincingly in thefollowing way. Let Mf be ~ M and M H Mf = 0, f being a mapping of M onto


Mf. We know that such Mf and f exist. For every meM I write f(m) = mf.Then we can get a one-to-one correspondence between UM and the product ofall the pairs {m,mf}. Let N be CM. Then as often as meM is also eN welet the corresponding element of the product contain m as element, otherwiseit contains just mf. Since the set of all pairs {m,m f} evidently has the samecardinal number m as M the product must be of cardinality 2m.

A consequence of this theorem is that a set of sets representing all

cardinals does not exist. Indeed, if T is such a set, then ST = X, X an arbi-

trary element of T, and Cantor's theorem says that UST > ST. Hence UST

> X for all XeT.It may be suitably mentioned here, that the sum of the cardinals belong-

ing to a set of sets with no greatest cardinal has already a cardinal > allcardinals in the set.

It is often asserted that the following theorem, also due to Bernstein, canbe proved without the axiom of choice. However, the usual proof, at least,does not fulfill this requirement, so that I think it is a mistake. The theoremis, when m and it denote cardinals:

Theorem 6. If m + n = nm, then m and it are comparable.

What is meant is that either m = it or it = m.

Proof: The supposition m + n = mit means that we are given two dis-joint sets M and N together with a mapping of M + N onto M x N. Thismeans again that the set of all pairs (m,n), meM, neN, is divided into twodisjoint parts A and B where A is mapped onto M, B onto N. Now, if thereis a particular mf such that all (mf,n), n running through N, are eA, then Nis ~ a subset of A, whence N ~ a subset of M. If no such mf exists, then foreach meM there is at least one n such that (m,n)eB. Then one says, it isevident that B contains a subset which is ~M, whence M ~ a subset of N.

Theorem 7. Let the cardinals m and it be ^ 2. Then m + it i nut.

Proof: We have two sets M and N with at least two elements and wecan assume M and N disjoint. Let mi =1= m2 be eM, HI ̂ n2eN. Let P be theset of all {mi, n}, n running through N. Then P is ~ N. Further let Q be theset of all {m,ni}, m running through M - {mi}, besides the pair {m2,n2}. Itis evident that Q is ~ M. Further P and Q are disjoint. Thus P + Q is asubset of M • N which is ~ M + N, which proves the theorem.

It is seen that the hypothesis of the theorem can not be weakened. Indeedif it is only supposed that one of the two cardinals is = 2, the other, say n,being = 1, the theorem is not valid for finite m.

The theorem can be generalized. Let T be a set of at least 2 elements,each element of T containing at least two elements, the elements of T beingmutually disjoint. Using the axiom of choice we may assume that we havechosen two elements of each XeT. Let A,B,C,.... be the elements of T andlet ai, a2,bi,b2,.... be the chosen elements from A,B,C,.... Then the productPT contains subsets

A! B! Ci

consisting of the elements


ar,bi, Ci, ai,bs, Ci, ai,bi,ct

r ± 1 s ± 1 t ± 1

and ai,b2, c2, a2,bi,c2, a2,b2, Ci,....

and it is evident that AI~ A, BI ~ B .... This means that PT contains a subset

ST so that ST ^ PT.

Theorem 8. If 0 < m = n, then every set of cardinality n can be dividedinto a set T of cardinal m of non-void mutually disjoint subsets.

Proof: Let M c N, M = m, N = n and meM. For each xeM a subset Nxof N is defined thus: If x =1= m, then NX = {x}, while in the case x = m, Nm =(N-M) + {m}. It is evident that the NX, x running through M, are all mutuallydisjoint and their union (sum) is N.

The inverse of this would be, that if a set N is the union of a set T of non-

void mutually disjoint sets, then T= N. However, without axiom of choicethis can only be proved if T is finite. Indeed, in order to prove this assertionone has to find a subset of N which is ~ T. This is possible if we can chooseone element a from each element A of T; then the pairs (a,A) yield a map-ping of T on a subset of N. Otherwise we have no means of proof. On theother hand we may prove the following theorem. Let N be the sum of mutu-ally disjoint and non-void sets Nx, xeM, so that to each xeM correspondsjust this single Nx- Then M is ~ a subset of the power set of N, so that

m = M= 2", n = N. To every subset XeM we let correspond the subset NX,namely the sum of all Nx, x running through X, which is cN. For differentX these corresponding NX are different; therefore 2m = 2". If we hadm = 2", then 2m = m which is not the case, by Cantor's theorem. Thusm < 2".

Theorem 9. (Zermelo). Let T be mapped on T1 in such a manner that

as often MtT corresponds to M'eT1, M < M*. Then ST < PT.

Proof: We may assume that the elements A,B,C,... of T are =1= 0. Then

F, W ... are all £ 2. By theorem 7 we then know that STf ^ PF'. Further it

is clear that ST i ST"1. Thus ST i FTT and it suffices to prove that PTT can-not be mapped on a subset S of ST. Let us assume that such a mapping werepossible. The subset S of ST can be written as Ao + B0 + Co + ..., where Aois the intersection of S and A,Bo that of S and B,.... The elements of PTare of the form {af,bf,cf,...}, where af eA f , b feB f,.... Let us take into accountthose which correspond to the elements of A0. If af varies, the correspondingaeA0 varies. Therefore the aT occurring in the elements {a'jb^c*,....} whichare mapped on the elements of Ao can only constitute a proper subset A! ofAf, because else Af would have to be ~ A0 which contradicts the assump-

tion A< A"1. Similarly the bf occurring in the elements {af,bf,cf,...} whichare mapped on the elements of Bo must constitute a proper subset BI of Bf,and so on. Now PT also contains, according to the axiom of choice, an ele-ment,

THE WELL-ORDERING THEOREM 19

{ao, b0, c0, ....},

where aoeA'-Ai, b0eB f-Bi, .... However this element cannot correspond toany element of ST. Indeed it cannot be mapped on an element of A0, forexample, because if it could, ao would have to be one of the elements of AI.

4. The well-ordering theorem

After all this I shall now prove, by use of the choice principle, that everyset can be well-ordered. First I shall give another version of the notion"well-ordered", different from the usual one.

We may say that a set M is well-ordered, if there is a function R, havingM as domain of the argument values and UM as domain of the functionvalues, such that if N D 0 is arbitrary and e UM, there is a unique neNsuch that NER(n). I have to show that this definition is equivalent to theordinary one. If M is well-ordered in the ordinary sense, then every non-void subset N has a unique first element. Then it is clear that if R(n), neM,means the set of all xeM such that nix, the other definition is fulfilled bythis R. Let us, on the other hand, assume that we have a function R of thesaid kind. Letting N be {a}, one sees that always aeR(a). Let N be {a,b},a 4= b. Then either a or b is such that NER(a) resp. R(b). If NER(a), thenwe put a < b. Since then N is not £ R(b), we have aeR(b). Now let b < c inthe same sense that is, ceR(b), be"R(c). Then it is easy to see that a < c.Indeed we shall have {a,b,c} E either R(a) or R(b) or R(c), but bFR(c), ae~R(b).Hence {a,b,c} ER(a) so that {a,c}ER(a), i.e. a < c. Thus the defined rela-tion < is linear ordering. Now let N be an arbitrary subset of M and n be theelement of N such that NER(n). Then if meN, m =(= n, we have meR(n), whichmeans that n < m. Therefore the linear ordering is a we 11-ordering.

Theorem 10. Let a function 0 be given such that <!>>(A), for every A suchthat OCA EM, denotes an element of A. Then UM possesses a subsetHI such that to every AT EM and D O there is one and only one elementN0of HI such thatN E #o and <t>(N0)eN.

Proof: I write generally Af = A - {0(A)}. I shall consider the setsP EUM which, like UM, possess the following properties

1) MeP

2) Aep-*A'eP for all A EM

3) T P-*DTeP.

These sets P constitute a subset C of UUM. They are called 9 -chains byZermelo. I shall show that the intersection DC of all elements of C isagain a 0 -chain, that is, DC e c. It is seen at once that DC possessesthe properties 1) and 2). Now let TEDC. Then, if PeC, we have TEP, andsince 3) is valid for P, also DTeP. Since this is true for all P, we haveDT e DC as asserted. Thus I have proved that DC e C.


In the sequel I put DC = fll and I assert that fll has the property men-tioned in the theorem. Obviously fll is the least 0 -chain. Let O c N E M ,and let N0 be the intersection of all Qe M for which NEQ, then N E N0.Further 0(N0)eN, because otherwise N!

0 = N0 - {0(N0)} would still containN and be efll, which is a contradiction, since this would mean that N0 iscontained in N0 - {0 (N0)}.

Thus we have proved the first half of the theorem. The proof of thelatter half is considerably more laborious. It will be suitable first to provethe following:

Lemma. Let A efll have the property that for every 3Cefll either 3C c Aor X = A or A c #.

Then Af possesses the same property.By the way, we may notice that such an A exists, M having this property.

Proof: If Xefll is such that A = * or A c X, then A1 c X. Therefore,we only need to consider the case Xc A. The question is whether some IBe fllcould exist such that Tl c A but I not E A% or in other words, 0(A) stille^. I will denote by fll* the subset of fll which remains after having re-moved all these 13 from fll. I shall show that HI* is a 0 -chain.

1) MeM* because Me fll and M is not possibly a TJ. Indeed each 1 iscA.

2) Let Be fll*. If A c B, then Bf is note A so that BT is not a 1. On theother hand B'efll, since B c fll. Then B'efll* in this case.

If A = B, then Bf = A* so that 0(A)i~Bf, whence again Bf is not a 1 so thatB'efll*. Finally, let Be A. Then 0(A) must be e~B; otherwise B would be aIS against the supposition Be fll*. But then a fortiori 0(A)e~B f , so that BT isnot a?. Therefore Bfe fll*.

3) Let TE fll*. Should DT be a 1, we would have

(DT CA) & (0(A) e DT).

Then 0(A) is e every element C of T. Since every C is not a 1, we musthave Co): A for every CeT and thus, because of the supposed property of A,AEC for all CeT, whence ASDT, so that DT is no 13. Hence DTefll*.

However, since fll is the minimal 0 -chain and fll* is a 0 -chain 9 fll,we have fll* = fll, which means that the elements 15 do not exist. This provesour lemma.

Now let fllt be the subset of fll consisting of all Aefll such that for everyXefll we have either #cA or 3C = A or Ac 3C. I shall show that fl^ is a0 -chain, so that it coincides with fll.

1) M is efll!. This is evident since every Xefl l is 9.M.

2) If Ae flli, then Afe fllt. That is just the lemma proved above.

3) Let T be9 flli. Then for every NeT and every Xefl l we have eitherNE3C or 3CCN. Let 3C be an arbitrary element of fll. Then eitherthere is an element N of T such that NE3e, and then DTE 36, or wehave for all NeT that #EN, whence #EDT. Thus

THE WELL-ORDERING THEOREM 21

Hence it follows that flli is a 8 -chain and therefore = 0. This means thatif A and B are ejR, we always have one of the three cases A cB, A = B,BCA. Further it ought to be noticed that if BCA, then BE A1, else we shouldhave A'CB, which obviously is impossible when B cA.

All this makes it now possible to prove the latter half of our we 11-orderingtheorem; namely that if N 4= 0 is EM there is only one NoeHI such that0(N0)eN and NENo. We have seen that there is such an No. Every elementP of fll such that PcN0 is EN!

0, so that 0(N0)e~P, whence N is not cp.Every other element P of HI is such that NocP, whence N0EP% whenceagain 0(P)e"No so that also 0(P)e~N. Thus N0 is the only element of JH withthe two properties NEN0 and 0(N0)eN.

We can now define a function R from M to HI thus: As often as Ne 01 &0(N) = m, we write N = R(m). It follows in particular from the theorem justproved that for every meM a unique NeHl exists such that {m} EN whilem = 0(N) so that N = R(m). Thus R and 0 are inverse functions.

It is easy to see that 0 maps JH onto M. Indeed, if Ni CN2, then NI EN f2

so that 0(N2)e~Ni whereas 0(Ni)eNi . Hence 0(Ni) ± 0(N2) so that 0 fur-nishes a one-to-one correspondence between HI and M. Therefore thereexists an inverse function mapping M onto HI, that is the function R.

Before entering into a more thorough treatment of the well-ordered setsand the ordinals I would like to remind you of some notations I shall use. Aninitial part A of an ordered set <D shall mean a subset A of <D such that ifxeA and y< x, then always also ye A, or in logical symbols (x)(y)((xeA) &(y < x)—»yeA). Similarly a terminal part C of <D is to be understood. Aninterval B shall be used in the meaning BE© and (x)(y)(z) (xeB & yeB &(x < z) & (z < y)—*zeB) . These parts A,B,C may be closed or open, forexample an initial part A may have a last element, then it is said to beclosed, or not, then it is open. An interval B may be open or closed or opento the left, closed to the right or inversely. It ought to be noticed that theunion of a set of initial parts is again an initial part.

If ae<D, the set of all x< a constitute an initial part. This I shall callthe initial section corresponding to a. It ought to be noticed that if (D iswell-ordered, every initial part which is not <D itself is an initial section.

Theorem 11. Let a well-ordered set M be mapped into itself by afunction f which preserves the order, that is a < b -*f(a) < f(b) for alla and b e M. Then for all m e M we have m ^ f(m).

Proof: Let us assume that the theorem is not true. That would meanthat the subset N of M of all those x for which x > f (x) was not void. Letm denote the least element of N. Then we should have

m > f(m) = mf,

and because mf€ N,

m' i f (m ' ) .

However, since f is order-preserving and m > mT, we should have f(m) >f(m f), that is m' > f(m f).

It follows that if M is mapped by a function f onto M with preservationof order, then f(x) = x for all x. Indeed, according to the theorem, we havef(x) i x and f'W ^ x, that is, x = f(x).


From this it again follows that if a well-ordered set M is mapped withpreservation of order onto an other well-ordered set Mf, then this mappingis unique. Indeed if f and g both map M onto Mf, then fg'1 maps M ontoM so that fg (x) is x an(* therefore f(x) = g(x) for all x.

Theorem 12. If M is mapped by f with preservation of order into aninitial part A of itself, then A = M and the mapping is the identical one.We may also say: M cannot be mapped onto an initial section of itself.

Proof: Let f map M onto A, A initial part of M. Then no element mof M can be > every element x of A, because f(m) should belong to A sothat m > f(m), which contradicts the previous theorem. Thus every meM is= an xeA, whence me A, that is, A = M.

Noticing that an initial part of a well-ordered set M is either M itselfor a section of M, we have that if M - N (meaning M and N are similar),then M is neither - Ni nor N — MI, MI and NI denoting sections of M resp.N.

Theorem 13. Let M and N be well-ordered sets. Then either M - NI,Ni a section of N or M = N or Mi = N, Mi a section of M.

Proof: Let I be the set of all initial parts of M that are similar to ini-tial parts of N constituting a set J. Then the union SI is in an obvious waysimilar to SJ. Now either SI must be =M or SJ = N. Else SJ will be thesection belonging to an element i of M and SJ the section delivered by j eN.But then SI + {i} would be similar to SJ + {j} which contradicts the definitionof I. Now, if SI = M, either M = N or M = a section NI of N according asSJ is N or NI , else SI is a section MI of M while SJ = N so that MI = N.

5. Ordinals and alephs

It is now natural to say that an ordinal a is < an ordinal ft if a is theorder-type of a well-ordered set A, 0 the type of B, such that A is similarto an initial section of B. It is clear that a < j 3 & j 3 < y - » a < y and thata <jS excludes j3 <a . Thus all ordinals are ordered. However, this order-ing is also a we 11-ordering. Let us namely consider an arbitrary set or evenclass C of well-ordered sets. Let M be one of the sets in C. Its ordinalnumber JLJ may be the least of all represented by the considered sets. If notthere are other sets in C which are similar to sections of M. These sectionsare furnished by elements of M and among these there is at least one. Thecorresponding initial section represents then the least ordinal of all furnishedby the sets in C.

Theorem 14. A terminal part or an interval of a well-ordered set issimilar to some initial part of it.

It is obviously sufficient to prove this for a terminal part. According tothe comparability theorem, otherwise the whole set M would have to be sim-

ORDINALS AND ALEPHS 23

ilar to an interval of itself, but that contradicts the fact that we should havex i f ( x ) for all xeM.

A consequence of this is that we always have a = a + j8 and 0 = a + j3.I have earlier defined addition and multiplication of ordered sets. We

may define multiplication and exponentiation for well-ordered sets in such away that well-ordered sets result. First I will repeat the definition of addi-tion: Let T be a well-ordered set of well-ordered sets A,B,C,.. which weassume mutually disjoint. Then the sum ST is well-ordered thus: Any twoelements of the same element X of T retain their order in X. If X pre-ceeds Y in T, then every element of X preceeds every element of Y in ST.It is indeed easy to see that ST is well-ordered in that way. Let namely Mbe EST and 4= 0. Then the diverse XeT which furnish elements of M con-stitute a non-void subset of T. Since T is well-ordered there is a least ele-ment of this subset, N say. Since N is well-ordered there is a least elementm in the subset M n N of N. Obviously m is the least element of M.

Multiplication I will define as follows. Let us again consider a well-ordered set T of mutually disjoint well-ordered sets A,B,C,... =)= 0. Letao, b0, CQ, ... be the least elements of A,B,C,.... Then I take a subset P ofA.B.C in the previous sense, namely the set P consisting of all ele-ments of A.B.C which contain only a finite number of elements differentfrom ao,b0, c0,.... This set P is then ordered by the principle of last dif-ferences, which means that if a,b,c,.. and a f ,b f , CT, ... are two elements of theproduct, then a,b,c... < a^b^c*, if m < mf but no later element mi > miT .

Exponentiation is defined by letting all factors in a product be similarwell-ordered sets.

Lemma. Let T be a well-ordered set of well-ordered sets A,B,C,...such that if X and Y are elements of T and X < Y in T, then X £ y andthe order of the elements of X remain unaltered in Y. Then the unionST is well-ordered and two elements of ST are ordered as in some ele-ment X of T.

Proof: If T contains a last (greatest) element M, then the truth of thelemma is immediately clear, because in this case ST = M. Therefore we mayassume that T does not contain any last element. Let us then consider asubset N of ST, OCN. There will be elements X of T containing elementsbelonging to N. Let X0 be the first of these X. Then XoflN is a subset =1= 0of the well-ordered set Xo so that there is a first element in X0ON whichobviously is the first element in N. Thus it is proved that ST is well-ordered. It is evident that two elements of ST will both occur in some ele-ment of T and have there the same relation of order.

Now let us consider the product P of the well ordered set T of wellordered factors A,B,C,.... The product belonging to an initial section of Tmay be called a partial product and be denoted by PX, if the section of T isgiven by X. It is understood that the elements of Py shall, for each Y = Xin T, contain y0 only. I shall first prove that if all these partial productsare well-ordered, so is P. Indeed as often as X < Y, PX£PY so that thepartial products constitute a well-ordered set of well-ordered sets of thekind considered in the lemma. Now if there is no last element in T (no lastfactor in P) then P is the union of all PX and is therefore well-orderedaccording to the lemma. If there is a last factor F then P = Pp. F where


PF is well-ordered according to supposition, and since the product of twowell-ordered sets is well-ordered, P is well-ordered. Now let us look at thecase that some partial products were not well-ordered. There must then bea least Xo among all the XeT for which PX is not well-ordered. Then PXOis the union of all Py, where Y preceeds Xo in T if Xo has no predecessor,else, if F is the predecessor, we have PY^ = PpF where PF and F arewell-ordered. Further all these Py are well-ordered. But then again ac-cording to the lemma PY is well-ordered which is a contradiction. There-

fore all partial products are well-ordered, which as we just saw implies thatP itself is well-ordered. Thus we have proved:

Theorem 15. The product P of a well-ordered set of well-ordered setsis well-ordered.

I would like to prove that the product a j3 can be conceived as the resultof adding /3 sets each of ordinal number a . Let A have the ordinal a,B theordinal 0. Then a/3 is the ordinal number of the set P of pairs (a,b) orderedaccording to last differences as explained. Let Mb be the set of all pairswith the last element b and T the set of all these Mfc. Then ST, well-ordered as explained above, is just the sum P of all Mb- Each of these hasthe ordinal a .

It is easy to verify that the associative laws hold for addition and multi-plication. Also the distributive law o(j3 +y) = a/3 + ay is seen to be valid.On the other hand, the commutative laws do not hold, nor does the distributiveformula (a + #)y = a y + j3y. I shall give some examples.

1 + co= a* < w+ 1

2.w = cu <cu2 and therefore (1 + l)co= w < l.w + l.cu.

One can also notice that not always

For example

(2.2)°° < 2W • 2W, (2.(u> + I))2 > 22(u + I)2

On the other hand, if A = £ 0^ , then a A = £ t a $ r ] and

17, in particular ff0 + ^ = (*£ • cr?

We have seen that the ordinal numbers are well-ordered by the relation< . It is then natural to ask how the cardinal numbers behave. Because ofthe comparability of the ordinals it is immediately clear that the cardinalnumbers are comparable; indeed, if M and N are any two sets and they arein some way well-ordered, then either M is similar to, and thus equivalent

to, some initial part of N or inversely. Thus we have either M = N or N = M.Now let T be a set of sets. I assert that the cardinal numbers represented bythe elements A,B,C,.... of T are well-ordered by the relation < as earlierdefined. Evidently it suffices to prove that there is a least cardinal represented


by the elements of T, because then the same will be true for every subset ofT. Now let M be e T. If M is the smallest cardinal represented by anyelement of T, then our assertion is correct. Otherwise there will be someelements X of T representing smaller cardinals. All these X we may as-sume well-ordered. Then each of them is similar to an initial section of Mgiven by an element m of M. Among these m there will be a least one mo.The section given by mo then furnishes the least cardinal number among thementioned X.

Thus the cardinal numbers are also well-ordered by the relation < .More exactly expressed: All cardinals = a given cardinal constitute a well-ordered sequence according to their magnitude. The least of the transfiniteones, the cardinal of the denumerable sets, we denote, as Cantor did, by N0,the following by NI , and so on.

If a is a transfinite ordinal, i.e. w= a, then we have 1 + a = a, becausewe may write a = w+ j3, whence l + a = l + ( w + |8) = (l + w) + j3 = co + /3 = a.More generally we have of course n + a = a, n finite. Further it may benoticed, that if a is the ordinal of a set M without last element or in otherwords a is without immediate predecessor, then for every finite ordinal n wehave net = a . We can first prove that a = o;/3, whence na = n(o;/3) = (no;)j3 =w/3 = a since nco is evidently = co. That a indeed is a multiple of o> is seenby distributing the elements of M into classes by putting any two elements in-to the same class which are either neighbors or have only a finite number ofelements between them. It is clear that every class is of type co, and thewhole set is the sum of a well-ordered set of these classes, which means thata = (jo (3, j3 denoting the ordinal of the set of the classes.

Among all ordinals whose cardinal number is $a there will be a least,usually written co^. This WQ belongs to a very remarkable class of ordinalscalled principal ordinals. The definition is:

An ordinal a is a principal one, if the equation, a = (3 + y only has thesolutions j8 <a, y = a and a = j3 , y - 0. One may also say that theordinal represented by a well-ordered set M is principal, if M is sim-ilar to every terminal part of itself.

Proof that u>o is principal: Let ua = (3 + y, y > 0. We know that y isthe ordinal of some initial part of M, if M has the ordinal wa. If this initialpart of M is not M itself, it is an initial section, so that y <wa, and accord-ing to the definition of w^ we have that the cardinal number y of y must be< tia. Further J3" is also <N0 , because j3 is the ordinal of some initial sec-tion of M. But the sum of two alephs < N a is again < N a . Thus y must be= ua.

Since it is clear that every transfinite cardinal N may be given by a well-ordered set without last element, indeed the least ordinal with cardinal num-ber N cannot have a predecessor because 1 + N = N, we obtain from the re-lation not = a just mentioned that always

for finite n. Hence for every aleph $a in particular $a + tia = $a. Furtherif tip <tia, we obtain


which means that

Thus the sum of two alephs is the greater one of them. Further, if N0 andNy are both < $a , also N0 + Ky < tf ff.

The division of ordinals may be performed thus. Let a be given and0 > O. We consider the ordinals y which are such that for some 6

a = j8y +6

1 assert that there is a greatest value of y here. Indeed the assumption that0y^ where n < y2 < ...., are all ^ a yields 0 lim yx = #, where lim y\is the least ordinal > every yx- This is perhaps most easily seen by writing72 = n + "X21, 7a = 72 + ra1, ..... and generally y^+i = r\ + rx+i- Tnen lim

yx = S y^f putting y\ = yi1, and we have by the distributive law for multiplica-tion A

But the several j3yxf will represent the ordinals of different disjoint intervalsof a well-ordered set of ordinal a. Thus 5 j3y\f = a.

If K is the greatest value of y, we have

a = PK + p, p < 0.

Indeed, if p were = /3 + pf , we should obtain a = j3(* + 1) + p' so that K wouldnot be the maximal y.

In the particular case 0 = co we get

a = GUK + n, n finite.

Thus we again get the above result, that if a is the ordinal of a well-orderedset without last element, it is of the form w*.

It is easily seen that /3lim 7^ lim jS^ . As a consequence of this thereis a maximal power jS^1 = a. Then the division of a by jSî yields

a =

Now again there is a maximal power of ft /3^2 say = af. Then we obtain

af = /3y2 1/2 + a f f , <*"< )Sy2, 1^2 < ft

Since the sequence a, af, a",., is decreasing, there is a least one which mustbe O. Then we have

ma = Z 0rrrr, m finite, all i/r < 0 .

r =1

Of particular interest is the case 0 = w. We obtain the result that everyordinal can be written in the form

ma= YJ cjyrnr, yi > y2 > ...

r =1

m positive and finite, all nr positive and finite. It is clear by the method ofconstruction that this form is unique.


It is seen that a cannot be principal without being simply a power of cuOn the other hand every power of a; is easily seen to be principal.

If yi is kept fixed in the above expression while y2, Ys, • ••. m, and the nrvary, we get all numbers < wî + 1. If also yi varies but is kept < a, a alimit number, we get all ordinals < co^. I will show how we can set up a verysimple one-to-one correspondence between the elements of a well-orderedset M of ordinal equal to a power of w on the one hand and the ordered pairs(a,b) which are the elements of M2 on the other. To every pair

kiq kiq

we let correspond the number

y = £ u a k f (m k , %)k ^ q

where f(mk,nk) is a one-to-one correspondence between the non-negativeintegers and their pairs. We set y = 0 for a = |3 = 0.

If this is applied to cuff considering the cardinal number $a we obtain

Of course we then also get tfj = $a by an easy induction.Because of the well-ordering theorem we then have that m2 = m for

every transfinite cardinal m. It is now very remarkable that, if inversely itis presupposed that this formula is valid for every transfinite cardinal num-ber m, then every set can be well-ordered. Thus we have

Theorem 16. The general validity of m2 = m implies the general princi-ple of choice and inversely.

If we look at the proof of the earlier theorem stating that m and n arecomparable when m + ti = mn, we notice that if n say is an aleph, then weneed not use the axiom of choice in the proof. Further, if simultanously it isknown that n is not = m, we get m = n and then m is an aleph.

Now m being an arbitrary cardinal number, it is always possible to de-fine an aleph which is not = m. This was first Jpne by F. Hartogs (Math.Ann. 76, 438, 1915). Let M be a set such that M= m. There are some sub-sets of M which can be well-ordered. We take into account all well-order-ings of all these subsets and distribute these well-ordered subsets intoclasses of similarity. Every such class is then a set corresponding to anordinal and these sets constitute again a certain set. To the ordinals repre-sented by the members of this set there exist always greater ordinals e.g.the sum of all the ordinals. Among these greater ordinals there is a leastone A say. Then A is not = m, because this would mean that there exists asubset of M which can be well-ordered with ordinal number A, whereas A isgreater than every ordinal a for which this was the case. Thus A is analeph which cannot be = m.

Hence the correctness of our assertion, that if always m + n = mn thenevery set is well-ordered. However, to be perfectly correct we must assumem2 = m for any inductive infinite cardinal number.

Now if always m2 = m, we have (m + it)2 = m + n, whence at any rate


mn i m + n.

However we have proved earlier that if nt and n are = 2, then tn + n = m • it.Thus we obtain mn = m + n.

6. Some remarks on functions of ordinal numbers

A function f(x) is called monotonic, if (x< y) -»(f(x) ^ f(y)) . It is calledstrictly increasing, if

The function is called seminormal, if it is monotonic and continuous, that isif f(lim a\) = lim t(a\), A. here indicating a sequence with ordinal number ofthe second kind, i.e., without immediate predecessor, while (\i< A.2) ~*(a\l<<*A2)-

The function is called normal, if it is strictly increasing and continuous;| is called a critical number for f, if f(|) = £ .

Theorem 17. Every normal function possesses critical number sand in-deed such numbers > any a.

Proof: Let a be chosen arbitrarily and let us consider the sequence a,i(a), I2 (a),.... Then if a^= lim fn(o), we have f(aw) = f (lim (fn(a)) = lim

(a) = aw, that is, a^ is a critical number for f.

Examples.

1) The function 1 + x is normal. Critical numbers are all x = w + a, aarbitrary.

2) The function 2x is normal. Critical numbers are all of the form wa,a arbitrary.

3) The function wx is normal. Critical numbers of this function arecalled £ -numbers. The least of them is the limit of the sequence

I will mention the quite trivial fact that every increasing function f issuch that f(x) = x for every x.

Theorem 18. Let g(x) ~ x for all x and a be an arbitrary ordinal; thenthere is a unique semi-normal function f such that

f(0) = or, f (x-f l )=g(f(x)) .

Proof clear by transf inite induction.

Theorem 19. Iff is a semi-normal function and /3 is an ordinal whichis not a value off, while f possesses values < )3 and values >#, thenthere is among the x such thatf(x) < $ a maximal one XQ such that

FUNCTIONS OF ORDINAL NUMBERS 29

Proof trivial, because if i(x\) < j3 for all A. in a sequence without lastelement, then

f(limxx) = limf(xx) ^ |3,

but the equality sign is excluded.Let A be a set of ordinal numbers without maximal element. A subset

B is said to be closed in A, if every limit of a sequence in B is eB, if it iseA. If B is closed in A and cofinal with A it is called a band of A.

Remark. Every band consists of the values of a normal function, and theinverse is true, if the set of the arguments is cofinal with A.

Theorem 20. If M and N are bands of A, so is M U N.

Proof. Of course M U N is cofinal with A. An arbitrary sequence S inM U N without last element is either such that from a certain point on allelements belong to M say, then the limit is in M; or there are always greaterelements both in M and in N, and then there is a common limit in M and N.

Theorem 21. If M and N are bands of A and A is as already indicatedwithout last element, but not cofinal with a;, then M n N is a band of A.

Proof. We assume that after a certain a0e M there are no common ele-ments in M and N. Then we have an increasing sequence thus:

c&n+1 is the first element of N which is > a2n

Qf2n+2 M which is > a2n+1 .

Then lim an is e A and therefore eM and eN which is contrary to then <o;

assumption.

Theorem 22. Letf(a,ff) be normal with respect to ft Then it is not analways increasing function with respect to a .

Proof. If ai < a2, then the normal functions f(0i,/3) and f(a2,j3) of )3 havea common critical value | according to the last theorem so that f(<*i ,|) =f (ft,{) ={•

Let us however, following E. Jacobsthal, consider the functions havingthe following two properties:

1) f(a,/3) is for constant a a normal function of j3

2) f(a,/3) is for constant /3 a monotonic function of a with f(a,/3) >a.

Further let us call fi a generating function for f when

i(a,p+l)=i1(i(a,($), a).

This equation together with f(a,0) defines f when f is continuous.

Theorem 23. If f\ has for a >!,&>! the property 2) and is monotonicin ft while f is continuous andf(a,l) increasing in a, then f satisfies 1)and 2).

Proof. When a > 1, one has f(a,l) > 1, namely t(a, 1)^ a > 1. If, fora > 1 and ,3=1, f(a, /3) is monotonic in a and f(a,/3) > 1, then because of the


definition of f above i(a , j8 + 1) is monotonic in a and f(a, /3 + 1) = f i (f(a,j3),a) > f(a,j3) (see 2)). K X is a limit number, and if, for a> 1 and 1 < 0 < X,f(a,j3) monotonic in a, then f(of,X) is monotonic in a. Thus for a > 1 and 0 >1we have that f(of,0) is monotonic in o? and a normal function in j3. Further, fora > 1 we have, because of f(a,l)>a, also f(af,/3)>a for 0 > 1.

Now, if one starts with 0o(#,$) = 0 + 1 and defines 0r+1(a,/3) by using0r as generating function for r = 0,1,2 putting 0i(a, 0) = a, 02 (a, 0) = 0,03(a,0) = 1, then we obtain

0i(a,/3) = a + ft 02(a,/3) = a • ft 03(a,j3) = aft

An immediate result is that these functions have the properties 1) and 2).

Definitions: 1) Let us say that f with generating function fi satisfies a gen-eralized distributive law when a function f2 exists such that

(1) fi(f(a,j8), f(a,y)) =f (a,f2(f ty)) .

If f2 = fi , we say that f satisfies the special distributive law.2) We may say that f fulfills a generalized associative law, if a function

f 3 exists such that

(2) f(f(a, j3),y) = f(a ,f3(f t y)).

If f3 = f , f satisfies the special associative law.

Theorem 24. Iff satisfies the general associative law, then f 3 satisfiesthe special associative law.

Proof. K in the formula (2) we put a = f(| ,af), /3 = 01, y = yf, the formula(2) yields

f(f ( f (S ,a f ) , j3'), y') = £ ( f ( 5 , a f ) , f3O f , r ' ) )

and by application of (2) twice on the left and once on the right side we get

f(f(!,f3(a',/3')),r') =f(£,f 3 ( f 3 (a ' , /3 ' ) , r ' ) ) = f ( « , fata1 , £3(0*, r'))).whence because f (| , /S) is increasing in ft

and that is the special associative law for f3.

Theorem 25. Iff, being generated b y f l f satisfies both laws (1) and (2),then /, is generating function offs and fz satisfies the special distribu-tive law.

Proof. We have

and

f(f(a,j3), r+1) =f(a , f 3 ( f t y + 1)),

whence

£3(ft y + l ) = f 2 ( f 3 ( f t r ) , / 3 ) ,

that is f2 is generating function for f3. Further, by (1)

FUNCTIONS OF ORDINAL NUMBERS 31

f(|,f2(f3(a,/3), f3(a,y))) = £i(£U,fs(«,/3)), f(l,f3(a,y)))

which by (2),(1),(2) successively yields

£i(f(f({,fl} fj3), f(f({ fa) fy)) f Kf(S,fl)A(Ay)), f(|,f3(a,f2(fty))).

By comparison of the first and last expressions containing £ one obtains

£a(£a(a,!3)f f3(a,y)) = fa(a,fa(fty)),

that is, f3 satisfies the special distributive law.

Theorem 26. #" / is defined by fi , f(a,o) = 0 or 1, / satisfying the gen-eralized distributive law, and iff3 is defined as a continuous functionwith fz as generating function, by

fs(a,o) = 0

fs(a,j3 + D=£a(£3(ag8) , a),

then f satisfies the associative law (2).

Proof. This law (2) is valid for y = 0, because f(f(a,/3),o) = 0 or 1 andf(a,f3(fto)) = f(a,o) = 0 or 1. If the law is valid for y, then it is valid fory + 1, because

f(f(a,/3),y+l) = £i(£(£(«,j3),y), £(«,|8))

because of the supposition of induction = fi(f(a,f3(a,y)), f(a,/3)) = f(o,f2(f3(fty),ft)) = f(a,f3(ft y + 1)). If the law is valid for all y < y0, yo a limit number, thenit is true for y0, because

f(f(a,j3),yo) = lim f(f(a,/3),y) = lim f(o,f3(fty)) = f(a,f3(ft y0)).y<yo y< yo

Theorem 27. Letf be defined byfi,f(a,o) = O, A(a9o) = a or f(a,o) = I,fi(a,l) = a, while the special associative law is valid for fi , andfi iscontinuous in ft thenf satisfies the distributive law (1) with /2 (a, (3) =a + /3.

Proof. The formula (1) is valid for y = 0, because fi(f(a,j8), f(o,o)) =f(a,/3). Let us assume its truth for y. Then we have

£i(f(«,|3), f(a, y + 0) = £i(£(a,j3), fi(f(a,y), a)),

and since the special associative law is valid for f this becomes

fi(fi(f(0,/3), f(o,y)), ex) = fi(f(a,/3 + y, a) = f(or,)3 + y + 1).

If formula (1) with f2(a,/3) = a + ]8 is valid for all y < y0, yo a limit number,then it is valid for y0, because

£i(£(a,j3), f(a,y0)) = lim fi(f(o,^), f(a,y)) = lim f(a,/3 +y) = f(a,/3+ro).y<yo y<yo

Applying the last two theorems to the three elementary arithmeticaloperations, 0i(a,/3) = a + ft 02(a,/3) = aft 03(a,/3) = a/3, it is seen that the as-sociative and distributive laws of these are all derivable from the specialassociative law of addition

(a + 0) + y = a + (j3 + y).


Indeed, if we put fi = 0i , f = 02 in Theorem 27 we get

and putting fi = 0i , f2 = 0i , f = 02, fa = 02 , Theorem 26 yields

(oj3)r = or(0r).

Further, if we put fi = 02, f = 03, Theorem 27 yields

while putting fi = 0 2 , U = 0 i , f = 0s, fa = 02 one obtains, according toTheorem 26,

7. On the exponentiation of alephs

We have seen that an aleph is unchanged by elevation to a power with finiteexponent. I shall add some remarks concerning the case of a transfinite ex-ponent.

Since 2Ko > »0, we have (2^)*° £ N0*°, but (2**°)^° = 2K°K° = 2K°.

On the other hand 2Ko i No**0. Hence

2No _ IA NO- NO

Of course we then have for arbitrary finite n

^>and not only that. Let namely N0 < w = 2 °. Then

2 — NO = in

whence

m*° = 2*°,

In a similar way we obtain for an arbitrary I

for all m > 1 and ^ 2From our axioms, in particular the axiom of choice, we have derived that

every cardinal is an aleph. Therefore 2^ a is an aleph. We can also prove

by the axiom of choice that 2 a > $a+i or perhaps = Na+i . One has neversucceeded in proving one of these two alternatives and according to a resultof GTodel such a decision is impossible. However, in many applications of settheory it has been convenient to introduce the so-called generalized continuumhypothesis or aleph hypothesis, namely

EXPONENTIATION OF ALEPHS 33

IAIn particular the equation 2 ° = tf i is called the continuum hypothesis. Ofcourse this assumption means that we introduce a new axiom, namely thefollowing: Let M be a well-ordered set, UM as usual the set of its subsets,and N such a well-ordered set that every initial section of N is ~ M, whileN itself is not ~M. Then there exist in our domain D a set 0 of orderedpairs which yields a one-to-one correspondence between UM and N.

If we have the axiom of choice, we may say more simply that if M isinfinite, then every subset of UM is either ~ a subset of M or it is ~ UM.

On the other hand there are a few aleph formulas which can be provedwithout the (generalized) continuum hypothesis. I shall give some of these.

A theorem of Konig says:

Theorem 28. Ify runs through all ordinals <X9 where A. is a limitnumber, then

X «y< n «y .y<\ y<\

This follows from the general inequality theorem of Zermelo proved earlier.

By the way, we have ^ Nv = tf^ of course. As a particular case we havey<\

NO, < «o«i«2 ..... Since «0«i«2 ..... is i «^°, we obtain the inequality

^Ct) ^ **U) '

Similarly «£| is > »w , etc.

An equation of Hausdorff is

Theorem 29. K*f 1 = »j|0 - «a+1,

where a and $ are arbitrary ordinals.

Proof. 1) Let a</3 so that a + 1 £ 0. Then, since »a+i i «j3< 2**0 =

a

2) Let a ^ 0. Then we can write

= * 'whence the asserted equation.

A theorem of Tarski is:

Theorem 30. If y i K^ ên « J^y = x*P -

The proof can be given by transfinite induction with respect to y. The


theorem is true for y = 0. Let us assume its truth for y. Then by Theorem29

K -HcH-y+i a a+y a+y+i a a+y a+y+i a? an-y+i

Now let A be a limit number such that A = tf 0, while the theorem is assumedvalid for all y < X. Then

= Z N < n

according to the theorem of Konig. Hence

te = n x*P = n N"PK' = (»"j3)" ny<A "^ ^^> u u^r \ M / -î a + y

a. a+A

while on the other hand

a a+\ a+A CM-A

Therefore the theorem is valid for A. and is proved.I shall further mention without proof the following two theorems:

M1) In order that 2 <* = K0 it is necessary and sufficient that 0 is the

least ordinal number { such that K*?a < N!^ .

2) We have 2 a = top if and only if j3 is the least ordinal number | such

that $*a = » .

A further question concerning the cardinal numbers is whether the so-palled inaccessible cardinals exist. An aleph tf Q would be called inaccessibleif WQ = Q, or if one prefers, ft = tf Q . This question may again be undecid-able so that the introduction of further axioms might be desirable. However,I will not pursue this subject further here.

SETS REPRESENTING ORDINALS 35

8. Sets representing ordinals

There exists a class of sets of such a particular structure that they maysuitably be said to represent ordinal numbers. I shall first mention the defi-nition by R. M. Robinson (1937).

A set M is an ordinal, if

1) M is transitive. That a set M is transitive means that it contains itsunion. In symbols: (x)(y)((xey) & (yeM) -»(xeM)).

2) Every non empty subset N of M is basic, which means that it is disjointto one of its elements. In logical symbols: (Ex)(xe N & (xf) N = 0)).

3) If A±B, AeM and BeM, then either AeB or BeA.

I shall call every set M with the properties 1), 2), 3) an R- ordinal.

Remark 1. If HI is a class of R- ordinals, then the intersection of allelements of M is again an R - ordinal. Indeed, if Mo is this intersection,we have that if AeB, Be Mo, then AeB, BeM for every M in HI, whenceAeM because M is transitive, whence A e Mo, because this is valid forevery M in JH. Thus Mo is transitive. Let O c N E M o . Then for anyM in HI we have 0 cN EM, whence by 2) Mo has the property 2). Fin-ally let A and B be different and eMo. Then for any M in III we haveA and BeM, whence by 3) either AeB or BeA. Thus Mo has the prop-erty 3).

Remark_2. Further it may be remarked that if M is an R - ordinal wehave MeM, because MeM would mean that the subset {M} of M wasnot basic.

Theorem 31. Every R-ordinal M is the set of all its transitive propersubsets.

Proof. Let C be eM. Since M is transitive, C must be EM. IndeedC is CM. C = M is impossible, because that would mean MeM, which isimpossible by Remark 2. Further C must be transitive. Indeed let AeB,BeC. Then BeM, whence BEM, whence AeM, whence AEM. By 3) wehave either A e C o r C e A o r A = C. I assert that CeA and C = A are im-possible. Indeed, CeA would imply that {A,B,C} is not basic, and C = Awould mean that {A,B} is not basic. Hence AeC, that is, C is transitive.So far I have proved that every element C of M is a transitive proper subsetof M.

Let, on the other hand, C be a transitive proper subset of M. Then0 CM - C so that by 2) an element A of M - C exists such that A n (M - C) =0. Then, if BeC, neither A = B nor AeB, because of the transitivity of C.Therefore BeA and thus CEA because BeC yields BeA for all B. SinceAEM and A H(M - C) = 0, it follows that AEC, whence A = C, whenceCeM. Thus I have proved that every transitive proper subset of M is ele-ment of M.

Remark 3. It is clear according to this that every element of an R - ordi-nal is an R - ordinal.


Theorem 32. If A and B are R-ordinals, AeB-*--AC B.

Proof. AeB yields, because of the transitivity of B, A EB, but A = Bis excluded. If AEB, then it follows from the previous theorem that AeB.

Theorem 33. Any class K of R - ordinals is well-ordered by the relatione.

Proof. Let A =(= B both belong to K. The intersection A flB is, accordingto Remark 1 above, an R-ordinal. If we had A DB cA and cB, then by thepreceding theorem A f l B would be eA and eB, whence A f l B e A n B which isimpossible. Thus either ACB or BcA, whence AeB or BeA, so that K islinearly ordered by e. Now let Kf be a subclass of K and D be the intersec-tion of all elements of Kf. According to the Remark 1 above, D is an R-ordi-nal, and if A belongs to Kf, DEA and therefore DeA whenever A =)= D. Onthe other hand D must itself belong to Kf, for if it did not, D would be ele-ment of each A in Kf and thus eD, but DeD is impossible. This shows thatthere is in Kf a first element with regard to the relation e. It is also evidentaccording to this that every R-ordinal is a well-ordered set with regard tothe membership relation.

Theorem 34. Every transitive set M of R-ordinals is an R-ordinal.

Proof. If A and B are two different elements of M, either AeB orBeA according to the preceding theorem. Further, if NEM and 0 CN, thereis a first element E of N. Then as often as CeE, C is eN. Thus N is basic.

It is clear that every transitive set M of R-ordinals is the least R-ordi-nal follo'wing all A e M. In particular, if M has an immediate predecessorN, then M = SN + N, otherwise M = SM.

Godel has (1939) defined an ordinal number as a set M with the threeproperties

1) M is transitive.

2) If OCNEM, N is basic.

3) Every element of M is transitive.

Let us call these sets M G-ordinals. I shall show that they are just thesame sets as the R-ordinals. Let us assume that M is a G-ordinal and thatthere are elements of M which are not R-ordinals. These constitute a setSÔ and by 2) an element B of S exists such that BDS =0. Now let CeB.Then since BEM, so that CeM, we must have CeM - S, because otherwiseCeS which is impossible, BD S being =0, it follows that C is an R-ordinal.According to the last theorem, B is also an R-ordinal, which is a contra-diction. Therefore all elements of M are R-ordinals so that M itself is anR-ordinal. Let, inversely, M be an R-ordinal. Then every element of M istransitive, as we have shown above. Thus M is a G-ordinal.

Further, Bernays has defined (1941) an ordinal number as a set M withthe two properties

1) M is transitive

2) Every transitive proper subset of M is eM.

SETS REPRESENTING ORDINALS 37

We will say that every M satisfying this definition is a B-ordinal. I shallshow that the B-ordinals are again the same sets as the R- or G-ordinals.Let M be an R-ordinal. According to Theorem 31 every transitive propersubset of M is an element of M, that is, M is a B-ordinal. Let, on the otherhand, M be a B-ordinal, S be the set of elements of M which are R-ordinals.K AeB, BeS, then, according to Remark 3 above, A is an R-ordinal, that is,AeS. Thus S is transitive. By Theorem 34, S is an R-ordinal. Now, if Swere ^ M, S would be a transitive proper subset of M, therefore SeM,whence SeS, which is absurd. Hence S = M so that M is an R-ordinal.

Zermelo has (1915) set up the definition of ordinals, which we will callZ-ordinals, having the three properties

1) M= 0 or OeM

2) For every element AeM we have either AU {A}= M or A U (A}eM.

3) For every NEM we have either SN = M or SN eM.

I shall show that the Z-ordinals are the same as the B-ordinals. Let M =)= 0be a Z-ordinal and let A be the set of all B-ordinals B such that B LM andBeM. Whenever B f e B e A , B' is a B-ordinal cB whence B'cM and B ' e Mso that B'eA._ Thus A is transitive. Therefore A is a B-ordinal. We haveAEM, but AeM. Indeed AeM would mean that AeA. Now A maybe= BU{B} with BeM, whence by 2) A = M, or A is =SA, A the set of thepreceding B-ordinals, and since SAeM is excluded, we get by 3) that A = M.Thus M is a B-ordinal.

Let M be a B-ordinal. If M =t= 0, then OeM, because 0 is a propertransitive subset. K AeM, then AU{A} may be = M. If not, A U {A} is atransitive proper subset of M and therefore eM. Let N be EM. Then SNmay be =M. If not, SN is a transitive proper subset of M and therefore eM.Thus M is a Z-ordinal.

Finally v. Neumann has defined (1923) a set M as an ordinal number, wemay say N-ordinal, as follows:

A set M is an ordinal, if it can be well-ordered in such a way that every ele-ment is identical with its corresponding initial section.

Let M be a N-ordinal. If BeM and AeM, then B is an initial sectionof M and therefore AeM. Thus M is transitive. Let S be a transitive,proper subset of M and BeS while A precedes B in the well-ordering of M.Then AeB because B is identical with the initial section of M consisting ofall elements of M preceding B. Since S is transitive we have AeS. ThusS is an initial part of M, and because ScM an initial section of M. S isidentical with this section and is therefore e M. Hence M is a B-ordinal. If,inversely, M is a B-ordinal, one sees by the theorems above that it is well-ordered by e such that every element m of M is the set of all elements npreceding m.


9. The notions "finite" and "infinite"

We will now leave for a while the theory of transfinite numbers and dealwith the notion "finite set". There are different possible definitions of thisnotion and with the aid of the well-ordering theorem they can be proved to beequivalent. Without the axiom of choice the proof of this equivalence seemsimpossible. I shall prove that the well-ordered finite sets are just the well-ordered sets that are also inversely well-ordered, that is, there is in everynon-empty subset also a last element.

Definition of the notion inductive finite set:

A set u is inductive finite, if the following statement is true:

(x)(xeUUu & (Oex) & (y)(z)(yex & zeu -»y u{z}ex) -»uex).

In ordinary language this means that every set x of subsets of u, such thatOex and as often as yex and zeu, always y U {z}ex, contains u as element.

Remark. Such sets x of subsets always exist. Indeed Uu is such a setx.

According to this definition we of course have the following principle ofinduction: If a statement S is valid for 0, and S is always valid for y U {z}if it is true for y, y c u, zeu, u inductive finite, then S is valid for u. Ishall now prove a few theorems on the inductive finite sets.

Theorem 35. I f u i s inductive finite, so is u u {m}.Proof. It suffices to assume meu. Let x be a set of subsets of u U {m}

such that Oex and if yex and zeu U {m} then y U {z}ex. Further, let xf

be the subset of x consisting of all elements of x which are cu. Then Oex f

and as often as yex f , zeu, we have y u {z}ex and therefore also y U {z}exf.Thus, u being inductive finite, uex f . But uex and meu U{m} yields u U{m}ex.Hence the theorem is correct.

Theorem 36. Every subset of an inductive finite set u is inductive finite.

Proof. Let v be £u. I consider the set x of subsets w of u such thatw n v is inductive finite. It is obvious that Oex, because the set 0 is in-ductive finite. Let y be ex and zeu. Then y n v is inductive finite and(y U {z}) 0 v is either y n v, namely when zev, or (y n v) + {z}, namely ifzev. But by the preceding theorem also (y 0u) + {z} is inductive finite. Thusas often as yex, zeu, we have y u{z}ex. Since u is inductive finite, it fol-lows that uex. Hence u H v is inductive finite, that is, v is inductive finite.

It follows easily from this that each subset v of u, u inductive finite,must be an element of every set of subsets of the kind mentioned in the defi-nition of inductive finiteness.

Theorem 37. Ifu and v are inductive finite, so is u\j v.

Proof. We consider the subset x of all subsets w of u such that w U vis inductive finite. Obviously Oex. Let yex and zeu. By the previoustheorem, y is inductive finite. Further y U v is inductive finite so thaty U {z} U v is also inductive finite which means that y U{z} ex. Since u isinductive finite, uex. This again means that u U v is inductive finite.

THE NOTIONS "FINITE" AND "INFINITE" 39

Theorem 38. If T is an inductive finite set of inductive finite setsA,B,C,...., then ST is inductive finite.

Proof. We consider the subsets V of T such that SV is inductive finite.Obviously 0 is one of them. If V is one of them and KeT then V U {K} isone of these subsets of T according to the previous theorem, becauseS(V U {K}) = SV U K. Therefore, since T is inductive finite, T itself is oneof these subsets, that is, ST is inductive finite.

It is evident that if A is inductive finite, and there is a one-to-one cor-respondence between A and A', then A* is inductive finite. Using this it iseasily proved that the product of two inductive finite sets is again of thiskind, and further, that if T is an inductive finite set of inductive finite sets,the product PT is inductive finite.

Theorem 39. If u is inductive finite, every set y of subsets of u con-tains a maximal element x. This is in symbols

(U inductive finite) -^(y)(yeUUu-»(Ex)(xey & (z)((zey)-»(Et)(tex & te~z)v(x=z)))).

Proof. Let us consider the subsets of u for which this theorem is valid.Certainly 0 is one of these. Lety be one of them. Then, if zeu, alsoy U {z} will be such a subset of u. Let, namely, M be a set of subsets ofy U {z}. If all these subsets of y U {z}are actually subsets of y, then ac-cording to supposition there is a maximal element in M. Otherwise thereare elements of M of the form yf U {z}, where yf £ u. These y7 constitutea set Mf of subsets of y so that there is a maximal one, say y0, amongthem. But then y0U {z} is a maximal element in M. Hence, since u is in-ductive finite, the theorem is true for u.

The inverse is also true, namely:

Theorem 40. If every set of subsets ofu contains a maximal element,then u is inductive finite.

Proof. In particular there is a maximal element in every set x of sub-sets such that Oex and (yex) & (z-eu) —»(y U {z}eu). But in this case it isobvious that there is no other maximal element than u itself, which provesthe theorem.

We might therefore just as well define a finite set as a set with propertythat there is a maximal subset in every set of subsets. We have seen thatthis notion coincides with the notion inductive finite, and we may notice thatwe have proved this without any use of the axiom of choice.

A further definition of finiteness is the following: A set M is calledDedekind finite, if there is no one-to-one correspondence between M andany proper subset MT of M.

Theorem 41. If M is Dedekind finite, so is M U {m}.

Proof. If meM, nothing is to be proved. Let m be eM, and let us as-sume that f(x), where x runs through M U {m}, furnishes a one-to-onecorrespondence between M U{m} and a proper part N of that set. If N were£ M, then f(x) would map M on a proper part of M, contrary to supposition.We may therefore assume N = Ni + {m}, where NiCM. If f(m) were = m, fwould map M onto NI . Then we would have to assume that f(m) e NI . In


this case f "* (m) e M so that one may define a mapping g such that g(x) = f(x)for all x=)= m and n = f"1(m) with g(m) = m and g(n) = f(m). Then g wouldmap M onto Ni.

Theorem 42. Every inductive finite set is Dedekind finite.

Proof. Let M be inductive finite. Let HI be the set of all Dedekind finitesubsets of M. Then OeHl and by the previous theorem N + {m}e HI when-ever NeHl. Thus we have Mefll.

In this treatment of the notions of finiteness we have hitherto not used theaxiom of choice. This is needed, however, to prove the inverse of the lasttheorem. As a matter of fact, as far as I know, nobody has been able to provethat without the axiom of choice. I shall give two versions of the proof.

Theorem 43. Every inductive infinite set is Dedekind infinite.

Proof. That the set u is_inductive infinite means that there exists a setx of subsets of u such that uex in spite of the circumstance that Oex andwhenever yex & zeu, we have y U {z}ex. It is clear that there is no subsetof u occurring as a greatest element of x. Now let us assume the principleof choice, that we have a function f of the subsets y of u such that alwaysf(y)ey. Then we can define a g(y) for all yex thus: g(y) = f(u-y). Then wemay remark that the set x has the two properties: 1) Oex, 2) wheneveryex also y + {g(y)}ex. All these x together constitute a subset X of Uu.Let XQ be the intersection D3C of all these x. Then XQ still possesses theproperties 1) and 2). Furthermore, for every yex0 , where 0=1= y, there is ayêxo such that y = y-i + g(y-i). Otherwise x0- {y}would still possess theproperties 1) and 2) which is contrary to the definition of XQ. Then we maydefine a mapping of u on a proper part of u as follows. We let u - Sxo bemapped identically onto itself while every g(y), where yex0 , shall be theimage of g(y-i) for the corresponding y _ j . This provides a mapping of Sxoonto the proper part Sxo - (g(0)}. Indeed every zeSxo must be a g(y) forsome yexo, because otherwise we could remove all elements y containingthe element z from XQ and still have a subset x with the properties 1) and 2).

Theorem 44. If an inductive finite set is well-ordered, it is also in-versely well-ordered by the same ordering.

Proof. Let M be inductive finite. We consider the set T of all subsetsN for which the theorem is valid. We have OeT. Let N be eT and meMbut not eN. By every well-ordering of N + {m}, either m will precede allelements of N or come after all these, or m will divide N into an initialpart NI and a terminal part N2 so that all elements of Ni precede m whileall of N2 succeed m. But since every non-empty subset of N has both afirst and a last element, one sees that every subset of N + {m}which is notempty has this property as well. Therefore MeT, which means that thetheorem is true for M.

Theorem 45. If a set M is well-ordered and also inversely well-ordered,it is inductive finite.

Proof. Let us assume the existence of elements y of M such that theset of all x = y was not inductive finite. Among these y there is then aleast one, say m. There is a predecessor mi of m. Then the set of all


x = mi is inductive finite. But according to a previous theorem then also theset of the x ^ m must be inductive finite. Therefore the set of all x i y isinductive finite for arbitrary y. Taking y then as the last element, one seesthe truth of the theorem.

Using the last theorems we obtain another version of the proof of thestatement that every inductive infinite set M is Dedekind infinite. Howeverwe must also use the well-ordering theorem, so that this proof depends onthe axiom of choice as well. Let M be well-ordered. Then after our pre-ceding results this well-ordering of M cannot simultaneously be an inversewell-ordering. Thus there is a subset Mi ̂ 0 without a last element. Theset of all elements x = an element y of MI is then an initial part N of Mwithout last element. Every element n of N has a successor n'eN. We maythen define a mapping f of M into a proper part of M by putting f(n) = nf forevery neN and f(n) = n for every n not eN. '

10. The simple infinite sequence. Development of arithmetic

Let M be a Dedekind infinite set, f a one-to-one correspondence betweenM and a proper part Mf of M. Let 0 denote an element of M not in Mf. Idenote generally by af the image f(a) of a, also by Pf, when PEM, the setof all pf = f(p) when p runs through P. Let N be the intersection of all sub-sets X of M possessing the two properties

1) OeX, 2) (x)(xeX-»x'eX).

Then N is called a simple infinite sequence or the f-chain from 0. We maysay that it is the natural number series. It is evident that N has the proper-ties 1) and 2). Further we have the principle of induction: A set containing0 and for every x in it also containing x1 contains N.

Theorem 46. (y)(yeN -»(Ex)(y = xf) & (xeN) • v • y = 0).

This means that any element of N is either 0 or the f-image of another ele-ment of N. The proof is easy: Let us assume that neN and ^ 0 and ^ everyxf when xeN. Then N-{n} would still possess the properties 1) and 2), whichis absurd.

In order to develop arithmetic it is above all necessary to define the twofundamental operations addition and multiplication. Usually these as well asany other arithmetical functions are introduced by the so-called recursivedefinitions. I shall show how we are able to use here the ordinary explicitdefinitions which can be formulated with the aid of the predicate calculus. Ishall introduce addition and multiplication by defining the sets of orderedtriples (x,y,z) such that x + y = z resp. xy = z.

We may consider the sets X of triples (a,b,c), where a,b.,c are eN,which have the two properties:

1) All triples of the form (a,0,a) are eX.

2) Whenever (a,b,c) is eX, (a,b',cf) is eX.


It is clear that there exist such sets X. Indeed the set X0 of all triples(a,b,c), where a,b,c are eN, is one of them.

Now let S be the intersection of all these X. I shall show that S is justthe set of triples a,b,c such that a + b = c according to the usual meaningof addition. First of all it is clear that S itself is one of the sets X withthe properties 1) and 2). Further, the following inversion of 2) is true:

Theorem 47. Whenever (a,b',c')eS, we have (a,byc)eS.

Proof. Let us assume that we had a triple (a,b f ,cT)eS while (a,b,c)eS.Then it is seen that S-{(a,b',cf)} would still have the two properties. Indeedif (a,fty)eS -{a,b»,cf)} then (o,fty)e S, whence (a,/3 f ,y f)e S, whence again(o,j3 f,y f) e S -{(a,bjc_f)} unless a = a,j3 = b, y = c which however cannot be thecase, since (a,b,c)eS, whereas (a,/3,y)eS.

Using Theorem 46 we may also formulate Theorem 47 thus:

(x)(y)(z)[tx,y,z)eS & (y 4=0) &(z +0) -(Eu)(Ev)((x,u,v)eS & (y = u') & (z = v'))].

Theorem 48. (a,bf,0)e"s.

Proof. If, for some a,b, we had (a,bf,0)eS, it is seen that S -{(a,bf,0)}would still satisfy the requirements 1) and 2).

Theorem 49. (x)(y)((x,0,y)eS -*(x = y)).

Proof. Indeed, if (a,0,b) with b=(= a were eS, then S - {(a,0,b)} wouldstill possess the properties 1) and 2).

Theorem 50. (x)(y)((x,y,0) e S) -(x = 0) & (y = 0)).

Proof. Let (a,b,0) be eS. According to theorem 48 we have b = 0 be-cause of Theorem 46.

Then Theorem 49 yields a = 0.

Theorem 51. (x)(y)(z)(u)(((x,y,z) e S) & ((x,y,u) e S) -(z = u)).

Proof. Let P(b) be the proposition (x)(z)(u) (((x,b,z) e S) & ((x,b,u) e S) -*(z = u)). Then P(0) is true. Indeed, if (a,0,c) e S and (a,0,d)eS, it followsfrom Theorem 49 that c = a and d = a, whence c = d. Let us assume thatP(b) is true for some b. Then, if (a,b',c) and (a,bf,d) are e S, we have byTheorem 47 that c = c'i, d = dif for some GI and di while (a,b,Ci)eS and(a,b,di)eS, whence because of the assumed validity of P(b) it follows thatCi = di, whence c = d. Hence by complete induction the general validity ofP(b) is proved.

Theorem 52. (x)(y)(Ez)((z,y,z)eS).

Proof. Let P(b) here denote (x)(Ez)((x,b,z)eS). Then P(0) is true. Letus assume that P(b) is true for some b. Then for arbitrary a there is a csuch that (a,b,c)eS, whence (a,b f ,c?)eS so that P(bT) is true. Thus thetheorem is proved by complete induction.

The two last theorems show that for every x and y there is just one zsuch that (x,y,z)e S. We may therefore, instead of (a,b,c)eS, write c = a +b.1, further, 0T is called 1, we have a +1 = af and the equations

a' ± 0, (af = b1) —(a = b), a + 0 = a, a + bf = (a + b)1


are generally valid. As is well known we may derive the commutative andassociative laws of addition by complete induction. This will be carried outlater even in the more difficult case of predicative set theory based on theramified theory of types.

Now let us consider the sets Y of triples with the two properties:

1) all triples (a,0,0) are e Y

2) whenever (a,b,c)eY and (c,a,d)eS, we have (a,b',d)eY.

It is evident that such sets of triples exist. Indeed the set of all triplesis such a Y. Now let P be the intersection of all these Y. Then it is clearthat P is again such a Y, but we can also prove the following inversions ofthe properties 1) and 2):

Theorem 53. If(a,O,b)eP, then b = 0.

Proof. Indeed, if (a,0,b) were eP, b =(= 0, then P - {(a,0,b)} would notonly have the property 1), which is immediately seen, but also 2). Let(a,fty) be eP - {(a,0,b)} and (r,a,6) e S. Then (a,fty) e P together with(y,a,6) eS yields (a,/3',6) eP, whence (a,/3',6)e P - {(a,0,b)} because (o,j8f,6)cannot coincide with (a,0,b).

Theorem 54. If (a,b',c) ep, then (Ez)((a,b,z)e P & (z,a,c) e S).

Proof. Let us assume that we had (&,b\c) e P, while for all z either(a,b,z)e~P or (z,a,c)eS. Let us consider the set Pf = P - {(a,bf,c)}. Thisset has obviously the property 1). Now let (a,fty) be ePf and therefore eP.As proved above, there exists a unique 6 such that (y,a,6)e S. Then (a,/3f,6)eP and therefore also (c^/S'êP1 unless a = a,/3 = b,6 = c. This is im-possible, however, because in such a case we should have (a,b,y) e P and(y,a,c)e S. Thus P1 would also possess the property 2), and that is absurd.

Theorem 55. (x)(y)(z)(u) ((x,y,z) e P. & (x,y,u) e P -> (z = u)).

Proof. Let S (b) denote the statement (x)(z)(u) ((x,b,z)eP & (x,b,u)eP -»(z = u)). Then S(0) is true because (x,0,z)eP -*(z = 0) and (x,0,u)eP -»(u = 0)(see Theorem 53). Let us assume that S(b) is true, and let us look at the con-junction (a,b f,Ci)e P & (a,bf,c2)e P. K this condition is fulfilled, we have ac-cording to Theorem 54, that x and y exist such that (a,b,x)eP & (a,b,y)ePtogether with (x,a,Ci)eS & (y,a,c2)eS. Because of the validity of S(b) thisyields first x = y, whence GI = c2 by Theorem 51.

Theorem 56. (x)(y)(Ez) ((x,y,z)e P).

Proof. Let S(b) here be the statement (x)(Ez) ((x,b,z) e P). Then S(0) isobviously true. Let S(b) be true and let us assume (a,b,c)e P. Then byTheorem 52 there exists a d such that (c,a,d)eS, whence (a,bT,d)eP.

The two last theorems show that to every a,b there exists a unique csuch that (a,b,c)eP. Therefore we may instead of (a,b,c)eP write c = ab,c being a function of a and b. Further, we have besides the earlier formulasa1 ± 0, (aT = bf) —(a = b), a + 0 = a, a + bf = (a + b)T also

a • 0 = 0, abf = ab + a.

These, together with (a = b) —»(a = c —»b = c), beside the principle of induction


and the predicate calculus, constitute, however, the axiom system for formalnumber theory, see, for example, R.L. Goodstein, Mathematical Logic, p. 44.Thus we see that the development of ordinary arithmetic is possible in theZermelo-Fraenkel set theory.

The method I used here to replace the recursive definition of additionand multiplication by explicit definitions can be used quite generally forother recursive definitions. The primitive recursive schema, for example,is:

f(0, a2,....,an) = g(a2,..., an)

f(a! + 1, a2,...., an) = h(f(ai,...., an), a1?..., an)

Here g and h are previously defined functions with n-1 respectively n+1arguments, while f is the function to be defined. From the set-theoreticstandpoint we may replace this recursive definition by the following explicitone. That g and h are already known may be expressed by saying that wehave a set G of n-tuples and a set H of (n +2)-tuples of elements of N suchthat for arbitrary ai,..., an_1 there is just one b such that (ai,.., an_i, b)e Gand for arbitrary ai, .., an+1 there is just one b such that (ai, .., an+i, b)eH.Then we consider all sets of n+1-tuples of elements of N which possess thetwo properties:

1) Whenever (a2, ..., an, b)eG, we have (0, a2, ..., an, b )eX.

2) Whenever (ai, a2, ..., an ,b)eX and (b, ai, ...., an, c)eH, we have(ai + 1, a2, ..., an, c)eX. Then the intersection F of all sets X of thiskind yields the function f, namely, as often as (ai, ...., an, b) is eF, wehave b = f(at, ..., an), and inversely.

But also other kinds of recursions may be treated in the same way. Asa further example we may take the definition of the Ackermann-Peter function,namely:

0(0,n) = n + 1

0 (m +1, G) = 0 (m, 1)

</>vm +1, n +1) = 0(m, 0(m + 1, n)).

We consider here the sets Z of triples with the three properties:

1) All triples (0,n,n + l) are e Z

2) Whenever (m,l,n) is e Z, so is (m + 1, 0, n)

3) For arbitrary m, n, h, k we have

(m + 1, n, h) e Z. & . (m, h, k) e Z -*(m + l, n+1, k) e Z.

If 0 is the intersection of all these sets Z, one proves easily that to everypair a,b there is just one c such that (a,b,c)e0. Thus c is a function 0 ofa,b, and this 0 is just the function defined by the recursive schema.

REMARKS ON THE NATURE OF THE SET-THEORETIC AXIOMS 45

11. Some remarks on the nature of the set-theoretic axioms.

The set-theoretic relativism.

Most of the axioms of the Zermelo-Fraenkel theory have the form: Theclass of all elements for which a certain statement is valid is a set, or, inother words, the domain D contains an element M such that all the objectsin the class, and only these, are e M. We might call these axioms "definingaxioms," because the set which is declared to exist is also defined. Thereare two axioms at least, however, which are not of this kind, namely, theaxiom of infinity and the axiom of choice. The axiom I mentioned expressingthe general aleph hypothesis is of course not a defining axiom. As I haveshown (see Mathematica Scandinavica, vol. 5, p. 40) the axiom of infinity canbe put into defining form. The easiest way of doing that is to use the notionof ordinal set introduced in § 8. We may define a finite ordinal as an ordinalset M such that (Ex) (xeM) & (M = x*) & (y)(y eM -* (Ez)(zey & y = z*).Here x* means x U{x}. Then the axiom of infinity can be expressed bysaying that the finite ordinals constitute a set.

The axiom of choice has given rise to many discussions. The reasonfor this is of course its non-constructive character. But people who desireto retain as much as possible of the old Cantor theory feel obliged to maintain-that axiom. It is also quite clear that from an axiomatic point of view onemust be allowed to study the consequences of any axioms whatever. On theother hand it cannot be denied that this axiom also leads to consequenceswhich one scarcely had expected. I shall mention a couple of examples ofthis without entering into the proofs.

In Hausdorff's book "Grundzu'ge der Mengenlehre" one finds the proof ofthe following statement: It is possible to divide the surface of a sphere into4 disjoint parts A,B,C,D such that A is a denumerable set of points, whileB,C,D, are mutually congruent and at the same time B is congruent to C + D.That two sets of points are congruent means of course that they arise fromone another by a rotation of the sphere.

Still more astonishing is a result obtained by Banach and Tar ski whichhas later been improved by some other authors. In an article "Decomposi-tions of a sphere" by T. J. Dekker and J. de Groot in Fund. Math. XLIII itis proved that it is possible to divide a 3-dimensional unit sphere in 5 disjointpieces, each piece being a connected set, such that by suitable translationsand rotations these pieces can be put together again so that two unit spheresare formed.

In the last instance it is a matter of personal taste whether one wants tohave a set theory without or with an axiom of choice. A similar remark mustbe made with regard to the aleph hypothesis or the hypothesis of the exis-tence of inaccessible cardinals etc.

From a purely logical point of view it would already be interesting tostudy a set theory with only defining axioms. I have proved (see my address"Some remarks on set theory" in the report of the International Congressof Mathematicians, Cambridge, Mass, 1950) that in such a theory the intro-duction of any set M can be brought into the form

(1) xeM — 0(x),


where 0(x) is a prepositional function containing only x as a free variablewhile there may be an arbitrary number of bound variables, 0 being builtfrom atomic expressions xey, yex, yez , etc. by the logical connectivesand the quantifiers. One might think in the first instance that there is amore general way of defining new sets, namely, by writing

(2) xeM — 0(x,N,P,R,..),

where N,P,R,... are previously defined sets entering into the expression 0.However, it is possible to prove that every set defined by an equivalence ofthe form (2) is already definable by (1). Indeed the reduction of a definition(2) to the form (I) can be performed by introducing the definitions of N,P,R,...into (2) and repeating if necessary, this procedure. If N,P,R,... are definedby (1) we get on once the form (1) by introducing their definitions. If N,P,R,...are themselves defined again by (2), the process must be repeated.

The simplest example of reduction from (2) to (1) is the case that M isdefined by the axiom of separation applied to a set N which is defined in theform (1). E indeed

(3) xeN—A(x)

and

xeM^-(xeN) & B(x) ,

then

xeM—A(x) & B(x)

which is of the form (1). Let us take as a little more complicated examplethe definition of the set M of all non-empty subsets of N, where N is de-fined by (3). First we have

(xeM)—(xeUN) & (Ey)(yex),

but

(xeUN)«~(xCNH-(z)(zex . v . zeN) —^(z)(zex • v • A(z)),

so that we obtain

(xeM) — (Ey)(yex) & (z) (z?x • v • A(z)).

It is now easy to understand the correctness of the theorem:

Theorem 57. In a set theory where the axioms are all of the form: Theclass so and so is a set, the definable sets constitute a denumerable class.

Proof. We obtain all sets M by taking in (1) all propositional functionswhich, by the operations of the predicate calculus, can be built from atomicstatements yez and only contain x as a free variable. We may replace x byXQ, letting the bound variables be denoted by Xi, x2, .... Further, 0(x) maybe written in prenex normal form, while its matrix is written in conjunctivenormal form. Then we will get an enumeration of all 0(x) by enumerating allfinite sequences consisting first of some pairs of integers corresponding tothe quantifiers of the prefix, the first number being the index of the x whichoccurs as quantifier, the last number being 0 or 1 according as the quantifieris universal or existential. This sequence of pairs is then followed by a finite

REMARKS ON THE NATURE OF THE SET-THEORETIC AXIOMS 47

sequence of finite sequences of triples, each triple corresponding to an atomicstatement xmexn, the last number in the triple being 0 or 1 according asxmexn occurs unnegated or negated and the first numbers being m and n.

Of course this class of definable sets is not itself a set, or, in other words,it is no object in the domain D; neither is the enumeration of these sets acorrespondence which occurs as a set in D.

These considerations are put in a clearer light by the application to axio-matic set theory of the Lb'wenheim theorem, or more exactly a generalizationof this. The theorem of Lowenheim says that if F is a well-formed formulaof the first order predicate calculus with certain predicate variables A,B,C,..,either F is provable or F can be satisfied in the natural number series bysuitable determination of A,B,C,... in that domain of individuals. The general-ization which I proved in 1919 says that the same is true for_an enumeratedset of such formulas, say Fi, F2, ..., that is,, either some Fj is provable orthe whole set of formulas can be satisfied by suitable determination in N ofthe predicates occurring in them. Since the axioms of our set theory areeither such formulas or are schemas each case of which is such a formula,it is clear that the generalized Lowenheim theorem can be applied. Thereforewe have that if the axioms are consistent, it must be possible to determinethe relation e between the natural numbers in such a way that all our axiomsbecome valid. This result appears paradoxical, but it is not difficult to under-stand how it can be explained. Indeed, the existence of sets in our domain Dis given by the axioms, and we have no guarantee that it should not be possiblein other ways to introduce further sets. Therefore we have no reason to ex-pect that, for example, the subsets of an infinite set which we can prove to 'exist in D are all of the subsets in an absolute sense. We must be contentwith a relativistic conception of set theory. Everything must be conceivedin relation to D as it is supposed to be by the axioms, and we must abandonthe idea that the axioms shall yield an absolute notion of "set" as in Cantor'stheory. That M is not ~ N means in the axiomatic theory that there is in Dno set F of pairs (m,n), meM, neN, yielding a one-to-one correspondencebetween M and N. But that does not mean that we cannot find such a set atall. There might be such a set, but outside D. In this way there might be aone-to-one correspondence between the Zermelo number series consisting ofthe elements 0, {0}, {{0}}, .... and the whole domain D, but this correspond-ence is not one of the sets of pairs which occur in D. Because of the generalcharacter of the theorem of Lowenheim and its generalization, it is clear thatthis set-theoretic relativism is unavoidable if we desire to have an exactformulation of set theory at all. Of course it shows the illusory character ofthe absolutist conceptions of Cantor's theory.


12. The simple theory of types

In order to avoid the logical paradoxes, Russell invented the theory oftypes. The idea is to distribute all objects of thought into different types or,in other words to assume that they can be put into different layers or at dif-ferent levels. We have some original objects called objects of type 0 (or 1 ifone prefers). Sets of these objects or relations between them are objects oftype 1. Sets of these again are objects of type 2, and so on. Further, themembership relation xey shall only have a meaning, if y is of type n + 1 asoften as x is of type n. Composite prepositional functions 0(x) built upfrom atomic propositions xey have then only meaning if it is possible toattach numbers to the occurring variables such that always the symbol y inevery occurring atomic proposition xey gets the number n + 1 when x getsthe number n. Such expressions 0(x) are called stratified.

We may now set up the following axiom of comprehension: For any stra-tified 0 (x) there exists a y such that the equivalence

x e y — 0(x)

is generally valid, that is, it is valid for all x of type n if y is of type n+1.Since we do not introduce negative types, there will be a lowest possible typefor x in 0(x), say no. Then the axiom asserts

(I) (Ey)(x)(xe y— 0(x)),

where the range of the universal quantifier is the domain of all objects oftype n, n = no, and the range of (Ey) is the domain of objects of type n + 1.The identity relation x = y might be introduced as an undefined notion besidethe membership relation e . Then we would have to set up the axiom

(x = y)-(i//(x)— 'My))

for every stratified i//(x). It is simpler, however, to use only e as an unde-fined notion and define = by letting x = y stand for the validity of the equi-valence,

for any stratified ty . We then also need, however, the axiom of extensionality

(II) (z)(zex— -zey)-»(x = y).

It is seen at once that the axioms of the power set and the union in the Zer-melo-Fraenkel theory are valid statements here, and also the axiom ofseparation for stratified C(x). As to the axioms of the small sets, these arealso valid with the restriction that {a,b} can be built only when a and b areof the same type. It must be noticed, however, that we get not only universalsets of different types but also hull sets of different types. Indeed

(Ey) (xey • v • xe~y) and (y)(xey • & • xey)

used as 0(x) in (I) define, if y runs through all individuals of type n + 1, theuniversal set of type n respectively the null set of type n.

Because of the restriction in building the set {a,b} , we ought to look atthe union and intersection of two sets. If A(x) and B(x) are two stratified

THE SIMPLE THEORY OF TYPES 49

prepositional functions with only x as free variable, then also A(x) & B(x) andA(x) V B(x) will be stratified. This is seen as follows: If we can attach num-bers to x and the other (the bound) variables in A(x) and do the same for B(x),then it is possible to do this for A(x) and B(x) in such a way that x is as-signed the same number in A and B. Asa consequence of this, we can, forevery type = a certain one, always build the union x(A(x) vB(x)) and the inter-section x(A(x) & B(x)) of the sets x A(x) and xB(x). It must be remarkedthat A(x) is also stratified when A(x) is, so that we get a complementary setto every given set.

There is a certain difficulty with regard to relations and functions. Onewould have liked to be able to conceive a binary relation as a set of pairs, andit would have been nice if this set could have been of the same type as a setof single elements. However, this would require the introduction of orderedpairs, triples, and so on, as new objects of the same type as the differentterms in these sequences. Thus an ordered pair (a,b), where a and b are ofa certain type, should again be an object of this type. This would mean acertain complication. Instead of that one could let the sign e stand for abinary relation in the case xey , and a ternary if an ordered pair (x,y) is ez,and so on. Probably this is not advisable. The best thing to do is, I shouldthink, to introduce the ordered pairs, triples, and so on, as sets. Also bythis procedure one has to tolerate a certain complication, because the set ofall x such that A(x) will not be of the same type as the set of all (x,y) suchthat B(x,y). For example if we have to do with a set N representing thenumber series, then the set of all primes p will be of same type as N, butthe set P of all ordered pairs (x,y), where xeN, y eN, will be of a type 2units higher. Indeed (a,b) = {{a,b}, {a}} is of a type 2 units higher thanthe type of a and b. The set {{p}} will however be of the same type as theset P. So far as I can see, it will be best to consider the ordered pairs,triples, etc., as sets.

If we should try to develop mathematics, basing it on the simple theoryof types, it would be desirable to have an axiom of infinity for the things oftype 0. Indeed, if there is only a finite number of individuals of type 0,there can be only a finite number of each of the higher types. The develop-ment of arithmetic will then already be difficult and analysis would scarcelybe possible. Now the axiom of infinity might be set up in different ways. Wemight assume a one-to-one correspondence f given between the set V of allthings of type 0 and a proper subset VT of V. This mapping f would then bea fundamental notion in the theory beside the relation e . We may manage sothat we don't introduce such an extra notion. We may assume the axiom

(III) (x) (x is inductive finite -»(Ey)(yix)).

where y runs through all objects of type 0, x all objects of type 1. Thenthere will exist sets x of type 1 with 0,1,2,... elements. Introducing thenotion cardinal number for the sets of type 1, every one of these cardinals isa set of type 2, and the finite cardinals constitute a set of type 3 which canbe taken as the natural number series. Starting with this, the introduction ofnegative integers, fractions, real numbers, etc., can be performed in just theusual way. One has to take care of the type distinctions, but it is quite easyto develop ordinally mathematics in this way.

Some small changes will often be necessary to carry over the theorems


and their proofs from the Zermelo-Fraenkel theory to the simple theory oftypes. Bernstein's equivalence theorem with its proof remains unchanged.Cantor's theorem that UM is always of higher cardinality than M must beexpressed thus: Let EM be the set of all unit sets {m} contained in M.

Then EM < UM. The previous definition of well-order ing (see § 4) must beslightly changed to this wording: A set M is well-ordered, if there is afunction R from EM to UM such that, for 0<=NEM, there is a unique neNsuch that NER({ n}). The wording of Theorem 10 must now be: Let a func-tion 0 be given such that 0(A), for every A such that OcA£M, denotes aunit subset of A. Then there is a subset JH of UM such that to every NE-Mthere is one and only one element No of HI such that N£No and 0(No) EN.Such slight changes will be necessary in many of the previous theorems andproofs. K we look at Theorem 6 for example, there can be no meaning in anequivalence between M + N and M • N or even M x N, because the elementsof M • N are of type t + 1 and those of M x N are of type t + 2 when those ofM and N are of type t. If, however, we replace M by its sets of unit subsetsEM and N by EN, then EM + EN and M • N will be of same type, and anequivalence between these two sets will be meaningful. Similarly we cancompare EEM + EEN and M x M. I don't think it is necessary to carry outin detail these small changes in the considerations. By the way, it may beremarked that functions may well be introduced such that arguments andvalues are not of same type, but if functions should be conceived as specialcases of relations, and relations as sets of sequences conceived as sets,such a procedure must be avoided.

13. The theory of Quine

There have been many attempts to avoid the introduction of types, whichare inconvenient. One of these is the theory of Quine. An exposition of thiscan be found in the book "Logic for Mathematicians" recently published by B.Rosser. Quine's theory is something intermediate between the axiomatictheory of Zermelo-Fraenkel and Russell's type theory. It has in commonwith the former the feature that there are no type distinctions. On the otherhand it has in common with the latter the feature that only stratified proposi-tional functions are admitted for the definition of new sets. Indeed we havein Quine's theory the following axiom of comprehension:

(Ey)(x)(xey—0(x))

with the whole domain of objects as range of variation of x and y. Of coursey must not occur in 0(x).

It is easy to see that here we again get only one null set A and only oneuniversal set V. We may for example use these definitions:

xeA-~(y)(xey & x?y), xeV-^-(Ey)(xey • v • xey) .

Obviously the set V is eV. Nevertheless Russell's antinomy cannot be de-duced, because the propositional function xex is not stratified, so that no

THE THEORY OF QUINE 51

set M can be introduced such that xeM should be-*-*-xex. The ordinary con-structions of new sets are, however, valid. If A(x) and B(x) are stratified,say without free variables other than x, also A(x) & B(x) and A(x) v B(x) arestratified, making the definition of an intersection and the union of two setspossible. Further, if A(x) is stratified, and x does not occur in A(y), then(Ey)(xey & A(y)) is stratified as well. This shows the existence of the unionof all elements of a given set. Further (x)(xey • v • A(x)) is stratified so thatwe can always build the set of all subsets of a given set. Since A(x) is alsostratified, there always exists a complementary set to any given set. Thereis therefore a greater possibility for the introduction of new sets in thistheory than in Zermelo's. In spite of this, however, it turns out that the ex-istence of infinite sets is not any more provable in Quine's theory than inZermelo's, so that an axiom of infinity is just as well needed here. This isdue to the fact that the prepositional functions needed for the definition of aninfinite set are not stratified. In Rosser's book the axiom of infinity is setup thus:

(m)(n)(meNn & n e N n & m + l = n + l — » m = n).

Here Nn means the set of natural numbers, where the natural numbers aredefined as the cardinals of finite sets. The axiom has the effect that none ofthese cardinals coincides with the set A, or in other words, there exist finitecardinal numbers as large as we please. The sequence of natural numbersis then infinite.

It is interesting to look at Cantor's theorem. In type theory we could notcompare Um with m. Here we can do that, but Cantor's theorem is notgenerally valid. That it cannot be generally valid is clear, because at anyrate it cannot be true for V. However, if we modify the theorem a little, say-ing that UM is of higher cardinality than EM (this was also the formulationwe could use in type theory) then we get a correct statement. This circum-stance shows again that M and EM cannot always be equivalent. This ap-pears very peculiar, but if we try to prove the equivalence between M andEM in general, this turns out to be impossible, because we would have touse prepositional functions which are not stratified. Nevertheless, in manyparticular cases the use of non-stratified formulas can be avoided. Wetherefore have to distinguish between sets M for which we can prove theequivalence between M and EM and those for which this is not provable. Theformer kind of sets are said to be Cantorian and Can M is written for thestatement M ~ EM. Rosser mentions in his book that the statement Can Mis provable not only for the natural number series, M = Nn, but for alt thesets which occur in ordinary mathematics.

Since UVE V, we have

UVî V.

On the other hand

UV> EV.

so that

(1) Iv < V


From this relation it follows (see the proof below) that

(2) H? < EV,

so that the sets V,EV, EEV, .... will possess decreasing cardinal numbers.The existence of such a decreasing sequence of cardinals shows that thesecardinals cannot be alephs, whence it follows that not all sets can be well-ordered. Therefore, the axiom of choice cannot be added to the other axiomsof Quine's theory without contradictions. We may express this fact by sayingthat the principle of choice can be proved false in Quine's theory. This waspointed out by Specker.

Proof that (2) follows from (1): Because of (1) there exists a mapping ofthe set of all unit sets {m} on a subset of V. Indeed the identical mapping isof that kind. However, the identical mapping maps the set of all {{m}} onjust this subset of all sets {m}. Let us on the other hand assume that EVcould be mapped onto EEV. The mapping would then consist of mutually dis-joint pairs ({m}, {{n}}). However, the certainly existing set of pairs (m, {n})would then furnish a mapping of V on EV contrary to (1). Hence (2) followsfrom (1).

The theory of Quine's does not seem to have many adherents amongmathematicians. The reason for this is presumably the existence of suchsets in it as V which are elements of themselves, pathological sets as theyare called. I don't think, however, that this circumstance ought to worrymathematicians, because it is not necessary to take these abnormal sets intoaccount in the development of the ordinary mathematical theories.

14. The ramified theory of types. Predicative set theory

I have already mentioned Poincare's objection to Cantor's set theory,that one makes use of the so-called non-predicative definitions. These defi-nitions collect objects in such a way that the totality of these objects, orobjects logically dependent upon that totality, are considered as belonging tothe same totality, so that the definition has a circular character. It mightperhaps be better to say that a non-predicative definition is the definition ofan entity by a logical expression containing a bound variable such that thedefined entity is one of the possible values of this variable. However, insteadof trying to explain this generally, I think it is better to take a characteristicexample.

Let us consider mankind, the domain of all human beings. We have thebinary relation "x is a child of y" which I write Ch(x,y). Let us try to de-fine descendant of P, P any given person. If we make use of the notion offinite number we may proceed thus: We define the relation Chn(x,y) re-cursively by letting

Ch'foy) stand for Ch(x,y)y) stand for (Ez)(Chn(x,z) & Ch(z,y)).

THE RAMIFIED THEORY OF TYPES 53

Then the proposition "x is a descendant of P" may be written

(En)(Chn(x,P)).

All this is quite clear and simple, but notice that we have to use quantifiersthat are logically very different in nature, namely, on the one hand, quantifierswith mankind as range of variation, and, on the other hand, a quantifier ex-tended over natural numbers. What appears most unsatifactory, however, isthe circumstance that the notion natural number itself is of the same kind asthe notion descendant of P. Indeed we can say that the numbers are the des-cendants of O by the successor relation + 1; therefore the above definitiononly refers one descendant relation to another. We may therefore ask if wecan give a definition of a purely logical character that is independent of thenotion of natural number. Following Frege and Dedekind we may do that byletting "z is a descendant of P" stand for

& (x)(y) (X(y) & Ch

where X runs through all classes of human beings. In ordinary language thewording of this is: That z is a descendant of P means that z belongs to everyclass X with the two properties, 1) P belongs to X, 2) whenever y belongsto X and x is a child of y, then x belongs to X. This is a typical example ofa non -predicative definition because the defined class "descendant of P" isitself one of the values which the variable X is assumed to run through. Ofcourse this definition is quite in order in the axiomatic set theory of Zermelo,also in Quine's theory, and in the simple theory of types as well. But in thecase of such theories we have the question of consistency. The older andmore natural point of view was that we should be able to set up a kind ofreasoning which could be considered reliable so that we were assured a priorithat contradiction would never arise. If we should try to set up such a logic,then the ramified theory of types, a theory where non-predicative definitionsare excluded, might be assumed to be the correct one. It could be reasonableto assume that this theory is really a perfectly reliable one. Then, if wecould believe this, a proof of consistency of this theory would be somethingout of the way, namely unnecessary and without point, because the reasoningyielding this proof could not be considered more reliable than the theoryitself.

In the ramified theory of types we have, just as in the simple theory, adistinction of type such that a e b only has a meaning when the type of b is aunit more than that of a. However we have also a distinction of order betweenobjects of the same type. Thus if a class of objects of type zero is defined insuch a way that only quantifiers extended over objects of type zero are used,then this class is of first order. If a class, still of objects of type 0, isdefined so that beside eventual quantifiers extended over objects of type 0,there are also quantifiers extended over the just mentioned classes of order1, then this class is said to be of order 2, and so on. A similar distinctionof order must take place for the objects of type 1,2,.... But there are evenfurther distinctions, because a class of objects of type 0, say, can also bedefined by a logical expression containing quantifiers extended over objectsof type 2 or even higher types. I shall, however, not try to go into furtherdetail in this rather complicated affair, but rather give some examples of thekind of reasoning that is possible when we proceed in a predicative manner.


As a first example we may look at the proof of the Bernstein theorem ofequivalence. We had sets M, Mf, MI such that

M ~ M f , Mf CMi CM

and we proved the existence of a 1 - 1 - correspondence between M and Mi.In the proof of this which I gave earlier I used, however, at one point a non-predicative definition, namely, reckoning DT as a subset of M in the samemeaning as the diverse elements of T. If we assume that the correspondencebetween M and Mf is of 1s* order, M, Mf and Mi sets of 1s* order andwe let T be the set—which of course is of type one unit higher than the typeof M, Mf, Mi —of all subsets of 1st order A such that for Q = MI - Mf

Q E A , AT EA,

then DT is a subset of 2nc* order and the earlier conclusion that A0 = DT iseT is no longer valid. Nevertheless we may prove the identity

Ao = Q + Afo

which we obtained in the earlier proof, but it must now be shown in a differentway. Let us here write D instead of A0. Then I shall first show that we have

D = Q + Df.

Let us assume that a d existed such that

deD, but de~Q& de~Df.

The assumption deD1 means that an XeT exists such that deX1, because DT

is just the intersection of all Xf, where XeT. On the other hand we havedeX and deQ. Now the set

Y = X - {d}

is of order 1 just as X and still Q is EY. Let y be eY. Then yeX, whencey f e X because Xf ex. Hence y f eY, because yf cannot be = d, since deYT

and yf e Y1. Thus we have proved that

Q EY and Y' EY

so that YeT. Now we had deY, whence deD which is a contradiction.Therefore I have shown that if deD, then deQ • v • deD1, that is

(1) D EQ UD' .

Since QEA for every AeT we have Q ED, and since Af E A for everyAeT we get DT E D. Thus

(2) Q U D' E D

(1) and (2) then yield as before

D = Q + Df

and the remaining part of the proof can be carried out just as before.There are however also theorems in the usual set theory which are no

longer provable in predicative set theory. As an example I shall mentionCantor's theorem that UM always possesses higher cardinality than M. Wemust replace M by EM of course, so that we would have to try to prove the


nonexistence of a 1-1 -correspondence between UM and EM. Our earlierproof was essentially due to the possibility of deriving a contradiction byconsidering the set N of all meM such that, if F was the assumed correspond-ence, meX where X was the subset of M corresponding to {m} by F, that is,(X, {m})eF. Translating the last phrase into logical symbols we have

m e N — (X){XeUM-*((X, {m})eF ~

Since this expression contains the quantifier X extended over all sets X oforder 1 say, the defined set N of elements m is of order 2. But then wecannot substitute N instead of X and the derived contradiction disappears.Then Cantor's theorem is not longer provable as before. One might perhapsthink that it could be proved in a quite different way, but that does not seemto be the case. In my opinion one has little reason to be worried because ofthe necessity to drop this theorem. Indeed the distinction of order compen-sates for the fact that we don't have the usual distinction of cardinality.

As a further example of predicative reasoning I shall develop elementaryarithmetic basing it as before on a definition of the simple infinite sequence,now, however, taking into account the order distinction. I prefer now to talkabout classes, relations, etc., instead of sets. Also I think the considerationswill be easier, if I use suffixes to denote the different orders. To begin withI assume that we have a class M and a binary relation fi (x,y) both of order 1.The relation fi is supposed to be a 1-1 -correspondence. The identity rela-tion x = y is assumed to be a relation of order 1; but for simplicity I assumethe axiom

valid for 0 of arbitrary orders. Then we assume

fi (x, y) & f ! (z, y) — (x = z)

f ! (x, y) & f ! (x, u) -» (y = u)

For simplicity I denote y, whenever fi(x,y) takes place, by xf. The class of1-st order consisting of all xf, x running through Xi , I denote by Xi f . ThenI assume that Mf c M and O may denote an element of M not in Mf. I de-note by N2 the class defined thus:

neN2—(Xi)(O6Xi & (x) (xeXi ->x f eXi ) ->(neXi))

or, as I now prefer to write it,

Na(n)— (Xi) (XitO) & (x) (K^x) ^X^x')) ->X1(n)).

The class of type 2 whose elements are all Xi for which Xi (O) & (x)(Xi(x) -*Xi(xt)) may be denoted by T. Similarly N3 is defined thus:

N3(n)— (X2) (X2(0) & (x) (X2(x) -X2(x')) ^X2(n)),

etc. Corresponding to these definitions we have the following principles ofinduction. If a class Xr of order r contains O and besides x always con-tains XT, then Xr contains the whole class Nr+i . We may regard N2, N3,...as successively sharpened determinations of the natural number series.

Now I shall show how we can define a ternary relation of second order,S2(x,y,z), such that, conceiving S2(x,y,z) as x + y = z, we obtain the ordinarytheorems of addition.


Let us consider the ternary relations of first order X! (x,y,z) with thetwo properties

1) (x) Xi(x, 0,*) , 2) (x)(y)(z) (Xi(x,y,z) -

They constitute a class Tr of type 2.These have an intersection S2 (x,y,z) and trivially we have

(x)S2(x,0,x) and (x)(y)(z) (S2(x,y,z) -S2(x,y',z')).

I shall prove such statements as

(x)(N2(x) -»(x = 0 • v • Nf2(x)) or in other words

(x)(N2(x) -(x = 0 - v • (Ey)(N2(y) & (x = y'))).

Further

(x)(y)S2 (x,y',0)) (x)(z)S2 (x,0,z) - (x = z) and (x)(y)(S2 (x,y,0) -*x = 0 & y = 0

Proof of

(x)(N2(x) -x = 0 • v • (Ey(N2(y) & (x = y')).

Let us assume the existence of an individual a such that N2 (a) & (a =1= 0) & N2(a). Because of N2(a) we have for every XieT that Xi(a). Now let X? beXi - {a}. Then I shall show that for at least_one Xi , Xi* would still have theproperties 1) and 2) so that "Xi*eT, whence N2(a), a contradiction. Indeed wehave X? (0) since XÔ) and a =1= 0. Further, if X?(a), then Xâ), whenceXi(a'), Xi being eT, whence again Xi* (d1) unless a = af. Now there must beat least one Xi e T for which this is not the case, because otherwise we shouldhave Na(a) contrary to the assumption concerning a. Since there is an XêTsuch that Xi*(a), we should have N2(a), which is a contradiction.

Proof of S2 (a,bT,0) for arbitrary a and b. Let us assume S2 (a,bf ,0). Thenwe have Xi (a,bf,0) for every Xie Tr. Let Xf be Xi -{(a,b',0) }. Then X?still has the property 1), because (x,0,x) can never be = (a,bf ,0), 0 being =t=every yf. However, X* also possesses the property 2). Indeed if Xt (a,3,y),thenX!(a,/3,y), whence Xi (a, j3f,yf), whence Xt (a, 0T,r'), unless (a,j3 f,y f)were = (a,bT,0) which is impossible because yf I 0. But X*e Tr andxT(a,b',0) yields S2(a,bf,0).

Proof of S2(a,0,c) -»(a = c). Let us assume S2(a,0,c) & (a =)= c). Then forevery Xi€ Tr we have Xi (a,0,c). Let X* be X! - {(a,0,c)}. Then it is seenagain that X* will still possess the two properties, so that X*e Tr. SinceX3} (a,0,c), it follows that S2(a,0,c) which is contrary to supposition.

Then the truth of S2 (a,b,0) -»a = 0&b = 0 follows from the last threestatements.

Proof of jSâjb'jC*) -^S2(a,b,c). Let us assume for some a,b,c thatS2 (a,bf,cf) & S2 (a,b,c). Then for an arbitrary element Xi of Tr we haveXi (a,bf ,cf), whereas for a certain Xi we have Xi (a,b,c). Let Xf be Xi -{(a,bf,cf) } for such an Xi . Then it is seen immediately that Xt has theproperty 1). It has the property 2) as well. Indeed, let 3Ci(a,fty) be true.Then Xi(o,fty) is true, whence Xi (a,f?,y*), whence X*(a,j3t,yt), unless(a,P,f) = (a,bf,cf) which however would mean (a,/3,y) = (a,b,c) but that is im-possible because we have Xi (a,b,c) but Xi(ff,0,y). Hence X*eTr so that X*(a,b f ,c f) leads to S2(a,bf,cf) contrary to supposition.


Proof of S2(0,b,c) —»(b = c). Let Xi be e Tr and X* be what remains ofXx when all triples (0,y,z) with y 4= z are removed from Xi. Obviously X* isof order 1 just as Xi is. I assert that also X*e Tr. Indeed for every triple(o,0,a) we have Xi(a,Q,a) whence also X*(a,0,a). Otherwise (o?,0,a) wouldbe of the form (0,y,z) with y ^ z, but that is not the case. Thus X* has theproperty 1). Let us assume X*(a,/3,y). Then Xi (a,0,y), whence Xi(af,£ f,y f),whence also X*(a,/3f,yT) unless (a,j3l,yt) is of the form (0,y,z) with y =1= z,that is, a = 0, (3* =1= yf. But then we should have Xf (ff,fty). Thus X?e Tr andsince S2(0,b,c)-*X*(0,b,c) we have b = c.

Theorem 58. (x)(y)(z)(S (x',y,z') ->S (x,y,z)).

Proof. For each Xi e Tr we let X* be what remains of Xi when alltriples (xf,y,zf) are removed for which we have Xi (xf,y,zf) but not Xi(x,y,z),that is, X?(x',y,z')—X^x'^z') &X!(x,y,z). Further all triples (x,y,0) areremoved for which x or y is =(= 0. Then X* has the property 1). Indeed forall (a,Q,a) we have Xi(a,0,a)> whence X*(o?,0,a), because if a= a[, we havealso Xi (o?i,0,Q?i). Now let us assume X?(a,/3,y). Then Xi(a,/3,y) whenceXi(o?,|3l,yt) whence X*(a,j3f,yf), unless (a,j3f,yf) = a certain (xf^y,zf) forwhich Xi(x f

fy,z f) & Xi(x,y,z). That would mean Xifa^y') & Xi(ai,j3f,y) witha = a[. Let us first consider the case y + 0, that is, y = yj for a certainyi. Then because of X*(o?,/3,y) we have Xi(a,j3,y) & Xifo^ftyi). ButXi («i,jS,yi) yields Xi (a,j3T,y) so that we get a contradiction. It remains forus to look at X* (a,ftO). This requires a = j3 = 0. But Xi(0,0f,0f) is trueand therefore also Xf (0,0f,0f) because (0,Of,0T) is not removed from Xi bythe construction of X*. Thus Xt has the property 2) as well, so that X*eTr.Now let a,b,c be arbitrary. I assert that

S2(a',b,c')^S2(a,b,c).

Let us assume_ S2 (a^bjC*) & S2 (a,b,c). Then there exists an Xi eTr such thatX1(af,b,ct) & Xi (a,b,c). We build the corresponding X* as above. Then wehave

XfeTr and Xf(a f,b,c f),

whence

S2(a',b,c')

which is a contradiction.

Corollary. (x)(y)(z)(S2 (x',y,z') -S2 (x,y',z»)).

Proof. S2(aT,b,cf) ->S2(a,b,c) -»S2(a,bf,cf).

I will only mention that such a statement as (y)(N2(y) ~»(x)(Ez)Xi(x,y,z))is easily proved. I shall not make any use of that, but instead prove thefollowing theorems.

Theorem 59. (y)(N3(y) -(x)(z)(u)(S2(x,y,z) & S2(x,y,u) -(z = u)).

Proof. Let C2 be the class of all y such that (x)(z)(u)(S2(x,y,z) &S2(x,y,u) -*(z = u)). Clearly C2(0) is true, because S2(a,0,c) is only truefor a = c. Now let C2(b) be true. If, then, for certain a,c,d we have S2(a,bf,c)& S2(a,bf,d), then according to a remark above, c must be = c{ for a certainGI and d = d{ likewise, whence S2(a,b,c) & S2(a,b,d), whence, because of


C2(b), Ci = di, whence c = d. Thus C2(0) & (y)(Ca(y) -'CaCy')) is true,whence the theorem, because of the definition of N3 .

Theorem 60. (y)(N3 (y) -* (x)(Ez)S2 (x,y,z)).

Proof. Let C2 here be the class of all y such that (x)(Ez)S2(x,y,z). Ob-viously C2 (0) is true. Let us assume C2 (b) and let a be arbitrary. Then wehave S2(a,b,c) for a certain c, whence S2(a,bf,cf) whence C2(bf). Hence thetheorem.

The last two theorems may be combined in the single statement

(y)(N3(y)-(x)(Ez)S2(x,y,z)),

where E means "there exists one and only one". Of course this yields inparticular

(x)(y)(N3(x) & N3(y) ~*(Ez) S2(x,y,z) ,

but the question arises, whether the z here again is an element of N3 . Ishall now show that this is really the case.

Let C2 denote an arbitrary class of 2. order with the two properties1) C2(0) and 2) (x) (C2(x) -C2(x')).

Then for every such class C2 I construct another class C* thus:

C| (y)— (x) (C2(x) -*(Ez) (S2(x,y,z) & C2(z)).

Now I assert that C * has again the properties 1) and 2). The truth of C^ (0)is immediately seen, because we have S2(x,0,x) and C2(x) — >C2(x). Let usassume "C? (b). Then for an arbitrary a we have a unique c such thatS2(a,b,c) and C2(c). Hence S (a,b?,cT) & C (cf), and according to a theoremabove we cannot have S2(a,bf,d) unless d = cf. Thus C*(bT) follows from

Theorem 61. (x)(y)(N3(x) & N3(y) -(Ez) S2(x,y,z) & N3(z)).

Proof. According to the definition of C* we have for arbitrary C2 of thesupposed kind

(x)(y)(C2(x) & C? (y) -(Ez)(S2(x,y,z) & C2(z))).

Now N3 is £ C2 and C* . Therefore

(x)(y) (N3(x) & N3(y) -(Ez) (S2(x,y,z) & C2(z)).

Here C2 is an arbitrary chain of 2. order, that is, a class of 2. order withthe properties 1) and 2). Therefore we may just as well write

(x)(y)(N3(x) & N3(y) -(Ez) (S2(x,y,z) & (X2)(X2(0) & (u)(X2(u) ->X2(u')) ->X2(z)))

which, by taking into account the definition of N3 , is just our theorem. Inthis way we have succeeded in obtaining a ternary relation S2 (x,y,z) which inN3 will play the role of addition, as I shall show.

Theorem 62. (z) (N3(z)-> (x)(y)(u)(v)(w)(S2 (x,y,v) & S2(v,z,u) & S2(y,z,w)->S2(x,w,u)))

Proof. Let C2 (b) denote

(x)(y)(u)(v)(w)(S2(x,y,v) & S2(v,b,u) & S2(y,b,w) ->S2(x,w,u)).


Clearly C2 is a class of second order. We have that C2 (0) is true, becauseS2(v,0,u) & S2(y,0,w) -*(u = v) & (y = w). Let C2(b) be true and let us assumeS2(x,y,v) & S2(v,bf,u) & S2(y,bf,w). Then we have u = uj , w = wl for someui, wi and S2(v,b,ui) & S2(y,b,wi) which, together with S2(x,y,v), because ofC2(b), yields S2(x,wi,ui), whence S(x,w,u). Thus the implication C2(b) ->C2(bt)is generally valid. Then the theorem follows from the definition of N3 . Afortiori we have

(x)(y)(z)(u)(v)(w)(N3(x) & N3(y) & N3(z) & N3(u) & N3(v) & N3(w) -

(S2(x,y,v) & S2(v,z,u) & S2(y,z,w) -» S2(x,w,u)).

This is the associative law of addition.

Theorem 63. (x)(N3(x) -(y)(z)(S2(x,y,z) -S2(y,x,z))).

Proof. Let C2 (a) be an abbreviation for

(y)(z)(S2(a,y,z) ->S2(y,a,z)).

Then C2(0) is true because, according to a result above, S(0,y,z) ->(y = z)and S2(y,0,z)— — (y = z). Let us assume the truth of C2(a) and let S2(af,b,c)be true. Then by some results above we have c = cj for a certain GI andS2(afjb,cf) -» S2(a,bf,cf) so that because of C2(a), we also get S2(bf,a,c),whence S2(b,af,c). Therefore we have

(y)(z)(S2(a',y,z) -»S2(y,a',z)),

so that

C2(a)-> C2(a').

According to the definition of N3 , the theorem must be valid.A fortiori we have

(N3(x) & N3(y) & N3(z) -(S2(x,y,z) - S2(y,x,z))).

This is the commutative law of addition.

Thus the ternary relation S2(x,y,z) & N3(x) & N3(y) & N3(z) which we canwrite £3(x,y,z) or z = x + y is a relation of 3. order which has the ordinaryproperties of addition, in particular,

x + (y + z) = (x + y) + z, x + y = y + x.

Now let us define a relation "less than or equal to" of second order, namely,

M2 (x, y)*— (Ez) S2 (x,z,y).

Then inside N3

Theorem 64. M2(a,b) & M2(b,c) ~* M2(a,c).

Proof. The hypothesis of the implication amounts to

S2(a,d,b)&S2(b,e,c)

for some d and e. According to Theorem 59 there is an f such that S2(d,e,f).

Then theorem 62 furnishes S2(a,f,c), whence M2(a,c).

Theorem 65. (y)(N3(y) - (x)(M2(x,y) v M2(y,x))).


Proof. Let C2(b) be (x)(M2(x,b) v M2(b,x)). Then C2(0) is true, becauseM2(0,x) is obviously true. Let us assume C2(b). If M2(x,b') is true, we haveat once C2 (bf), and M2 (x,br) is true if M2 (x,b) is. Otherwise we have M2(b,x)that is (Ez) S2(b,z,x). If z =t= 0, we have z = zl and S2(b,z,x) -*S2(b

f, Zi, x),that is, M2(bf,x). K z = 0, we have x = b, whence M2(x,bf). Thus C2 is achain of 2. order, and hence (y)(N3(y) -*C2(y)), which is the theorem.

It follows that M2 will have the ordinary properties of the relation = inNa.

Now in order to develop elementary arithmetic we must introduce multi-plication. This can again be done by considering some ternary relations. Itmust be remarked, however, that these relations ought to be chosen as 1.order relations Yi (x,y,z). Otherwise we might have to make a transition tounnecessarily high orders of the number series. It would not be advantageousto take, for example, the relations Z2(x,y,z) which have the properties1) (x)Z2 (x,0,0) and 2) (x)(y)(z)(Z2(x,y,z) & S2(z,x,u) ->Z2(x,y',u). It is betterto introduce addition and multiplication simultaneously as follows. Let usconsider all quaternary relations Ui(x,y,z,u) such that Ui is true only foru = 0 or 1 and has the properties

)f 2) (x)Ui(x fO fO,l), 3) (x)(y)(z)(U1(x,y,z,0)-U(x,y%z',0)),

4) (x)(y)(z)(Ui(x,y,z,l) & Ui(z,x,u,0) -Ui(x,y',u,l)).

Then if S2(x,y,z) denotes the intersection of all Ui(x,y,z,0) and P2(x,y,z) theintersection of all Ui(x,y,z,l), one is able to show that in a suitable Nn allof the ordinary principles of addition and multiplication are provable, x + y = zmeaning S2(x,y,z) and xy = z meaning P2(x,y,z). However, I will not carryout all that here in detail, in particular for the reason that different proced-ures are possible.

One fact ought to be noticed: The relation S2 (x,y,z), which in N3 definedaddition, does that also in Nn for any n> 3, that is, every Nn is closed withregard to this addition. Let us, for example, consider N4 . If N4 (a) andN4(b), then N3(a) and N3(b) so that a unique c exists such that S2(a,b,c) &N3(c). But how can we conclude N4 (c) ? This can be seen thus: Let S3(x,y,z)be the intersection of all X2(x,y,z) with the properties 1) and 2). Then wecan prove in the same way as above that

(x)(y)(N4(x) & N4(y) -(Ez) S3(x,y,z) & N4(z)).

Furthermore let us write the z for which S3(x,y,z) & N4(z) as x + fy. Now itis obvious that S3(x,y,z) ->S2(x,y,z). Hence, for arbitrary a and b such thatN4(a) and N4(b), we get that

c = a + fb— >c = a + b ,

so that the result of the operation +f is the same as the result of +. In thesame way the other operations we may introduce, such as multiplication,exponentiation, etc., all will retain their meaning for the natural number se-quences of higher orders.

I must confine my remarks to these hints, which I nevertheless hope aresufficient to show that a purely logical development of arithmetic similar tothat given by Dedekind in his work "Was sind und was sollen die Zahlen" ispossible even in the ramified type theory.

LORENZEN'S OPERATIVE MATHEMATICS 61

If we turn to analysis it must be remarked that the classical form of itcannot be obtained. Indeed it will be necessary to distinguish between realnumbers of different orders. A class of real numbers of 1. order which isbounded above possesses an upper bound, but this bound may then be a realnumber of order 2. Nevertheless a great part of analysis can be developedas usual, namely, the most useful part of it dealing with continuous functions,closed point-sets, etc. The reason for this is that it is often possible toprove theorems of reducibility, namely, theorems saying that a class (orrelation) of a certain order coincides with one of lower order. I will notenter into this but only refer the reader to the book: "Das Kontinuum" byH. Weyl, where he has developed such a kind of predicative analysis.

15. Lorenzen's operative mathematics

In more recent years the German mathematician P. Lorenzen has setforth a system of mathematics which in some respects resembles the ramifiedtheory of types, but it has also one important feature in common with thesimple theory of types, namely, that the simple infinite sequence and similarnotions are characterized by an induction principle which is assumed validwithin all layers of objects. Lorenzen talks namely about layers of objects,not of types or orders. To begin with he takes into account some originalobjects, say numerals, figures built up in a so-called calculus as follows. Wehave the rules of production

which means that the object or symbol 1 is originally given and whenever wehave a symbol or a string of symbols k we may build the string k 1 obtainedby placing 1 after k. He introduces the notion "system". A system is afinite set of symbols. The systems are obtained by the rules

x

X~»X, x

The length or cardinal number of a system X is denoted by |x|. Hegives the rules

| X , X | = | X | 1

for these lengths. Now the explanation of the successive layers of languageis as follows.

From certain originally given symbols called atoms, say Ui un, heconstructs strings of symbols by the schema

X —»XUi

x -'xun


Further, he introduces logical symbols, first A , V, — *, 1 denoting conjunc-tion, disjunction, implication and negation respectively, then Ax, Ay, ... whichare universal quantifiers, Vx, Vy, ... which are existential quantifiers, ep, eawhich express membership, namely, that something belongs to a class, rela-tion, etc., an operational symbol having the same meaning as Russell's x,and finally 4p, J£a, .. which are called operators of induction. These lastones have the following significance:

Let AI , A2 , .... be prepositional expressions built up from propositionsYI , Y2 , .... Ynep while we have the schema of production

AI~>XI,I , Xi,2 , ....,

then 4 p written before this schema denotes the relation that is the set of allm - tuples which can be constructed by the schema.

The symbolic figures obtained in this way constitute what Lorenzen callsthe first layer of language and denotes by Si . Whenever Sn, the n*h layer oflanguage, has been constructed, he defines the (n+1)**1 layer Sn+1 as consist-ing of all figures belonging to Sn together with all further ones which can bederived from them by the same means we used in deriving the first layerfrom the atoms Ui ...., un.

By this procedure it is necessary to distinguish between variables indifferent layers, for example, by writing the number of the layer as a sub-script just as I used an order subscript above in the ramified type theory.The construction of layers can, however, be continued transfinitely. Indeed,after having performed the construction of the layers Sn with finite n,Lorenzen defines S(j, co the least transfinite ordinal, as the union of allSn, n < w. Now it becomes possible to introduce Sum > 8^+2 ,••• and theirunion Scu • 2 , and so on. He can introduce all S a , where a is any con-structed transfinite ordinal.

One sees the resemblance between this theory and the ramified theory oftypes. In both theories an expression containing a bound variable extendedover a previously obtained range is considered as belonging to a new rangeof symbols. The presence of the symbols Up, lov- means that there is not,as in the previously treated systems, any attempt to reduce the inductive orrecursive definitions to the explicit ones, an attempt which caused so muchtrouble above, in particular in the case of the predicative set theory. In ac-cordance with this attitude in Lorenzen' s system, the principle of completeinduction remains unchanged by transition to higher layers.

It is obvious that the objects of a certain layer SQ can be enumerated.In his book "Operative Logik and Mathematik" he shows this in detail for Siand it is easy to see how that can be carried out for an arbitrary layer SQ.The formula giving this enumeration does, however, not belong to SQ but toSQ+I . Sometimes, of course, a set belonging to SQ may have an enumerationbelonging to SQ. We may then say that the set is denumerable in SQ. Other-wise the set is nonde numerable in SQ. All this shows that the notion "de-numerable" must be conceived in a relative sense. This result we also ob-tained by application of the Lowenheim theorem to the axiomatic set theoryin so far as denumerable models must exist for any consistent set theory.But in Lorenzen' s theory this relativism is obtained immediately.

LORENZEN'S OPERATIVE MATHEMATICS 63

In connection with this we may notice that the problem concerning theprinciple of choice disappears. Indeed, the enumeration in S0 + 1 of the ob-jects constituting the layer SQ makes possible at once the simultaneouschoice of one element from every set in SQ. On the other hand it is notcertain that we can find a formula in SQ furnishing such a choice for a set ofsets in SQ. Thus we have again a relativity with regard to the existence ofchoice functions.

Now let us consider real numbers —defined, for example, as initial partsof the ordered set R of rational numbers—and sets of reals all belonging tothe layer S & , where Q is a limit number. Then it is possible to provefor each set M of real numbers, M as well as the elements of M belonging •to S e , that if M is bounded below, it possesses a lower bound y also inS Q . Indeed y is the intersection of all elements of M considered as initialparts of R. Since MeS e , we have Me S Q, 9 some ordinal < @ . In thedefinition of y all occurring variables belong to SQ but there is a universalquantifier extended over SQ. Thus y is a real number occurring in S0 + i.However, since @ is a limit number we have 9 + 1 < Q . Therefore thelower bound y always again belongs to S e . More special theorems, suchas the existence of a convergent subsequence of a bounded sequence of reals,and that every convergent sequence (in the sense of Cauchy) has a real num-ber as limit, are easily proved.

The theory of neighborhoods and coverings is more difficult. In orderto be able to develop the usual covering theorems, Lorenzen finds it necessaryto take into account sets of real numbers belonging to essentially higher lay-ers than the real numbers themselves. He choses two limit numbers, @ i <0 2. The considered real numbers shall all belong to S e 1, whereas sets

of, and relations between, these reals are allowed to belong to S e . The

classes and relations which already belong to S 9 i are called primary,

those which belong to S e 2 but not S e 1 are called secondary. It may be

noticed that by taking into account also the secondary sets we are enabled tosay that all the reals in an interval constitute a set, namely, a secondary one.Indeed it is clear that all these numbers belonging to S 0 , constitute a set

tr 1

that occurs in S e 1+1. Similar remarks can be made for neighborhoods.

Lorenzen now succeeds in proving the Heine-Borel theorem, which here hasthe wording: To every primary covering, that is a primary set of neighbor-hoods, one can find a finite covering, that is, a finite set of such neighborhoods.

A further important notion is that of a quasi-primary function: Thaty = f(xi,...,xn) is quasi-primary means that, whenever Xi,..., xn are primaryreal numbers, that is, they belong to S 0 1, y is a primary real. Of course

every primary function is quasi-primary, but the inverse is not always true.Thus, for example, x + y is quasi-primary but not primary. Indeed the setof all triples (x,y,z) such that x + y = z, where x,y,z run through S e L,

does not belong to S 0 x, but to S @ x + 1.

For the quasi-primary functions Lorenzen proves theorems analogous tothe theorems in ordinary analysis concerning functions of real numbers.Thus he proves that a continuous quasi-primary function on a closed interval


is uniformly continuous. Further he proves that the values of such a functionon a closed interval are bounded and that the upper and lower bounds areattained. He also proves that such a function takes every value between two ofits values. If a quasi-primary function has a derivative for every (primary)real number, then this derivative is again a quasi-primary function.

He also develops a theory of integration, defining first the Riemann inte-gral, later also Lebesque's. It might seem that a measure theory must beimpossible in this system, because by ordinary concepts the measure shouldbe = 0 for denumerable sets, and here all sets are denumerable in a suffic-iently high layer. However, the distinction between primary and secondarysets makes a definition of measure possible in such a way that the primarysets all get the measure 0, but not the secondary sets.

This system has one great advantage in distinction to the previous ones,namely, that the objects we are dealing with are all definitely and explicitlygiven. It is true of course that the unsolvability or even undecidability ofmany problems remains as before, but we know what we are talking about.In the previous theories it was at any rate not required that our considerationsshould be restricted to the definable or constructible objects.

16. Some remarks on intuitionist mathematics

Of great interest is the so-called intuitionism which above all is due tothe Dutch mathematician L. E. J. Brouwer. This theory is essentiallycharacterized by the requirement that an assertion of the existence of amathematical object must contain a means of finding or constructing such anobject. Further, the use of such a formal logical principle as "tertium nondatur" is only justified, if we have a decision procedure. The intuitionistcritique of classical mathematics is similar to the critique of Kronecker whoalso declared that a great part of ordinary mathematics was only words. Itwould lead too far, however, if I should give in these lectures a detailed ex-position of the intuitionist foundation of mathematics. I must confine my ex-position here to a few remarks which I hope will give an idea of the intuition-ist way of reasoning.

The conjunction p & q retains its usual meaning also in intuitionist logic.The disjunction p v q can be asserted if and only if either p can be assertedor q can. The negation ~| p shall mean that the assumption p leads to a con-tradiction. The implication p —»q means that we are in possession of a cer-tain construction which will furnish a proof of q as soon as a proof of p isavailable. The assertion (x)p (x) is justified if we possess a schema showingthe property p(x) for an arbitrary x, and (E(x)p(x) can be asserted if weknow an x with the property p or at least have a method for constructingsuch an x.

Since we have no general method to prove either p or "| p, the tertiumnon datur, p v~l p, is not generally valid. It can be proved that p —*"|~lP *s

generally true, but not the inverse implication. Such differences in the pro-positional logic cause differences in predicate logic of course. As an interest-

REMARKS ON INTUITIONIST MATHEMATICS 65

ing example of the difference in the classical and the intuitionist way of stat-ing a theorem, I will take an example mentioned in the book "Intuitionism"of Hey ting.

Let us define a real number p by writing an infinite decimal fraction asfollows. As long as no sequence of digits 0,1,2,3,4,5,6,7,8,9 has occured inthe development of IT = 3.14... as a decimal fraction, there shall only be digits3 in the development of p, however, if it should happen that the digits in theplaces n - 9, , n should be just 0,1,...,9, then all digits after the nth shallbe 0 in the development of p. Then it is easy to prove that

This can, in classical mathematics, be expressed thus:

P= Q 3.10*

However, this is not correct intuitionistically, because the last statement1 10n-lwould mean that we are able to prove either that p = ̂ or that p = „ 1Qn for

a certain n. But in order to do that we would have to decide whether a se-quence 0,1,... ,9 occurs in the development of TT or not. This we are unableto do at (the) present. This is an example of the circumstance that the twostatements

(Ex)p(x), 1(x)-|p(x) ,

which are equivalent in classical logic, are not generally equivalent in intui-tionist logic.

Let a real number generator (abbreviated an rng) be any sequence ofrational numbers an such that for every positive integer k we can find an-other positive integer n such that

for all p. We put

a = b

when for every k we can find n such that

' an+p ~ ^n+p ' ^ ijj:

for all p. Further, a =t= b may mean

l(a = b),

that is, the assumption a = b leads to a contradiction. On the other handa tt b shall mean that we know a k and an n such that, for all p,

'an+p " bn+p> > £ '

while a < b shall mean that we know a k and an n such that

bn+p

for all p.


It is then possible to prove a lot of theorems about these rng. A realnumber is the set of all rng which are = a certain rng. The intuitionist notionset will soon be explained below. I shall mention a few of the most importanttheorems about the rng. One proves that a = b is equivalent ~|~l(a = b), or, inother words, if the assumption a =(= b leads to a contradiction, then a = b.Further, if a =14= b, then for every c we have a 4 c. v b 4= c. It is clear thata $ b -*a ± b. Further, a =0= b is equivalent to a < b . v . b < a. Instead ofl(a < b) one writes a <f b. Then we have that a < t b & b > c - * a > c .

Addition, subtraction, multiplication of the rag's a and b is defined bytaking the rng with the general term

an + t>n» an - bn, anbn ,

whereas the quotient g is defined as a - 7- under the assumption b 4= 0, where

r- is the rng c whose general term is cn = |— whenever bn 4= 0 and cn = 0, if

bn = 0. It is then trivial to prove the associative, commutative and distributivelaws. It may be noticed that a + b ^ O - ^ a ^ O - v - b ^ O . For a more thoroughstudy of this subject I recommend Heyting's book.

As an introduction to the intuitionist set theory it is convenient to definethe notion ips, that is, infinitely proceeding sequence of natural numbers. Weare dealing with an ips, if we first choose a natural number ai and, for everyn, as soon as B.I ,..., an have been chosen, we choose an+i. What determinesthese choices, whether they obey a certain law or are made at random ormore or less arbitrarily, is irrelevant. We are justified in saying that an ipsis something that becomes, not that is. If we let a mathematical entity corre-spond to every finite initial sequence ax,..., an of an ips, we obtain an infinitelyproceeding sequence of such entities.

A set can be built in two ways: 1) There may be a common way of gene-rating its elements, 2) one considers all elements having a common property.The sets which are obtained in the first manner are called spreads. Thesets obtained according to the second point of view are called species.

The definition of a spread is as follows: One has two rules, a spreadrule and a complementary rule. The spread rule A determines a process forthe generation of ips in the following way. 1) A determines for every naturalnumber k, whether it is allowed to be the first element of an ips or not.2) Every allowed sequence ai,..., an+1 shall be generated from an earlierallowed sequence ai,..., an. 3) Whenever an allowed sequence ai,...., an isgiven, the rule A determines, for any natural number k, whether ai,..., an,k is an allowed sequence or not. 4) To every admitted sequence a!,..., anat least one natural number k can be found such that ai,..., an, k is an ad-mitted sequence.

The complementary rule T determines for every allowed sequenceai,..., an a corresponding mathematical object bn.

Some elucidating examples, taken from Hey ting's book, may be suitablymentioned here.

1) Let ri, r2,.... be an enumeration of all rational numbers. We build aspread M by letting the rule A M be this: Every natural number is ad-mitted as ai. Whenever ai,..., an is an admitted sequence, a!,..., an, an+1shall be admitted if and only if

REMARKS ON INTUITIONIST MATHEMATICS 67

The rule F^j shall be: To every admitted sequence ai,..., an we let corre-spond the rational number ran.

It is easy to see that the elements of M are rng, and indeed, if c is anarbitrary rng, we can find an element m of M such that m = c. Thus M issimply the continuum consisting of all rng.

2) If the rule A -^ in example 1 is restricted by adding the requirement0 < ra ^ 1 f°r every n, then M is the spread consisting of all rng x suchthat n

3) If the rule A^ in example 2 is further restricted by the requirement thatfor each n > 1 we shall have

1 r2" r«

then M will consist of all rng y such that 0 < y < 1.

It is evident that by changing the rules AM and F^ one can obtain themost varied spreads of rng.

A simple example of a species is the notion real number. A real numberis the species whose elements are all rng equal to a given one. A generalremark is that the definition of an element of a species must always precedethe definition of the species in order to avoid circular definitions.

Also in the intuitionist theory we have the operations of union and inter-section of two species. If e as usual means the membership relation wehave the definitions

SET stands for (x) (xeS ->xeT)

S = T means (S ET) & (T ES).

Further we have for arbitrary x the equivalences

(xeS n T)—*(xeS) & (xeT), (xeS U T)—-(xeS) v (xeT).

Letting 4 mean the negation of e in the intuitionist sence, we have the follow-ing definition of the difference species S - T:

(xeS - T)—(xeS) & (x^T).

It must then be noticed that we don't always have S = T U (S - T). That isonly the case if T E S and we are able, for every xeS, to prove either xeTor x4 T. A subspecies T of S is called detachable when we possess such adecision method to decide for any xeS whether it is eT or not.

A characteristic notion is "S is congruent to T". That means

l(Ex)(xeS & x4 T • v • x<|s & xeT),

which can also be written

~l(Ex)(xeS & xdT) • &• "](Ex)(xis & xeT).

As an example of the use of this notion I shall mention the theorem:


Let T £ S and S1 = T U (S - T). Then S and Sf are congruent.

Proof. First we have Sf £ S because T £S and S - T £ S. Hence~|(Ex)(xeS & x^S f). Therefore it remains only to prove that ~"|(Ex)(x4s &xeST). But this is equivalent to ~~l(Ex)(x<|s & (xeT • v - xeS & x<tT)) whichagain is equivalent to ~l(Ex)((x4s& xeT) v (x^S & xeS & xeT)) which isequivalent to (Ex)(x^S & xeT) which follows from (x)(xeT -*xeS).

Simple examples of detachable subspecies of the natural number sequenceare given by the even or the odd numbers. The linear continuum can beshown to have no other detachable subspecies than itself and the null species.

A species is said to be finite if there is a 1-to-l correspondence betweenit and an initial part 1,..., n of the natural number series. It is called de-numerable if there is such a correspondence between the species and thewhole number series. A species is called numerable if it can be mappedonto a detachable subspecies of the sequence of natural numbers.

An important notion is "finitary spread" or, more briefly, "fan". Afan is a spread with such a spread law that there are only finitely many al-lowed first terms, and for every n every admitted sequence with n termshas only a finite number of sequences with n + 1 terms as admitted continua-tions. Above all the so-called fan theorem is important here. It says that if0(a) is an integral-valued function of a, a varying through the different ele-ments of the fan, then the value of 0 is already determined by a finite initialsequence of a. Therefore, if 0((Ji) = m, there exists an n such that 0(a2) = mas often as a2 has the same first n terms as ai. An important applicationof the fan theorem is the proof of the statement that every function which iscontinuous on a bounded and closed point species is uniformly continuous onthe point species. Further, such covering theorems as that of Heine-Borelcan be proved. However, not all of the theorems of classical analysis can beproved in intuitionist mathematics.

I must confine my exposition of intuitionism to these scattered remarksA more thorough exposition would require a more complete treatment of in-tuitionist logic, and that would take more space than I have at my disposalhere.

17. Mathematics without quantifiers

In all the theories we have treated above we have made use of the logicalquantifiers, the universal one and the existential one. We have used themwithout scruples even in the case of an infinite number of objects. There isnow a way of developing mathematics, in particular arithmetic, without theuse of these operations which, in the case of an infinite number of objects,may be considered as an extension or extrapolation of conjunction and dis-junction in the finite case. If we shall really consider the infinite as some-thing becoming, something not finished or finishable, one might argue thatwe ought to avoid the quantifiers extended over an infinite range. Such atheory is possible. I myself published in 1923 a first beginning of such astrict finitist mathematics. I treated arithmetic, showing that by the use of

MATHEMATICS WITHOUT QUANTIFIERS 69

free variables for general statements, basing the theory on the principles ofdefinition by recursion and proof by complete induction, ordinary arithmeticcould be developed in a very natural way. Later this theory, called RecursiveArithmetic, has been more perfectly formalized, first in Hilbert Bernays,"Grundlagen der Mathematik", Vol. 1, 1934, § 7, later also by H. B. Curry(Amer. J. Math. Vol. 63, 1941, pp. 263-282). But the most complete expositionof this kind of mathematics has been given by R. L. Goodstein. He has ex-tended the use of these purely finitist methods also to analysis. However,since this kind of mathematics rather avoids set theory in its proper sensethan replaces it by a new form of it, I find no reason to pursue this subjectfurther in these lectures on set theory.

18. The possibility of set theory based

on many-valued logic

It is well known that it is possible to set forth logical calculi, both pro-positional calculi and predicate calculi as well, where the statements canhave more than the two truth values in classical logic. It is then natural toask if it should not perhaps be easier to obtain a consistent set theory bytaking into account many-valued logics. One might think that it could thenperhaps be possible to avoid the distinction of type (and order), even if wemaintained a general axiom of comprehension allowing the greater number oftruth values. I myself have investigated the possibility of using truth functionsof the kind proposed by -Lukasiewicz. My results are published in a paper"Bemerkungen zum Komprehensions axiom". (Zeitschr. f. math. Logik undGrundlagen d. Math., Bd. 3, S. 1 - 17 (1957).) The basic logic is as follows:The truth values are numbers between 0 and 1. The values of p & q, p v q,IP are respectively the min (value of p, value of q), max (value of p, valueof q), 1- value of p. Further the value of (x)p(x) is the minimum of the valuesof p(x) for the diverse x. The value of (Ex)p(x) is the maximum of the valuesof p(x). In the case of finitely many truth values they are the diverse multiplesof the least one 4= 0. Some of my results are: If we shall have an unrestrictedaxiom of comprehension, a consistent theory is impossible if the number oftruth values is finite. On the other hand, it seems to be possible to obtain aconsistent set theory with an unrestricted axiom of comprehension if all ra-tional numbers =0 and = 1 are allowed as truth values. I was able to provethat a rudimentary set theory, where the axiom of comprehension

(Ey) (x) (xey^0(x))

is only used in the case that 0(x) is built up from the atomic membershippropositions by use of the logical connectives, &, v, ~| , alone, is consistent.It ought to be noticed, however, that in any set theory where we use quantifiersextended over the whole domain, the set introduced by the axiom of compre-hension are defined relative to the total domain, so that the whole theory inthat respect is circular. If we want to avoid circularity, we must accept adistinction of the objects we are dealing with into types, orders or layers, or


whatever we prefer to call these subdivisions of our domain. In any case,research concerning set theories based on many-valued logic must be con-tinued before we can say whether it is really promising or not.

Abstract Set Theory [Skolem]

Documents

set theory

classical

abstract set

axiomatic

mutually disjoint

general set

notion ordered

axiomatic