Discrete Mathematics I (CS127) Lecture Notestiskin/teach/dm1/notes.pdfDiscrete Mathematics I (CS127) Lecture Notes Alexander Tiskin University of Warwick Autumn Term 2004/05 This course

Discrete Mathematics I (CS127)

Lecture Notes

Alexander Tiskin

University of WarwickAutumn Term 2004/05

This course introduces some of the fundamental mathematical ideas thatare used in the design and analysis of computer systems and software. Thecourse makes you familiar with basic concepts and notation, helps you todevelop a good understanding of mathematical proofs, and enables you toapply mathematics to solving computer science problems.

Problem sheets and seminars

The course is accompanied by a series of problem sheets relating to topicscovered in the lectures. To develop proper understanding it is essential thatyou try to solve these problems during your own private study. The seminarsprovide an opportunity to get help with difficulties experienced in tacklingthe problem sheets, or with understanding the material from lectures. Pleasesign up for a weekly seminar at a time which suits you, and do attend it.Your performance at seminars will not be assessed, so nothing can preventyou from showing your solutions, whatever your confidence level in themmight be. Confidence tends to grow with practice, and so does your exampotential.

Lecture notes and books

The lecture notes are self-contained, but you may find it helpful also toconsult some books. The library contains several which cover all or partof the course syllabus, and exploration of the catalogue and shelves is rec-ommended. The three books listed below are all suitable. They are in thelibrary and should be available in the University bookshop. They cover thematerial in different ways and in different style. It is suggested that youlook at them all, to find the one you find most accessible.

• K. A. Ross and C. R. B. Wright, Discrete Mathematics (5th ed.),Prentice Hall, 2003.

2 Discrete Mathematics I (CS127)

• K. H. Rosen, Discrete Mathematics and Its Applications (5th ed.),McGraw-Hill, 2003.

• J. K. Truss, Discrete Mathematics for Computer Scientists (2nd ed.),Addison-Wesley, 1999.

Another book well worth considering is

• E. Bloch, Proofs and Fundamentals: a First Course in Abstract Math-ematics, Birkhauser, 2002.

It is less suitable as general reference for the course material, but insteadconcentrates on what is arguably its most important aspect: the concept ofa proof. It is very clearly written, and in many respects complements thebooks on the course’s main reading list.

Electronic resources

As the course progresses, the material will be available on the course website:http://www.dcs.warwick.ac.uk/~tiskin/teach/dm1.html . The Rosenbook has a website of its own: http://www.mhhe.com/rosen .

A forum (discussion group) on Warwick Forums has been set up to ex-change messages relevant to the course. In the past, it proved to be a use-ful tool for communication within the CS127 student population, and alsobetween students and tutors. The forum is available at http://forums.

warwick.ac.uk . The University IT Services should be able to help in caseof any problems with accessing this forum. As with all discussion groups,its abuse will not be tolerated.

Assessment

One of the main challenges of the course is the lack of continuous courseworkassessment. This means that you have to work hard, without being forcedto. The course is assessed by a two-hour examination in week 1 of SummerTerm. Results of this and other exams will be announced at the end of theacademic year.

A new element of the course introduced last year is the class test, whichwill be held in week 7 of Autumn Term. The test will consist of a one-hourpaper with 20 “true or false” questions, to be answered on specially preparedsheets, which then will be scanned and marked automatically. The resultingmark will not contribute to your official course assessment, and the classtest itself is not mandatory. However, it is strongly recommended to takethe test, in order to get feedback on your progress and to prepare yourselffor the Summer Term exam.

Discrete Mathematics I (CS127) 3

1 A Brief Tour of the Discrete Mathematics Zoo

Mathematics studies concepts that are abstract, idealised images of the realworld. An example of such a concept is natural numbers:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, . . .

We all learn it in early childhood — yet nobody has ever seen “three”, asopposed to “three oranges” or “figure 3 in black ink in the top-right cornerof this page”.

A philosopher would say here: “well, our concept of ‘three’ captures the‘threeness’ of all three-element sets that we have seen before or may see infuture: three apples, three penguins, or two sheep with a sheepdog in thefield”. Number 0 can be accommodated by this view as well: it representsan empty set, a set that contains nothing.

While the philosopher’s answer makes a lot of sense, it is also true thatin mathematics, concepts depart from immediate reality, and start to live alife of their own. Consider, for example, the notion of a set, that our friendthe philosopher has used to define natural numbers. We can have a set ofapples or penguins, so why not think about sets of numbers? Say, the set ofthis week’s National Lottery winning numbers: {14, 20, 25, 32, 47, 49}. (notethe use of curly brackets to denote a set). We could then think of somemore interesting (in my opinion) examples, such as the set N of all naturalnumbers:

N = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, . . . },the set of all integers (natural numbers and their negatives):

Z = {. . . ,−6,−5,−4,−3,−2,−1, 0, 1, 2, 3, 4, 5, 6, . . . },

or the set of all even integers:

{. . . ,−10,−8,−6,−4,−2, 0, 2, 4, 6, 8, 10, . . . }.

For a mathematician, the last three sets are just as legitimate as a setof three apples. However, there is a crucial difference: the new sets areinfinite. Infinite sets do not occur in reality, even the number of atoms inthe Universe is finite. Yet, we have just imagined a few infinite sets. Evenif we cannot write down the elements of these sets without resorting to the“. . . ” notation, we can capture these sets in our mind, and treat them aswe would treat any real-world set.

Of course, to make our theory of sets useful, we will have to answer someimportant questions:

• do infinite sets have a size? (yes they do, but of course these “sizes”are beyond natural numbers);


• can two infinite sets have different size? (yes, their “sizes” can vary sogreatly it is hard to imagine even for a mathematician);

• can one form a set of all sets? (no, this is asking too much — onecannot even form a set of all possible “set sizes”).

This sort of question cannot be answered from any empirical evidence:infinite sets simply do not exist in reality. At this point, we are confrontedwith a major distinction between mathematics and natural sciences: insteadof experiments, mathematics relies on proofs. The answers to the abovequestions given in brackets can be formally and unambiguously proved tobe correct. Experimentation also plays a role in mathematics, but rathera supporting one: it helps our intuition to understand the concepts andfind the right idea for a proof. For example, to answer the first questionabove, we could think of various infinite sets that can be imagined, and askourselves if they are likely to have a sensible notion of “size”. Then we wouldformulate this notion precisely, and prove that it satisfies all the propertiesthat we associate with size — for example, by adding new elements to a setwe cannot decrease its size. The same approach can be applied to the othertwo questions. For the final question, this approach has an additional twist:we want to prove that a certain object (a set of all sets) does not exist. Inorder to tackle this, we imagine that it does exist, and try to consider allconsequences of its existence. Somewhere in our reasoning we come to acontradiction (it turn out that the set of all sets cannot be assigned anysensible size). The contradiction proves that the object we imagined (theset of all sets) cannot exist without violating the basic laws of logic.

To be able to write down our proofs, we need a language that is bothprecise (does not allow any ambiguity) and concise (allows to express com-plicated ideas relatively briefly). We should indicate exactly the conceptsthat we consider basic, i.e. that require no definition. For example, a set anda natural number are basic concepts. All concepts that are not basic mustbe given a formal definition. For example, we will have to define “finite set”and “even number”. We should also indicate exactly the statements that weconsider to be axioms, i.e. that require no proof. For example, “two sets arethe same if they consist of the same elements” is an axiom. All statementsthat we hold to be true, but that are not axioms, are called theorems; theymust be given formal proofs. For example, we will have to prove the answersthat we gave to the above list of questions on infinite sets.

This approach to mathematics is called the axiomatic method. It requiresa special language and a set of proof rules known as logic. Logic is a part ofmathematics both as a tool and an object of study; we will see some detailsof it in the beginning of our Discrete Mathematics course. Concepts andlaws of logic allow us to formalise ways of reasoning that we learn togetherwith our mother tongue:


All eagles can fly;Some pigs cannot fly.

Therefore, some pigs are not eagles.

The conclusion seems obvious, but in mathematics we must know a precisereason why it follows from the two given conditions. Firstly, we must defineexactly the class of all things that these statements speak about: supposethis is the class of “all living creatures”. The first condition can be refor-mulated as follows: “If a creature is an eagle, it can fly”. The laws of logictell us that this is equivalent to saying: “If a creature cannot fly, it is notan eagle”. The second condition says that there is a creature, which is a pigand cannot fly. Taken together with the previous statement, this leads tothe conclusion being proved: there is a creature, which is a pig and cannotfly, and therefore is not an eagle. Note that we can only prove what logi-cally follows from the given conditions. For example, we do not have enoughinformation to conclude that all pigs are not eagles.

Armed with logic, we will take a closer look at sets, and will introducetwo concepts that are central to all mathematics: relation between elementsof two sets, and function from one set to another. We will study differenttypes of relations and functions, and eventually will consider graphs — anespecially powerful concept in dealing with complicated sets, in particularthe ones occurring in computer science.

In summary, the basic ingredients of our course are sets and naturalnumbers, glued together by logic. We will use these ingredients to buildmore complicated structures, and will apply the axiomatic method to studytheir properties. A lot of emphasis will be put on being able to prove facts,rather than just memorise them. This ability, of course, comes only withpractice — hence the weekly problem sheets and seminars to discuss yoursolutions. Please do attempt to solve the problems, and be active at theseminars: our subject is discrete, rather than discreet, mathematics.



2 Logic

2.1 Statements and operators

We use all sorts of sentences in everyday speech. Our language has specialways in which we can communicate information, ask a question, give a com-mand, express our thoughts, feelings or emotions. In mathematics, however,we restrict ourselves to only one type of sentences: statements, which mustbe either true or false. Here are some examples of statements:

• Five is less than ten.• Pigs can fly.• There is life on Mars.

Note that we know the last statement must be true of false, despite the factthat we cannot decide between true and false from our present knowledge.

Here are some examples of sentences that are not statements:

• Welcome to Tweedy’s farm!• What’s in the pies?• It’s not as bad as it seems. . .

The last sentence will become a statement if we substitute the name of aparticular object for the pronoun “it”. Of course, we must also give a clear,unambiguous definition of “bad”, “seems”, etc.

Thus, every statement has a value taken from the set B = {F, T}. Thetwo elements of this set are called Boolean values. There are special oper-ations, called Boolean operators, that one can perform on Boolean values(rather like addition and multiplication on natural numbers):

• negation (logical NOT), denoted ¬;• conjunction (logical AND), denoted ∧;• disjunction (logical OR), denoted ∨;• implication (IF . . . THEN . . . ), denoted ⇒;• equivalence (. . . IF AND ONLY IF . . . ), denoted ⇔.

The negation (NOT) operator ¬ simply reverts the value of a statementto the opposite value. We can define the action of operator ¬ applied to astatement A by the following truth table:

A ¬A

F TT F

The conjunction (AND) operator ∧ applies to two separate statements.The conjunction of A and B is true when both A and B are true; the con-junction is false when either A, or B (or both) are false. Thus, operator ∧


can be defined by the following truth table:

A B A ∧B

F F FF T FT F FT T T

The disjunction (OR) operator ∨ also applies to two separate statements,and is complementary to conjunction. The disjunction of A and B is truewhen either A or B (or both) are true; the disjunction is false when both Aand B are false. Here is the truth table for ∨:

A B A ∨B

F F FF T TT F TT T T

The two statements connected by conjunction or disjunction do not needto be related in any way. Thus,

(5 < 10) ∧ (Pigs can fly) means T ∧ F means F

(5 < 10) ∨ (Pigs can fly) means T ∨ F means T

The same applies to statements connected by implication (IF . . . THEN. . . ). In ordinary life, we usually think of implication as a cause-effect re-lationship: if the bird is happy, then it sings loud. This relationship isone-way: if the bird sings, it does not necessarily mean that it is happy— perhaps there are other reasons for a bird to sing. And, if the bird isnot happy, we cannot conclude whether it should sing or not, so we mustaccept both possibilities. The same reasoning applies in logic, but here thestatements connected by implication do not have to be related. For any twostatements A, B, the value of the implication is determined by the truthtable:

A B A ⇒ B

F F TF T TT F FT T T

Thus, a false statement implies anything, no matter true or false, but a truestatement can only imply another true statement.

The equivalence operator (. . . IF AND ONLY IF . . . ) can be thought ofas the two-way version of implication: A is equivalent to B, when A implies


B, and B implies A. In other words, the values of A and B must agree:either both true, or both false. Here is the truth table:

A B A ⇔ B

F F TF T FT F FT T T

Implication and equivalence play a special role in mathematics. Manymathematical theorems have the form

if A then B

or

A implies B

sometimes disguised as

A is sufficient for B

or

B is necessary for A

The meaning of all these sentences is the same: A ⇒ B. A standard way ofproving such theorems is by a chain of implications:

A ⇒ P1 ⇒ P2 ⇒ . . . ⇒ Pn ⇒ B

where P1, P2, . . . , Pn are some statements, chosen so that every implicationin the chain can be proved in one step.

Another common form of theorems is

A if and only if B

often disguised as

A and B are equivalent

or

A is necessary and sufficient for B

or

B is necessary and sufficient for A

The meaning of all these is A ⇔ B. A standard way of proving such theoremsis by a chain of equivalences:

A ⇔ P1 ⇔ P2 ⇔ . . . ⇔ Pn ⇔ B

where P1, P2, . . . , Pn are some statements, chosen so that every equivalencein the chain can be proved in one step.


2.2 Laws of logic

The truth tables completely define Boolean operators, so, in principle, thetruth value of any compound statement, however complicated, can be foundby a series of truth table lookups. In practice, we often want an easierand more intuitive method of dealing with compound statements. One suchmethod consists in applying certain properties of Boolean operators, knownas the laws of logic. From the formal point of view, these laws do not addanything new to the operator definitions: each of the laws follows directlyfrom the truth tables. However, the laws offer an alternative, complementaryapproach to logic, and are widely applicable. Many of these laws are similarto the properties of arithmetic operators + and ·.

In the following formulas, letters A, B, C stand for arbitrary statements.The statements of the laws are always true, irrespective of the truth valuesof A, B, C.

The first group of laws involve only one operator and at most two ele-mentary statements each.

¬¬A ⇐⇒ A double negation law

A ∧A ⇐⇒ A A ∨A ⇐⇒ A idempotence of ∧, ∨A ∧B ⇐⇒ B ∧A A ∨B ⇐⇒ B ∨A commutativity of ∧, ∨

The double negation law is similar to −(−a) = a, and the commutativitylaws correspond to a · b = b · a, and a + b = b + a. The idempotence lawshave no direct counterparts in arithmetic.

The second group of laws involve more than one operator, and/or morethan two elementary statements each.

(A ∧B) ∧ C ⇐⇒ A ∧ (B ∧ C) associativity of ∧(A ∨B) ∨ C ⇐⇒ A ∨ (B ∨ C) associativity of ∨A ∧ (B ∨ C) ⇐⇒ (A ∧B) ∨ (A ∧ C) distributivity of ∧ over ∨A ∨ (B ∧ C) ⇐⇒ (A ∨B) ∧ (A ∨ C) distributivity of ∨ over ∧

The associativity laws correspond to the arithmetic laws (a · b) · c = a · (b · c)and (a + b) + c = a + (b + c). These laws allow us to write A ∧ B ∧ C andA ∨ B ∨ C without any brackets, just as we write a · b · c and a + b + c.The first distributivity law corresponds to a · (b + c) = a · b + a · c. Thesecond distributive law has no counterpart in arithmetic, since, in general,a + b · c 6= (a + b) · (a + c).

Note that in all laws so far, we can replace all symbols ∧ by ∨, and,simultaneously, all symbols ∨ by ∧. The resulting statement will still betrue for any A, B, C. This is a general rule that applies to all laws weintroduce in this section.


The following pair of laws, called De Morgan’s laws, describes the closerelationship between operators ∨, ∧.

¬(A ∧B) ⇐⇒ ¬A ∨ ¬B ¬(A ∨B) ⇐⇒ ¬A ∧ ¬B

These two laws allow us to express ∧ via ¬ and ∨:

A ∧B ⇐⇒ ¬¬(A ∧B) ⇐⇒ ¬(¬A ∨ ¬B)

and ∨ via ¬ and ∧:

A ∨B ⇐⇒ ¬¬(A ∨B) ⇐⇒ ¬(¬A ∧ ¬B)

This means that any one of the two operators ∧, ∨ is redundant: we canrewrite any statement without using either one or the other. Of course, itis usually more convenient to use both.

Another group of laws deals with the case of know truth values appearingexplicitly in compound statements:

A ∧ T ⇐⇒ A A ∨ F ⇐⇒ A identity laws

A ∧ F ⇐⇒ F A ∨ T ⇐⇒ T annihilation laws

A ∧ ¬A ⇐⇒ F A ∨ ¬A ⇐⇒ T excluded middle

A ∧ (A ∨B) ⇐⇒ A ⇐⇒ A ∨ (A ∧B) absorption laws

Identity laws correspond to a · 1 = a, a + 0 = a. An arithmetic annihilationlaw does not hold for addition, but holds for multiplication: a · 0 = 0.An arithmetic analogue of the law of excluded middle does not hold formultiplication, but holds for addition: a + (−a) = 0.

Finally, the following two laws completely describe the two remainingBoolean operators, ⇒ and ⇔:

(A ⇒ B) ⇐⇒ (¬A ∨B) ⇐⇒ ¬(A ∧ ¬B)

(A ⇔ B) ⇐⇒ (A ⇒ B) ∧ (B ⇒ A) ⇐⇒ (A ∧B) ∨ (¬A ∧ ¬B)

Again, both ⇒ and ⇔ are formally redundant, but, as we mentioned before,very useful in practice.

All the above laws are in fact theorems, and proving them is a goodexercise in applying truth tables. Here is a table that proves one of DeMorgan’s laws, ¬(A ∧B) ⇐⇒ (¬A ∨ ¬B):

A B A ∧B ¬(A ∧B) ¬A ¬B (¬A ∨ ¬B)

T T T F F F FT F F T F T TF T F T T F TF F F T T T T

? ?


The columns for the two sides of the law (marked ?) are identical, hencetheir truth values agree for any A, B.

We can use our laws of logic to prove new theorems. Here is an example.

Theorem 1 (Principle of proof by contradiction). For any statementsA, B, we have (A ⇒ B) ⇐⇒ (¬B ⇒ ¬A)

Proof. We apply the law for ⇒, then the law of double negation, commu-tativity of ∨, and finally the law for ⇒ once again, this time in the oppositedirection.

(¬B ⇒ ¬A) ⇐⇒ (¬¬B∨¬A) ⇐⇒ (B∨¬A) ⇐⇒ (¬A∨B) ⇐⇒ (A ⇒ B)

�

The above theorem gives us a useful generic proof method. When we aregiven a statement A, and we are asked to prove a statement B, we may startby assuming that B is false (i.e. ¬B holds), and then show that a statementcontradicting A (i.e. ¬A) follows from our assumption. The principle of proofby contradiction tells us that in this case, B must be a logical consequenceof A.

2.3 Predicates and quantified statements

Statements we have been making so far declared facts about specific objects:

• Five is less than ten.• The pie is not as bad as it looks.

Often we need more that that: we want to declare a fact about a specificset of objects. For example, we could say:

• Some natural numbers are less than ten.• All pies are not as bad as they look.

In the first case, we could try to come up with a specific example thatproves is: say, five is less than ten. In the second case, we could restrict ourattention to a finite number of possible pies; let this set be {Chicken pie,Mushroom pie, Cabbage pie}. Then the statement “All pies are not as badas they look” is a conjunction:

(Chicken pie is not as bad as it looks) ∧(Mushroom pie is not as bad as it looks) ∧

(Cabbage pie is not as bad as it looks)

There are problems with both these approaches. In the first case, it waseasy to find a specific instance (five) that proved our statement; for other


statements, it could be much harder. We would like to have a way of saying“some numbers are less than ten” without having to show a specific example.In the second case, the chosen set of pies was too small; in reality, there aremillions of individual pies, so our statement has to be a conjunction of ahuge number of individual statements. This would be hard to deal with ifwe were to use it in proofs. Furthermore, this approach would completelyfail if the statement were about all possible pies, and then it turned out thatthis set is infinite. We would like to have a way of making a statement about“all elements” or “some elements” of any set, including infinite ones.

We achieve the stated goal by using the notion of a predicate. A predicateis simply a sentence containing variables ranging over a particular set. Theset of values for a variable is called the range of that variable. Will willalways assume that the range is nonempty. The sentence must become trueor false when an element of the range is substituted for every variable. Hereare some examples:

• Number x is less than ten.• Pie p is not as bad as it looks.

Here x is a variable that stands for a member of set N (i.e. ranges over N),and p is a variable that stands for a member of the set of all pies (i.e. rangesover that set. Of course, in the latter case we must specify precisely whatwe understand by “all pies”.

A predicate may contain more than one variable. For examples, theseare valid predicates:

• Number x is less than number y.• Pie p is better than pie q.

Ordinary statements can be regarded as a special case of predicates, con-taining zero variables, for example:

• Number 5 is less than number 10.• This chicken pie is better than that apple pie.

In the latter case, we assume that we are talking about two specific, well-defined pies.

A predicate with more than one variable can be made a statement bysubstituting a specific element of the range for every variable. A differentway of make a statement from a predicate is by using quantifiers. Let denoteby P (x) a predicate with the variable x, There are two quantifiers:

• existential (FOR SOME x, P(x)): ∃x : P (x);• universal (FOR ALL x, P(x)): ∀x : P (x).

Here, the range of x (i.e. the set from which x taken) is implicit. Often, wewant to specify the range of a variable. The above examples can be written


as:

∃x ∈ N : x < 10

∀p ∈ Pies : “p is not as bad as it looks”

The sign ∈ stands for “belongs”, and denotes the membership of an elementin a set. The general form is

∃x ∈ S : P (x) ∀x ∈ S : P (x)

With predicates having more than one variable, we can write more com-plicated quantified statements:

∀x ∈ N : ∃y ∈ N : x < y

∃y ∈ N : ∀x ∈ N : x < y

Note that the meaning, and even the truth value of the above two state-ments is different: the first one is true (“for every natural number, there isa greater number”), the second is false (“there is a natural number greaterthan all natural numbers”). In general, the meaning of a quantified state-ment depends on the order of the quantifiers.

The meaning of a quantified statement does not change if we change thequantifier variable consistently throughout the statement. For example, wecan write:

∃z ∈ N : z < 10 ∀π ∈ Pies : “π is not as bad as it looks”

The variable in a quantified statement is only defined within the statement;it is not “visible from outside”. In programming, such variables are called“local”. In mathematics, we call them dummies, or bound variables. In theexamples above, variables x, z are bound by the quantifier ∃, and variablesp, π are bound by the quantifier ∀. In contrast, a variable in a predicate notbound by any quantifier (such as P (x) or z < 10) is called free. We havethe following laws of changing the bound variable:

∀x : P (x) ⇐⇒ ∀y : P (y) ∃x : P (x) ⇐⇒ ∃y : P (y)

As we have seen before, a universally quantified statement with a finiterange S = {a1, . . . , an} can be expressed by a conjunction:

∀x ∈ S : P (x) ⇐⇒ P (a1) ∧ · · · ∧ P (an)

Similarly, an existentially quantified statement with a finite range can beexpressed by a disjunction:

∃x ∈ S : P (x) ⇐⇒ P (a1) ∨ · · · ∨ P (an)


These equivalences do not hold for an infinite S, since their right-hand sideswould not be well-defined. However, the following laws will hold for anynonempty range, finite or infinite:

∀x : T ⇐⇒ T ∃x : T ⇐⇒ T

∀x : F ⇐⇒ F ∃x : F ⇐⇒ F

∀x : P (x) =⇒ ∃x : P (x)

In predicate logic, we also have the following analogue of De Morgan’slaws:

¬∀x : P (x) ⇐⇒ ∃x : ¬P (x)

¬∃x : P (x) ⇐⇒ ∀x : ¬P (x)

On a finite range, these laws can be proved by the laws of Boolean logic,using properties of conjunction for ∀, and those of disjunction for ∃. On aninfinite range, the new laws must be taken as axioms.

When several predicates are involved in a quantified statement, all theusual laws of Boolean logic apply to these predicates. However, when weintroduce a quantifier, we must be careful not to “capture” inadvertentlyany existing free variables, or any variables bound by other quantifiers. Forexample, the statement (∃x : P (x)) ∧ (∃x : Q(x)) is, in general, not equiv-alent to ∃x : (P (x) ∧Q(x)). This is because in the former statement, P (x)and Q(x) may be satisfied by different values of x, whereas in the latterstatement the value of x must be the same for both P and Q. We canmake this argument even more forceful by replacing the first statement byits logical equivalent: (∃x : P (x)) ∧ (∃y : Q(y)). By a similar reasoning,there is no equivalence between the statements (∀x : P (x))∨(∀x : Q(x)) and∀x : (P (x)∨Q(x)), since the former is equivalent to (∀x : P (x))∨(∀y : Q(y)).However, the following equivalences hold:

(∃x : P (x)) ∨ (∃x : Q(x)) ⇐⇒ ∃x : (P (x) ∨Q(x))

(∀x : P (x)) ∧ (∀x : Q(x)) ⇐⇒ ∀x : (P (x) ∧Q(x))

As before, they can be proved by laws of Boolean logic for a finite range,but must be taken as axioms when the range is infinite.

In general, a quantifier ∀x or ∃x is safe to “capture” a predicate Q, aslong as Q does not contain x as a free variable (in other words, as long asall occurrences of x in Q are bound by other quantifiers). Therefore, wehave the following laws, where Q is always assumed to be a predicate not


containing x as a free variable:

(∀x : P (x)) ∨Q ⇐⇒ ∀x : (P (x) ∨Q)

(∃x : P (x)) ∨Q ⇐⇒ ∃x : (P (x) ∨Q)

(∀x : P (x)) ∧Q ⇐⇒ ∀x : (P (x) ∧Q)

(∃x : P (x)) ∧Q ⇐⇒ ∃x : (P (x) ∧Q)

(∀x : P (x)) ⇒ Q ⇐⇒ ∀x : (P (x) ⇒ Q)

(∃x : P (x)) ⇒ Q ⇐⇒ ∃x : (P (x) ⇒ Q)

Q ⇒ (∀x : P (x)) ⇐⇒ ∀x : (Q ⇒ P (x))

Q ⇒ (∃x : P (x)) ⇐⇒ ∃x : (Q ⇒ P (x))

(∀x : P (x)) ⇔ Q ⇐⇒ ∀x : (P (x) ⇔ Q)

(∃x : P (x)) ⇔ Q ⇐⇒ ∃x : (P (x) ⇔ Q)

Just like laws of Boolean logic, which are useful in simplifying statementsinvolving Boolean operators, the above laws, along with other laws intro-duced in this section, allow us to simplify statements involving quantifiers.The ultimate purpose of all these laws, and of logic as a whole, is to allowus to express and prove facts about objects and sets that we build acrossall branches of mathematics. In the following sections of the course, we willmake extensive use of this section’s language and ideas.


3 Sets

3.1 The naıve set theory

The notion of a set is central to mathematics. However, it was not until thelate 1800s and early 1900 that mathematicians began to study sets in theirown right. Sets and set elements are basic concepts, and, as such, are leftwithout a formal definition. Georg Cantor (1845–1918), one of the creatorsof modern set theory, gave the following description:

By a set we shall understand any collection into a whole M ofdefinite, distinct objects of our intuition or of our thought. Theseobjects are called the elements of M .

The above is not a mathematical definition: it just describes our intuitiveidea of sets (“collections”) and their elements (“objects”). However, we canformulate some characteristic properties that we associate with sets:

• Any object can be an element of a set. For example, we can form thefollowing sets:

Planets = {Mercury, Venus, . . . , Pluto}Neven = {0, 2, 4, 6, 8, 10, . . .}

Junk = {239, banana, ace of spades}

• The order of elements in a set does not matter. For example,

Junk = {239, banana, ace of spades}

• Repetition of elements in a set does not matter. For example,

Junk = {banana, banana, ace of spades, 239, 239, 239}

• A set can be an element of another set. For example,

Junk = {banana, banana, ace of spades, 239, 239, 239}

SuperJunk = {239, Junk , ∅} = {239, {banana, ace of spades, 239}, ∅}

There is a special set, which contains no elements. It is called the emptyset, and denoted ∅: ∅ = {}. Any set with exactly one element is called asingleton. For example, we can form the following singletons:

MorningStars = {Venus}NonpositiveNaturals = {0}

EmptySets = {∅}


Note that the set EmptySets is not empty: it contains an element, whichhappens to be the set ∅. Likewise, the set MorningStars is distinct from theplanet Venus, and the set NonpositiveNaturals is distinct from the numberzero.

The fact that x is an element of set S is written as x ∈ S. Thus,Jupiter ∈ Planets, orange 6∈ Junk . A set A is called a subset of a set B(A ⊆ B), if all elements of A are also elements of B (but not necessarily theother way round). For example, Neven is a subset of N (Neven ⊆ N), sinceevery even natural number is a natural number. We can write the definitionformally as follows:

A ⊆ B ⇐⇒ ∀x : x ∈ A ⇒ x ∈ B

By this definition, the empty set ∅ is a subset of any set (since the rangeof the quantified statement in the definition is empty), and every set is asubset of itself.

It is very important to distinguish between the signs ∈ (element inclu-sion) and ⊆ (subset inclusion). Despite their superficial similarity, theirmeaning is very different: the first indicates an individual member of a set,the second — an arbitrary subset of a set, including the two possible ex-tremes: the empty set and the whole working set. Element inclusion ∈ isa basic concept, and therefore has no formal definition; the definition ofsubset inclusion ⊆ in terms of element inclusion was given in the previousparagraph.

Our intuitive idea of a set is an arbitrary collection of elements, wherethe order and any repetitions of elements are ignored. Can we make thisidea formal by giving to the basic concept of a set the appropriate axioms?The fact that order and repetitions do not matter is easy to express:

Axiom (The Law of Extensionality). If two sets contain the same ele-ments, they are equal.

In other words, for any sets A, B, we have

(A ⊆ B ∧B ⊆ A) ⇒ A = B

In particular, any two sets without elements are equal, therefore there isonly one empty set ∅.

When dealing with sets, we often need to select from a given set a subsetthat satisfies a certain property. For example, we could start from the setN, and select from it only those numbers that are even. In general, let Sbe our working set; then we can express any property of its elements by apredicate P (x), where x is a variable ranging over S. A set of all elementsx of S for which P (x) is T is denoted {x ∈ S | P (x)}. For example,

Neven = {x ∈ N | x is even}


The variable x in the above expression is a dummy: the set Neven will notchange if we replace all occurrences of x in its definition by y, or by anyother variable.

For any set S, we have

{x ∈ S | T} = S

{x ∈ S | F} = ∅

Here are some more examples:

{x ∈ N | x > 0} = {1, 2, 3, 4, 5, 6, . . .}{x ∈ Planets | x is red} = {Mars}

{x ∈ N | x ≥ 0} = N

{x ∈ Planets | x is a banana} = ∅

Using the predicate notation, we can attempt to formalise completelyour intuitive notion of a set. We have described a set as an “arbitrarycollection” of elements — that is, we can form a set of elements satisfyingany given predicate. We can now make it our second axiom.

Axiom (The Law of Abstraction). For any predicate P (x), there is aset A = {x | P (x)}, such that an element x is in A if and only if P (x) istrue.

Our two axioms — the law of extensionality and the law of abstraction— formalise our intuition about sets. We could try to base a whole theoryon these two axioms. Indeed, such attempts were made in the early stagesof set theory development. Unfortunately, it was soon realised that theextensionality and abstraction laws, taken together, are inconsistent — thatis, a theory based on these laws leads to contradictions. The simplest ofthese contradiction is called Russell’s paradox, after the great logician andphilosopher Bertrand Russell (1872–1970).

Consider the following predicate: P (x) ⇐⇒ x 6∈ x (note that it involveselement inclusion, rather than subset inclusion). In words, we could say thatP (x) means “x is not a member of itself”. This would be definitely true ifx is not a set; it is also true for all sets we have seen so far, and for all setswe can think of (except perhaps an imaginary “set of all sets”). We may ormay not believe that P (x) is true for all x: whether this is the case or notis irrelevant, since both possibilities will lead to a contradiction. What isrelevant is that P (x) is a well-formed predicate (i.e. is true or false for anygiven x). Therefore, by the law of abstraction, we can form the set B of allobjects x that satisfy the predicate P (x):

B = {x | P (x)} = {x | x 6∈ x}


In words, B is the set of all objects that are not their own members.Now consider the following statement R: B ∈ B. It is a well-formed

statement, so it must be either true or false. Suppose statement R is true,so B is a member of B, and, like all members of B, must not be a memberof itself. This makes the statement R false — which is impossible, sincewe assumed it was true. Now suppose statement R is false, so B is nota member of B. By definition of set B, everything that is not a memberof itself must be a member of B, so B itself must be a member B. Thismakes statement R true — which is impossible, since we assumed it wasfalse! Thus, statement R cannot be either true or false, so there must besomething “wrong” in our reasoning. The only thing that can be wrong isthe law of abstraction that we used to form the set B.

There is an alternative, somewhat lighter form of Russell’s paradox.Imagine a village that has a single (male) barber with the following code ofpractice: the barber will shave every man in the village, but only if this mandoes not shave himself. Must the barber shave himself? The question has noanswer, since both choices of the answer lead to a contradiction. Therefore,the barber’s rule is inconsistent.

Because of Russell’s paradox, the theory based on the laws of extension-ality and abstraction is often called the naıve set theory. It captures ourintuitive notion of a set but, being inconsistent, cannot serve as a formalfoundation of mathematics. A lot of time and effort have been spent inorder to provide a more sound axiomatic system for sets. Now, several suchsystems exist; they are all significantly more complicated than the naıve settheory. We shall not go into their details in this course. For the rest ofthe course, we will use implicitly the laws of extensionality and abstraction,and in particular the convenient notation for set abstraction {x | P (x)}. Onthe level of our course, no paradoxes similar to Russell’s will arise. Indeed,unless mathematicians create them artificially, they seldom arise at all.

3.2 Operations on sets

We have already studied the concept of set abstraction, that allows us (ide-ally) to form a set {x | P (x)} from any predicate P (x). We will now use thismethod to define operations that create new sets from existing ones. Despitethe problems with abstraction arising due to Russell’s paradox, these newset operations will be completely non-controversial.

Let A, B be any sets. The intersection of A, B, denoted A ∩B, is a setthat contains all elements which are members of both A and B:

A ∩B = {x | (x ∈ A) ∧ (x ∈ B)}The union of A, B, denoted A∪B, is a set that contains all elements whichare members of either A, or B (or both):

A ∪B = {x | (x ∈ A) ∨ (x ∈ B)}


The difference of A, B, denoted A \ B, is a set that contains all elementswhich are members of A, except those which are members of B:

A \B = {x | (x ∈ A) ∧ ¬(x ∈ B)}

As we see from the definitions, set operations are closely related toBoolean operators. In particular, they have properties very similar to thelaws of Boolean logic, where ∩ is analogous to conjunction, and ∪ to dis-junction.

A ∩A = A A ∪A = A idempotence of ∩, ∪A ∩B = B ∩A A ∪B = B ∪A commutativity of ∩, ∪

Also,

(A ∩B) ∩ C = A ∩ (B ∩ C) associativity of ∩(A ∪B) ∪ C = A ∪ (B ∪ C) associativity of ∪A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C) distributivity of ∩ over ∪A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C) distributivity of ∪ over ∩

Set difference does not directly correspond to negation, since it involvestwo sets rather of one. In order to obtain an analogue of negation, let us fixa particular set S (the universal set). We now restrict ourselves to sets thatare subsets of S. For any set A ⊆ S, the complement of A (with respect toS) is the difference A = S \ A. The laws of complement are analogous tothe the laws of Boolean negation. We have the law of double complement :

¯A = A

and De Morgan’s laws:

A ∩B = A ∪ B A ∪B = A ∩ B

Here, A, B are arbitrary subsets of S.The universal set S corresponds to the statement T , and the empty set

∅ to the statement F . Note that ∅ ⊆ S. We have:

A ∩ S = A A ∪ ∅ = A identity laws

A ∩ ∅ = ∅ A ∪ S = S annihilation laws

A ∩ A = ∅ A ∪ A = S excluded middle

A ∩ (A ∪B) = A = A ∪ (A ∩B) absorption laws

All the above laws are theorems, and are easy to prove by the laws of Booleanlogic. Here is an example:


Theorem 2 (De Morgan’s Law). For any universal set S, and for anysets A, B ∈ S, we have A ∩B = A ∪ B.

Proof. We apply the definition of complement, the Boolean De Morgan’slaw, the Boolean distributivity law, once again the definition of complement,and finally the definition of set union:

A ∩B = S \ (A ∩B) =

{x | (x ∈ S) ∧ ¬(x ∈ A ∩B)} =

{x | (x ∈ S) ∧ ¬((x ∈ A) ∧ (x ∈ B))} =

{x | (x ∈ S) ∧ (¬(x ∈ A) ∨ ¬(x ∈ B))} =

{x | ((x ∈ S) ∧ ¬(x ∈ A)) ∨ ((x ∈ S) ∧ ¬(x ∈ B))} =

{x | (x ∈ S \A) ∨ (x ∈ S \B)} =

{x | (x ∈ A) ∨ (x ∈ B)} = A ∪ B �

Let us compare once again the laws of Boolean logic with the laws ofsets. In logic, we have the set of Boolean values B = {F, T}, and Booleanoperators ¬, ∧, ∨. In set theory, we have a fixed universal set S, and setoperations¯(complement), ∩, ∪. The laws obeyed by these two structures(set B and the set of all subsets of S) are essentially the same. There aremany other similar structures in mathematics, with operations governed byexactly the same laws. Such structures are called Boolean algebras.

The Boolean algebra formed by all subsets of a given set S is called thepowerset of S. Formally, the powerset of S is a set P(S) = {A | A ⊆ S}.In other words, a set is member of P(S), if and only if it is a subset of S:∀A : A ∈ P(S) ⇔ A ⊆ S.

Let us consider some examples. The simplest case is S = ∅. The emptyset contains exactly one subset: the empty set itself. Hence, the powersetof ∅ is a singleton: P(∅) = {∅}. Note: the powerset of the empty set is notempty.

Now let S be a singleton, for example S = {Bunty}. Set S contains twosubsets: S itself, and the empty set. Hence, the powerset of S consists oftwo elements: P(S) = {∅, {Bunty}}. In general, the powerset of any set Scontains, among other elements, the set S itself, and the empty set. Forexample,

P({a, b, c}) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}

When we form a subset of a given set S, we have two choices for eachelement: either to include, or not to include this element in the subset.Thus, for a finite set of n ∈ N elements, we make n independent choices,leading to 2n different subsets. Therefore, the powerset of a finite set isfinite. Furthermore, the powerset of an n-element finite set consists of 2n

elements. Note that this also holds for P(∅), since 20 = 1.


If set S is infinite, then its powerset P(S) must also be infinite. This isbecause P(S) contains, among other elements, all singletons {a}, such thata ∈ S. Since S is infinite, the number of such singletons is also infinite.

The last set operation that we consider in this section is based on theidea of a sequence. Let x1, x2, . . . , xn be any objects (n ∈ N). A (finite)sequence (x1, x2, . . . , xn) is different from a set {x1, x2, . . . , xn} in that theorder and repetition of elements do matter in a sequence. For example,the sequence JunkSeq1 = (239, banana, ace of spades) is different from thesequence JunkSeq2 = (banana, 239, ace of spades, 239). Natural number nis called the length of the sequence. For example, the length of JunkSeq1 isthree, and the length of JunkSeq2 is four. We will give a formal definitionof sequences further in the course.

A sequence of length two is called an ordered pair. Let A, B be any sets.The Cartesian product of sets A, B, denote A×B, is the set of all orderedpairs (a, b), where a ∈ A, b ∈ B. In other words, A × B = {(a, b) | (a ∈A) ∧ (b ∈ B)}.

The Cartesian product is named after the great philosopher and math-ematician Rene Descartes (1596–1650). Descartes lived long before setsemerged as a separate mathematical concept. However, Descartes was thefirst to realise that in geometry, a point in the plane can be represented bya pair of numbers, called coordinates. Therefore, the whole plane is repre-sented by what we now call a Cartesian product of two lines.

Here are some examples of Cartesian products:

∅ ×A = A× ∅ = ∅ for any set A

{Bunty} × {Fowler} = {(Bunty, Fowler)}{Fowler} × {Bunty} = {(Fowler, Bunty)}

{a, b, c} × {d, e} = {(a, d), (a, e), (b, d), (b, e), (c, d), (c, e)}N× Planets = {(n, x) | (a ∈ N) ∧ (x ∈ Planets)} =

{(5, Saturn), (239, Earth), . . .}

The Cartesian product of a set A to itself is called the Cartesian squareof A, and denoted A2 = A×A. For example,

{a, b}2 = {(a, a), (a, b), (b, a), (b, b)}N2 = N× N = {(m, n) | m, n ∈ N}

Thus, the plane is the Cartesian square of a line.When forming a pair (a, b) in the Cartesian product A × B, we make

two independent choices: we choose a ∈ A, and b ∈ B. For finite setsA, B, with m and n elements respectively, there are m · n possible pairs.Therefore, the Cartesian product of two finite sets is finite. Furthermore,the Cartesian product of an m-element set and an n-element set consists


of m · n elements. Note that this also holds for the products involving theempty set: the Cartesian product of the empty set with any other set isempty.

If one of the sets A, B is infinite, and the other is non-empty, then theCartesian product A × B must be infinite. This is because if, say, A isinfinite, and b ∈ B, then we can form an infinite number of distinct pairs(x, b), where x ∈ A. Each of such pairs belongs to A×B.

In general A × B 6= B × A (the equality only holds when A = B, orwhen one of A, B is empty). Hence, the Cartesian product operator is notcommutative. Furthermore, a nested pair ((a, b), c) is different from thenested pair (a, (b, c)), hence (A × B) × C 6= A × (B × C), so the Cartesianproduct operator is not associative. However, it still has some distributiveproperties with respect to other set operations:

A× (B ∩ C) = (A×B) ∩ (A× C) distributivity of × over ∩(A ∩B)× C = (A× C) ∩ (B × C)

A× (B ∪ C) = (A×B) ∪ (A× C) distributivity of × over ∪(A ∪B)× C = (A× C) ∪ (B × C)

A× (B \ C) = (A×B) \ (A× C) distributivity of × over \(A \B)× C = (A× C) \ (B × C)

For a finite sequence of sets A1, A2, . . . , Ak, we can define the Cartesianproduct A1×A2× · · · ×Ak as the set of all finite sequences (a1, a2, . . . , ak),where ai ∈ Ai for all i ∈ N, 1 ≤ i ≤ k. Alternatively, we can define the multi-ple Cartesian product A1×A2×· · ·×Ak as nested binary Cartesian products(((A1 × A2)× A3)× . . . )× Ak or A1 × (A2 × (A3 × (· · · × Ak))). From theformal viewpoint, the above three definitions are not equivalent, since thesequence (a1, a2, . . . , ak) is different from the nested pairs (a1, (a2, (. . . , ak)))and (((a1, a2), . . . ), ak). However, the structure of the resulting sets is sim-ilar, and in most applications we can treat the above as three equivalentdefinitions of the Cartesian product of a sequence of sets. If all sets in thesequence are finite, with set Ai having ni elements for every i, then theCartesian product A1 × A2 × · · · × Ak (by any of the three definitions) hasn1 · n2 · . . . · nk elements.

Similarly to the Cartesian square, we can define the k-th Cartesian powerof a set A as Ak = A×A× · · · ×A (k times). Thus, the three-dimensionalspace is the Cartesian cube (i.e. the third Cartesian power) of a line.

It is possible do define the Cartesian product of an infinite sequence ofsets by considering infinite sequence of elements, each element coming fromthe corresponding set in the sequence. We will not use Cartesian productsof an infinite number of sets in this course.


4 Relations

4.1 Introduction to relations

We usually thing of a “relation” between sets as a certain set of orderedpairs, where each element of a pair is taken from its corresponding set. Forexample, we could have the relation between the set of all cars and the setof all people, which would consists of all pairs (x, y), where car x is drivenby person y. A car may be driven by more than one person, so there maybe several pairs with the same x; a person may drive more than one car, sothere may be several pairs with the same y. Some cars may have no drivers,and some people may not drive any cars, therefore some set elements maynot be included in any of the pairs.

Thus, for any sets A, B, a relation between A and B is an arbitrarysubset of the Cartesian product A× B. In other words, a relation betweenA and B is an arbitrary set of ordered pairs (a, b), where a ∈ A, b ∈ B.Although ordinary set notation would be sufficient, there is an alternative,more convenient notation for relations. We denote a relation Rp ⊆ A × Bby Rp : A ↔ B. Instead of writing (a, b) ∈ Rp, we write apb. This is in linewith the normal practice of mathematics, where we use e.g. x ≤ y insteadof (x, y) ∈ R≤.

Relation R≤ : N ↔ N is an example of a relation between the set N anditself. For any set A, we say that relation Rp : A ↔ A is a relation on theset A. From arithmetic, we already know several other relations on the setN: R=, R<, R≤, R>, R≥. Another example of a relation on the set N is therelation R| : N ↔ N, where m|n is true if m divides n (i.e. n is a multipleon m).

We can define the following relations between any sets A, B:

• the empty relation ∅ ⊆ A×B

• the complete relation A×B ⊆ A×B

On any set A, we can define the equality relation R=A: A ↔ A as

R=A= {(a, a) | a ∈ A}. The equality relation consists of all pairs where

both elements are equal. When the set A is clear from the context, we dropthe subscript, so instead of a =A b we write simply a = b.

Let Rp : A ↔ B, Rq : B ↔ C. The composition of relation Rp and Rq isa relation Rp◦q : A ↔ C, defined as follows:

∀(a, c) ∈ A× C :(

a(p ◦ q)c ⇐⇒ ∃b ∈ B : (apb) ∧ (bqc))

In other words, an element a ∈ A is related to an element c ∈ C by thecomposition Rp◦q, if there is (at least one) intermediate element b ∈ B, suchthat a is related to b by Rp, and b is related to c by Rq.


Consider, for example, the relation Rq : People ↔ People, where xqy ifx is a child of y. The composition Rq◦q relates two elements x, y, if x is agrandchild of y.

Let Rp : A ↔ B. The inverse of relation Rp is a relation Rp−1 : B ↔ A,defined as follows:

∀(b, a) ∈ B ×A :(

b(p−1)a ⇐⇒ apb)

In other words, an element b ∈ B is related to an element a ∈ A by theinverse relation Rp−1 , if a is related to b by the original relation Rp. Thesuperscript −1 is just a symbol for inversion, not the number “minus one”.

For example, the inverse of the child relation Rq : People ↔ People isthe relation Rq−1 , that relates two elements x, y, if x is a child of y.

We now switch our attention from relations between two arbitrary setsA, B to relations on a given set A (the previous two examples were alreadyof this type). Let Rp : A ↔ A. Relation Rp is

• reflexive, if every element is related to itself: ∀a ∈ A : apa

• symmetric, if every two elements are related in both possible orders,as long as they are related at all: ∀a, b ∈ A : apb ⇒ bpa

• antisymmetric, if no two distinct elements are related in both possibleorders: ∀a, b ∈ A : (apb ∧ bpa) ⇒ a = b

• transitive, if every two elements related via an intermediate third ele-ment are also related directly: ∀a, b, c ∈ A : (apb ∧ bpc) ⇒ apc

In other words, a relation Rp : A ↔ A is reflexive, if R=A⊆ Rp; symmetric,

if Rp−1 ⊆ Rp; antisymmetric, if Rp ∩ Rp−1 ⊆ R=A; transitive, if Rp◦p ⊆ Rp.

Verifying these claims is left as an exercise.

Note that a relation that is not symmetric need not be antisymmetric,and vice versa. Any relation which contains simultaneously pairs (a, b) and(b, a) for some, but not all a, b ∈ A, a 6= b, would be an example of a relationthat is neither symmetric nor antisymmetric. The equality relation is anexample of a relation which is both symmetric and antisymmetric (and alsoreflexive and transitive).

The most interesting relations are those that satisfy more than one ofthe above properties. In particular, a relation is

• an equivalence relation, if it is reflexive, symmetric and transitive;

• a partial order, if it is reflexive, antisymmetric and transitive.

The equality relation is both an equivalence relation and a partial order. Inthe following sections, we shall see more examples of each type of relations.


4.2 Equivalence relations

An equivalence relation is a relation that is reflexive, symmetric and tran-sitive. Examples of equivalence relations are abundant in mathematics andin everyday life. For example, consider the relation on the set of all people,where person a is related to person b, if a and b are of the same age (inwhole number of years). It is easy to check that all necessary properties inthe definition of an equivalence relation are satisfied. A relation where a isrelated to b if a and b were born on the same day (but possibly in differentyears) is another equivalence relation. In geometry, we can define an equiv-alence relation on the set of all straight lines in the plane, where line a isrelated to line b, if a and b are parallel (every line is considered to be parallelto itself).

In arithmetic, given a fixed number n ∈ Z, we can define the relationR≡n

: Z ↔ Z, where two numbers are related, if their difference is a multipleof n: a ≡n b ⇐⇒ n|(a− b). The relation R≡n

is called congruence modulon. It is an equivalence relation for every natural n > 0.

Let A be any set, and R∼ : A ↔ A an equivalence relation (∼ is ageneral mathematical sign for equivalence). For any element a ∈ A, theequivalence class of a, denoted [a]∼, is the set of all elements in A relatedto a: [a]∼ = {x ∈ A | x ∼ a}. Since R∼ is reflexive, every element belongsto its own equivalence class: for all a ∈ A, a ∈ [a]∼. Sometimes an elementa is called a representative of the equivalence class [a]∼.

For example, if a ∼ b means that a and b are two people of the sameage, then the equivalence classes are all possible ages, and every person“represents” all people of his or her age. If a ∼ b means that persons aand b share a birthday, then the equivalence classes are all 366 possiblebirthdays, and every person “represents” all people with the same birthday.If a ∼ b means that lines a and b are parallel, then these lines share thesame direction, and we can think of all possible directions as the equivalenceclasses. For the congruence relation R≡n

, the equivalence class of any a ∈ Z

consists of all numbers that give the same remainder as a, when divided byn. Thus, [2]≡5

= {. . . ,−18,−13,−8,−3, 2, 7, 12, 17, . . . }.The importance of equivalence classes is that in a set with an equivalence

relation, every element belongs to one, and only one, equivalence class. Inother words, we have the following theorem.

Theorem 3. Let R∼ : A ↔ A be an equivalence relation. The equivalenceclasses of R∼ are pairwise disjoint. The union of all equivalence classes isthe whole set A.

Proof. To prove that the classes are pairwise disjoint, we need to show thatfor all a, b ∈ A : ([a]∼ = [b]∼) ∨ ([a]∼ ∩ [b]∼ = ∅). Consider two cases:

• Case a ∼ b. Consider any x ∈ [a]∼. By transitivity of R∼, we have:

x ∼ a, a ∼ b =⇒ x ∼ b =⇒ x ∈ [b]∼


Hence [a]∼ ⊆ [b]∼. Swapping a and b, we get [b]∼ ⊆ [a]∼, therefore[a]∼ = [b]∼.

• Case a 6∼ b. Suppose [a]∼∩ [b]∼ 6= ∅, then there is some x ∈ [a]∼∩ [b]∼.By symmetry and transitivity of R∼, we have a ∼ x, x ∼ b =⇒ a ∼ b,contradiction. Therefore [a]∼ ∩ [b]∼ = ∅.

By the law of excluded middle, one of the above two cases must be true,hence ∀a, b ∈ A : ([a]∼ = [b]∼) ∨ ([a]∼ ∩ [b]∼ = ∅)

Finally, by reflexivity of R∼, we have a ∼ a, therefore a ∈ [a]∼, so everyelement of A belongs to some equivalence class. On the other hand, everyequivalence class is a subset of A, therefore the union of all equivalenceclasses is the whole set A. �

Theorem 3 allows us to think of any equivalence relation as a partitioningof the set into disjoint subsets. In many cases, such partitioning has a well-understood intuitive meaning:

• The equivalence relation “person a is of the same age as person b(in whole number of years)” has approximately 110–120 equivalenceclasses, corresponding to all possible ages. Note that these ages neednot be a contiguous set of natural numbers, if e.g. there is a person ofage 120, but no person of age 119.

• The equivalence relation “person a was born on the same day as personb (possibly in different years)” has exactly 366 equivalence classes,corresponding to every date in a year. Note that the sizes of all classeswill be nearly equal, except the class corresponding to 29 February,which will be approximately four times smaller than others.

• The equivalence relation “line a is parallel (or equal) to line b” hasan infinite number of equivalence classes corresponding to all possibledirections of a line in the plane. In fact, we can define “direction” asan equivalence class of this relation.

• The “congruence modulo n” relation R≡nhas n equivalence classes,

represented by numbers 0, 1, . . . , n − 1. For example, for n = 5, wehave:

[0]≡5= {. . . ,−10,−5, 0, 5, 10, . . . }

[1]≡5= {. . . ,−9,−4, 1, 6, 11, . . . }

[2]≡5= {. . . ,−8,−3, 2, 7, 12, . . . }

[3]≡5= {. . . ,−7,−2, 3, 8, 13, . . . }

[4]≡5= {. . . ,−6,−1, 4, 9, 14, . . . }

Although the number of classes is finite, each class is an infinite set.The classes [a]≡n

are called residue classes modulo n.


For a given equivalence relation R∼ : A ↔ A, the set of all is equivalenceclasses is called the quotient set of A with respect to R∼, and is denotedby A/R∼ = {[a]∼ | a ∈ A}. In the examples above, the quotient sets arerespectively the set of all ages, the set of all birthdays, the set of all linedirections, and the set of all residue classes modulo n (for a given n ∈ N,n > 0). The latter set is usually denoted by Zn = Z/R≡n

= {[a]≡n| a ∈

Z}. The set Zn possesses very interesting arithmetic properties, which arestudied in number theory.

For a finite set A, the quotient set A/R∼ must be finite. In particular,if A has n elements, and if all equivalence classes happen to be of equal sizem, then n must be a multiple of m, and the quotient set will have n/melements (i.e. equivalence classes). For an infinite set A, the quotient setmay be finite or infinite.

4.3 Partial orders

A partial order is a relation that is reflexive, antisymmetric and transitive.Whereas an equivalence relation is an abstraction of “equality” or “similar-ity” between objects, a partial order is an abstraction of one object beingin some sense “smaller” (or “greater”) than another, or of one object “pre-ceding” (or “succeeding”) another. Consider, for example, a relation on theset of all people, where person a is related to person b, if a is a descendantof b (i.e. a child, a grandchild, a great-grandchild, etc.) We count everyperson as his or her own descendant, therefore the relation is reflexive. Therelation is also transitive, since a descendant of a descendant of a person isa descendant of that person. Of course, the relation is not symmetric, since“person a is a descendant of b” does not imply that “person b is a descendantof a”. Moreover, these two statements can both be true in one case only:when a and b are the same person (who, by definition, is a descendant ofhim/herself). Thus, we have the antisymmetry property, and our relation isa partial order on the set of all people.

It is easy to check that the arithmetic relations R≤ and R≥, both on N

and on Z, are partial orders.

The divisibility relation R| : N ↔ N, which we mentioned several timesbefore, is formally defined as follows: for m, n ∈ N, we have m|n (m dividesn, n is a multiple of m), if there is number k ∈ N, such that m · k = n.Note that by this definition, number 1 divides every number: to prove 1|n,we take k = n. Also, number 0 is a multiple of every number: to prove m|0,we take k = 0. We have the following theorem.

Theorem 4. The divisibility relation R| : N ↔ N is a partial order.

Proof. Let n ∈ N. We have n · 1 = n, hence n|n by definition of relationR|. Therefore, relation R| is reflexive.


Let m, n ∈ N, m|n, n|m. By definition of relation R|, there are k, l ∈ N,such that n = k ·m, m = l ·n. Hence, n = k · l ·n, so k · l = 1. Since k and lare natural numbers, this can only be true if k = l = 1, hence n = 1 ·m = m.Therefore, relation R| is antisymmetric.

Let m, n, p ∈ N, m|n, n|p. By definition of relation R|, there are k, l ∈ N,such that n = k ·m, p = l ·n. Hence, p = k · l ·m, so m|p. Therefore, relationR| is transitive. �

Another important example of a partial order is the subset inclusionrelation A ⊆ B, where A, B are both subsets of a given set S. Sincethe objects being related are subsets of S, the subset inclusion relation isdefined on the powerset of S: R⊆ : P(S) ↔ P(S). The relation is reflexive,since for any A ⊆ S, A ⊆ A; antisymmetric, since for any A, B ⊆ S,(A ⊆ B) ∧ (B ⊆ A) ⇒ (A = B); transitive, since for any A, B, C ⊆ S,(A ⊆ B) ∧ (B ⊆ C) ⇒ (A ⊆ C).

Note that in a partial order, some pairs of elements may be incomparable.For example, for any two persons one does not have to be an ancestor of theother: they could be siblings, cousins, or not related at all. Likewise, thereare pairs of numbers neither of which divides the other (e.g. 4 and 5), andpairs of sets neither of which is a subset of the other (e.g. {1, 2}, {1, 3} ⊆{1, 2, 3}). On the other hand, relations R≤ and R≥ satisfy an additionalproperty: for any numbers a, b, we have either a ≤ b, or b ≤ a (or both,if a = b). In general, a partial order R� : A ↔ A is called total, if for alla, b ∈ A, we have either a � b, or b � a. Thus, partial orders R≤ and R≥

are total; partial orders R| and R⊆ are not total.

Consider a partial (not necessarily total) order R� : A ↔ A. Let a, b ∈ A.We say that c ∈ A is an upper bound of a, if a � c. In particular, everyelement is an upper bound of itself. An element c ∈ A is a (common) upperbound of a and b, if a � c and b � c. An arbitrary pair of elements a, bmay have no common upper bound at all, or several common upper bounds.If the latter case, one of the bounds may play a special role, being “theclosest” to a and b among all their common upper bounds. Formally, anelement c ∈ A is called the least upper bound of a, b, denoted lub�(a, b), if cis an upper bound of a, b, and for any upper bound x of a, b, we have c � x.In other words,

c = lub�(a, b) ⇐⇒(a � c) ∧ (b � c) ∧

(

∀x ∈ A : (a � x) ∧ (b � x) ⇒ (c � x))

The least upper bound of a, b does not have to exist, even if elements a, bhave some common upper bounds.

All the above definitions can be easily restated for lower, rather thanupper bounds. Thus, d ∈ A is a lower bound of a, if d � a. Every elementis a lower bound of itself. An element d ∈ A is a (common) lower bound of


a and b, if d � a and d � b. Two elements can have any number of commonlower bounds, or no common bounds at all. An element d ∈ A is called thegreatest lower bound of a, b, denoted glb�(a, b), if d is a lower bound of a,b, and for any lower bound x of a, b, we have x � d. In other words,

d = glb�(a, b) ⇐⇒(d � a) ∧ (d � b) ∧

(

∀x ∈ A : (x � a) ∧ (x � b) ⇒ (x � d))

Two elements may not have the greatest lower bound, even if they havesome common lower bounds. However, if two elements have the greatestlower bound, then it is unique (why?). The same applies to the least upperbound.

As an example, consider the partial order “a is a descendant of b”. Forany two people, their common upper bound is any common ancestor, if oneexists. Thus, if two persons are cousins, then either of the two commongrandparents is their common upper bound. Neither of these upper boundsis the least, since the two grandparents are not ancestors of each other.There are many other common upper bounds, provided by ancestors ofthese grandparents, but none of these upper bounds is the least.

In the same partial order, the common lower bound of any two peopleis their common descendant, if one exists. Thus, is two persons are “in-laws”, i.e. each of them is a parent of the other child’s partner, then eachof their common grandchildren is their common lower bound. There maybe many other common lower bounds, provided by descendants of commongrandchildren. If the two “in-laws” have exactly one common grandchild,he/she is their greatest lower bound, since all other common lower boundswould be that grandchild’s descendants.

In arithmetic, the greatest lower bound of two numbers a, b ∈ N withrespect to the divisibility relation R| is the two numbers’ greatest commondivisor: glb|(a, b) = gcd(a, b). (Sometimes the greatest common divisor iscalled “highest common factor”.) In the same partial order, the least upperbound of two numbers a, b ∈ N is their least common multiple: lub|(a, b) =lcm(a, b). In contrast with the previous example, every two non-zero naturalnumbers have the greatest common divisor and the least common multiple,and therefore the greatest lower bound and the least upper bound in R|.

Another example of an arithmetic partial order with guaranteed greatestlower and least upper bounds is the total order R≤ : N ↔ N. Here, thegreatest lower bound of two numbers a, b is simply their minimum a u b(a u b = a if a ≤ b, and a u b = b otherwise). The least upper bound of a, bis their maximum at b (at b = b if a ≤ b, and at b = a otherwise). In fact,it is easy to see that greatest lower and least upper bounds are guaranteedto exist in every totally ordered set.

Finally, consider the subset inclusion relation R⊆ : P(S) ↔ P(S) onthe subsets of any (note necessarily finite) set S. The greatest lower bound


of two subsets A, B ⊆ S is their intersection glb⊆(A, B) = A ∩ B, andthe least upper bound is their union lub⊆(A, B) = A ∪ B. For any twosets, we can form their intersection and their union, therefore the relationR⊆ is another example of a partial order where greatest lower and leastupper bounds always exist. In general, a partially ordered set where forevery two elements one can find their greatest lower bound and least upperbound is called a lattice. The partial orders R| : N ↔ N, R≤ : N ↔ N andR⊆ : P(S) ↔ P(S) (for any set S) are examples of lattices.

In many partial ordered sets, it is worthwhile to look for elements thatare in some sense “extreme”. Since the set may include incomparable el-ements, we have two possible notions of “extremality”. Consider a partial(not necessarily total) order R� : A ↔ A. We say that a ∈ A is a maximalelement, if for all x ∈ A, we have (a � x) ⇒ (a = x). In other words, theonly element higher than or equal to a is a itself. We say that c ∈ A is thegreatest element, if for all x ∈ A, we have x � c. In other words, c is higherthan or equal to all elements of A. Note that by this definition, the great-est element must be comparable to (and higher than) all other elements,whereas a maximal element may be comparable to (and higher than) someelements and incomparable to others.

Both above definitions can be restated for the opposite “extremes”. Wesay that b ∈ A is a minimal element, if for all x ∈ A, we have (x � b) ⇒(x = b). In other words, the only element lower than or equal to b is b itself.We say that d ∈ A is the least element, if for all x ∈ A, we have d � x. Inother words, a is lower than or equal to all elements of A. Again, the leastelement is comparable to all other elements, whereas a minimal element maybe comparable to some elements and incomparable to others.

As an example, consider the partial order “a is a descendant of b”. Aminimal element in this partial order is any person without children. Thereis no least element, since no person is everyone’s descendant.

In the total order R≤ : N ↔ N, number 0 is the least element, and theonly minimal element. There are no maximal or greatest elements. In thepartial order R| : N ↔ N, number 1 is the least element, since it dividesall natural numbers. Number 0 is (somewhat contrary to the intuition) thegreatest element, since every natural number divides 0. It is also the onlymaximal element.

An interesting variation of the previous example is the same partial orderR|, considered on the set of all natural numbers, except 0, 1. In this partialorder, every prime number is a minimal element, since it is not divisibleby any other natural number. There is no least element, since no number(except 1, which is excluded) divides all natural numbers. There are nomaximal elements, since for every number other than 0, there is a distinctmultiple (e.g. x 6= 2x and x|2x for any x ∈ N, x 6= 0). There is no greatestelement, since no positive number is a multiple of all other numbers.

In the subset inclusion relation R⊆ : P(S) ↔ P(S), the least (and the


only minimal) element is ∅, and the greatest (and the only maximal) elementis S. If ∅ and S are excluded, and S is neither empty nor a singleton, thenthere will be many minimal elements (all singletons {a}, where a ∈ S)and many maximal elements (all complements of such singletons), but nogreatest or least element.

It is easy to prove that any greatest element is maximal, and that anyleast element is minimal (try it!). As the above examples show, the converseis not true: a maximal element need not be the greatest, and a minimalelement need not be the least. It is also easy to prove that if the greatest(or the least) element exists, then it must be unique (try it!). However, ifa maximal or a minimal element is unique, it still does not have to be thegreatest or the least (why?).

The results of this section show us that the concept of a relation, and inparticular equivalence relations and partial orders, give us a useful generaltool, applicable in various branches of mathematics and computer science.We will apply our knowledge of relations in the following sections.



5 Functions

5.1 Introduction to functions

The word “function” takes on different meanings in different branches ofmathematics and computer science. One often thinks of a function as atransformation rule, or a set of rules, that allow us to “map”, or transform,objects into other objects. There are various ways to make this conceptof a function precise. In this course, we take the approach of ignoring theprocess of transformation (which may not even be computable), and insteadwe concentrate on the initial object the function was applied to, and thefinal object that is the result of this application. In other words, we view afunction as a relation between the set of all possible “inputs” and all possible“outputs”.

The special property of functions, which distinguishes them from otherrelations, is that for every “input”, the function produces exactly one “out-put” (see Figure 1). Formally, a function f from set A to set B is a relationRf : A ↔ B, where for every a ∈ A, there is exactly one b ∈ B, such thatafb (that is, (a, b) ∈ Rf ). Set A is the domain of f , set B is the co-domain off . We say that function f maps A into B. We say that a function f : A → Ais a function on the set A.

There is special notation and terminology associated with functions. Weindicate that f is a function from A into B by writing f : A → B. As analternative notation to (a, b) ∈ Rf or afb, we write f(a) = b. This notationis unambiguous since, by definition of a function, for every a there is exactlyone b = f(a). We say that function f maps a to b. For a given function f ,element b = f(a) is called the image of a, and a is called the pre-image of b.

We have already seen some examples of functions earlier in the course.In particular, the equality relation on a set A, defined as R=A

= {(a, a) | a ∈A}, is a function on A. It is called the identity function on A, and denotedidA : A → A. For all a ∈ A, we have idA(a) = a.

As an example of an arithmetic function, we can take the function sq :Z → N, defined as the set of pairs Rsq = {(m, n) ∈ Z×N | m2 = n}. This setsatisfies the definition of a function, since every natural number has exactly

�

�

�

�

�

A

�

�

�

�

B

f

Figure 1: A function


one square. We have

Rsq = {. . . , (−3, 9), (−2, 4), (−1, 1), (0, 0), (1, 1), (2, 4), (3, 9), . . .}

Consider any function f : A → B, and let H ⊆ A. The restriction of fon set H is function f |H , defined as f |H = {(a, f(a)) | a ∈ H}. In otherwords, the restriction agrees with the original function on all elements of H,and is undefined on all elements not in H. For example, the restriction ofsq to the set of all natural numbers is the function sq |N : N → N. We have

Rsq|N = {(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25), . . .}

Two other special cases of a function that we considered before are finiteand infinite sequences. Let A be any set, finite or infinite. A finite sequenceof elements of A is a function Nk → A, where k ∈ N is the length of thesequence. Notation (a0, a1, . . . , ak−1) ∈ Ak is simply an alternative, shorterway of writing

a : Nk → A

a(0) = a0 a(1) = a1 . . . a(k − 1) = ak−1

Similarly, an infinite sequence of elements of A is a function N → A. Nota-tion (a0, a1, a2, a3, . . .), where ∀i ∈ N : ai ∈ A, is an alternative to

a : N → A

a(0) = a0 a(1) = a1 a(2) = a2 a(3) = a3 . . .

Thus, unlike sets, sequences need not be a basic, undefined concept: wedefine sets via functions. Since functions are a special case of relations, andrelations are a special case of sets, the concept of an ordered sequence isultimately reduced to the concept of an unordered set.

Since functions are relations, the operations of composition and inversioncan be applied to functions just like to any other relations. The result ofsuch application is a relation, but is not a priori guaranteed to be a function.It still turns out that the result of function composition will always be afunction.

Theorem 5. Let f : A → B, g : B → C. The composite relation Rf◦g is afunction A → C.

Proof. Let a ∈ A. Since f is a function, there is a unique b = f(a) ∈ B.Since g is a function, there is a unique c = g(b) = g(f(a)) ∈ C. By definitionof relation composition, we have (a, c) ∈ Rf◦g. Since such element c isunique, relation Rf◦g is a function f ◦ g : A → C. �


�

�

�

�

�

A

�

�

�

�

B

f(A)

f

Figure 2: The range of a function

Thus, f ◦ g(a) = g(f(a)). This explains why in some books, the order ofthe ◦ notation for function composition is inverted: g ◦ f instead of f ◦ g.We prefer the latter notation, which indicates that in the expression g(f(a)),function f is applied first, followed by function g.

In general, there is no analogue of Theorem 5 for function inversion. Fora function f : A → B, the inverse relation Rf−1 : B ↔ A need not be afunction. Consider, for example, the function sq : Z → N. Its inverse is the“square root” relation Rsq−1 = {(n, m) ∈ N× Z | m2 = n}. We have

Rsq−1 = {. . . , (9,−3), (4,−2), (1,−1), (0, 0), (1, 1), (4, 2), (9, 3), . . .}

This set of pairs does not satisfy the definition of a function: some naturalnumbers, such as 2, 3, 5, 6, . . ., do not have an integer square root, whereasother natural numbers, such as 1, 4, 9, 16, . . ., have two integer square rootsof opposite signs. Thus, neither the existence nor the uniqueness conditionfrom the definition of a function is satisfied.

Let us go back to the definition of a function f : A → B. Note thatthe domain A and the co-domain B play different, non-symmetric roles: forevery element of the domain, there must be a unique image of the co-domain,but not vice versa. The set of all elements of the co-domain that do have apre-image in the domain (not necessarily a unique one) is called the range ofthe function (see Figure 2). The range of a function f : A → B is denotedf(A). For example, the range of the square function sq : Z → N is the setof all squares.

Many important functions satisfy stronger conditions than just the ex-istence and uniqueness of the image. Here we concentrate on two suchconditions.

A function f : A → B is called surjective, if its range is the wholeco-domain B:

f(A) = B

(see Figure 3). Such a function f is said to map the domain A onto B:f : A � B. An example of a surjective function is the function suit :Cards → {♠,♥,♣,♦}, which maps the finite set of cards in a standard packto the set of four suits. Since there is at least one card of every suit in thepack, function suit is surjective.


�

�

�

�

�

A�

�

�

B

f

Figure 3: A surjective function

�

A

�

�

�

�

B

f

Figure 4: An injective function

A function f : A → B is called injective, if it maps different elements ofthe domain A to different elements of the co-domain B:

∀x, y ∈ A : (f(x) = f(y)) ⇒ (x = y)

(see Figure 4). Such a function f is said to map A to B one-to-one: f : A�B. An example of a injective function is the square function on the set ofnatural numbers: sq |N : N → N. Since every two different natural numbershave different squares, function sq |N is injective. The square function on theset of all integers in not injective, since e.g. sq(−5) = sq(5) = 25.

The concepts of a surjective and an injective functions are in a certainsense complementary: for any pair of sets A, B, there is a surjective functionfrom A to B, if and only if there is an injective function from B to A. Theproof of this statement is left as an exercise.

A function f : A → B is called bijective, if it is both surjective andinjective. For every element of te co-domain B, such a function has a uniquepre-image in the domain A:

∀b ∈ B : ∃!a ∈ A : f(a) = b

(see Figure 5). A bijective function f from A to B is also called a one-to-onecorrespondence between A and B: f : A��B. An example of a bijectivefunction is the function “add five” on the set of all integers:

add5 : Z → Z ∀a ∈ Z : add5 (a) = a + 5


�

�

�

�

A

�

�

�

�

B

f

Figure 5: A bijective function

For every integer b ∈ Z, number b − 5 is the pre-image, therefore functionadd5 is surjective. Adding five to two different integers produces differ-ent results, therefore function add5 is injective. Thus, function add5 is abijective function from the set Z to itself.

A bijective function from any set to itself is called a permutation onthat set. In the previous example, function add5 is a permutation on Z. Aspecial case of a permutation is an involution, which is any bijection thatcoincides with its own inverse. Under an involution, every element of thedomain is either left unchanged, or “swapped” with another element. Anexample of an involution is the function that inverts the sign of an integer:

neg : Z → Z ∀a ∈ Z : neg(a) = −a

Proof of the following properties of functions is left as an exercise:

• composition of two surjective (respectively injective, bijective) func-tions is surjective (injective, bijective);

• the inverse relation of a bijective function is a bijective function.

A special example of a bijection puts the powerset of any given set S inone-to-one correspondence with the set of all possible functions from S to thetwo-element set B = {F, T}. For any subset A ∈ P(S), the correspondingfunction is the indicator function of A, χA : S → B, defined as follows:

∀x ∈ S : χA(x) =

{

T if x ∈ A

F if x 6∈ A

To prove that the mapping χ : A 7→ χA is a bijection between P(A) and theset of functions B(S) = {f | f : S → B} is left as an exercise.

5.2 Set cardinality

Putting two sets in one-to-one correspondence is one of the most basic ac-tivities that can be performed on sets. Intuition tells us that it is possibleif and only if both sets have the same “size”. In fact, the idea of one-to-one correspondence, or bijection, allows us to define precisely what “size”means, even for infinite sets.


We say that two sets A, B are equinumerous (A ∼= B), if there is abijective function f : A��B. For any given set S, we can think of “equinu-merous” as a relation on the subsets of S: R∼= : P(S) ↔ P(S). Since everyset can be put in one-to-one correspondence with itself by the identity func-tion, relation R∼= is reflexive. Since both the inverse of a bijective functionand a composition of two bijective functions are bijective, relation R∼= issymmetric and transitive. Thus, R∼= : P(S) ↔ P(S) is an equivalence rela-tion. Each of its equivalence classes is composed of sets of the same “size”; infact, every such class can be thought of as an abstraction of “set size”, eitherfinite or infinite. In mathematics, these set sizes are called cardinalities.

It is easy for us to get hold of finite cardinalities, since we accepted thenatural numbers as one of our basic concepts. For any n ∈ N, let Nn bedefined as the set of first n natural numbers:

Nn = {x ∈ N | x < n}

Thus, N0 = ∅, N1 = {0}, N2 = {0, 1}, etc. Intuitively, sets Nn are arepresentative collection of what we would like to call finite sets: we define aset to be finite, if it is equinumerous with the set Nn for some n ∈ N. Noneof the sets Nn with different values of n are equinumerous; we accept thisas one of the axiomatic properties of natural numbers. Given this property,it is easy to prove that every finite set is equinumerous with exactly one ofNn.

Theorem 6. For every finite set A, there is a unique n ∈ N, such thatA ∼= Nn.

Proof. Suppose A is equinumerous with Nk and Nl, k, l ∈ N. We have thebijections f : A��Nk and g : A��Nl. Function f−1 ◦ g : Nk��Nl is alsoa bijection (why?) Therefore, sets Nk and Nl are equinumerous. This canonly happen if k = l. �

By the above theorem, every finite set has a uniquely defined naturalnumber as its cardinality. This fact gives some precision to our introductoryremark that natural numbers are an abstraction of finite set “sizes”.

We now turn our attention to cardinalities of infinite sets. A priori, it isnot obvious whether different infinite sets (e.g. N, Neven, N2, N3, Z, P(N))have different cardinalities. We begin our study of infinite cardinalities fromthe set N. We call an infinite set countable, if it is equinumerous with theset of all natural numbers N. Intuitively, such a set can be “counted”, i.e.put in one-to-one correspondence with N.

It may appear at first that by removing elements from N, we can obtaininfinite sets with a cardinality different from that of N. It turns out thatthis is not the case. Let us look at some examples.

Theorem 7. Set N+ = N \ {0} is countable.


Proof. Consider function f : N → N+, which adds one to every naturalnumber: ∀n : f(n) = n + 1.

With respect to function f , every element of N+ has a pre-image:

∀n ∈ N+ : n = (n− 1) + 1 = f(n− 1)

Therefore, function f is surjective.Furthermore, function f maps different elements of N to different ele-

ments of N+:∀m, n ∈ N : (m 6= n) ⇒ (m + 1 6= n + 1)

Therefore, function f is injective.Since f is surjective and injective, f is bijective �

The above proof can be represented graphically as follows:

0 1 2 3 4 5 6 7 · · ·l l l l l l l l1 2 3 4 5 6 7 8 · · ·

Theorem 8. Set Neven = {0, 2, 4, 6, . . . } is countable.

Proof. Consider function f : N → Neven, which doubles every natural num-ber: ∀n : f(n) = 2n.

With respect to function f , every element of Neven has a pre-image:

∀n ∈ Neven : n = 2 · (n/2) = f(n/2)

Therefore, function f is surjective.Furthermore, function f maps different elements of N to different ele-

ments of Neven:∀m, n ∈ N : (m 6= n) ⇒ (2m 6= 2n)

Therefore, function f is injective.Since f is surjective and injective, f is bijective �


0 1 2 3 4 5 6 7 · · ·l l l l l l l l0 2 4 6 8 10 12 14 · · ·

Theorems 7 and 8 suggest that, contrary to the intuition, a “part” (i.e.a proper subset) of an infinite set can be of the same “size” as the whole.In fact, it can be proved that every subset of a countable set is either finiteor countable; in other words, the cardinality of N is the “smallest” amonginfinite cardinalities. As a consequence, for any equivalence relation on a


countable set, the quotient set (i.e. the set of all equivalence classes) iseither finite or countable. This can be shown by selecting an arbitraryrepresentative from every equivalence class. The function that maps everyequivalence class to its representative is a bijection (why?), therefore thequotient set is equinumerous with a subset of the initial set. Since theinitial set is countable, its quotient set must be finite or countable.

It turns out that not only subsets, but also certain supersets of N maybe countable.

Theorem 9. Set Z is countable.

Proof. Consider function f : N → Z, which counts negative integers byeven naturals, and positive integers by odd naturals:

∀n : f(n) =

{

(n + 1)/2 if n odd

−n/2 if n even

Function f is bijective (proof left as an exercise). �


· · · −4 −3 −2 −1 0 1 2 3 4 · · ·l l l l l l l l l

· · · 8 6 4 2 0 1 3 5 7 · · ·

Perhaps taking a Cartesian square or a higher Cartesian power of acountable set will produce a “bigger” set? It turns out that the answer isno.

Theorem 10. Set Z2 is countable.

Proof. We only give the main idea of the proof. The set Z2 can be rep-resented as an infinite two-dimensional table, where the entry in row i andcolumn j corresponds to the pair (i, j), i, j ∈ N. The entries in such a tablecan be counted by diagonals:

0 1 2 3 40 0 1 3 6 101 2 4 7 11 ·2 5 8 12 · ·3 9 13 · · ·4 14 · · · ·

This method gives us a bijection between N and N2; with a little extra effort,the formula for this bijection can be given explicitly (left as an advancedexercise). �


The above theorem implies that any finite Cartesian power of a countableset is countable. For instance,

N3 = (N× N)× N ∼= N× N ∼= N

In our quest for uncountable infinity, we may be tempted to extend theset of natural numbers so that, roughly speaking, we would have an infinityof numbers “everywhere”. More precisely, we may want to consider the setQ of rational numbers, defined as fractions m/n, where m, n ∈ Z, n 6= 0.Two fractions a/b and c/d are considered equal, i.e. representing the samerational number, if a · d = b · c. Therefore, we have an equivalence relationon the set of all integer pairs:

R∼ : Z2 ↔ Z2 (a, b) ∼ (c, d) ⇐⇒ a · d = b · c

Every rational number is defined as an equivalence class of this relation.The whole set of rational numbers is the quotient set Q = Z2/R∼.

In contrast with sets N and Z, the set of rational numbers Q is dense:between any two rational numbers, no matter how close, there is anotherrational number. In fact, in every segment between two rational numbers,no matter how tiny, there is an infinite number of other rational numbers.Intuitively, it feels as if there must be much more rational numbers thanintegers, in order to “fill up all those segments”. However, we already knowthat the set Q must be countable, since it defined as a quotient set of acountable set Z2.

Do uncountable sets exist at all? The answer to this question is givenby Cantor’s theorem: no set can be equinumerous with its own powerset.

Theorem 11. For all sets A, A 6∼= P(A).

Proof. The proof method is called Cantor’s diagonal argument, and is rem-iniscent of Russell’s paradox.

To prove the statement by contradiction, suppose that for some set A,there exists a bijective function f : A��P(A), which puts elements of A inone-to-one correspondence with subsets of A. Consider the set of all elementsof A that are not in their corresponding subsets: D = {a ∈ A | a 6∈ f(a)}.Since D is a subset of A, it must, like all other subsets, have a correspondingelement d, such that f(d) = D.

Consider the statement d ∈ D. Suppose this statement is true. Then dis an element of the set D of all elements that are not in their correspondingsubsets. But the corresponding subset of d is set D itself, therefore, by thedefinition of D, we have d 6∈ D. Hence, the statement d ∈ D cannot be true.

Suppose the statement d ∈ D is false. Then d is not an element of thecorresponding set D. We have a special set for such elements, which happensto be D itself! Therefore, by the definition of D, we have d ∈ D. Hence,the statement d ∈ D cannot be false.


By the laws of logic, d ∈ D must be true or false. As we have shownabove, both cases lead to a contradiction. Therefore, our initial assumptionmust be false, and the bijective function f cannot exist. �

The above theorem implies that the set P(N) is uncountable. Since thepowerset of any set A is equinumerous with the set of all Boolean functionsA → B, the set of functions from N to B = {F, T} is also uncountable. Byreplacing F with 0 and T with 1, we can obtain a simple bijection from thelatter set to the set of all function N → {0, 1}. This set, in its turn, is asubset of the set of all functions N → N, which can be regarded as the set ofall infinite integer sequences, or as the infinite Cartesian product N×N×. . . .Therefore, unlike finite Cartesian products, an infinite Cartesian product ofcountable sets need not be countable.

The fact that the set P(N) is uncountable helps us in analysing thecardinality of yet another numerical set. Consider extending the set of ra-tional numbers Q by “filling in the gaps”. We obtain is the set of realnumbers, which, in addition to rational numbers, contains such numbers as√

2 = 1.414213 . . . and π = 3.141592 . . . . To formalise properly the idea of areal number as a “gap” between rationals, we notice that every real numbercan be approximated by rationals both from below and from above. Forexample, π is approximated from below by rationals

3,31

10,314

100,3141

1000, . . .

and from above by

4,32

10,315

100,3142

1000, . . .

Thus, a real number splits the set of all rationals into two subsets, “below”and “above”. More precisely, a real number is defined as a partitioningQ = Q1 ∪ Q2, such that for all x ∈ Q1, y ∈ Q2, we have x < y. Thesepartitionings are traditionally called Dedekind cuts of Q. Taken together,all possible Dedekind cuts form the set of real numbers R.

Since the real numbers are nothing else than “gaps” between rationals,one might expect that there cannot be more “gaps” than rationals them-selves. Here the intuition fails us once again: unlike Q, the set R is uncount-able. Consider the set of real numbers between 0 and 1. Every such numbercan be represented in the decimal (or binary, or any other positional) sys-tem, which is just another form of approximation by rationals. For example,the number π − 3 = 0.141592 corresponds to the following infinite sequenceof decimal digits:

(1, 4, 1, 5, 9, 2, . . . )

The set of all such sequences includes as a subset the set of all sequencescomposed of numbers 0 and 1. We already know that this set is equinumer-ous with the set of all functions N → B, which is uncountable. Thereforethe whole set R is also uncountable.


In order to obtain larger infinite cardinalities, we can go beyond P(N)by applying Cantor’s theorem several times. The sets N, P(N), P(P(N)),P(P(P(N))), . . . all have different cardinalities. This sequence of cardinal-ities is only a beginning of an enormous tower of infinite cardinalities. Infact, there are so many of them, they do not even form a set. However, inthis course, and in real life, we rarely need any sets bigger than P(N).



6 Induction

Let us take another look at the set of natural numbers:

N = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, . . . }

We have accepted it is a basic concept, and have therefore used it withoutdefinition. However, we have not said much about the axioms that describethe basic properties of the set N. Here are these axioms in a simplified form:

• 0 is a natural number;

• if n is a natural number, then the next number next(n) is also a naturalnumber;

• every natural number can be obtained by applying the above axiomsa finite number of times.

These three axioms can be used to derive all known properties of naturalnumbers. They describe exactly what a natural number is, by giving the“first” natural number, and a method of obtaining new natural numbersfrom old ones. This way of describing a mathematical object is usually calledan inductive definition. It is important to note that it is not a definition inour original sense of the word: it does not reduce a concept (in this case,that of a natural number) to other, more basic concepts. Instead, it reducesinstances of a concept (for example, number 5) to other, more basic instances(5 = next(4)).

Notice that our inductive definition of N is self-referential: it defines anatural number by referring to the concept of natural number itself. Suchself-reference would not be allowed in a standard definition. Since, tech-nically, an inductive definition is a collection of axioms, self-reference isallowed, but care has to be taken to prevent such self-reference from goingin a vicious circle (“a natural number is a natural number”). In fact, thefirst axiom of natural numbers is not self-referential: it provides the base ofinduction, number 0. The second axiom provides the inductive step, the rulefor constructing new natural numbers from objects already known to be anatural number. The base and the inductive step capture everything that isa natural number; however, they still do not allow us to decide whether anyother object is a natural number. Therefore, we need the final axiom, whichprovides the completeness statement. The concept of a natural number isnow settled: we know that 0 ∈ N by the first axiom;

3 = next(2) = next(next(1)) = next(next(next(0))) ∈ N

by the second axiom; The Moon 6∈ N by the final axiom.

In general, every inductive definition will follow the above pattern:


• induction base, giving one or more initial elements of the set beingdefined;

• inductive step, giving one or more rules for obtaining new elements ofthe set from old ones;

• completeness statement.

For example, we can define a queue as follows:

• the empty queue (no people queuing) is a queue;

• if we take an existing queue, and put another person behind the lastperson in the queue, the result is a queue;

• every queue can be obtained by applying the above rules a finite num-ber of times.

Provided that “putting the person behind” is precisely defined, we have anunambiguous inductive definition of a queue.

Perhaps a more useful example is the inductive definition of a Booleanstatement:

• F , T are Boolean statements;

• if A, B are Boolean statements, then ¬A, A ∧ B, A ∨ B, A ⇒ B,A ⇔ B are Boolean statements;

• every Boolean statement can be obtained by applying the above rulesa finite number of times.

Here, there are two initial Boolean statements in the base of induction, andthe inductive step gives us five rules for constructing new statements fromold ones.

Let us return to the general situation where we are given an inductivedefinition for a set S. Suppose we need to prove that all elements of S sharesome common property, given by the predicate P : ∀x ∈ S : P (x). The proofhas to follow the structure of the inductive definition:

• induction base: for each of the initial elements in S, predicate P istrue;

• inductive step: if element x ∈ S is obtained from some “simpler”elements, and P is true for each of these simpler elements, then P hastrue for x;

• by the completeness statement, P must be true for all elements of S.


The inductive step is an implication: assuming that P is true for elementsfrom which x is obtained, we must prove that it logically follows that P (x)must be also true. The assumption that we make in the inductive step iscalled inductive hypothesis. Superficially, it may look as if “we are assumingwhat we are supposed to prove”, but in fact, the inductive step reduces theproof of P (x) to proving P for elements “simpler” than x. Ultimately, thewhole proof hinges on proving the induction base. Thus, both the inductionbase and the inductive step are essential parts of the inductive proof; theproof is not valid without one or the other.

The structure of the inductive proof, applied to the specific case of nat-ural numbers, is as follows:

• induction base: prove P (0);

• inductive step: prove that for all n ∈ N, P (n) ⇒ P (next(n));

• therefore, ∀n ∈ N : P (n).

In this case, the inductive hypothesis is simply the statement P (n).As an example, consider the following proof.

Theorem 12. Any amount of postage beginning from 8p can be paid bypostage stamps of value 3p and 5p.

Proof. Let us denote the amount of postage by n ∈ N, n ≥ 8.Induction base: n = 8 = 3 + 5.Inductive step. Suppose postage n is paid by 3p and 5p stamps. There

may be two cases:

• there is at least one 5p stamp. Replace it by two 3p stamps. Theamount has increased by −5 + (3 + 3) = 1 pence.

• there are only 3p stamps. Since n ≥ 8, there are at least three ofthem. Replace three 3p stamps by two 5p stamps. The amount hasincreased by −(3 + 3 + 3) + (5 + 5) = 1 pence.

In both cases, postage n+1 has been paid by 3p and 5p stamps as required.By induction, the statement is true for any postage. �

The above example may not strike one as a useful mathematical fact.However, many much more useful properties of natural numbers and otherinductively defined sets can be proved by the induction principle. As anexample, consider the following theorem, which we give without proof.

Theorem 13. Every Boolean function Bn → B, n ∈ N, can be expressed bya Boolean statement with n free variables, using only operators ¬, ∧, ∨.

Proof. Induction. �


This theorem justifies our choice of Boolean operators: using just threeof them, we can express every possible Boolean function of n variables.(Exercise: what statement should be the base of induction in the omittedproof?)

In the following chapter, we shall see more examples of induction.


7 Graphs

7.1 Motivating examples

Graphs were invented by Leonhard Euler (1707–1783). Figure 6 shows themap of his native town of Konigsberg (now Kaliningrad), consisting of fourislands connected by seven bridges. The question is: can one, starting fromany point on the map, make a tour of the town, crossing every bridge exactlyonce and returning to the original point.

Figure 7 shows a graph representing this puzzle. The black nodes corre-spond to islands, the white nodes to bridges. A black node is connected to awhite node by an edge, is the corresponding island and bridge are adjacent.The puzzle now becomes: can one, starting from any node, make a tourof the graph, visiting every edge exactly once and returning to the originalnode. The puzzle is greatly simplified by the graph representation. In par-ticular, the exact location of nodes and the shape of edges do not matter;the edges may even cross, as long as we do not count the crossing point asa new node.

It is hardly surprising that we can represent the above problem of ageometric nature by a graph. However, graphs are applicable to a muchlarger class of problems. Consider the following one, which has very littlegeometry in it.

A farmer, who has in his possession a wolf, a goat and a cabbage, wantsto cross a river in a boat. The boat is only big enough for the farmer himself,plus one other item. The wolf cannot be left alone with the goat, or the goatalone with the cabbage. Is it possible to get to the other side of the riversatisfying all these restrictions?

Figure 8 shows a graph representing this puzzle. The nodes correspondto different states of the games, the edges to transitions between states. It isclear that the puzzle has two distinct solutions, which correspond to pathsfrom left to right in the picture. As before, the location of nodes and the

Figure 6: The Konigsberg bridges


�

�

�

�

��

� �

�

��

��

Figure 7: The Konigsberg graph

�

FWGC

��WC

FG �

FWC

G

��C

FWG �FGC

W

��G

FWC

�FG

WC ��

FWGC

��W

FGC

FGW

C

Figure 8: The wolf/goat/cabbage graph

shape or intersections of edges do not matter.

As a third example, consider the following puzzle. There are three housesand three wells. The owner of each of the houses wants to build a path toeach of the wells. The paths must not cross.

Figure 9 shows a graph representing the puzzle. In contrast with theprevious examples, the layout of nodes and edges does matter here. Thelayout shown in Figure 9 is not a solution, since the edge from H3 to W1

intersects with other edges.

!

H1

"

H2

#

H3

$%

W1

&'

W2

()

W3

Figure 9: The houses and wells graph


� 0

�1

�2

�

3

�

4

K(N5)

Figure 10: The complete graph on five nodes

7.2 Graphs as relations

From our first two motivating examples, it is clear that in our definition of agraph, it should only matter which nodes are connected by edges; the layoutand shape of nodes and edges are irrelevant. Therefore, graphs for us are aspecial type of relations.

Let Rp : A ↔ A. We say that relation Rp is

• irreflexive, if no element is related to itself: ∀a ∈ A : ¬(apa)

• symmetric, if every two elements are related in both possible orders,as long as they are related at all: ∀a, b ∈ A : apb ⇒ bpa

Let V be any finite set. We call elements of V nodes. An irreflexive,symmetric relation E = R⇀ : V ↔ V is called a graph on V . The pairs ofnodes that are elements of relation E are called edges. Two nodes that areconnected by an edge are called adjacent. A graph with set of nodes V andset of edges E is usually denoted G = (V, E).

A special case of a graph on the set of nodes V is the empty graph, whichhas no edges: (V, ∅). The other extreme is the complete graph, which containsall possible edges: K(V ) = (V, E), where E = {(u, v) ∈ V 2 | u 6= v}.Figure 10 shows the complete graph K(N5).

When studying the structure of a graph, we usually want to identifygraphs which are “the same up to a renaming of nodes”. This informal ideais captured by the following definition. Graphs G1 = (V1, E1) and G2 =(V2, E2) are called isomorphic, if there is a bijective function f : V1��V2

which preserves the edges:

∀u, v ∈ V1 : (u, v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2

Bijective function f is called the isomorphism between G1 and G2. Figure 11shows three isomorphic graphs with different layouts.

The notion of isomorphism can be useful when the exact set of nodesin the graph is irrelevant. For example, for every n ∈ N, there is, up toisomorphism, just one complete graph on n nodes. We will denote thisgraph by K(n). This can be read as “any graph isomorphic to K(Nn)”.

A graph G = (V, E) is called bipartite (or two-coloured), if the set ofnodes can be partitioned into two disjoint subsets V = V1 ∪ V2, such that


�0

�2 � 1

�3

�

4

�0

�2 � 1

�3

4

0

�

1

�

2

3

�4

Figure 11: Isomorphic graphs

�H1 �H2 �H3

��

W1

��

W2

��

W3

Figure 12: The complete bipartite graph on two sets of three nodes

every edge in E connects two nodes from different subsets. The subsets V1,V2 are called colour classes. From Figures 7, 8, 9 it is clear that the threegraphs introduced in the previous subsection are bipartite, with the colourclasses indicated by black and white colouring of the nodes. The completegraph K(5) in Figure 10 is not bipartite.

The bipartite graph that contains all possible edges between its colourclasses is called a complete bipartite graph: K(V1, V2) = (V1∪V2, (V1×V2)∪(V2 × V1)). Figure 12 shows a “straightened” picture of the “houses andwells” graph, which is the complete bipartite graph K(H, S) on the sets ofhouses H = {H1, H2, H3} and wells W = {W1, W2, W3}. When the exactset of nodes is irrelevant, we will denote the complete bipartite graph byK(m, n), where m, n ∈ Nat are the sizes of the colour classes. This can beread as “any graph isomorphic to K(H, W ) with m houses and n wells”.

The definition of bipartite graphs can be generalised for any fixed numberof colour classes. A graph with k colour classes is called k-partite. Thek-colourability problem consists in determining whether a given graph is k-partite, for a fixed value of k. The 2-colourability problem can be solvedefficiently; however, nobody knows an efficient algorithm for 3-colourability.In fact, deciding if such an algorithm exists amounts to a solution of thefamous “P versus NP” problem. A correct solution can bring the author,apart from worldwide fame, a $1 000 000 prize from Clay MathematicalInstitute. See www.claymath.org for details.


7.3 Graph connectivity

In real-life problems represented by graphs, a graph edge often correspondsto a “move” from one node to another. Edges are playing such a role inour Konigsberg Bridges and wolf/goat/cabbage examples. A logical devel-opment of this idea is to consider a sequence of “moves”, which visits severalnodes in turn. The sequence may or may not be required to return to thestarting node, and repeated visits to the same nodes or edges may or maynot be allowed.

Let G = (V, E) be a graph. We begin by defining an unrestricted se-quence of “moves”, which we call a walk. Formally, a walk is a sequenceof nodes (ub, u1, . . . , uk−1, v), such that every two consecutive nodes in thesequence are connected by an edge:

(u ⇀ u1) ∧ (u1 ⇀ u2) ∧ · · · ∧ (uk−1 ⇀ v)

The statement “nodes u and v are connected by a walk” will be denotedby u # v. Sometimes we will also write u # v to denote a particular walkfrom u to v. The above walk from u# v can be written in a compact form:

u = u0 ⇀ u1 ⇀ u2 ⇀ . . . ⇀ uk−1 ⇀ v0 = v

A walk that returns back the starting node is called a tour.In Figure 13, the following are examples of a walk and a tour:

0 ⇀ 3 ⇀ 1 ⇀ 4 ⇀ 6 ⇀ 3 ⇀ 0 ⇀ 2 ⇀ 5

0 ⇀ 3 ⇀ 1 ⇀ 4 ⇀ 6 ⇀ 3 ⇀ 5 ⇀ 2 ⇀ 0

Nodes u, v in a graph are connected, if there is a walk u # v. A graphis called connected, if every two of its nodes are connected.

We can regard node connectivity as a relation on the set of all nodes ina graph: R# : V ↔ V . It is easy to check that this relation is

�2

�3

�

4

�10

�0 � 5

�1 � 6

�7

8

9

Figure 13: An example graph


• reflexive: ∀u ∈ V : u# u;

• symmetric: ∀u, v ∈ V : (u# v) ⇒ (v # u);

• transitive: ∀u, v, w ∈ V : (u# v) ∧ (v # w) ⇒ (u# w).

Therefore, R# is an equivalence relation on V . The equivalence classes ofR# are called connected components of the graph G. A graph is connected,if and only if it has one connected component.

We can restrict the notion of a walk by forbidding repeated visits to thesame node. A walk where all nodes (and, therefore, all edges) are distinctis called a path. The statement “nodes u and v are connected by a path”will be denoted by u v. Sometimes we will also write u v to denote aparticular path from u to v. The path from u v can be written as:

u = u0 ⇀ u1 ⇀ u2 ⇀ . . . ⇀ uk−1 ⇀ uk = v

∀i, j ∈ Nk+1 : ui 6= uj

A tour with at least three nodes, where all nodes except the starting and thefinal are distinct, is called a cycle. A cycle can be viewed as a path, followedby an edge connecting the end and the beginning of the path: u v ⇀ u.A graph without any cycles is called acyclic.

In Figure 13, the following are examples of a path and a cycle:

0 ⇀ 2 ⇀ 7 ⇀ 10 ⇀ 8 ⇀ 3 ⇀ 5

3 ⇀ 8 ⇀ 10 ⇀ 9 ⇀ 4 ⇀ 6 ⇀ 3

We now have another relation on the set of all nodes in a graph: R :V ↔ V . It is easy to check that this relation is reflexive and symmetric.However, the transitivity is not so obvious: a path u v followed by a pathv w is not necessarily a path from u to w, since some nodes visited beforev may be re-visited after v. It turns out that R is still transitive. In fact,we can prove an even stronger result: R is exactly the same relation asR#.

Theorem 14. Consider a graph G = (V, E). For all u, v ∈ V , there is apath u v, if and only if there is a walk u# v.

Proof. A path u v is also a walk u# v. Therefore, this direction of theimplication is trivial.

The opposite direction of the implication is proved by induction.Induction base. Consider a walk u # u of length 0. This walk is a

sequence (u), which is also a path.Inductive step. Consider a walk u # v, obtained by adding an edge to

a shorter walk: u# w ⇀ v. Since nodes u and w are connected by a walk,by the induction hypothesis they are also connected by a path: u w ⇀ v.There are two possible cases:


�a

� b � c

�d

�

e

�

f

Figure 14: An example graph

• path u w does not visit node v. Then u w ⇀ v is itself a path.

• path u w visits node v: u v w ⇀ v. We now have a pathu v as an initial segment of u w.

In both cases, the existence of a walk u# v implies the existence of a pathu v. �

By the above theorem, the notions of connectivity by walks and by pathscoincide: a graph is connected if and only if every two of its nodes areconnected by a path.

Let us now recall the Konigsberg Bridges problem. It consists in findinga tour that visits every edge in a graph exactly once. Such a tour is calledan Euler tour of the graph.

The graph in Figure 14 has the following Euler tour:

a ⇀ b ⇀ c ⇀ f ⇀ e ⇀ d ⇀ c ⇀ e ⇀ b ⇀ f ⇀ a

It turns out that the Euler tour problem has a simple solution for anygraph G = (V, E). The solution is based on the following definition. Forany node v ∈ V , its degree is the number of nodes adjacent to it: deg(v) =|{u ∈ V | v ⇀ u}|. For example, in Figure 14:

deg(a) = deg(d) = 2

deg(b) = deg(c) = deg(e) = deg(f) = 4

We are now able to describe a simple test for existence of the Euler tourin a graph.

Theorem 15. Consider a graph G = (V, E). Graph G has an Euler tour,if and only if

• G is connected;

• every node in V has even degree.


Proof. If G has an Euler tour, then it is connected, since the Euler tourcontains a walk between any pair of nodes. Consider any node v ∈ V .Suppose node v is visited k times by the Euler tour. On every visit, thetour uses two edges: an incoming and an outgoing edge. Since every edgeadjacent to v is used exactly once, the total number of edges adjacent to vmust be twice the number of visits: deg(v) = 2k. Therefore, deg(v) is even.

The proof of the opposite implication is done in several steps. First, webuild a tour that visits every edge at most once, but may miss some of theedges. We then show that such a tour can be extended to cover all the edges.

Let G = (V, E) be a connected graph, where each node has even degree.Let us fix any starting node u ∈ V . Consider any walk u # v 6= u. Thefinal node of this walk v may have been visited by the walk several times;on each such visit, the walk uses two edges adjacent to v. However, on thefinal visit, only one incoming edge is used. Therefore, the number of visitededges adjacent to v is odd. Since the total number of adjacent edges deg(v)is even, there is at least one unvisited edge adjacent to v. Let us add thisedge to the walk: u # v ⇀ w. If w 6= u, we can repeat the previous step,extending the walk u# w by more edges. Eventually, the walk will returnback to node u.

At this point, we have a tour u# u that visits every edge at most once,but may not visit some of the edges at all. Suppose there are some unvisitededges. We now recall that graph G is connected. Consider all nodes in ourtour u # u. If all these nodes had no adjacent unvisited edges, then therewould be no path connecting every one of them to an unvisited edge, andhence the graph would not be connected. Therefore, some node s in thetour u# s# u has an adjacent unvisited edge s ⇀ t.

Let us now make s the initial node of our tour: s # u # s. The tourstill visits every edge at most once. Let us extend the tour by visiting thepreviously unvisited edge s ⇀ t: s # u # s ⇀ t. As before, the finalnode t has an odd number of adjacent visited edges, but the total number ofadjacent edges deg(t) is even. Therefore, there is an unvisited edge adjacentto t, so we can extend the walk by another edge. As before, we can repeatthis process until the walk returns back to node s. If there still are anyunvisited edges in the graph, we can repeat the whole process once again.Eventually, the walk will return back to the starting node, having visited alledges in the graph. We have constructed an Euler tour of the graph G. �

Even though the above proof is longer that our previous proofs, it is lessformal: we use such phrases as “repeat the whole process” until “eventu-ally” it yields an Euler tour. This proof can be completely formalised usinginduction.

To illustrate the tour-building procedure outlined in the proof, considerthe graph in Figure 14. Let us take a as the starting node, and begin thewalk by moving along the edge a ⇀ b. Node b has now one adjacent visited


edge; since deg(b) is even, it is also guaranteed to have at least one adjacentunvisited edge. In fact, it has three such edges; let us take the edge b ⇀ c.We now have the walk a ⇀ b ⇀ c. Node c, in its turn, is guaranteed to haveat least one adjacent unvisited edge, so we can keep extending the walk.Eventually we will return back to node a. Suppose at this point our walk isthe tour a ⇀ b ⇀ c ⇀ f ⇀ a.

Since the graph is connected, and not all edges have been visited, atleast one node in the current tour must have an adjacent unvisited edge.For instance, let us take node b with the unvisited edge b ⇀ f . We can nowmake b the starting node in our existing tour, and extend the tour by a newedge, making it into a walk:

b ⇀ c ⇀ f ⇀ a ⇀ b ⇀ f

We can keep extending the walk by more edges, until eventually we returnto node b:

b ⇀ c ⇀ f ⇀ a ⇀ b ⇀ f ⇀ e ⇀ d ⇀ c ⇀ e ⇀ b

At this point, all edges have been visited, so our current tour is an Eulertour.

The power of Theorem 15 is in replacing a complex global condition (ex-istence of an Euler tour) by a much simpler global condition (connectivity),plus a number of very simple local conditions (node degrees). By Theo-rem 15, the original Konigsberg graph in Figure 7 has no Euler tour, sincesome nodes (in fact, all nodes representing islands) have odd degree.

Inspired by our success in finding an efficient test for the existence of anEuler tour, we may want to formulate an analogous (and practically moreimportant) problem for cycles. It consists in finding a cycle that visits everynode (but not necessarily every edge) in a graph exactly once. Such a cycleis called a Hamiltonian cycle of the graph.

The graph in Figure 14 has the following Hamiltonian cycle:

a ⇀ b ⇀ e ⇀ d ⇀ c ⇀ f ⇀ a

It turns out that the Hamiltonian cycle problem, despite its similaritywith the Euler tour problem, is hard. In fact, nobody has managed so farto find an efficient test for existence of a Hamiltonian cycle, or to prove thatno such test exists. The status of this problem is very similar to that ofthe colourability problem introduced in the previous subsection. And, likecolourability, the Hamiltonian cycle problem is also worth $1 000 000!

7.4 Trees

We begin this subsection by giving some definitions. Let G = (V, E) andG′ = (V ′, E′) be graphs. Graph G′ is called a subgraph of G (G′ ⊆ G), if


�

0

�1

�2

� 3

�

4G′

�

0

�1

�2

� 3

G

Figure 15: A graph and its subgraph

0

1

�2

� 3

4G′

�

0

�1

�2

� 3

�

4G

Figure 16: A graph and its spanning subgraph

V ′ ⊆ V , E′ ⊆ E (see Figure 15). Graph G′ is called a spanning subgraph ofG (G′ v G), if V ′ = V , E′ ⊆ E (see Figure 16). Every spanning subgraphis also a subgraph, but not necessarily vice versa.

Let us denote the set of all graphs on node set V by G(V ). We can viewthe notions of subgraphs and spanning subgraphs as relations on G(V ):R⊆, Rv : G(V ) ↔ G(V ). It is easy to see that the relations R⊆, Rv arereflexive, antisymmetric, and transitive. Therefore, these two relations arepartial orders on G(V ).

Recall that a graph is called connected, if every two of its nodes areconnected, and acyclic, if is has no cycle as a subgraph. A graph is calleda tree, if it is both connected and acyclic. Figure 17 shows an example of atree.

If a graph is acyclic, but not necessarily connected, then every its con-nected component is a tree. Because of this, the term forest is often used asa synonym of “acyclic graph”.

� �

�

�

�

�

�

�

�

� �

�

Figure 17: A tree


Note that a connected graph stays connected if we add some edges to it.Therefore, a connected graph cannot have “too few” edges. Also note thatan acyclic graph stays acyclic if we remove some edges from it. Therefore,an acyclic graph cannot have “too many” edges. A tree, being both con-nected and acyclic, must therefore have some “middling” number of edges.It turns out that we can specify this number precisely: every tree with agiven number of nodes has the same number of edges.

Theorem 16. Let G = (V, E) be a tree. We have |V | = |E|+ 1.

Proof. Induction.Induction base. Consider graph G with one node and no edges. It is

connected and acyclic, and therefore a tree. We have |E| = 0, |V | = 1 =|E|+ 1.

Inductive step. Let G = (V, E) be any tree with at least one edge u ⇀ v.Let G′ = (V, E \ {(u, v), (v, u)}) be the spanning subgraph of G obtained byremoving the edge u ⇀ v.

Consider the connectivity relation R in the graph G′. A node w ∈ Vcannot be connected both to u and v in G′, otherwise we would have a cycleu w v ⇀ u in G. However, every node w must be connected eitherto u or to v in G′, otherwise graph G would not be connected. Therefore,graph G′ has two connected components with node sets Vu = [u] (all nodesconnected to u) and Vv = [v] (all nodes connected to v). Let us denotethese components by Gu = (Vu, Eu) and Gv = (Vv, Ev).

Both Gu and Gv are connected and acyclic, therefore they are trees. Bythe inductive hypothesis, we have

|Vu| = |Eu|+ 1 |Vv| = |Ev|+ 1

We also have |V | = |Vu| + |Vv| (all nodes in G are nodes in Gu plus nodesin Gv), and |E| = |Eu|+ |Ev|+ 1 (all edges in G are edges in Gu plus edgesin Gv plus the edge u ⇀ v). Therefore,

|V | = |Vu|+ |Vv| = (|Eu|+ 1) + (|Ev|+ 1) =

(|Eu|+ |Ev|+ 1) + 1 = |E|+ 1

�

The above theorem does not simply give us an edge count for trees; wecan draw from it some important conclusions on the structure of a tree. Letus call a node of degree 1 a leaf.

Theorem 17. Every tree with at least one edge has a leaf.

Proof. Let G = (V, E) be a tree with at least one edge. The sum of allnode degrees in any graph is twice the number of edges, since every edge


contributes to the degree of both its ends. Suppose every node in G hasdegree at least 2. Then 2 · |E| ≥ 2 · |V |, therefore |E| ≥ |V |. But since Gis a tree, by the previous theorem |E| = |V | − 1. This is a contradiction, soour assumption must be false, and G has some nodes of degree less than 2.Since G is connected and has at least one edge, it cannot have any nodes ofdegree 0. Therefore, there must be at least one node of degree 1. �

To complete our study of trees, we characterise them in terms of the“spanning subgraph” relation.

Recall that relation Rv is a partial order on the set G(V ) of all graphswith node set V . This partial order has the least element (V, ∅) (the emptygraph), and the largest element K(V ) (the complete graph). Things becomemore interesting if we restrict the relation Rv to the set of all connected, oracyclic, graphs on V .

Theorem 18. Let V be any finite set. Consider the partial order Rv onthe set of all connected graphs on V . A graph G = (V, E) is minimal inthis partial order, if and only if it is a tree.

Proof. Since graph G is connected by the condition of the theorem, we needto prove that G is minimal if and only if it is acyclic. Equivalently, we needto prove that G has a cycle, if and only if it is not minimal in the partialorder.

Suppose that graph G has a cycle. Let u ⇀ v be any edge in the cycle.Remove edge u ⇀ v from the graph. The graph stays connected, since everywalk that passed through the edge u ⇀ v can be redirected by the remainingpath u v. Since the graph stays connected after removing an edge, it isnot minimal.

To prove the opposite implication, suppose that graph G is not minimalconnected. This means that for some u, v ∈ V , removing the edge u ⇀ vdoes not disconnect the graph. It can only happen, if nodes u and v areconnected, apart from the edge u ⇀ v, by some path u v. Therefore,graph G has a cycle u ⇀ v u. �

In short, trees are minimal among connected graphs. This can be viewedas an alternative definition of a tree, not using the word “acyclic”.

Theorem 19. Let V be any finite set. Consider the partial order Rv onthe set of all acyclic graphs on V . A graph G = (V, E) is maximal in thispartial order, if and only if it is a tree.

Proof. Since graph G is acyclic by the condition of the theorem, we needto prove that G is maximal if and only if it is connected. Equivalently, weneed to prove that G is disconnected, if and only if it is not maximal in thepartial order.

Suppose that graph G is disconnected. Let u, v be any two unconnectednodes. Add the edge u ⇀ v to the graph. The graph stays acyclic, since u


and v are not connected by any path apart from the new edge u ⇀ v. Sincethe graph stays acyclic after adding an edge, it is not maximal.

To prove the opposite implication, suppose that graph G is not maximalacyclic. This means that for some u, v ∈ V , adding the edge u ⇀ v doesnot create a cycle. It can only happen, if nodes u and v are unconnected.Therefore, graph G is disconnected. �

In short, trees are maximal among acyclic graphs. This can be viewedas an alternative definition of a tree, not using the word “connected”.

Discrete Mathematics I (CS127) Lecture Notestiskin/teach/dm1/notes.pdfDiscrete Mathematics I (CS127) Lecture Notes Alexander Tiskin University of Warwick Autumn Term 2004/05 This course

Documents