Number Theory Course notes for MA 341, Spring 2018

Number Theory

Course notes for MA 341, Spring 2018

Jared Weinstein

May 2, 2018

Contents

1 Basic properties of the integers 31.1 Definitions: Z and Q . . . . . . . . . . . . . . . . . . . . . . . 31.2 The well-ordering principle . . . . . . . . . . . . . . . . . . . 51.3 The division algorithm . . . . . . . . . . . . . . . . . . . . . . 51.4 Running times . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 The Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . 81.6 The extended Euclidean algorithm . . . . . . . . . . . . . . . 101.7 Exercises due February 2. . . . . . . . . . . . . . . . . . . . . 11

2 The unique factorization theorem 122.1 Factorization into primes . . . . . . . . . . . . . . . . . . . . 122.2 The proof that prime factorization is unique . . . . . . . . . . 132.3 Valuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 The rational root theorem . . . . . . . . . . . . . . . . . . . . 152.5 Pythagorean triples . . . . . . . . . . . . . . . . . . . . . . . . 162.6 Exercises due February 9 . . . . . . . . . . . . . . . . . . . . 17

3 Congruences 173.1 Definition and basic properties . . . . . . . . . . . . . . . . . 173.2 Solving Linear Congruences . . . . . . . . . . . . . . . . . . . 183.3 The Chinese Remainder Theorem . . . . . . . . . . . . . . . . 193.4 Modular Exponentiation . . . . . . . . . . . . . . . . . . . . . 203.5 Exercises due February 16 . . . . . . . . . . . . . . . . . . . . 21

1

4 Units modulo m: Fermat’s theorem and Euler’s theorem 224.1 Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Powers modulo m . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Fermat’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 The φ function . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Euler’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6 Exercises due February 23 . . . . . . . . . . . . . . . . . . . . 27

5 Orders and primitive elements 275.1 Basic properties of the function ordm . . . . . . . . . . . . . . 275.2 Primitive roots . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3 The discrete logarithm . . . . . . . . . . . . . . . . . . . . . . 305.4 Existence of primitive roots for a prime modulus . . . . . . . 305.5 Exercises due March 2 . . . . . . . . . . . . . . . . . . . . . . 32

6 Some cryptographic applications 336.1 The basic problem of cryptography . . . . . . . . . . . . . . . 336.2 Ciphers, keys, and one-time pads . . . . . . . . . . . . . . . . 336.3 Diffie-Hellman key exchange . . . . . . . . . . . . . . . . . . . 346.4 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Quadratic Residues 377.1 Which numbers are squares? . . . . . . . . . . . . . . . . . . 377.2 Euler’s criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 387.3 Exercises due March 16 . . . . . . . . . . . . . . . . . . . . . 40

8 Quadratic Reciprocity 408.1 The Legendre symbol . . . . . . . . . . . . . . . . . . . . . . 408.2 Some reciprocity laws . . . . . . . . . . . . . . . . . . . . . . 418.3 The main quadratic reciprocity law . . . . . . . . . . . . . . . 428.4 The Jacobi symbol . . . . . . . . . . . . . . . . . . . . . . . . 448.5 Exercises due March 23 . . . . . . . . . . . . . . . . . . . . . 45

9 The Gaussian integers 469.1 Motivation and definitions . . . . . . . . . . . . . . . . . . . . 469.2 The division algorithm and the gcd . . . . . . . . . . . . . . . 489.3 Unique factorization in Z[i] . . . . . . . . . . . . . . . . . . . 499.4 The factorization of rational primes in Z[i] . . . . . . . . . . . 499.5 Exercises due March 30 . . . . . . . . . . . . . . . . . . . . . 50

2

10 Unique factorization and its applications 5110.1 Pythagorean triples, revisited . . . . . . . . . . . . . . . . . . 5110.2 A cubic Diophantine equation . . . . . . . . . . . . . . . . . . 5110.3 The system Z[

√−2] . . . . . . . . . . . . . . . . . . . . . . . 52

10.4 Examples of the failure of unique factorization . . . . . . . . 5310.5 The Eisenstein integers . . . . . . . . . . . . . . . . . . . . . . 5410.6 Exercises due April 13 . . . . . . . . . . . . . . . . . . . . . . 56

11 Some analytic number theory 5711.1

∑p 1/p diverges . . . . . . . . . . . . . . . . . . . . . . . . . . 58

11.2 Classes of primes, and their infinitude . . . . . . . . . . . . . 6011.3

∑p≡±1 (mod 4) 1/p diverges . . . . . . . . . . . . . . . . . . . . 61

11.4 Exercises due April 20 . . . . . . . . . . . . . . . . . . . . . . 63

12 Continued fractions and Pell’s equation 6412.1 A closer look at the Euclidean algorithm . . . . . . . . . . . . 6412.2 Continued fractions in the large . . . . . . . . . . . . . . . . . 6712.3 Real quadratic irrationals and their continued fractions . . . 6812.4 Pell’s equation and Z[

√d] . . . . . . . . . . . . . . . . . . . . 70

12.5 The fundamental unit . . . . . . . . . . . . . . . . . . . . . . 7112.6 The question of unique factorization for Z[

√d] . . . . . . . . 73

12.7 Exercises due April 27 . . . . . . . . . . . . . . . . . . . . . . 74

13 Lagrange’s four square theorem 7413.1 Hamiltonian quaternions . . . . . . . . . . . . . . . . . . . . . 7513.2 The Lipschitz quaternions . . . . . . . . . . . . . . . . . . . . 7713.3 The Hurwitz quaternions . . . . . . . . . . . . . . . . . . . . 7813.4 Hurwitz primes . . . . . . . . . . . . . . . . . . . . . . . . . . 8013.5 The end of the proof . . . . . . . . . . . . . . . . . . . . . . . 81

1 Basic properties of the integers

1.1 Definitions: Z and Q

Number theory is the study of the integers:

. . . ,−3,−2,−1, 0, 1, 2, 3, . . .

We use the symbol Z to stand for the set of integers. (Z stands for GermanZahl, meaning number.) Now might be a good time to review some set-theoretic notations: 3 ∈ Z is a true statement, meaning that 3 is a memberof the integers, whereas

√7 6∈ Z.

3

We observe that integers can be added, subtracted, and multiplied toproduce other integers, but the same cannot be said for division. When wedivide integers we create rational numbers, such as 3/7 and −2/3. We writethe set of rational numbers as Q, for quotient.

The failure of integers to divide each other evenly is so important thatwe have special notation to express it: for integers a and b, we write a|b tomean that b/a is an integer. In other words, a|b means that there existsc ∈ Z such that b = ac. In this case we say that a is a divisor of b, and thatb is a multiple of a.

Example 1.1.1. The divisors of 12 are 1,2,3,4,6,12 and their negatives. Adivisor of a positive integer n is proper if it’s positive and not equal to nitself. Thus the proper divisors of 12 are just 1,2,3,4,6.

Example 1.1.2. 1 is a divisor of every integer, as is −1. Also, every integerdivides 0, since 0 = 0 · a for every a. However, the only multiple of 0 is 0itself.

Proposition 1.1.3. Suppose that a, b, c ∈ Z. If a|b and b|c, then a|c.

Proof. There exists integers m,n such that b = am and c = bn. Thenc = amn, so a|c.

The above proposition says that the relation a|b is transitive.

Proposition 1.1.4. Suppose a, b, d, x, y ∈ Z. If d|a and d|b, then d|ax+ by.

We remark that ax+ by is called a linear combination of a and b.

Proof. Write a = dm and b = dn, then ax+by = d(mx+ny), so d|ax+by.

A positive integer is prime if it has no proper divisors other than 1. Byconvention, 1 is not counted as prime.

Theorem 1.1.5 (Euclid). There are infinitely many primes.

Proof. If there we finitely many primes, then we could list all of them asp1, . . . , pn. The number N = p1 · · · pn + 1 is divisible by some prime1,which must be one of our enumerated primes, say pi. Then pi|N but alsopi|p1 · · · pn. Thus pi|(N − p1 · · · pn) = 1, which is absurd.

1Strictly speaking, we don’t know this fact yet, but for now we’ll take it for granted.

4

Therefore we are guaranteed to never run out of primes. As of January2018 the largest known prime is

277,232,917 − 1.

This is a Mersenne prime, meaning a prime which is one less than a powerof two. It is not known if there are infinitely many Mersenne primes.

1.2 The well-ordering principle

How do we know that every integer n > 1 is divisible by a prime? Anargument might go this way: if n isn’t itself prime, then it has a properdivisor n1 > 1. If n1 isn’t prime, then it has a proper divisor n2 > 1, andso on. The result is that we get a strictly decreasing sequence of positiveintegers n > n1 > n2 > . . . , which cannot go on indefinitely. This fact,obvious that it may be, is quite important. We give it a name: The well-ordering principle.

Axiom 1.2.1 (The well-ordering principle). 2 A strictly decreasing sequenceof positive integers cannot go on indefinitely.

Rather than attempt to prove this statement, we take it as an axiom ofthe system of integers.

1.3 The division algorithm

We noted before that the integers are not closed under division. But thereis a familiar operation among integers: you can divide one by another toobtain a quotient and a remainder. For instance, when 39 is divided by 5,the quotient is 7 and the remainder is 4. We can check this by verifyingthat 39 = 5 · 7 + 4. When this is done, the remainder must be less thanthe number you divided by. It would be incorrect to say that 5 goes into 39with a quotient of 6 and a remainder of 9, even though 39 = 5 · 6 + 9 is alsotrue.

Theorem 1.3.1 (The division algorithm). Let a, b ∈ Z, with b > 0. Thereexists a unique pair of integers q, r ∈ Z such that a = bq + r and that0 ≤ r < b.

Of course, if the remainder r is 0, then a = bq and therefore b|a.

2There is another formulation: every nonempty subset of the positive integers has aleast element. The two formulations are equivalent.

5

Proof. We’ll assume that a is positive, the other cases are similar. Considerthe sequence a, a − b, a − 2b, a − 3b, . . . . By the well-ordering principle,these cannot all be nonnegative integers. So there is a least one which isnonnegative, call it r = a− bq. If r > b, then a− b(q+ 1) = r− b > 0, whichcontradicts our assumption that r was the least element of our sequence.Therefore r ≤ b.

That handles the existence part of the theorem. For uniqueness: if therewere another pair q′, r′ such that a = bq+r = bq′+r′, then r−r′ = b(q′−q)would be a multiple of b, but since 0 ≤ r, r′ < b, this can only happen ifr = r′, which implies q = q′ as well.

This proof gives a hint to the “algorithm” part of the division algorithm:to divide 5 into 39, keep subtracting 5 from 39 to get 34, 29, 24, 19, 14, 9,4, at which point we cannot subtract anymore and 4 is the remainder. Onesays that just as multiplication is repeated addition, division is repeatedsubtraction.

I want to introduce an important piece of notation: if r is the remainderwhen b is divided into a, we sometimes write a mod b = r, especially if theremainder is all we care about. You already do this with time: 17 hoursafter 2 o’clock is 19 mod 12 = 7 o’clock. (Or substitute 24 for 12 if youuse that system.) We say that r is the residue of a modulo b. It is alwaysbetween 0 and b− 1 inclusive.

1.4 Running times

Of course in practice when you want to divide larger numbers, like 114 into395623945, you don’t subtract repeatedly at all. Instead you perform an

6

algorithm known as long division, which looks like this:

3470385

114)

395623945342000000

5362394545600000

80239457980000

4394534200

97459120

625570

55

Thus the quotient is 3470385 and the remainder is 55. This may look labo-rious, but you could probably do it by hand in just a few minutes. Contrastthis with the repeated subtraction method. You would have had to subtract114 from 395623945 a total of 3470385 times – even if you could do onesubtraction every second, it would take 40 days!

In our applications to cryptography, it will be important to keep trackof how long it takes for a person (or a computer) to run a particular algo-rithm, in terms of how many basic operations are performed as a functionof how long the inputs are. In the case of our long division problem, therewere 3 + 9 = 12 inputs (the total number of digits in 114 and 395623945).If a basic operation means adding, subtracting, or multiplying individualdigits, then the long division algorithm took dozens of operations, while therepeated subtraction algorithm took millions of operations. One says thatlong division is a polynomial time algorithm, but repeated subtraction isexponential time.

Behind any abstract theorem in number theory there is often an algorith-mic question. For instance, we just saw that every integer n > 1 has a primedivisor. Is there a fast algorithm to find one? One simple method is to trydividing 2, 3, 4, . . . , n−1, n into n to see if any of these are divisors; the firstone that divides n evenly will be prime (why?). Such an algorithm wouldrequire at least n steps. When n has hundreds of digits, this is completelyimpractical.

We can save some time by noting that if we reach√n without finding

any factors, then n must be prime, which limits the number of steps to

7

about√n. That seems like it should help a lot, until you figure that if n

has 200 digits, then√n has about 100. Computers these days are fast, but

no computer out there can execute 10100 steps in any reasonable amount oftime.

1.5 The Euclidean algorithm

Given positive integers a and b, a common divisor is an integer d such thatd|a and d|b. The greatest common divisor (gcd) is of course the greatest ofthese. This comes up in simplifying fractions: to reduce 18/12 you have todivide both numerator and denominator by their gcd, which is 6, to get 3/2.If gcd(a, b) = 1, we say that a and b are relatively prime or coprime.

If a and b are large numbers, how do we compute gcd(a, b)? One wayto be to count down from the smaller of the two numbers, and stop at thefirst one which divides them both. But if the smaller number has 100 digits,then this process will take about 10100 steps, which is far too long.

The Euclidean algorithm is a very efficient way to compute gcd(a, b) with-out having to factor either number. It rests on repeated application of thedivision algorithm (which we already noted runs in polynomial time). It’sbest illustrated by example. Suppose we want gcd(119, 259). We calculate:

259 = 2 · 119 + 21

119 = 5 · 21 + 14

21 = 1 · 14 + 7

14 = 2 · 7 + 0.

Note that in each iteration, the denominator and remainder become thenumerator and denominator in the next step. The last non-zero remainderis 7, which is the gcd we wanted!

The algorithm works because of the following lemma:

Lemma 1.5.1. For integers a, b, q, r with a = bq + r, we have gcd(a, b) =gcd(b, r).

Proof. Let d = gcd(a, b) and e = gcd(b, r). We’ll show that d ≤ e and e ≤ d,which will do the trick.

First let’s show that d ≤ e. Since d divides a and b, it divides r = a− bq,which is a linear combination of a and b. Thus d is a common divisor of band r. Therefore it cannot exceed the greatest common divisor of b and r,which is e.

8

Now let’s show that e ≤ d. Since e divides b and r, it divides a = bq+ r,which is a linear combination of b and r. Thus e is a common divisor of aand b. Therefore it cannot exceed the greatest common divisor of a and b,which is d.

Thus in the example, gcd(259, 119) = gcd(119, 21) = gcd(21, 14) =gcd(14, 7) = gcd(7, 0) = 7.

I should note here that as long as the remainder is nonzero, the algorithmcan continue to produce a smaller remainder. By the well-ordering principle,the remainders cannot decrease forever, and so eventually one arrives at aremainder of 0. Finally, note that gcd(r, 0) = r for any nonzero r.

It turns out that Euclid’s algorithm runs in polynomial time. Computerscan easily compute gcd(a, b) even if a and b have hundreds of digits. To geta sense of why Euclid’s algorithm runs quickly, let us examine the followingworst case scenario, in which we compute gcd(55, 34):

55 = 1 · 34 + 21

34 = 1 · 21 + 13

21 = 1 · 13 + 8

13 = 1 · 8 + 5

8 = 1 · 5 + 3

5 = 1 · 3 + 2

3 = 1 · 2 + 1

2 = 2 · 1 + 0

We computed gcd(55, 34) = 1 in 8 iterations, whereas gcd(259, 119) = 7took only 4. Notice that the quotient was 1 each time we divided (ex-cept the last one), which means that the remainders go down as slowly aspossible. We got this result because we used consecutive numbers in theFibonacci sequence 1, 1, 2, 3, 5, 8, . . . , in which each number is the sum of thetwo previous numbers. As a result, computing gcd(a, b) can be done in atmost n iterations, where the nth number in the Fibonacci sequence is largerthan a and b.

9

1.6 The extended Euclidean algorithm

The integers 49 and 40 are relatively prime, so it’s no surprise that theEuclidean algorithm produces 1:

49 = 1 · 40 + 9

40 = 4 · 9 + 4

9 = 2 · 4 + 1

4 = 4 · 1 + 0

Now look at the sequence of quotients: 1, 4, 2, 4. It turns out that thissequence “encodes” the numbers we started with. Place them in the toprow of a table like so:

1 4 2 4

1 0

0 1

Proceeding from left to right, we fill in the blanks as follows. The first num-ber of the top row is 1. Use the two numbers in the second row immediatelypreceeding this column to make a number like this: 1 · 0 + 1 = 1. Then4 ·1 + 0 = 4, so we put that in the next spot. Filling out everything like thisgives us

1 4 2 4

1 0 1 4 9 40

0 1 1 5 11 49

The final column has 40, 49, which of course are the numbers we startedwith. The second-to-last column has 31, 38. Observe that

49 · 31− 40 · 38 = 1.

This method, called the extended Euclidean algorithm, gives a practicalmeans of finding a solution to the equation

ax+ by = 1

when gcd(a, b) = 1.Now let’s try a = 259 and b = 119, like in our previous example. The

sequence of quotients is 2, 5, 1, 2 and the gcd is 7. The extended Euclideanalgorithm gives us

10

2 5 1 2

1 0 1 5 6 17

0 1 2 11 13 37

The numbers in the last column are 17 = 119/7 and 37 = 259/7. Thatis, we got the numbers we started with, divided out by their gcd. Thesecond-to-last column has 6 and 13, and

37 · 6− 17 · 13 = 1,

and multiplying both sides by 7 gives

259 · 6− 119 · 13 = 7.

Theorem 1.6.1 (Bezout’s identity). Let a and b be positive integers. Thereexist integers x, y such that ax+ by = gcd(a, b).

Proof. If you believe that the extended Euclidean algorithm works, you maybe satisfied already. But here is an independent proof: Among all posi-tive linear combinations ax + by, there is a smallest one, say ax + by = d.Certainly gcd(a, b)|d. Let’s perform the division algorithm with a and d:a = dq + r, with 0 ≤ r < d. Then

r = a− dq = a− (ax+ by)q = a(1− xq)− bqy

is also a linear combination of a and b. Since d was assumed least amongall positive linear combinations, and r < d, the only way this is possible is ifr = 0. Thus d|a. Similarly d|b, which means d ≤ gcd(a, b). Combining thiswith gcd(a, b)|d gives d = gcd(a, b).

1.7 Exercises due February 2.

1. The proper divisors of 6 are 1,2,3. We have 1 + 2 + 3 = 6, meaningthat 6 is a perfect number. Verify that 28 and 496 are also perfect.

2. The ancient Greeks divided integers n into perfect (sum of properdivisors is n), abundant (sum of divisors is > n), and deficient (sumof divisors is < n). Classify each of the numbers 2, 3, . . . , 20 into oneof these three classes.

3. Suppose that p = 2n − 1 is a Mersenne prime. Prove that 2n−1p is aperfect number.

4. Prove that if a, b, c, d ∈ Z and a|b and c|d, then ac|bd.

11

5. Let p1, . . . , pn be distinct primes. How many positive divisors doesp1 · · · pn have?

6. True or false: the rational numbers Q obey the well-ordering principle.Explain your reasoning.

7. What is the remainder when 2100 is divided by 5? (Find a pattern inthe first few powers of 2.)

8. Use the Euclidean algorithm to compute gcd(527, 408) and gcd(1001, 121).

9. Use the extended Euclidean algorithm to find integers x and y suchthat 527x+ 408y = gcd(527, 408).

10. Let a and b be integers. Show that any common divisor of a and bmust divide gcd(a, b).

2 The unique factorization theorem

2.1 Factorization into primes

Lemma 2.1.1. Every positive integer can be expressed as a product ofprimes.

(Even 1 is a product of primes: it is the empty product, so to speak. And17 is a product of primes too, but just one of them. So one must interpretthe lemma to mean “every positive integer can be expressed as a product ofzero or more primes.”)

Proof. Let n ∈ Z be positive. If n = 1, we’re done. Otherwise we can finda prime divisor p1|n. Write n = p1n1, where n1 < n. If n1 = 1, we’redone. Otherwise we can find a prime divisor p2|n1; write n1 = p2n2, withn2 < n1. Continuing, we get a sequence of descending positive integersn > n1 > n2 > . . . , which cannot go on forever. Thus there exists t forwhich nt = 1, and then n = p1p2 · · · pn.

The proof even suggests a sort of algorithm for factoring a number intoprimes: keep dividing out prime factors until you’re completely factored thenumber. For instance,

72 = 2 · 36 = 2 · 2 · 18 = 2 · 2 · 2 · 9 = 2 · 2 · 2 · 3 · 3 = 23 · 32.

12

The process produces the same result no matter how we factor the num-ber. Here’s another way:

72 = 3 · 24 = 3 · 3 · 8 = 3 · 3 · 2 · 4 = 3 · 3 · 2 · 2 · 2 = 23 · 32.

Perhaps this isn’t so surprising. But how do we really know that you get thesame prime factorization no matter what? Could there be a particular num-ber n, possibly with hundreds of digits, which has two prime factorizationsn = p1p2 = q1q2, with all four primes p1, p2, q1, q2 distinct?

2.2 The proof that prime factorization is unique

All will rest upon the following lemma.

Lemma 2.2.1. Let a, b, c ∈ Z, with a|bc and (a, b) = 1. Then a|c.

Proof. Crucially, we use Bezout’s identity (Theorem 1.6.1). There existx, y ∈ Z with ax+ by = 1. Multiplying by c, we get acx+ bcy = c. We havea|bc, so that a|bcy. Obviously a|acx, so a|acx+ bcy = c.

Corollary 2.2.2. Let a, b ∈ Z. If p is a prime number and p|ab, then p|aor p|b.

Proof. We will show that if p - a then p|b. If p - a, then gcd(p, a) = 1, inwhich case the preceeding lemma shows that p|b.

From this it is easy to see that if p divides an arbitrary product then pmust divide one of the factors.

Theorem 2.2.3 (Unique Factorization Theorem). Every positive integercan be written as a product of primes in a unique way, up to ordering.

Proof. If p1 · · · pt = q1 ·qs for primes p1, · · · , pt, q1, · · · , qs, then pt divides theproduct q1 · · · qs, so that it must divide one of the factors. Without loss ofgenerality, pt|qs. But these are primes, so we must have pt = qs. Removingthis factor gives p1 · · · pt−1 = q1 · · · qs−1. Continuing, we are able to matchup each p with a q until no further factors remain.

2.3 Valuations

The Unique Factorization Theorem shows that every n ≥ 1 can be written

n =∏p

pap ,

13

where p runs over primes and ap is a nonnegative integer. It must be the casethat ap = 0 for all but finitely many primes, so that the product can makesense. Since prime factorization is unique, the ap are uniquely determinedby n, and so it makes sense to define

valp(n) = ap,

the valuation of n at p. For instance, 75 = 3 · 52, so val3(75) = 1 andval5(75) = 2, whereas valp(75) = 0 for every other prime p.

You can extend this definition to include negative n as well: valp(−n) =valp(n). You can even extend it to include 0. We set valp(0) =∞. (Why isthis the right definition?)

The function valp obeys the following rules:

valp(mn) = valp(m) + valp(n)

valp(mk) = k valp(m),

which makes it similar to the logarithm to base p.Here are some basic facts about valp:

Theorem 2.3.1. Let a, b ∈ Z.

1. a|b if and only if, for all primes p, valp(a) ≤ valp(b).

2. valp(gcd(a, b)) = min {valp(a), valp(b)} .

3. valp(lcm(a, b)) = max {valp(a), valp(b)} .

4. If a > 0, then a is a perfect kth power if and only if, for all primes p,k| valp(a).

I encourage you think about why these facts are true, and to work withsome examples. For instance, the gcd of 25 · 3 · 54 and 32 · 53 is 3 · 53. Aconsequence of (2) is that gcd(a, b) = 1 if and only if, for all primes p, eithervalp(a) or valp(b) is 0.

Theorem 2.3.2. For a, b ∈ Z positive, gcd(a, b) lcm(a, b) = ab.

Proof. The valp of the left hand side is min {valp(a), valp(b)}+max {valp(a), valp(b)} =valp(a) + valp(b) (why?), which is the same as valp(ab).

Theorem 2.3.3. Let a and b be coprime positive integers. If ab is a perfectsquare, then so are a and b.

Proof. Since ab is a perfect square, valp(ab) = valp(a) + valp(b) is even forall p. Then since one of valp(a) and valp(b) has to be 0, both must be even.This shows by point (4) above that a and b are perfect squares.

14

2.4 The rational root theorem

This is a classic example of proof by contradiction.

Theorem 2.4.1.√

2 is irrational.

Proof. Assume that√

2 is rational. Then√

2 = p/q for positive p, q ∈ Z.Then p2 = 2q2. Since 2|p2, Theorem [?] shows that 2|p; i.e. p is even. Writep = 2p0; then p20 = 2q2. The same reasoning shows that q is even. Writeq = 2q0, and then p20 = 2q20. But this is the original equation! Repeating theprocess gives a descending sequence of positive integers p > p0 > p1 > . . . ,which is impossible.

It may have occurred to you to avoid the use of the well-ordering principlein this proof by arguing as follows: express p/q in lowest terms, show thatp and q are both even, and then draw a contradiction. To do this, though,we need to know that it is possible to expression in lowest terms in the firstplace! This is the point of the following theorem:

Theorem 2.4.2. If gcd(p, q) = d, then gcd(p/d, q/d) = 1.

Then if p/q is a rational number, we can let d = gcd(p, q), and then afterwriting p = dp0 and q = dq0, then gcd(p0, q0) = 1, and p0/q0 is in lowestterms.

Proof. We can write px + qy = d for some integers x and y, and thenp0x+ q0y = 1, which shows that gcd(p0, q0) = 1.

But let’s return to the subject of irrationality. A variation of the aboveproof can be used to show that

√3 and 71/3 are irrational too. These are

examples of algebraic numbers, a class of complex numbers which include

combinations like√

2 +√

3,√

3 +√

7− 2. A number is algebraic if it is theroot of a polynomial with integer coefficients.

Theorem 2.4.3 (Rational Root Theorem). Suppose the polynomial

f(x) = anxn + an−1x

n−1 + · · ·+ a0

has coefficients ai ∈ Z. If p/q is a fraction in lowest terms which is a rootof f(x), then q|an and p|a0.

Proof. The fact that p/q is a root of f(x) means that f(p/q) = 0. Afterclearing away denominators, this becomes

anpn + an−1p

n−1q + · · ·+ a1pqn−1 + a0q

n = 0.

15

Since p divides all terms other than the last one, it divides the last one aswell: p|a0qn. But by Theorem 2.2.2, p|a0 (remember that gcd(p, q) = 1).The proof that q|an is similar.

The Rational Root Theorem gives a method for finding all rational rootsp/q of a polynomial with integer coefficients, since the possibilities for p andq are limited. We can also use the Rational Root Theorem to show

√2 is

irrational in another way.√

2 is a root of x2 − 2. If√

2 = p/q in lowestterms, then p|2 and q|1, which implies that p/q = ±2. But this is nonsense,since

√2 6= ±2! The same proof can be used to show that

√n is irrational

whenever n is not a perfect square.

2.5 Pythagorean triples

A pythagorean triple is a list (a, b, c) of integers which satisfy

a2 + b2 = c2,

so that a, b, c could be the lengths of sides of a right triangle. This is anexample of a Diophantine equation: a polynomial equation meant to besolved for integer variables. This particular Diophantine equation is trulyold, the solution (3, 4, 5) being known to the ancient Egyptians. Otherfamiliar solutions are (5, 12, 13) and (6, 8, 10). The point of this discussionis to find all the Pythagorean triples.

Note that if a prime p divides two of the three numbers, then it dividesthe third (Theorem 2.2.2 again). Let’s call a triple primitive if gcd(a, b, c) =1. Then in a primitive triple, all pairs (a, b), (a, c), (b, c) are coprime as well.It suffices to find all the primitive triples, because any other triplet is just amultiple of a primitive one.

Suppose (a, b, c) is primitive. Then a and b can’t both be even. But theycan’t both be odd either: if a = 2m+1 and b = 2n+1 are odd, then c = 2c0is even, and substituting gives

4m2 + 4m+ 1 + 4n2 + 4n+ 1 = 4c20,

or2(m2 +m+ n2 + n) + 1 = 2c20,

which is impossible. So a and b have opposite parities. Without loss ofgenerality, say a is odd and b is even.

We havea2 = c2 − b2 = (c+ b)(c− b).

16

Since gcd(b, c) = 1, gcd(c+ b, c− b) is 1 or 2 (Exercise 3). But we can ruleout 2, since (c+ b)(c− b) = a2 is odd. Thus (c+ b)(c− b) = a2 is odd, so infact gcd(c+b, c−b) = 1. Now by Theorem 2.3.3, c+b = p2 and c−b = q2 forpositive integers p, q. These have to be odd and relatively prime. Solving,we get c = (p2 + q2)/2, b = (p2 − q2)/2, and a = pq.

Theorem 2.5.1. As p and q run through pairs of odd coprime integers,(pq, (p2 − q2)/2, (p2 + q2)/2) runs through all primitive Pythagorean triples(up to switching the a and b coordinates).

2.6 Exercises due February 9

1. How many (positive) divisors does the number 25 · 37 · 5 · 116 have?

2. Prove that if a, b, c ∈ Z, then gcd(ab, ac) = a gcd(b, c).

3. Prove that if a, b ∈ Z are coprime then gcd(a+ b, a− b) is either 1 or2.

4. Let a, b, c ∈ Z. Prove that if gcd(a, b) = 1, a|c, and b|c, then ab|c.

5. Prove that if ab is a perfect cube and gcd(a, b) = 1, then a and b areboth perfect cubes.

6. Find all rational roots of 3x3 + x2 + x− 2.

7. Prove that√

2 +√

3 is irrational.

8. Show that if a and b are integers and an|bn, then a|b. (There aremultiple ways to do this. One quick way is to use the rational roottheorem!)

9. When the number 30! is written out in base 10, how many zeros areat the end?

10. Is it possible to write 50 as the difference between two perfect squares?

3 Congruences

3.1 Definition and basic properties

Definition 3.1.1. For integers a, b,m, we write a ≡ b (mod m) (pro-nounced: a is congruent to b modulo m) if m|a− b.

17

The notation here suggests that somehow a and b are equal in a funnyway. Indeed you probably already have a notion of taking a number modulo12 (or 24) when you think about the clock: The clock looks the same when100 hours pass as when 4 hours pass, because 100 ≡ 4 mod 12. Or if youthink about numbers as being even or odd: a ≡ b (mod 2) means that aand b have the same parity (they are either both odd or both even).

The notion that a ≡ b (mod m) is a sort of equality can be formalizedby checking the following three properties:

1. (Reflexivity) a ≡ a (mod m).

2. (Symmetry) If a ≡ b (mod m) then b ≡ a (mod m).

3. (Transitivity) If a ≡ b (mod m) and b ≡ c (mod m) then a ≡ c(mod m).

4. If a ≡ b (mod m) then:

a+ c ≡ b+ c (mod m)

a− c ≡ b− c (mod m)

ac ≡ bc (mod m)

The first three properties express the fact that ≡ is an equivalence relation.This means that you can treat the ≡ symbol much like the = symbol, atleast when it comes to substituting equals for equals. The fourth propertymeans that when it comes to congruences you can add, subtract or multiplyby c on both sides and the congruence will remain true.

You should be able to come up with short proofs of the above properties.For instance, here’s a proof of 4(a): If a ≡ b (mod m) it means thatm|a−b =(a+ c)− (b+ c), so a+ c ≡ b+ c (mod m).

3.2 Solving Linear Congruences

The rules we outlined above enable us to solve for x in congruences like

x+ 3 ≡ 1 (mod 10).

Namely, you can subtract 3 from both sides to get x ≡ −2 (mod 10), whichis the same as x ≡ 8 (mod 10). But if the equation is

3x ≡ 2 (mod 10),

18

we cannot “divide by 3” on both sides just yet because “1/3” doesn’t havingany meaning modulo 10 (at least until we give it meaning). We can tryplugging in x = 0, 1, . . . , 9 to see that there is just one solution x ≡ 4(mod 10).

Here’s another example: 2x ≡ 4 (mod 10). There’s the obvious solutionx ≡ 2 (mod 10), but then there’s also x ≡ 7 (mod 10). Those are the onlysolutions modulo 10. You can also say that the complete solution is x ≡ 2(mod 5).

Finally, look at 2x ≡ 3 (mod 10). This time there are no solutions at all!Thus a linear congruence can have zero, one, or more than one solutions.

Theorem 3.2.1. The congruence ax ≡ b (mod m) has a solution if andonly if gcd(a,m)|b. If a solution exists, then it is unique modulo m/ gcd(a,m).In particular if gcd(a,m) = 1 then a solution always exists and is uniquemodulo m.

Proof. Let’s begin with the case that gcd(a,m) = 1. Then there exist x, y ∈Z with aX+mY = 1. But then m|mY = aX−1, so that aX ≡ 1 (mod m).We can multiply this by b to get a(bX) ≡ b (mod m). Therefore x = bX isa solution. If x′ is another solution, then ax ≡ ax′ (mod m), so m|a(x−x′).Since gcd(a,m) = 1, m|x− x′ and so x ≡ x′ (mod m). We have shown thatthe solution is unique in this case.

In the general case, let d = gcd(a,m). The congruence ax ≡ b (mod m)means that m|ax− b. Since d|m and |a, we also have d|b. Thus shows thatif there is a solution we must have d|b.

Supposing then that d|b, let a = da0, b = db0 and m = dm0. Thestatement m|ax − b is equivalent to m0|a0x − b0, or a0x ≡ b0 (mod m0).But now gcd(a0,m0), so this new congruence has a unique solution modulom0.

3.3 The Chinese Remainder Theorem

This section is concerned with solving simultaneous congruences such as

x ≡ 2 (mod 7)

x ≡ 5 (mod 6),

where x needs to satisfy both congruences at the same time. We mightproceed by listing the solutions to the first congruence: 2, 9, 16, 23, . . . andstopping at the first one that satisfies the second, which is 23. Here’s a

19

different one:

x ≡ 2 (mod 8)

x ≡ 3 (mod 10).

This one does not have any solutions, since those x which satisfy the firstcongruence are even, and those satisfying the second congruence must beodd.

First we’ll handle the situation that m and n are coprime.

Theorem 3.3.1. Let m and n be coprime integers. Then the system ofcongruences

x ≡ a (mod m)

x ≡ b (mod n)

has a unique solution modulo mn.

Proof. FIrst we’ll show that a solution exists, and then we’ll show it’s uniquemod mn. Since m and n are coprime, there exist integers y and z such thatmy + nz = 1. Then my ≡ 1 (mod n) and nz ≡ 1 (mod m). So

x = anz + bmy

satisfies x ≡ a (mod m) and x ≡ b (mod n).For uniqueness: if x′ is another solution, then x − x′ ≡ 0 (mod m) and

x− x′ ≡ 0 (mod n). That is, x− x′ is divisible by m and n. Since m and nare relatively prime, x−x′ is divisible by mn, so that x ≡ x′ (mod mn).

The proof suggests a practical solution to the system of congruences: usethe Extended Euclidean Algorithm to find y and z such that my + nz = 1,and then use the formula for x above.

If m and n are not necessarily relatively prime, say d = gcd(m,n), thenthe simultaneous congruence cannot have a solution unless d|a− b.

3.4 Modular Exponentiation

We have already remarked that the division algorithm runs very fast. Theoperation a (mod m) can be computed in polymomial time, so that it isreasonable to compute even if a and m have hundreds of digits.

The same is true for modular exponentiation, meaning the computationof an (mod m). We demonstrate with the example of 3165 (mod 100). That

20

is, we want the last two digits of 3165. Certainly we could compute 3165 andsimply write down the last two digits, but this is impractical when theexponent is very large. Instead, we write the exponent in binary:

165 = 27 + 25 + 22 + 1.

Now the idea is to square the base 7 repeatedly:

3 ≡ 3 (mod 100)

32 ≡ 9

322 ≡ 81

323 ≡ 61

324 ≡ 21

325 ≡ 41

326 ≡ 81

327 ≡ 61

Then

3165 = 327 · 325 · 322 · 3 ≡ 61 · 41 · 81 · 3 ≡ 43 (mod 100).

The number of times you have to square the base is at most then numberof binary digits of the exponent, which is proportional to the number ofdecimal digits. Thus this method can handle exponents which have hundredsof digits. This fact is important for cryptography: it is much easier toexponentiate than it is to do the reverse (extract a root).


For 1–4, if it’s true, prove it, and if it’s false, give a counterexample.

1. True or False: If a ≡ b (mod m) and c ≡ d (mod n) then ac ≡ bd(mod mn).

2. True or False: If a ≡ b (mod m) and c ≡ d (mod m) then ac ≡ bd(mod m).

3. True or False: the only solutions to x2 ≡ 1 (mod n) are x ≡ ±1.

4. True or False: if b ≡ c (mod m), then ab ≡ ac (mod m).

21

5. The multiplicative inverse of a (mod m) is an integer b such that ab ≡1 (mod m). Prove that the multiplicative inverse, if it exists, is uniquemodulo m.

6. Solve 15x ≡ 4 (mod 79).

7. Solve the system of congruences:

z ≡ 1 (mod 50)

z ≡ −1 (mod 71)

8. Compute 3301 (mod 501).

9. Let n ≥ 0 be an integer, and let m = 2n + 1. Show that 22n ≡ 1(mod m).

10. Let (a, b, c) be a Pythagorean triple. Show that 60|abc.

4 Units modulo m: Fermat’s theorem and Euler’stheorem

4.1 Units

For integers a, b and m, we say that b is a (multiplicative) inverse to amodulo m if ab ≡ 1 (mod m). Of course the relation is mutual: if b is aninverse to a, then a is an inverse to b. You have already seen that an inverseis unique if it exists.

Theorem 4.1.1. a has a multiplicative inverse modulo m if and only ifgcd(a,m) = 1.

Proof. This is just a special case of a prior theorem: ax ≡ 1 (mod m) hasa solution if and only if gcd(a,m)|1, which is to say gcd(a,m) = 1.

The most important thing about units is that they can be canceled fromboth sides of a congruence. That is, if a is a unit modulo m, and ax ≡ ay(mod m), then we can multiply both sides be the inverse of a to get x ≡ y(mod m).

Theorem 4.1.2. The set of units modulo m is closed under multiplication.

Proof. If a and b have inverses c and d, then ab is also a unit, since (ab)(cd) =(ac)(bd) ≡ 1 (mod m).

22

Let Um be the set of units modulo m. (This set is also written (Z/mZ)×.)The above theorem means we can creat multiplication tables modulo m, likethis one for m = 10:

1 3 7 9

1 1 3 7 93 3 9 1 77 7 1 9 39 9 7 3 1

Observe that every row and every column contains every unit exactly once.(Sometimes I call this the “sudoku property”.) This reflects the fact thatif a is a unit mod m, then the linear equiation ax ≡ b (mod m) has aunique solution modulo m. Notice also that the table is symmetric about itsdiagonal: this reflects the fact that ab = ba (multiplication is commutative).In abstract algebra we call this sort of structure an abelian group.

Easy and important exercise: Construct a table like this for m = 5,m = 7 and m = 12. Take note of any patterns you observe.

4.2 Powers modulo m

Let a be an integer considered modulo m, and consider the sequence ofpowers

a, a2, a3 (mod m), · · ·

For instance, here are the powers of 2 modulo m for three values of m:

m 21 22 23 24 25 26 27 28 29

15 2 4 8 1 2 4 8 1 2 416 2 4 8 0 0 0 0 0 0 017 2 4 8 16 15 13 9 1 2 4

The first thing we can prove about this is that since there are only finitelymany residues modulo m, and infinitely many possible powers, that we canfind N > n with

aN ≡ an (mod m).

But then, multiplying by a gives an+k+1 ≡ an+1 as well, and so on; we inferthat the sequence an, an+1, . . . , aN−1 (mod m) is the same as the sequenceaN , aN+1, . . . , a2N−n−1. In conclusion, the sequence powers of a modulo mmust eventually enter a repeating cycle.

A special case occurs when a is a unit modulo m. Then we can cancelthe excess powers in aN ≡ an to get aN−n ≡ 1 (mod m). Thus at somepoint in the sequence of powers, 1 appears.

23

Definition 4.2.1. Let a be a unit modulo m. The order of a modulo m,written ordm(a), is the smallest power n such that an ≡ 1 (mod m).

Looking at the table above, ord15(2) = 4 and ord17(2) = 8. We’ll resumethe study of this ord function a bit later.

4.3 Fermat’s theorem

When p is a prime number, Up is the set of all nonzero residues 1, 2, . . . , p−1.Consider the following table listing an modulo 7:

n 1n 2n 3n 4n 5n 6n

1 1 2 3 4 5 62 1 4 2 2 4 13 1 1 6 1 6 64 1 2 4 4 2 15 1 4 5 2 3 66 1 1 1 1 1 1

Strikingly, row 6 has only 1s.

Theorem 4.3.1 (Fermat’s (little) theorem). Let p be a prime number, andlet a be a unit modulo p. Then ap−1 ≡ 1 (mod p).

Somtimes the theorem is stated a slightly different way: ap ≡ a (mod p)for all integers a (not just units). The only non-unit modulo p is 0, andof course 0p ≡ 0, so the two forms are equivalent. We’ll give two proofs ofFermat’s theorem.

#1. This proof is based on the sudoku property of the multiplication tablemodulo p. For a unit a, the ath row of the table reads

a, 2a, 3a, . . . , (p− 1)a (mod p).

But by the sudoku property, this list of residues is just a reordering of

1, 2, 3, . . . , (p− 1).

This means the product of these two lists is the same:

a · 2a · 3a · · · (p− 1)a ≡ 1 · 2 · 3 · · · (p− 1) (mod p)

The residues 1, 2, 3, . . . , (p− 1) are all units, so we can cancel them; what’sleft over is

ap−1 ≡ 1 (mod p).

24

#2. We’re going to prove ap ≡ a (mod p) for all a = 1, 2, . . . by induction3.The base case 1p ≡ 1 (mod p) is trivial. Now, assuming np ≡ n, we use thebinomial theorem:

(n+ 1)p = np +

(p1

)np−1 +

(p2

)np−2 + · · ·

(p1

)n+ 1.

The binomial coefficients are(pk

)=

p!

k!(p− k)!∈ Z

If k = 1, . . . , p− 1, then neither k! nor (p−k)! is divisible by p (by Theorem

2.2.2!), but p does divide p! =

(pk

)k!(p − k)!, so (Theorem 2.2.2 again!)

p|(pk

). Therefore (n + 1)p ≡ np + 1 (mod p), so that by the inductive

hypothesis (n+ 1)p ≡ n+ 1. We win by induction.

4.4 The φ function

Definition 4.4.1. For an integer m, φ(m) is the number of units modulom. In order words, it is the number of integers among 1, 2, . . . ,m which arerelatively prime with m. This function is sometimes called Euler’s totientfunction.

The first few values of φ(m) are

3The principal of mathematical induction is a way of proving a proposition P (n) for alln = 1, 2, . . . . It says that if P (1) is true, and if the implication P (n) =⇒ P (n+1) is truefor any n ≥ 1, then P (n) is true for all n. But we don’t need to assume this as an axiom;it follows from the well-ordering principle! Indeed, if there were some n for which P (n)were false, then by hypothesis n 6= 1. Also P (n − 1) could not be true, since it impliesP (n). Again by hypothesis, n− 1 6= 1. Continuing, we find a sequence of positive integerswhich descends indefinitely, contradiction.

25

m φ(m)

1 12 13 24 25 46 57 68 49 610 4

The first thing I notice is that φ(m) appears to be even for m ≥ 3. (Thisfollows from the fact that the units come in pairs a and −a.)

But of course we might want a formula for φ(m). One easy special caseis that when p is a prime number, φ(p) = p− 1, since the units are exactly1, 2, . . . , p − 1. Another case is a prime power pn: among the numbers1, 2, . . . , pn, the only non-units modulo pn are those numbers divisible by p,so that φ(pn) = pn − pn−1.

Theorem 4.4.2. For m and n relatively prime, φ(mn) = φ(m)φ(n).

Proof. (This is just a sketch.) We apply the Chinese remainder theorem.Each unit a modulo mn can be reduced modulo m and then modulo n, tocreate a function Umn → Um × Un. The Chinese remainder theorem showsthat this function is one-to-one and onto, so that φ(mn) = φ(m)φ(n).

By combining together what we know so far about φ, we get the followingformula.

Theorem 4.4.3. If pa11 · · · parr is the prime factorization of n, then

φ(n) =∏i

(pai − pai−1).

Note that this requires knowing the prime factorization of n. As far aswe know there is no shortcut to finding φ(n) without knowing the primefactorization. Therefore if n has hundreds of digits, φ(n) is very difficult tocompute.

4.5 Euler’s theorem

Fermat’s theorem has an extension to general moduli m. In fact we can justadapt proof #1 of Fermat’s theorem to obtain Euler’s theorem:

26

Theorem 4.5.1. Let a be a unit modulo m. Then aφ(m) = 1 (mod m).


1. Compute 23506 (mod 101).

2. Compute 23111

(mod 47).

3. Compute φ(75000).

4. Compute 51000 (mod 18).

5. Prove that if p is prime, and x2 ≡ 1 (mod p), then x ≡ ±1 (mod p).

6. Prove that if p is an odd prime, and a is a unit mod p, then a(p−1)/2 ≡±1 (mod p).

7. How many solutions are there to x2 ≡ 1 (mod n), where n is a productof r distinct primes?

8. Prove Wilson’s theorem: If p is prime, then (p − 1)! ≡ −1 (mod p).Strategy: each a = 1, . . . , p−1 has a multiplicative inverse b, and thena and b are distinct unless a = ±1.

9. Fermat’s theorem suggests the following test for primality: if a is a unitmod m, and am−1 6≡ 1 (mod m), then m cannot be prime. Compute2118 (mod 119), and use this method to show that 119 is composite.

10. Unfortunately, this method is not foolproof. The number 561 is com-posite: 561 = 3 ·11 ·17. Nevertheless, show that for all units a modulo561, a560 ≡ 1 (mod 561).

5 Orders and primitive elements

5.1 Basic properties of the function ordm

Let a be a unit modulo m. Recall that aordm(a) ≡ 1 (mod m), and an 6≡ 1(mod m) for any integer 1 ≤ n < ordm(a). Thus if we do find a positiveinteger n with an ≡ 1 (mod m), we can conclude that ordm(a) ≤ n. In facta little more is true:

Theorem 5.1.1. Suppose that an ≡ 1 (mod m). Then ordm(a)|n.

27

Proof. By the division algorithm, we can write n = q ordm(a) + r, where0 ≤ r < ordm(a). Then

1 ≡ an ≡ (aordm(a))qar ≡ 1qar ≡ ar (mod m).

If r 6= 0, we get a contradiction, since r < ordm(a). Thus r = 0 andn = q ordm(a).

Here’s an important corollary. By Euler’s theorem, aφ(m) ≡ 1 (mod m),and therefore

ordm(a)|φ(m). (5.1.1)

This is a strong restriction on what ordm(a) could possibly be. It meansthat if we are interested in finding ordm(a), we don’t need to compute allthe powers a, a2, . . . modulo m, stopping when we reach 1. Instead, we cancompute an for all divisors n of φ(m). The order ordm(a) is the least divisorn for which an ≡ 1 (mod m).

Theorem 5.1.2. For an integer n, ordm(an) = ordm(a)/ gcd(n, ordm(a)).

Proof. We have

(an)ordm(a)

gcd(n,ordm(a)) = (aordm(a))n

gcd(n,ordm(a)) ≡ 1n

gcd(n,ordm(a)) ≡ 1 (mod m),

so that ordm(an) ≤ ordm(a)/ gcd(n, ordm(a)). On the other hand, we have

an ordm(an) = (an)ordm(an) ≡ 1 (mod m).

Therefore by the previous theorem ordm(a)|n ordm(an), so that

ordm(a)

gcd(n, ordm(a))

∣∣∣∣ n

gcd(n, ordm(a))· ordm(an).

By Lemma 2.2.1, ordm(a)/ gcd(n, ordm(a))| ordm(an).

5.2 Primitive roots

We have seen that ordm(a)|φ(m) for every unit a modulo m. Sometimes ithappens that ordm(a) = φ(m). This happens for instance with 3 modulo7. The powers of 3 modulo 7 are 1, 3, 2, 6, 4, 5, 1, . . . . Notice that all unitsmodulo 7 appear in this sequence.

Definition 5.2.1. A unit a is a primitive root modulo m if ordm(a) = φ(m).

28

To determine whether a is a primitive root, you can calculate aφ(m)/p

(mod m) for every prime p which divides φ(m). If none of these residues is1, then a is a primitive root.

Here is a chart of the first few positive integers m and their primitiveroots.

m prim. roots mod m

1 12 13 24 35 2,36 57 3,58 none9 2,510 3,711 2,6,7,812 none

Later we’ll tackle the question of which m have primitive roots. It turnsout that a primitive root exists whenever m is prime.

The following theorem explains the term “primitive root”.

Theorem 5.2.2. Let a be a primitive root modulo m. Then for every unitu modulo m, there exists n ∈ Z such that u ≡ an (mod m). Furthermore, nis unique modulo φ(m).

Thus, every unit can be generated from a primitive root.

Proof. We claim that the residues

1, a, a2, . . . , aφ(m)−1

are all distinct modulo m. Indeed if two of them were the same, say ai ≡aj (mod m) for 0 ≤ i < j < φ(m), then aj−i ≡ 1 (mod m), which is acontradiction because 0 < j − i < φ(m). Also, all of these powers are units.But this list contains φ(m) elements, and that is exactly how many unitsthere are. So the list must contain every unit exactly once.

For uniqueness: if an ≡ an′ ≡ (mod m), then an

′−n ≡ 1 (mod m),so that by Theorem 5.1.1 ordm(a) = φ(m)|n′ − n, meaning that n′ ≡ n(mod φ(m)).

29

Theorem 5.2.3. Suppose a is a primitive root modulo m. Then the full setof primitive roots modulo m is{

an∣∣∣∣ 1 ≤ n ≤ φ(m), gcd(n, φ(m)) = 1

}.

Thus the number of primitive roots modulo m is φ(φ(m)).

Proof. By Theorem 5.2.2, it suffices to say when an is a primitive root. ByTheorem 5.1.2, ordm(an) = φ(m)/ gcd(n, φ(m)). Thus an is a primitive rootif and only if gcd(n, φ(m)) = 1.

5.3 The discrete logarithm

Let m be an integer, and let b be a primitive root modulo m. By Theorem5.2.2, every unit a is a power of b:

a ≡ bk (mod m).

Here the integer k may be considered modulo φ(m). We set

k = logb(a),

and call this the discrete logarithm of a to the base b. For instance, 2 is aprimitive root modulo 11, and 24 ≡ 5 (mod 11), so log2(5) = 4. (You haveto deduce from context that we are referring to the discrete logarithm here,and not the usual one.) The discrete logarithm obeys some of the usualrules that logarithms do, only modulo φ(m):

logb(xy) ≡ logb(x) + logb(y) (mod φ(m))

logb(xn) ≡ n logb(x) (mod φ(m))

Unlike the case of usual logarithms, discrete logarithms are not easy to com-pute. If m has hundreds of digits, one knows that there exists a k that makesbk ≡ a (mod m) true, but finding this k is not at all straightforward. Thereare algorithms to do so, but none that we know so far runs in polynomialtime. Thus, the discrete logarithm is hard to compute.

5.4 Existence of primitive roots for a prime modulus

Here we will address the question of the existence of primitive roots moduloa prime. The proof is a little involved, so we’ll demonstrate the main ideawith an example. Suppose we want to show that there exists a primitive

30

root modulo 59. This means finding a unit of order 58. By (5.1.1), thepossible orders of units all divide 58, so they must be 1, 2, 29 or 58. Theonly element of order 1 is 1, and the only element of order 2 is −1. (This isproved in your exercises from last week – it’s here we use the fact that 59 isprime.) But there are more than 2 units! Therefore there exists an elementof order 29 or 58. If there’s an element of order 58, great; that’s a primitiveroot. Otherwise, suppose x is an element of order 29. What is the order of−x? It must be 29 or 58, since x 6≡ ±1 (mod 59). But (−x)29 = −x29 ≡ −1(mod 59), so that −x must be a primitive root.

In order for the above proof to work, it was important to know thatx2 ≡ 1 (mod 59) could have only two solutions, namely ±1. This is aspecial case of the following theorem:

Theorem 5.4.1. Let f(x) = xn + an−1xn−1 + · · ·+ a0 be a polynomial with

integer coefficients, and let p be a prime. Then f(x) ≡ 0 (mod p) can haveno more than n distinct solutions modulo p.

Proof. The proof will follow from the following fact which is familiar fromalgebra: If f(r) ≡ 0 (mod p), then we can write

f(x) ≡ (x− r)g(x) (mod p)

for some polynomial g(x), whose degree is n − 1. (This is a congruencebetween polynomials – it means that corresponding coefficients on eitherside are congruent.) This is easy to see when r = 0, because if f(0) ≡ 0(mod p) it means that c0 ≡ 0 (mod p), so that f(x) (mod p) is divisible byx. In general, we can substitute: f(x+r) has 0 as a root, so f(x+r) ≡ xh(x),and so (substituting back) f(x) ≡ (x− r)h(x− r).

Now suppose f(x) has n distinct roots r1, · · · , rn modulo p. Then f(x) ≡(x − r1)f2(x). Plugging in x = r2, we get 0 ≡ f(r2) ≡ (r2 − r1)f2(r2).But since r2 6≡ r1, we can use Corollary 2.2.2 to get f2(r2) ≡ 0 (mod p).Thus (x − r2) can be factored out of f2(x): f(x) ≡ (x − r1)(x − r2)f3(x).Continuing, we get

f(x) ≡ (x− r1) · · · (x− rn) (mod p).

(There can be nothing left over, because both sides are degree n with unitleading coefficients.) Again by Corollary 2.2.2, there cannot be a root ofthis other than r1, . . . , rn.

Lemma 5.4.2. Suppose m and n are relatively prime. If ordp(x) = m andordp(y) = n, then ordp(xy) = mn.

31

Proof. Let d = ordp(xy). On the one hand, (xy)mn = (xm)n(yn)m ≡ 1(mod p), so that d|mn. On the other hand, 1 ≡ (xy)md ≡ ymd, so thatby Theorem 5.1.1, n|md, and so (Lemma 2.2.1) n|d. Similarly m|d, and so(since m and n are coprime) mn|d.

Now we return to the problem of finding a primitive root modulo a primep. Suppose φ(p) = p − 1 factors as `n1

1 · · · `ntt . That is, valì(p − 1) = ni

for i = 1, . . . , t. We first claim that for each i there exists a unit u withvalì ordp(u) = ni. Assume otherwise: this would mean that u(p−1)/ì ≡ 1(mod p). But this contradicts Lemma 5.4.1, because it would mean that thepolynomial x(p−1)/ì − 1 has p− 1 roots modulo p.

Therefore there exists, for each i, a unit ui with valì ordp(ui) = ni.

Let vi = uordp(ui)/`

nii

i ; then by Lemma 5.1.2 we have ordp(vi) = `nii . Let

v = v1 · · · vt. By Lemma 5.4.2, ordp(v) = `n11 · · · `

ntt = p − 1, so that v is a

primitive root. We have proved:

Theorem 5.4.3. Let p be a prime. There exists a primitive root modulo p.

Note that the above proof is not constructive! That is, it doesn’t give usan algorithm to find a primitive root modulo p. If p is large, we don’t havea great way of finding a primitive root. I will say however that if we happento know all the prime factors of p− 1, then we can quickly check if a givenunit u is primitive (by testing u(p−1)/` 6≡ 1 for all prime ` dividing p− 1), soone might simply test units 2, 3, · · · until one finds a primitive root.

5.5 Exercises due March 2

These exercises constitute your midterm. You may refer to the notes, butnot to any outside sources, and you must work on your own4.

1. Find integers x, y, z such that

55x+ 35y + 77z = 1.

Please show your method.

2. Let n be an integer. Show that n13 − n is divisible by 2730.

3. True or false: for units a and bmodulom, ordm(ab) = ordm(a) ordm(b).(If true, prove it, if false, give a counterexample.)

4Added Monday Feb. 26: I shouldn’t have to say this, but there are some very realconsequences for handing in work that is not your own on an exam. I won’t hesitate toreport plagiarism or copying to the Dean.

32

4. True or false: If a is a unit modulo m, and ar ≡ as ≡ 1 (mod m), thenagcd(r,s) ≡ 1 (mod m). (If true, prove it, if false, give a counterexam-ple.)

5. True or false: if p is a prime, and a3 ≡ 1 (mod p), then a ≡ 1 (mod p).(If true, prove it, if false, give a counterexample.)

6. Find all primitive roots modulo 17.

7. The decimal expansion of 1/7 is .142857. It repeats with period 6.Find all other integers n such that 1/n has period 6. (You may assumethat n is coprime with 10.)

8. The number p = 216 + 1 is prime. Find ordp(2).

9. Suppose p is a prime, such that p ≡ 1 (mod 4). Let b be a primitiveroot modulo p, and let x = b(p−1)/4. Show that x2 ≡ −1 (mod p).

10. Suppose p is a prime, such that p ≡ 3 (mod 4). Show that x2 ≡ −1(mod p) has no solutions. (Hint: Raise both sides to the power of(p− 1)/2.)

6 Some cryptographic applications

6.1 The basic problem of cryptography

Cryptography is the art of sending messages securely. Cryptographers speakof fictional characters Alice, Bob and Eve. Alice and Bob are far apart,and Alice wants to send Bob a private message. (For instance, Alice couldbe a customer sending her credit card information to Bob’s online store.)If she sends the message directly (via snail mail, courier, wire or e-mail:the medium doesn’t matter!), then Eve the eavesdropper could intercept it,which would be a disaster. So Alice should encrypt her message in some wayand send the coded message, so that Eve would not be able to understandit. But then how is Bob supposed to understand it?

It almost sounds logically impossible for this to work, but in fact it canbe done using some basic number theory.

6.2 Ciphers, keys, and one-time pads

Since we’re going to use mathematics, it makes sense to agree upon a wayto turn the message into a number. This can be accomplished with a simple

33

substitution (01 for A, 02 for B, etc.), or something more sophisticated (likeASCII). We are going to assume that this substitution is known to all parties(Alice, Bob, and also Eve). Thus Alice wants to send a large number M(perhaps in the hundreds of digits) to Bob.

A natural way to do this is a simple substitution cipher: 0 can be replacedwith 5, 1 with 3, 4 with 7, etc. (Or perhaps the cipher can be a little morecomplicated, with a rule for pairs or triples of digits.) Perhaps Alice andBob have met earlier to agree on which cipher to use. But such a cipher isrelatively easy for Eve to crack: the regularities of language make it easyto guess which letter corresponds to which sequences of numbers. (Indeed,sometimes there are puzzles in the newspaper which ask to solve such acipher.)

Another idea is to use a key K. This is a random number with approx-imately the same size as M , which is known to Alice and Bob and no oneelse. To send a secure message, Alice can send C = M + K to Bob, whocan then compute C −K = M . This has the advantage of being virtuallyunbreakable: since K is random, Eve has no way of guessing it and breakingthe code. But it has some major disadvantages too: Alice and Bob wouldhave had to meet in advance to agree on the key K (this is impractical ifAlice is a customer at Bob’s online store!), and they both need to keep Ksecure as they travel. Not only that, but the key should only be used once:if Alice wants to send another message M ′, she sends C ′ = M ′ + K. ThenEve, who has intercepted both C and C ′, can compute C − C ′ = M −M ′,the difference between the two messages–too risky.

This last problem can be overcome if Alice and Bob share a one-timepad: a whole collection of keys K1,K2, . . . , all random and unrelated to oneanother, so that Alice can send Bob as many messages as there are keys.But this still has the problem that Alice and Bob need to agree on thesekeys in a secure location, which is often impractical.

6.3 Diffie-Hellman key exchange

Remarkably, there is a way for Alice and Bob to agree on a key K with-out ever meeting, in such away that Eve cannot determine K even if sheintercepts all communications.

As a warm-up, here’s a riddle: Suppose Alice is sending a suitcase toBob containing sensitive material. Both Alice and Bob own padlocks thatcan lock the suitcase, but the padlocks have different keys. How can Alicesecurely send Bob the suitcase?

Here’s the solution: Alice locks the suitcase with her lock and sends it

34

to Bob. Bob receives it and places his own padlock on it, and sends it backto Alice with both locks. Alice then removes her own lock and sends it athird time to Bob, who removes his own padlock and opens the suitcase.

In Diffie-Hellman key exchange, the idea of the riddle is combined withnumber theory. Alice chooses a large prime p, at least in the hundreds ofdigits and certainly larger than her message M . By Theorem 5.4.3, thereexists a primitive root g modulo p. Alice finds one and makes both g and ppublic. (There is the good question of how quickly one can find a primitiveroot; we won’t be so concerned with this. If the factorization of p − 1 isknown, it is easy to check that a particular unit is a primitive root; so onecan guess and check until a primitive root is found.)

Alice and Bob both choose secret numbers a and b, respectively. Theseshould be very large but still less than p. They should also be relativelyprime to p−1. Alice calculates A = ga (mod p), and B computes gb (mod p)(remember that modular exponentiation runs in polynomial time, so this isreasonable for them to do). The next steps are:

1. Alice sends A to Bob.

2. Bob sends B to Alice.

3. Alice computes Ba (mod p).

4. Bob computes Ab (mod p).

In fact Alice and Bob have computed the same quantity, since

Ba ≡ (gb)a ≡ (ga)b ≡ Ab (mod p).

Call this common value K. Then K is the key that Alice and Bob canuse to encode messages between each other. The whole process is calledDiffie-Hellman key exchange.

Why is it secure? Let’s say Eve wants to spy on Alice and Bob. Sheknows the prime p and its primitive root g, because these are public. Sheintercepts A and B. Can she use them to compute K in a reasonable amountof time?

It is believed that the answer is no. The Diffie-Hellman problem is:Given ga and gb modulo p, compute gab modulo p. This is what Eve has tosolve to get the private key K. Note the relationship with the problem ofcomputing discrete logarithms. If Eve has a magical discrete-log calculator,she can compute a = logg A and b = logg B and then easily get gab (mod p).But as far as we know there is no rapid way to compute discrete logarithms,and also no way to solve the Diffie-Hellman problem without them.

35

6.4 RSA

The RSA algorithm is another number-theory based encryption method. Itallows Alice to directly encrypt her message to Bob. Its security is based onthe difficulty of factoring large integers.

Bob is the intended recipient of secure messages. He chooses two largeprimes p and q, and computes N = pq. Bob publishes N but keeps itsfactorization secret. Bob has access to φ(N) = (p − 1)(q − 1). We remarkthat knowledge of φ(N) is equivalent to knowledge of p and q. Indeed, ifyou know φ(N) = N − (p+ q) + 1, then you know p+ q and pq = N , fromwhich you can solve for p and q.

Bob also chooses a private decryption key d. The number d can be small,but it should not be 1. It should also be relatively prime to φ(N). Secretly,Bob computes the inverse of d modulo φ(N). That is, he finds an integere such that de ≡ 1 (mod φ(N)). This is the public encryption key. Bobpublishes e.

Alice would like to use RSA to send a secure message to Bob. Hermessage takes the form of an integer M which is less than N . (If her messageis longer than N , she can break it up into smaller chunks. Also, if hermessage is particularly short, she should use a simple “padding” process tomake sure that M is almost as large as N .)

Since the encryption key e is public, Alice can use it to compute C = M e

(mod N). This is the encrypted message. Alice sends it to Bob. To decryptthe message, Bob computes Cd (mod m). This works because

Cd ≡ (M e)d ≡M ed ≡M (mod N).

Why is the last congruence true? If M is relatively prime to N , it fol-lows from Euler’s theorem: Since ed ≡ 1 (mod φ(N)), we have M ed ≡ M(mod N). (It’s still true even in the unlikely event that M is divisible by por q–you should figure this out for yourself.)

Now suppose Eve overhears everything. She knows N , e and C = M e

(mod N). To figure out M , she needs to extract an eth root of C modulo N .This is known as the RSA problem. If Eve can factor N , she can computeφ(N) and then use Euclid’s algorithm to compute d (the inverse of e moduloφ(N)), and then compute M the same way that Bob did.

It is believed that solving the RSA problem is very difficult. But thereis no proof that it can’t be done efficiently. For all we know, a criminalmastermind has already solved the problem and therefore can break RSA-based cryptosystems. The only evidence to the contrary is that very smartpeople have tried and failed to solve the RSA problem.

36

7 Quadratic Residues

7.1 Which numbers are squares?

Which numbers are perfect squares? In other words, given n, when does√n make sense? The answer depends very much on what sort of number

system we are working with:

• In the real numbers R, the squares are the nonnegative numbers.

• In the complex numbers C, every number is a square.

• In the integers Z, it is easy to decide whether n is a square. If n < 0it is certainly not. If n > 0, we can use a calculator to compute thereal number

√n; if anything appears past the decimal point, n is not a

square. Thus, deciding whether n is a perfect square is a polynomial-time algorithm.

• In the rational numbers Q, a positive reduced fraction p/q is a squareif and only if both p and q are.

Much less obvious is the question of perfect squares in Z/mZ. That is,given an integer a, we could like to know if there is a solution to

x2 ≡ a (mod m).

(This is the natural progression of things: we have already solved linearcongruences modulo m, and now we are moving on to degree 2 equations.)If a solution exists, we call a a quadratic residue modulo m; otherwise it isa quadratic nonresidue. (These terms are due to Gauss.)

For instance, 10 is a square modulo 13 because 72 ≡ 10 (mod 13). Is 2a square modulo 13? We can answer the question using a chart like this:

37

x x2 (mod 13)

0 01 12 43 94 35 126 107 108 129 310 911 412 1

Since 2 doesn’t appear on the second column, it is a quadratic nonresiduemodulo 13. Note that the second column is palindromic (ignoring the initialzero), because (−x)2 = x2. So to answer the question of whether 2 was aquadratic residue, it was only really necessary to compute the squares of0, 1, . . . , 6.

This method is horribly inefficient for large values of m. It takes m/2steps to decide if a is a quadratic residue modulo m this way, which isunacceptable.

7.2 Euler’s criterion

If the modulus is a prime number p, there is a far better way to decide if ais a quadratic residue.

Theorem 7.2.1 (Euler’s criterion). Let p be an odd prime. Suppose that ais a unit modulo p. Then a is a quadratic residue if and only if

a(p−1)/2 ≡ 1 (mod p).

Proof. If a ≡ x2 (mod p), then

a(p−1)/2 ≡ (x2)(p−1)/2 ≡ xp−1 ≡ 1 (mod p)

by Fermat’s theorem.Conversely, suppose a(p−1)/2 ≡ 1 (mod p). By Theorem 5.4.3, there

exists a primitive root g modulo p; let us write a ≡ gk (mod p). Then

1 ≡ a(p−1)/2 ≡ gk(p−1)/2 (mod p).

38

Since ordp(g) = p− 1, Theorem 5.1.1 implies that (p− 1)|k(p− 1)/2. Can-celling the integer (p − 1)/2 from both sides gives us 2|k, so that k = 2`.Therefore a ≡ gk ≡ (g`)2 (mod p) is a quadratic residue.

Theorem 7.2.2. Let p be an odd prime. There are exactly (p + 1)/2quadratic residues modulo p. (Since 0 is obviously a quadratic residue, thisis the same as saying that there are exactly (p − 1)/2 quadratic residueswhich are units.)

Proof. We have already observed that the complete list of unit quadraticresidues is

12, 22, . . . , ((p− 1)/2)2 (mod p).

We are done if we can show that the members of this list are distinct.Suppose 1 ≤ x, y ≤ (p − 1)/2 and x2 ≡ y2 (mod p). Then p|x2 − y2 =(x− y)(x+ y), so that (Lemma 2.2.1) p|(x− y) or p|(x+ y), which is to sayx ≡ ±y (mod p). Since x, y belong to the range 1, . . . , (p− 1)/2, x ≡ −y isimpossible, so that x ≡ y (mod p).

Euler’s criterion gives a polynomial time algorithm for deciding whethera unit a is a quadratic residue modulo an odd prime p. However, Euler’scriterion does not tell us how to find a solution to x2 ≡ a (mod p). This isa harder problem.

The following theorem is another interpretation of the problem in termsof discrete logarithms.

Theorem 7.2.3. Let p be an odd prime and let a be a unit modulo p. Letg be a primitive root modulo p. Then a is a quadratic residue modulo p ifand only if logg(a) is even.

Proof. Let k = logg(a), so that a ≡ gk (mod p). If k is even, then a is obvi-ously a quadratic residue. Conversely if a ≡ x2, then logg(a) ≡ logg(x

2) ≡2 logg(x) (mod p− 1). Since p− 1 is even, this implies that logg(a) is evenas well.

We remark that the quadratic residues modulo p are 0 together with

1, g2, g4, . . . , gp−3.

A special case is a = −1. When is −1 a quadratic residue modulo p?Informally, we are asking whether the imaginary number i exists modulo p.

Theorem 7.2.4. Let p be an odd prime. −1 is a quadratic residue modulop if and only if p ≡ 1 (mod 4).

39

Proof. This follows right away from Euler’s criterion, since (−1)(p−1)/2 is 1if and only if p ≡ 1 (mod 4).


(This is assignment #5.)

1. List the quadratic residues modulo 13.

2. How many quadratic residues are there modulo 9, 25, 27? Formulatea conjecture about the number of squares modulo pn, where p is anodd prime and n ≥ 1.

3. The number p = 28 + 1 is prime. Decide if 2 is a quadratic residuemodulo p. Do the same for p = 216 + 1.

4. Let m = p1 · · · pn be a product of distinct odd primes pi. How manyunits modulo m are squares?

5. (2 pts) Let m = p1 · · · pn be a product of distinct odd primes pi. Underwhat conditions does x2 ≡ −1 (mod m) have a solution? How manysolutions are there?

6. (2 pts) Let p be an odd prime, and let a be an integer. Prove thatthere exists a solution to x2 + y2 ≡ a (mod p).

7. (2 pts.) Let p be an odd prime, and let x = [(p− 1)/2]!. Prove that

x2 ≡ (−1)(p+1)/2 (mod p).

(You will need Wilson’s theorem, (p− 1)! ≡ −1 (mod p).) This givesanother proof that if p ≡ 1 (mod 4), then x2 ≡ −1 (mod p) has asolution.

8 Quadratic Reciprocity

8.1 The Legendre symbol

In the real numbers R, the nonzero squares are exactly the positive numbers,and the nonsquares are exactly the negative numbers. From this we deducethat the product of two nonsquares is a square. This is not at all true in Z,since for instance 2 · · · 3 = 6 is not a square. But this property is recoveredin Z/pZ for an odd prime p:

40

Theorem 8.1.1. Let p be an odd prime. Then in Z/pZ:

1. The product of two nonzero quadratic residues is again a nonzeroquadratic residue.

2. The product of a nonzero residue and a nonresidue is a nonresidue.

3. The product of two nonresidues is a residue.

Proof. Suppose x and y are two units modulo p. Let g be a primitive rootmodulo p. Then logg(xy) ≡ logg(x)+logg(y) (mod p−1). By Theorem 7.2.3,a unit is a residue if and only if its logg is even. Therefore the theorem isreduced to the observation that even plus even is even, even plus odd is odd,and odd plus odd is even.

Definition 8.1.2. Let p be an odd prime, and let a be an integer. TheLegendre symbol is defined as

(a

p

)=

1, a is a unit residue modulo p

−1, a is a nonresidue modulo p

0, p|a.

(Often this symbol is pronounced “a on p”.)Theorem 8.1.1 can now be restated elegantly as follows: for integers a

and b, (ab

p

)=

(a

p

)(b

p

).

Furthermore, by Euler’s criterion we have

a(p−1)/2 ≡(a

p

)(mod p).

8.2 Some reciprocity laws

Let us look for some patterns in the Legendre symbol. The patterns will

take this form: we would like to predict what(ap

)is, based on what p is

modulo some other number. Such a rule is called a reciprocity law.The simplest case is when a = −1, where we have Theorem 7.2.4. This

says that (−1

p

)= (−1)(p−1)/2 =

{1, p ≡ 1 (mod 4)

−1, p ≡ −1 (mod 4).

41

The next case to examine is a = 2. It turns out that that the correctreciprocity law is(

2

p

)= (−1)(p

2−1)/8 =

{1, p ≡ ±1 (mod 8)

−1, p ≡ ±3 (mod 8).

We will not prove this law in its entirety right now; instead we will offer thefollowing partial result.

Theorem 8.2.1. If p ≡ 1 (mod 8), then(2p

)= 1.

Our proof will be based on the following observation about complexnumbers (!). Let z = e2πi/8. This is a primitive 8th root of 1, because z8 =e2πi = 1, but zk 6= 1 for 1 ≤ k < 8. Using Euler’s formula eiθ = cos θ+i sin θ,we find z = (1 + i)/

√2 and z−1 = (1− i)/

√2. Therefore z + z−1 =

√2.

Proof. Let g be a primitive root modulo p. Since p ≡ 1 (mod 8), we mayset z = g(p−1)/8; by Theorem 5.1.2, ordp(z) = 8 and ordp(z

4) = 2; thelatter relation tells us that z4 ≡ −1 and therefore z2 ≡ −z−2 (mod p). Letα = z + z−1. Then

α2 = (z + z−1)2 = z2 + z−2 + 2 ≡ 2 (mod p).

Therefore 2 is a quadratic residue modulo p.

The same reasoning can be used to prove the following reciprocity law:

Theorem 8.2.2. If p ≡ 1 (mod 3), then(−3p

)= 1.

For this, one is inspired by the equation ω + ω−1 =√−3, where ω =

e2πi/3. The reader is invited to check the details.

8.3 The main quadratic reciprocity law

Theorem 8.3.1. Let p and q be distinct odd positive primes. Then(p

q

)(q

p

)= (−1)

p−12

q−12

The symmetry between p and q is the reason Theorem 8.3.1 is called areciprocity law. The right side of the equation is −1 if p ≡ q ≡ 3 (mod 4),and 1 in all other cases. Thus a restatement of Theorem 8.3.1 is the following:

(pq

)=(qp

), p ≡ 1 (mod 4) or q ≡ 1 (mod 4),(

pq

)= −

(qp

), p ≡ q ≡ 3 (mod 4).

42

As an example, since 5 ≡ 1 (mod 4), Theorem 8.3.1 predicts that(5p

)=(p5

)for all positive odd primes p 6= 5. We confirm this for p = 11:

(511

)= 1

(since 5 ≡ 42 (mod 11)), and indeed(115

)=(15

)= 1.

Theorem 8.3.1 is a truly deep result. It was first proved by Gauss around1797. Gauss (and others) would go on to publish many proofs. Later on inthis course, we will present on of Gauss’ proofs.

Theorem 8.3.1 provides a strategy for computing the Legendre symbol.For instance, let’s compute

(91101

). The first step is to factor the “numerator”:

91 = 7 · 13. Therefore(91

101

)=

(7

101

)(13

101

)=

(101

7

)(101

13

)=

(3

7

)(10

13

)=

(3

7

)(2

13

)(5

13

)= −

(3

7

)(5

13

)=

(7

3

)(13

5

)=

(1

3

)(3

5

)=

(5

3

)=

(2

3

)= −1.

Notice the steps involved: factor the numerator(s), apply quadratic reci-procity, reduce the numerator(s) modulo the denominator(s), and then re-peat. If a and p are very large, then this method is actually impractical,because of the factoring step.

43

8.4 The Jacobi symbol

In the example of(

91101

)above, suppose we didn’t know that 91 was com-

posite. We would then proceed to apply quadratic reciprocity directly:(91

101

)=

(101

91

)=

(10

91

)=

(2

91

)(5

91

)= −

(5

91

)= −

(91

5

)= −

(1

5

)= −1.

We arrived at the correct answer regardless!In fact we can justify the above manipulations using an extension of the

Legendre symbol which allows composite (but odd) numbers in the denomi-nator. For a positive odd number P which is the product of primes p1 · · · pt,we define the Jacobi symbol

( aP

)=

t∏i=1

(a

pi

).

Then the Jacobi symbol is multiplicative in both its numerator and denom-inator. Another important observation is that

(aP

)=(bP

)whenever a ≡ b

(mod P ).It turns out that the Jacobi symbol obeys much the same reciprocity

laws as the Legendre symbol.

Theorem 8.4.1. Let P be a positive odd number. The Jacobi symbol hasthe following properties:

1.(−1P

)= (−1)(P−1)/2.

2.(2P

)= (−1)(P

2−1)/8.

3. For another positive odd number Q which is coprime to P , we have(P

Q

)(Q

P

)= (−1)

P−12

Q−12 .

44

We warn the reader that the Jacobi symbol does not predict whethera is a quadratic residue modulo P . For instance,

(−121

)=(−1

3

) (−17

)=

(−1)(−1) = 1, but −1 is not a square modulo 21. The only use of the Jacobisymbol for us is as an intermediate step in calculations for the Legendresymbol. If we use the Jacobi symbol, we no longer have to factor anynumbers (with the exception of factoring out powers of 2, which is easy.)

Executing this algorithm for computing(ap

)is on par with running the

Euclidean algorithm for a and p, which is to say it is very fast indeed.



1. Evaluate the Legendre symbol(3879

).

2. Evaluate the Legendre symbol(

31103

).

3. Let p be a prime such that q = 2p + 1 is also prime. Let a be a unit

modulo q other than ±1. Show that if(aq

)= −1, then a must be a

primitive root modulo q.

4. Let p = 2n + 1 be a Fermat prime with n ≥ 2. (In fact n itself mustbe a power of 2). Prove that 3 is a primitive root modulo p.

5. Use the law of quadratic reciprocity to show that for an odd primep 6= 5: (

5

p

)=

{1, p ≡ ±1 (mod 5)

−1, p ≡ ±2 (mod 5)

6. Use the law of quadratic reciprocity to show that for an odd primep 6= 3: (

3

p

)=

{1, p ≡ ±1 (mod 12)

−1, p ≡ ±5 (mod 12)

7. Let p be an odd prime, and let a be a unit modulo pn which is aquadratic residue. Show that a is also a quadratic residue modulopn+1. (Therefore by induction, if a is a nonzero square modulo p, thenit is a square modulo all powers of p.)

8. Find a solution to the congruence x2 ≡ 14 (mod 53).

45

9. Let p ≡ 1 (mod 3) be a prime. Show that a unit a modulo p is aperfect cube if and only if a(p−1)/3 ≡ 1 (mod p).

10. Let p ≡ 2 (mod 3) be a prime. Show that a unit a modulo p is alwaysa perfect cube.

9 The Gaussian integers

9.1 Motivation and definitions

Here is a list of properties enjoyed by the integers Z:

• They are closed unter addition, subtraction, and multiplication, butnot division.

• There is a division algorithm, which leads to a Euclidean algorithm,which computes the gcd.

• If a and b are coprime then ax+ by = 1 has a solution.

• If a|bc and gcd(a, b) = 1 then a|c.

• In particular, if p is prime and p|ab then p|a or p|b.

• Every nonzero element can be expressed as a product of primes, whichis unique up to rearranging and units.

In this section we will explore an extension of Z to the complex numbers,which turns out to satisfy all of these properties.

Definition 9.1.1. A Gaussian integer is a complex number of the forma+ bi, where a, b ∈ Z. The set of Gaussian integers is denoted Z[i].

Thus, elements of Z[i] lie on a square lattice (where all the squares haveside length 1) in the complex plane. To avoid confusion, we can say thatelements of Z are called rational integers.

It is easy to check that Z[i] ⊂ C is closed under the operations of addi-tion, subtraction, and multiplication. For instance, the relation

(a+ bi)(c+ di) = (ac− bd) + (ad+ bc)i

shows that Z[i] is closed under multiplication.

46

All the same, we see that Z[i] is not closed under division. For instance,

1

1 + 2i=

1− 2i

(1 + 2i)(1− 2i)=

1

5− 2

5i.

For α, β ∈ Z[i], let us write α|β if there exists γ ∈ Z[i] with β = αγ.One major difference between Z and Z[i] is that elements of Z can be

compared with the relation <, whereas it is nonsense to say that α < βfor Gaussian integers α and β. The relation < among integers is quiteimportant, since we need it to apply the well-ordering Principle. To remedythis problem, we introduce the norm

N(a+ bi) = |a+ bi|2 = a2 + b2

Then if α ∈ Z[i], the norm N(α) is a non-negative integer. Crucially, thenorm is multiplicative: N(αβ) = N(α)N(β). Thus if α|β, then N(α)|N(β).

An element α ∈ Z[i] is a unit if α|1, which is to say that the multiplicativeinverse of α lies in Z[i].

Theorem 9.1.2. The units of Z[i] are 1,−1, i,−i.

Proof. Suppose α = a+ bi is a unit. If αβ = 1, then N(α)N(β) = 1. SinceN(α) is a non-negative integer, this is only possible if N(α) = a2 + b2 = 1,which forces α to be one of 1, −1, i or −i.

Definition 9.1.3. Two Gaussian integers α and β are associates if thereexists a unit u such that β = uα.

This is the same as saying that α|β and β|α.

Definition 9.1.4. Let α, β ∈ Z[i]. A common divisor of α and β is aGaussian integer δ with δ|α and δ|β. We write δ = gcd(α, β) if N(δ) ≥ N(δ′)for any other common divisor δ′.

Somewhat confusingly, gcd(α, β) isn’t quite unique. If δ is a gcd of αand β, then so is any associate of δ.

Definition 9.1.5. A nonzero Gaussian integer π is prime if (a) it is not aunit and (b) it is not equal to a product of non-units. Such π are calledGaussian primes.

Again, to avoid confusion, we will refer to a prime in Z as a rationalprime. Let’s observe that a rational prime isn’t necessarily a Gaussian prime:5 is a rational prime, but 5 = (1 + 2i)(1− 2i).

47

How would one verify that a given Gaussian integer is prime? For in-stance, let π = 2 + 3i. If π is not prime, then it factors as π = βγ fornon-units β, γ ∈ Z[i], then N(β)N(γ) = N(π) = 13. Since 13 is a rationalprime, this is only possible if N(β) = 1 or N(γ) = 1, which contradicts thefact that β and γ are non-units. Generalizing: if N(π) is a rational prime,then π is a Gaussian prime.

As another example, let π = 7. If 7 = βγ for non-units β and γ, thenN(7) = 49 = N(β)N(γ), which is only possible if N(β) = N(γ) = 7. Ifβ = a + bi, we get a2 + b2 = 7, which has no solutions in rational integersa, b. Therefore 7 is a Gaussian prime. Generalizing: if p is a rational primewhich is not the sum of two perfect squares, then p is also a Gaussian prime.

We would like to give a classification of Gaussian primes, and also answerthe question of which primes are expressible as a2 + b2, but this will have towait.

9.2 The division algorithm and the gcd

Theorem 9.2.1. Let α, β ∈ Z[i] be Gaussian integers with β 6= 0. Thereexist γ, δ ∈ Z[i] such that α = βγ + δ and N(δ) < N(β).

Proof. Consider the complex number α/β. It falls somewhere within a unitsquare whose vertices are Gaussian integers. The farthest a point in thesquare can be from one of the vertices is 1/

√2 (that is, the distance from

the center to any vertex). Thus there exists γ ∈ Z[i] such that |α/β − γ| ≤1/√

2. Squaring both sides and rearranging gives N(α − βγ) ≤ N(β)/2 <N(β). Now we can let δ = α− βγ.

Note that the quotient and remainder are not necessarily unique!

Theorem 9.2.2. Let α, β be nonzero Gaussian integers. Any common di-visor of α and β divides gcd(α, β). Furthermore, there exist x, y ∈ Z[i] suchthat αx+ βy = gcd(α, β).

Proof. Choose x and y so that δ = αx + βy has the least nonzero norm.Using Theorem 9.2.1, there exists q, r ∈ Z[i] such that α = δq + r andN(r) < N(δ). But then r = α − δq = α(1 − xq) − βqy is also a linearcombination of α and β. This is a contradiction unless r = 0, so that in factδ|α. Similarly, δ|β, so that δ is a common divisor of α and β.

If δ′ is another common divisor, then δ′ divides the linear combinationαx + βy = δ. Thus N(δ′)|N(δ), and in particular N(δ′) ≤ N(δ). Weconclude that δ = gcd(α, β).

This theorem implies that gcd(α, β) is unique up to associates.

48

9.3 Unique factorization in Z[i]

Theorem 9.3.1. Suppose α, β, γ ∈ Z[i] satisfy gcd(α, β) = 1 and α|βγ.Then α|γ.

Proof. The proof is very similar to the proof of Theorem 2.2.1. By Theorem9.2.2, there exist x, y ∈ Z[i] with αx + βy = 1. Multiplying by γ, we getαγx + βγy = γ. Since α divides both terms on the left, it divides γ aswell.

Corollary 9.3.2. If π is a Gaussian prime, and π|αβ, then π|α or π|β.

Theorem 9.3.3. Let α ∈ Z[i] be nonzero. Then we may factor α as

α = uπ1 · · ·πn

for a unit u and Gaussian primes πi. This factorization is unique up toreordering the πi and replacing them by associates.

Proof. This is quite the same proof as in Theorem 2.2.3 (but you should stillcheck the details!)

9.4 The factorization of rational primes in Z[i]

We can now tackle the problem of when a rational prime stays prime in Z[i],and when it factors.

Theorem 9.4.1. Let p be a positive rational prime. Then the factorizationof p in Z[i] is as follows:

1. If p = 2, then 2 = −i(1 + i)2.

2. If p ≡ 1 (mod 4), then p = ππ for a Gaussian prime π. In particularp is the sum of two perfect squares.

3. If p ≡ 3 (mod 4) then p is a Gaussian prime.

Proof. The claim about 2 can be checked directly.

Let p ≡ 1 (mod 4). Then since(−1p

)= 1, there exists an integer x such

that x2 ≡ −1 (mod p). This means that p|(x2 + 1) = (x + i)(x − i). If pwere a Gaussian prime, then Corollary 9.3.2 would apply, so that p|x+ i orp|x − i. But neither can be true, because (x ± i)/p 6∈ Z[i]. Thus p is not aGaussian prime, and so p = ππ′ for non-units π, π′. Taking norms, we get

49

p2 = N(π)N(π′), so that N(π) = p; this implies that π is a Gaussian primeand π′ = π.

Finally, if p ≡ 3 (mod 4), then p is not the sum of two squares, because3 is not the sum of two squares modulo 4. Therefore (as we noted before) pis a Gaussian prime.



1. Let α = 23 − 9i, β = 3 + 2i. Find Gaussian integers γ, δ such thatα = βγ + δ and N(δ) < N(β).

2. Let α = 2 + 3i, β = 4 + i. Find Gaussian integers x, y such thatαx+ βy = 1.

3. Factor into Gaussian primes: 29, 39, 7 + 9i.

4. True or false: if α, β ∈ Z[i] and N(α)|N(β), then α|β.

5. Let a, b ∈ Z, and let α = a+ bi. Show that 1 + i|α if and only if a andb are either both even or both odd.

6. Let a, b ∈ Z be coprime, and let α = a+bi ∈ Z[i]. Show that gcd(α, α)is either 1 or 1 + i.

7. Prove Fermat’s little theorem for Gaussian primes: For a Gaussianprime π and a Gaussian integer α, we have

αNπ ≡ α (mod π).

8. Let π be a Gaussian prime. Show that the only solutions to x4 ≡ 1(mod π) are 1,−1, i,−i.

9. Let n = (10002 + 1)(20002 + 1). Express n as the sum of two perfectsquares. Then do it in a different way, using entirely different squares.

10. Let p be an odd prime. Note that (1 + i)2 = 2i. Also rememberthat p divides the middle binomial coefficients, so that (1 + i)p ≡1+ ip (mod p). Combined these facts to show that in Z[i] we have thefollowing congruence:

1 + ip ≡(

2

p

)ip−12 (1 + i) (mod p).

This can be used to deduce the reciprocity law for(2p

).

50

10 Unique factorization and its applications

Theorem 2.3.3 has the following analogue in Z[i].

Theorem 10.0.1. Let α, β ∈ Z[i] be relatively prime. If αβ is an nth powerin Z[i], then α = uγn for some γ ∈ Z[i] and some unit u (and similarly forβ).

We present two applications of Theorem 10.0.1 to Diophantine equations.

10.1 Pythagorean triples, revisited

We can use unique factorization in Z[i] to come up with a formula whichgenerates Pythagorean triples in a different way than in 2.5.

Let a, b, c be integers satisfying a2 + b2 = c2, with gcd(a, b) = 1. As wealready observed, a and b cannot both be even (since then they wouldn’tbe coprime) and they cannot both be odd (there would arise a contraditionmodulo 4). Without loss of generality we may assume that a is positive andodd, and b is positive and even. Let α = a+ bi ∈ Z[i], so that αα = c2. ByExercises #4 and #5 from 9.5, gcd(α, α) = 1. Therefore by Theorem 10.0.1,α = uγ2 for a unit u. Let’s write γ = p+ qi, so that γ2 = p2 − q2 + 2pqi. Ifthe unit u is 1, we can equate real and imaginary parts to get

a = p2 − q2

b = 2pq

c = p2 + q2

(Other choices of units simply result in permuting a and b and changing theirsigns.) As p and q run through positive relatively prime integers of oppositeparity with p > q, the formulas above produce all primitive Pythagoreantriples (a, b, c) with a positive and odd and b positive and even.

10.2 A cubic Diophantine equation

In this example, we use unique factorization in Z[i] to solve a Diophantineequation in two variables of degree 35

Theorem 10.2.1. The only solution to the Diophantine equation y2 = x3−1is (1, 0).

5The graphs of such equations are called elliptic curves. It is a rich and interestingproblem to find all integral or rational points to an elliptic curve.

51

Proof. Let x and y be integers with y2 = x3 − 1. First we examine theparities of x and y. If x is even, then x3 ≡ 0 (mod 4), but then y is odd, sothat y2 ≡ 1 (mod 4). This leads to a contradiction.

Thus x is odd and y is even. We write the equation as y2 + 1 = x3 andfactor the left side in Z[i]:

(y + i)(y − i) = x3.

Since y is even, y ± i is not divisible by 1 + i (9.5 #5), and then we musthave gcd(y+ i, y− i) = 1 (9.5 #6). Therefore we can apply Theorem 10.0.1to get y + i = uz3 for some z ∈ Z[i]. In fact, since every unit in Z[i] is aperfect cube, we can in fact write y + i = z3. Letting z = a+ bi, we get

y + i = (a+ bi)3

= a3 − 3ab2 + (3a2b− b3)i.

Therefore

y = a3 − 3ab2

1 = 3a2b− b3 = b(3a2 − b2).

The only way for the second equation to be true is if b = 3a2− b2 = ±1. Byinspection we get b = −1 and a = 0, which leads to y = 0 and x = 1.

10.3 The system Z[√−2]

The reader might wonder if this method can be used to solve other Dio-phantine equations, such as the elliptic curve

y2 = x3 − 2. (10.3.1)

This time, x and y must have the same parity. If they are both even, so thatx = 2z and y = 2w, then 2z2 = 4w3 − 1, which is impossible. Therefore xand y are both odd.

After rewriting this as y2 + 2 = x3, we observe that the left side factorsnot in Z[i], but rather in Z[

√−2], this being the set of complex numbers

of the form a + b√−2, where a, b ∈ Z. So let us turn our attention to

Z[√−2]. It is easy to observe that it is closed under addition, subtraction

and multiplication. We can once again define α|β to mean that there existsγ ∈ Z[

√−2] with β = αγ. There is once again the norm function N(a +

b√−2) = a2 + 2b2, and the same logic as in Theorem 9.1.2 shows that the

units of Z[√−2] are just ±1.

52

Returning to the Diophantine equation above, we can rewrite it as

(y +√−2)(y −

√−2) = x3.

If d is a common factor of y +√−2 and y −

√−2, then d divides their

difference, which is 2√−2; it follows that N(d)|8, so that N(d) is a power

of 2. But d must also must divide x3, which means that N(d)|x6; since xis odd, the only possibility is that N(d) = 1, so that d is a unit. Thereforey +√−2 and y −

√−2 are coprime.

Can we apply the same technique we used in (10.2)? There, we relied onTheorem 10.0.1, which in turn relied on the unique factorization propertyfor Z[i] in Theorem 9.3.3, which in turn relied on the division algorithm forZ[i] in Theorem 9.2.1. So if we can prove the division algorithm for Z[

√−2],

then the same chain of reasoning could apply, and we could conclude thaty +√−2 = z3 for some z ∈ Z[

√−2]. Letting z = a+ b

√−2, we get

y +√−2 = (a+ b

√−2)3

= a3 − 6ab2 + (3a2b− 2b3)√−2

Therefore

y = a3 − 6ab2

1 = 3a2b− 2b3 = b(3a2 − 2b2).

Once again, b = 3a2 − 2b2 = ±1. The case b = −1 leads to an absurdity.Thus b = 1, 3a2− 2 = 1 and a = ±1, which leads to the solutions (3,±5) to(10.3.1). We have thus proved that these are the only solutions.

As for the division algorithm, we have seen that this can be proved usinggeometry. The elements of Z[

√−2] comprise the corners of a tiling of the

plane by rectangles, with side lengths 1 and√

2. The center of such a rect-angle is distance

√3/2 from the corners, and obviously this is the maximum

possible such distance. Since√

3/2 < 1, the division algorithm holds inZ[√−2], and therefore this system has the property of unique factorization

into primes.

10.4 Examples of the failure of unique factorization

It is easy to see how this circle of ideas might break down. Consider Z[√−3].

This time, the elements of Z[√−3] are the corners of a plane by rectangles

with side lengths 1 and√

3. The center of such a rectangle lies at a distanceexactly 1 from the corners. This leads to the following cascade of failures:

53

• The division algorithm fails: If we try to divide 2 into 1 +√−3, what

should the quotient and remainder be? Since (1 +√−3)/2 lies in the

center of one of these rectangles, the remainder cannot have norm lessthan N(2).

• Bezout’s identity fails: The elements 2 and 1+√−3 share no common

factors. (Any non-unit common factor would have norm 2, which isimpossible because 2 = a2 + 3b2 has no solutions.) Thus if Theorem1.6.1 held for Z[

√−3], we would expect 2x+ (1 +

√−3)y = 1 to have

a solution with x, y ∈ Z[√−3]. In fact such a solution does not exist!

Let β = 1 +√−3, so that N(β) = 4. The norm of 2x+ βy is

(2x+ βy)(2x+ βy) = 4N(x) + 2(xβy + xβy) + 4N(y),

which is always even.

• Lemma 2.2.1 fails: In Z[√−3], we have 2|4 = (1 +

√−3)(1 −

√−3),

yet 2 is coprime with each factor.

• Unique factorization fails: The elements 2 and 1 ±√−3 are prime

(in the sense that they are nonzero elements with no non-unit properdivisors), but 4 = (1 +

√−3)(1−

√−3) = 2 · 2 exhibits 4 as a product

of primes in two truly distinct ways.

• Theorem 2.3.3 fails: The elements 1 +√−3 and 1 −

√−3 share no

common factors. Their product 4 is a square, but neither factor is asquare (even up to units).

It isn’t hard to come up with further examples of such failures: InZ[√−5], for instance, we have the non-unique factorization

6 = 2 · 3 = (1 +√−5)(1−

√−5).

10.5 The Eisenstein integers

As we saw, the failure of Z[√−3] to have a division algorithm had to do

with geometry: there are elements of C which are of distance 1 from anelement of Z[

√−3], namely the very centers of the rectangles. This deficit

can be resolved simply by adding those centers into the system Z[√−3]

itself, producing a larger one.Define the complex number ω = e2π/3. Using some basic facts about

complex numbers, we gather some data on ω:

54

• ω = (−1 +√−3)/2.

• ω3 = e2πi = 1.

• ω2 = ω = −1− ω.

The final equation can be derived efficiently as follows: since ω3− 1 = 0, wecan factor to get (ω − 1)(ω2 + ω + 1) = 0. Since ω 6= 1, the second factormust be 0.

Definition 10.5.1. The Eisenstein integers Z[ω] are the subset of complexnumbers of the form a+ bω, with a, b ∈ Z.

It is clear that Z[ω] is closed under addition and subtraction. It is closedunder multiplication too, because

(a+ bω)(c+ dω) = ac+ (b+ d)ω + bdω2 = (ac− bd) + (b+ d− bd)ω.

The complex plane can now be tiled by rhombuses whose corners are theelements of Z[ω].

The norm of a+ bω is

N(a+ bω) = (a+ bω)(a+ bω2) = a2 − ab+ b2.

The units in Z[ω] are exactly those elements of norm 1, which are±1,±ω,±ω2.Note that they form a regular hexagon in the complex plane.

The division algorithm holds in Z[ω], since the farthest a complex num-ber can be from an element of Z[ω] is < 1. (In fact this maximum distance isthe distance of the center of a unit equilateral triangle to one of its corners,which is 1/

√3.) Therefore:

Theorem 10.5.2. Z[ω] has unique factorization.

Note than in our prior example with Z[√−3], we exhibited two distinct

factorizations of 4: 2 ·2 = (1+√−3)(1−

√−3). In Z[ω], these factorizations

are the same up to units, because 1 +√−3 = −2ω2 and 1−

√−3 = −2ω.

We close this section by proving a case of quadratic reciprocity, madepossible by the Eisenstein integers. Let p 6= 2, 3 be prime. Since ω − ω2 =√−3, we have on the one hand

(ω − ω2)p =√−3

p= (−3)

p−12

√−3 ≡

(−3

p

)√−3 (mod p).

55

On the other hand, since p divides the interior binomial coefficients,

(ω − ω2)p ≡ ωp − ω2p ≡

{√−3, p ≡ 1 (mod 3)

−√−3, p ≡ −1 (mod 3)

(mod p).

Therefore (−3

p

)=

{1, p ≡ 1 (mod 3)

−1, p ≡ −1 (mod 3)=(p

3

).

(Why are we able to cancel√−3 from both sides of the congruence? Since

p and 3 are relatively prime, we can solve 3x+ py = 1 in integers x, y; thisshows that −

√−3x is an inverse of

√−3 modulo p. Another small point is

that we passed from a congruence between Legendre symbols modulo p toan equality; this is because p does not divide 2 in Z[ω].)

10.6 Exercises due April 13


1. The system of congruence classes of Gaussian integers modulo 3 iswritten as Z[i]/3Z[i]. It has nine elements, eight of which are units:±1, ±i, ±1± i. Find the orders of all eight units. How many primitiveroots are there?

2. Let π = a + bi be a Gaussian prime such that Nπ = p is a rationalprime. Show that every α ∈ Z[i] is congruent modulo π to exactly oneof 0, 1, . . . , p−1 modulo π. Thus Z[i]/πZ[i] has p elements. (One saysthat Z[i]/πZ[i] and Z/pZ are isomorphic).

3. Let n = p1 · · · pr, where the pis are distinct primes with pi ≡ 1(mod 4). How many ways can we write n = a2 + b2 for integers aand b? Let us only count n = a2 + b2 and n = c2 + d2 as different if|a| 6= |c| and |a| 6= |d|.

4. For n ≥ 2, the integer 2n is always a sum of two squares. If n = 2kis even, then 2n = (2k)2 + 02, and if n = 2k + 1 is odd, then 2n =(2k)2 + (2k)2. Are there any other ways to write 2n as a2 + b2?

5. For an odd prime p, show that(−2

p

)=

{1, p ≡ 1, 3 (mod 8)

−1, p ≡ 5, 7 (mod 8).

56

6. (2 pts.) For an odd prime p, prove that p = a2 + 2b2 has a solution inintegers a, b if and only if p ≡ 1, 3 (mod 8).

7. (2 pts.) For a prime p 6= 2, 3, prove that p = a2+ab+b2 has a solutionin integers a, b if and only if p ≡ 1 (mod 3).

8. True or false: for a prime p satisfying(−5p

)= 1, the equation p =

a2 + 5b2 has a solution in integers.

11 Some analytic number theory

Analytic number theory is the marriage of number theory to calculus, withthe aim of answering quantitative questions about the former. For instance,we have already seen that there are infinitely many primes, but just howinfinite are they? Plenty of sequences are infinite, such as the odd num-bers, the numbers which are 7 (mod 10), the square numbers, the powersof two, etc. But odd numbers are more common than numbers which are 7(mod 10), which are more common than square numbers, which are muchmore common than powers of two. Where do the primes rank among thislist, and how do we even make such a question precise?

One way of answering such a question is with a counting function. Fora positive real number x, let π(x) be the number of primes p ≥ x. Thusπ(10) = 4 because there are four primes less than 10. Here are the values ofπ(x) for the first few powers of 10:

x π(x) π(x)/x

10 4 .4102 25 .25103 168 .168104 1229 .1229105 9592 .09592

We have listed π(x)/x, which is the ratio of primes among positive integers≤ x. It seems that this ratio decreases, but only very slowly. Contrast thiswith the similar ratio for odd numbers (roughly 1/2), for numbers whichare 7 (mod 10) (1/10), for squares (1/

√x), and for powers of 2 (log2(x)/x).

The first two ratios don’t decrease at all, but the second two decrease to 0fairly quickly.

Mathematicians have studied π(x) for centuries. The following theoremwas conjectured by Gauss in 1793 (when he was a teenager) and proved by

57

Hadamard and de la Vallee Poussin in 1896 using (of all things) complexanalysis.

Theorem 11.0.1 (The prime number theorem). We have π(x) ∼ x/ log x.That is,

limx→∞

π(x)

x/ log x= 1.

Unfortunately we will not be developing the tools to prove this here, butwe can develop some interesting results nonetheless.

11.1∑

p 1/p diverges

Here’s a crude means for testing whether a sequence is “dense” or ”sparse”.The harmonic series is

1 +1

2+

1

3+ . . .

diverges (it can be compared with∫∞1 dx/x, for instance). So if we are given

a subset S ⊂ Z≥1, we can ask whether∑

n∈S 1/n converges or diverges; ifit converges, we can conclude that there “aren’t that many” elements6 of S.Here are some examples:

• If S is the set of odd numbers, then the sum in question is∑∞

k=0 1/(2k+1), which diverges.

• If S is the set of numbers which are 7 (mod 10), then the sum is∑∞k=0 1/(10k + 7), which diverges.

• If S is the set of square numbers, the sum is∑∞

n=1 1/n2, which con-verges (in comparison with

∫∞1 dx/x2).

• If S is the set of powers of 2, the sum is∑∞

n=0 1/2n, which convergesquite rapidly (to 2, in fact).

The set of primes, in turns out, falls in the “dense” column. Here andelsewhere, we use the notation

∑p to mean a sum over primes p.

Theorem 11.1.1.∑

p 1/p diverges.

6Pedantic note: if S is infinite, then its cardinality is the same as that of Z; i.e. it iscountable. But this isn’t the sort of comparison we are looking for – instead we want toknow how spread out S is among the positive integers.

58

Proof. For a real number s, let ζ(s) denote the sum

ζ(s) =

∞∑n=1

1

ns.

Then ζ(s) converges for s > 1 but diverges for s ≤ 1. This function is knownas the Riemann zeta function, and is very important for analytic numbertheory. This is because of its Euler factorization, valid for s > 1:

ζ(s) =

(1 +

1

2s+

1

22s+ · · ·

)×(

1 +1

3s+

1

32s+ · · ·

)×(

1 +1

5s+

1

52s+ · · ·

)· · ·

=∏p

(1− 1

ps

)−1The first equality no more and no less than is the theorem of unique factor-ization into primes: each term 1/ns is unique the product of factors 1/pks,each of which appears somewhere in the product. The second equality comesfrom the geometric series,

1 + x+ x2 + · · · = (1− x)−1,

valid for |x| < 1.In order to turn the product in to a sum, we take logarithms:

log ζ(s) =∑p

log

(1− 1

ps

)−1.

We now apply the Taylor series for log(1− x)−1:

log(1− x)−1 = x+1

2x2 +

1

3x3 + . . . ,

again valid for |x| < 1. Thus

log ζ(s) =∑p

(1

ps+

1

2p2s+

1

3p3s+ · · ·

).

59

Now let’s think about what happens as s → 1 from the right. It turns outthat ∑

p

1

2p2+

1

3p3+ · · ·

converges (left as exercise). But log ζ(s) → ∞ as s → 1. Thus we canconclude that

∑p 1/ps →∞ as s→ 1, which is to say that

∑p 1/p diverges.

11.2 Classes of primes, and their infinitude

There are infinitely many primes among the integers. But we can also ask,given an infinite subset S of the positive integers (i.e., a sequence), are thereinfinitely many primes in S? For instance, are there infinitely many primesp among the following sets?

• The set of intgers of the form 4n+ 1.

• The set of integers of the form 4n− 1.

• The set of integers of the form 2n − 1

• The set of integers of the form 2n + 1

• The set of integers of the form n2 + 1.

Every prime other than 2 falls into one of the first two categories, so wemight guess that there infinitely many primes of either sort. The third classrefers to the Mersenne primes, of which we have discovered quite a few, andthe fourth class refers to the Fermat primes, of which we have discovered five.Unfortunately, it is not known whether there exist infinitely many primesof the form 2n − 1, 2n + 1, n2 + 1 or indeed almost any formula involvingone variable, unless it happens to be a linear formula, such as the first twoexamples given here.

Theorem 11.2.1. There are infinitely many primes of the form 4n+ 1 and4n− 1.

Proof. We can vary the original method of Euclid’s proof for both cases.First we do the case of primes of the form 4n − 1. Suppose there are

finitely many of these, say p1, · · · , pt. Then N = 4p1 · · · pt − 1 must be aproduct of odd primes, which cannot all be of the form 4n+ 1 (since N ≡ 3(mod 4)). Thus there exists a prime dividing N which is 3 (mod 4), which

60

has to be one of the pi. But none of the pi divide N , since they clearlydivide N + 1. This is a contradiction.

Now consider the case of primes of the form 4n + 1. Suppose there arefinitely many of these, say p1, · · · , pt. Then N = 4p21 · · · p2t + 1 is divisibleby some odd prime, say q. Then x2 ≡ −1 (mod q) has a solution, namely

x = 2p1 · · · pt. Therefore(−1q

)= 1, which implies that q ≡ 1 (mod 4),

which means that q = pi for some i. But then q|N − 1 and q|N , which is acontradiction.

This theorem raises the question of whether there are infinitely manyprimes which are a (mod m), where a and m are integers. The answer willbe no when a and m share a common factor, but otherwise it will be yes:

Theorem 11.2.2 (Dirichlet’s theorem on primes in arithmetic progres-sions). Let a,m ∈ Z with gcd(a,m) = 1. There are infinitely many primeswhich are a (mod m).

There isn’t a Euclidean method to prove this theorem. Instead, Dirichletused analytic means. In the next section we will get a taste of the sort ofmethod he used, although we won’t quite prove the whole theorem.

11.3∑

p≡±1 (mod 4) 1/p diverges

Let

L(s) =∑n odd

(−1)(n−1)/2

ns= 1− 1

3s+

1

5s− 1

7s+ · · · .

As an alternating series, L(s) converges as long as its terms are strictlydecreasing, which is true for s > 0. Remarkably, L(s) has an Euler factor-ization:

L(s) =∏p odd

(1− (−1)(p−1)/2

ps

)−1.

This is essentially because (−1)(n−1)/2 =(−1n

)(Jacobi symbol) is multi-

plicative in n. We would now like to take logarithms of both sides, but wemust be mindful that we are not taking the logarithm of zero or a negativenumber. At s = 1, for instance, we have

1− 1

3+

1

5− 1

7+ · · · = π

4

61

(this is obtained by plugging in x = 1 into the Maclauren series for tan−1(x).)Therefore L(1) 6= 0, and so there is no problem defining logL(s) for valuesof s near 1. For such values we have

logL(s) =∑p odd

log

(1− (−1)(p−1)/2

ps

)−1

=∑p odd

∞∑n=1

(−1)n(p−1)/2

npns

For i = 1, 3, let Pi,4(s) =∑

p≡i (mod 4) 1/ps. Then we can rewrite the aboveas

logL(s) = P1,4(s)− P3,4(s) +∞∑n=2

(−1)n(p−1)/2

npns

The infinite sum converges for s = 1, because it converges absolutely (thiswas in the exercises). Therefore:

lims→1+

P1,4(s)− P3,4(s) exists.

On the other hand,

lims→1+

P1,4(s) + P3,4(s) does not exist,

because this is just the sum of 1/ps over all odd primes, and we already know∼p 1/p diverges. Now if lims→1+ P1,4(s) existed, then so would lims→1+ P3,4(s),in which case the limit of the sum would exist, which it doesn’t. A similarargument applies to P3,4(s). As a result:

Theorem 11.3.1.∑

p≡i (mod 4) 1/p diverges for i = 1, 3.

Actually these methods tells us a little bit more. Let P (s) =∑

p 1/ps,

and let Q(s) = P1,4(s)−P3,4(s). We have proved that P (s)→∞ as s→ 1+,but lims→1+ Q(s) exists. On the other hand we can express P1,4 and P3,4

in terms of these quantities: P1,4(s) = 12(P (s) + Q(s) − 2−s) and P3,4(s) =

12(P (s)−Q(s)− 2−s). We have

lims→1+

P1,4(s)

P (s)= lim

s→1+

P (s) +Q(s)− 2−s

2P (s)=

1

2

and

lims→1+

P3,4(s)

P (s)= lim

s→1+

P (s)−Q(s)− 2−s

2P (s)=

1

2.

62

For a subset S of the set of primes, we may define the function PS(s) =∑p∈S 1/ps, which is convergent (at least) for s > 1. Then the Dirichlet den-

sity of S is defined as the limit lims→1+ PS(s)/P (s), if this exists. Thereforethe set of all primes has density 1, while any finite set of primes as density0. We have shown that the Dirichlet density of primes which are 1 (mod 4)and 3 (mod 4) are 1/2 each. Dirichlet’s original theorem is that for coprimeintegers a and m, the Dirichlet density of primes which are a (mod m) is1/φ(m). This is what you would expect, if you figured that nature has no“bias” in distributing the primes among the classes of units modulo m.

There are other notions of density as well. We can define a functionπS(x) to be the number of primes in S which are ≤ x, and then naturaldensity of S is the limit limx→∞ πS(x)/π(x), if this exists. One can showthat if the natural density exists, then so does the Dirichlet density, andthese are the same. It is known that the natural density of primes whichare a (mod m) exists (and so equals 1/φ(m)).



1. The number of primes below 1026 is

π(1026) = 1, 699, 246, 750, 872, 437, 141, 327, 603.

The prime number theorem gives the estimate π(x) ≈ x/ log x. Whatis the percentage error of this approximation for x = 1026? (Thepercentage error of an approximation is the difference between the truevalue and the approximation, divided by the true value and expressedas a percentage.)

2. Using a Euclidean argument, show that there are infinitely manyprimes which are 2 (mod 3).

3. (2 pts.) Using a Euclidean argument, show that there are infinitelymany primes which are 1 (mod 3).

4. Let f(n) be a function whose inputs are positive integers and whoseoutputs are complex numbers. Then we can form the Dirichlet series∑

n≥1 f(n)/ns. If f(mn) = f(m)f(n), we say that f is multiplicative.Show that if f is multiplicative, then the Dirichlet series has an Eulerfactorization: ∑

n≥1f(n)/ns =

∏p

(1− f(p)p−s)−1.

63

(Ignore questions of convergence in this problem.)

5. We can multiply together Dirichlet series to produce new ones. Forinstance,

ζ(s)2 =∞∑n=1

d(n)

ns,

for some function d. What is the function d?

6. Similarly,

ζ(s− 1)/ζ(s) =∞∑n=1

f(n)

ns

for some function f . What is the function f?

7. Show that ζ(n) − 1 ≤ 1/(n − 1) for n = 2, 3, . . . , by comparing thesum to an integral.

8. (2 pts.) Use the result of the previous exercise to show that

∑p

∞∑n=2

1

npn

converges. You can replace the sum over primes with a sum overintegers k ≥ 2, and show that the new sum (which is larger) stillconverges. This is the technical result that allows us to prove Theorem11.1.1.

12 Continued fractions and Pell’s equation

12.1 A closer look at the Euclidean algorithm

Let’s examine what is actually going on in the extended Euclidean algorithm.Let’s start with the inputs 33 and 26:

33 = 1 · 26 + 7

26 = 3 · 7 + 5

7 = 1 · 5 + 2

5 = 2 · 2 + 1

2 = 2 · 1 + 0.

The quotients 1, 3, 1, 2, 2 then go into the table:

64

1 3 1 2 2

0 1 1 4 5 14 331 0 1 3 4 11 26

(The rows in this table are reversed from the way we usually set thingsup – ultimately it doesn’t matter.) In doing so, we can find a solution toBezout’s identity 33x+ 26y = 1, namely x = −11, y = 14. But what do theother entries of the table mean? Let’s interpret them as fractions and writethem in decimal to six places:

1/1 = 1.000000

4/3 = 1.333333

5/4 = 1.250000

14/11 = 1.272727

33/26 = 1.269231

It seems like the fractions are converging on the final fraction, 33/26,with each fraction being a better approximation than the last. Not onlythat, but the odd-numbered fractions are less than 33/26, while the even-numbered fractions are greater than it.

To see what is happening, we return to the results of the Euclideanalgorithm and reinterpret it in terms of the fraction 33/26. Since 33 =1 · 26 + 7, we have

33

26= 1 +

7

26

= 1 +1267

65

Then we substitute 26 = 3 · 7 + 5 in much the same way, and continue:

33

26= 1 +

1

3 +5

7

= 1 +1

3 +1

7/5

= 1 +1

3 +1

1 +2

5

= 1 +1

3 +1

1 +1

5

2

= 1 +1

3 +1

1 +1

2 +1

2

This is a continued fraction. Note that in the final result, all numerators are1, so the only data that matter are the sequence of denominators 1, 3, 1, 2, 2,which are exactly the quotients appearing in the Euclidean algorithm. Tosave space we can write this as

33

26= [1, 3, 1, 2, 2].

What’s more, the approximants to 33/26 we found occur when we trun-cate the continued fraction:

1 = [1]

4/3 = [1, 3]

5/4 = [1, 3, 1]

14/11 = [1, 3, 1, 2]

66

12.2 Continued fractions in the large

The theory of continued fractions extends beyond rational numbers like33/26. Given an arbitrary real number x > 1, we can execute an algo-rithm which produces rational approximations to x. It goes like this: leta0 = bxc. If x is an integer, the algorithm ends there: x = [a0]. Otherwiselet x1 = 1/(x − a0), so that x1 > 1. Then apply the same steps to x1: leta1 = bx1c. If x1 = a1, the algorithm ends there: x = [a0, a1]. Otherwise, letx2 = 1/(x1 − a1), etc.

After n steps, the algorithm produces the result:

x = [a0, a1, · · · , an−1, an, xn+1]

where a0, · · · , an ∈ Z. We can place the ans in the usual table, to producetwo lists pn and qn, defined recursively by

pn = anpn−1 + pn−2

qn = anqn−1 + qn−2

with initial conditions p−1 = q−2 = 1, p−2 = q−1 = 0. Then

pnqn

= [a0, a1, · · · , an].

(We invite the reader to prove this result by induction.)If x is rational, then this algorithm is identical to the Euclidean algo-

rithm; it halts after finitely many steps, producing a finite continued fractionx = [a0, a1, . . . , an]. But if x is irrational, the algorithm will never halt, sinceeach [a0, a1, · · · , an] is clearly a rational number.

If 0 < x < 1, then x still has a continued fraction expansion; by conven-tion we let a0 = 0 for such numbers.

Theorem 12.2.1. Let x > 0 be a real number, and let the numbers an, pnand qn be defined as above.

1. We havep0q0<p2q2< · · · < x < · · · < p3

q3<p1q1.

2. pnqn−1 − pn−1qn = (−1)n.

3.∣∣∣x− pn

qn

∣∣∣ < 1q2n

.

4. If x is irrational, so that the pn and qn are well-defined for all n, thenlimn→∞ pn/qn = x.

67

Thus if x is irrational, it makes sense to say that

x = [a0, a1, · · · ]

is the (infinite) continued fraction expansion for x.

Proof. Part 1 follows from the following observation: for positive numbersy1, y2, · · · , yn, we have that [y1, · · · , yn] is larger than [y1, · · · , yn−1] if n isodd, and is smaller otherwise. (Think about why this is true!) Part 2 canbe proved by routine induction.

Part 3 follows from parts 1 and 2:∣∣∣∣x− pnqn

∣∣∣∣ ≤ ∣∣∣∣pnqn − pn+1

qn+1

∣∣∣∣ =1

qnqn+1<

1

q2n,

where in the last step we used the fact that qn+1 > qn. Finally, part 4 followsfrom part 3, since qn →∞ as n→∞.

In fact we can give estimate how well the pn/qn approximate x. If Fn isthe nth Fibonacci number, then qn ≥ Fn (induction once again!). On theother hand Fn ∼ φn, where φ = 1.618 . . . is the golden ratio. Thus part 3 ofthe theorem shows that the accuracy of pn/qn as an approximation to x isexponential, meaning that the number of correct digits grows linearly withn.

For instance, the continued fraction expansion of π is

π = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, . . . ]

The approximation 22/7 = [3, 7] is a well-known approximation to π whichdates back at least to Archimedes. The better approximation 335/113 =[3, 7, 1, 15] is correct to six places after the decimal point:

355

113= 3.14159292 . . . .

This approximation was known to Chinese mathematicians in the 5th cen-tury.

12.3 Real quadratic irrationals and their continued fractions

Let’s find the continued fraction expansion for√

2. We start by observingthat

∣∣√2∣∣ = 1, and continue as follows:

√2 = 1 + (−1 +

√2)

(−1 +√

2)−1 = 1 +√

2 = 2 + (−1 +√

2)

(−1 +√

2)−1 = 1 +√

2 = 2 + (−1 +√

2)

68

and then we immediately notice we are trapped in a loop! The continuedfraction expansion for

√2 is [1, 2, 2, . . . ], which we abbreviate as [1, 2].

We can use our usual table to come up with good rational approximationsto√

2:

1 2 2 2 · · ·0 1 1 3 7 17 · · ·1 0 1 2 5 12 · · ·

(Some of these approximations were known to the ancient Babylonians andIndians.) Interestingly, if p/q is the nth approximation obtained this way,then p2 − 2q2 = (−1)n.

Let’s do the example of√

7 to convince ourselves that this isn’t a fluke.We proceed:

√7 = 2 + (−2 +

√7)

(−2 +√

7)−1 = (2 +√

7)/3 = 1 + (−1 +√

7)/3

3/(−1 +√

7) = (1 +√

7)/2 = 1 + (−1 +√

7)/2

2/(−1 +√

7) = (1 +√

7)/3 = 1 + (−2 +√

7)/3

3/(−2 +√

7) = 2 +√

7 = 4 + (−2 +√

7)

Since this remainder has appeared before, we can deduce that the continuedfraction expansion for

√7 is periodic as well:√

7 = [2, 1, 1, 1, 4]

Let’s make another table, this time including the value of p2 − 7q2:

2 1 1 1 4 1 1 · · ·p 0 1 2 3 5 8 37 45 82 · · ·q 1 0 1 1 2 3 14 17 31 · · ·p2 − 7q2 -7 1 -3 2 -3 1 -3 2 -3

We observe here that the value of p2 − 7q2 seems rather small relative to pand q.

Some of these observations can be summarized in the following theorem,which is due to Lagrange:

Theorem 12.3.1. Let α > 0 be a real number. The following statementsare equivalent:

1. The continued fraction expansion of α is (eventually) periodic, that is:

α = [a0, a1, · · · , am, b1, b2, · · · , bn].

69

2. There exist rational numbers r, s, and d with d > 0 such that α =r + s

√d.

12.4 Pell’s equation and Z[√d]

Let d > 0 be an integer which is not a perfect square. We let Z[√d] in the

same way we did when d was negative: it is the set of numbers of the forma + b

√d, where a, b ∈ Z. This is closed under addition, subtraction and

multiplication. Of course, Z[√d] consists of real numbers. This makes it

difficult to picture it in a nice way: if we plotted elements of Z[√d] as dots

on the real number line, the dots would “accumulate” and crowd each otherout, rather than appearing as an orderly lattice in the complex plane.

Nonetheless, we can copy some of the methods we used in the complexcase. To wit, we can define the norm of an element of Z[

√d] by

N(a+ b√d) = (a+ b

√d)(a− b

√d) = a2 − db2.

Note that this may well be negative! It is still the case that N(αβ) =N(α)N(β), and so the norm function retains its importance for studyingunits, factorization, and primes.

Lemma 12.4.1. An element α ∈ Z[√d] is a unit if and only if N(α) = ±1.

Proof. If αβ = 1, then N(α)N(β) = 1, so of course N(α) = ±1.

Thus α = x+ y√d (with x, y ∈ Z) is a unit if and only if x2− dy2 = ±1.

We remark here that the Diophantine equation

a2 − db2 = 1

is called the Pell’s equation.In the case d = 2, it is easy to see that u = 1+

√2 is a unit. Therefore so

are its powers u2, u3, . . . ; we conclude that Z[√

2] has infinitely many units.The first few powers of u are

u = 1 +√

2

u2 = 3 + 2√

2

u3 = 7 + 5√

2

u4 = 17 + 12√

2.

Interestingly, the coefficients appearing here are exactly the numbers ap-pearing in our table of approximations for

√2!

Does there always exist a unit in Z[√d] other than ±1? Examination of

some small values of d seems to indicate that it does:

70

d unit in Z[√d]

2 1 +√

2

3 2 +√

3

5 2 +√

5

6 5 + 2√

6

7 8 + 3√

7

(Here we have listed the unit a+ b√d with the smallest positive value of b.)

However, if we continued far enough we might find some erratic behavior.Z[√

60] has 31 + 4√

60, which is simple enough, but the entry in our tablefor d = 61 would be

29718 + 3805√

61,

which is rather too large to obtain by hand.Our observations so far suggest that units in Z[

√d] are strongly related

to the approximations to√d coming from its continued fraction expansion.

For instance, the large size of the unit in Z[√

61] is “explained” by the lengthof the periodic part of the continued fraction expansion

√61 = [7, 1, 4, 3, 1, 2, 2, 1, 3, 4, 1, 14]

12.5 The fundamental unit

The following theorem ties everything together.

Theorem 12.5.1. Let d > 0 be an integer which is not a perfect square.Then there exists a unit u ∈ Z[

√d] which is not ±1. Furthermore, this unit

can be chosen in such a way that every unit in Z[√d] is of the form ±un for

some n ∈ Z.

The unit u appearing in the theorem is the fundamental unit in Z[√d]. In

the language of abstract algebra, we would say that the unit group of Z[√d]

is isomorphic to Z×Z/2Z. Note that this theorem shows that Pell’s equationx2−dy2 = 1 always has infinitely many solutions: even if N(u) = −1, everyeven power of u has norm 1. (It is not the case that x2 − dy2 = −1 alwayshas solutions, though; this is a subtle problem.)

Proof. There are two parts to this theorem: the existence of a unit u 6= ±1,and then the statement that every unit arises from a fundamental unit.

We start with Theorem 12.2.1, which gives us an infinite supply of pairs

of positive integers (p, q) with∣∣∣√d− p/q∣∣∣ < 1/q2. For such a pair we have

71

p/q <√d+ 1, and so

∣∣p2 − dq2∣∣ = q2∣∣∣∣pq −√d

∣∣∣∣ (p/q +√d)

≤ q21

q2(2√d+ 1) = 2

√d+ 1

Thus p2−dq2 takes on only finitely many values. By the pigeonhole principle,at least one value gets repeated infinitely often. That is, we can find aninteger m and infinitely many pairs (p, q) with

p2 − dq2 = m.

In fact we need to apply the pigeonhole principle one more time. Eachpair (p, q), when considered modulo m, falls into one of m2 possible cases.Therefore there exist residues p0 and q0 modulo m, and infinitely manypairs (p, q) of positive integers satisfying p2 − dq2 = m, p ≡ p0 (mod m),and q ≡ q0 (mod m). Let (p, q) and (p′, q′) be two distinct such pairs. Letα = p+ q

√d and α′ = p′ + q′

√d. Then let

u =α

α′=

1

m(p+ q

√d)(p′ − q′

√d)

=1

m(pp′ − qq′d) +

1

m(pq′ − p′q)

√d

Since pp′−qq′d ≡ p2−dq2 ≡ 0 (mod m) and pq′−p′q ≡ pq−pq ≡ 0 (mod m),the element u lies in Z[

√d]. Furthermore, N(u) = N(α)/N(α′) = m/m = 1,

so that u is a unit. Since (p, q) 6= (p′, q′), u 6= 1, and since p, p′, q, q′ > 0,u > 0 and so u 6= −1.

We have therefore shown that there exists a unit u = p + q√d with

p, q > 0. Let u be the least such unit. (This exists by the well-orderingprinciple!) Now if v is another unit, we claim that v = ±un for some integern. After replacing v with −v, we may assume that v > 0. Then there existsn ∈ Z such that un ≤ v < un+1: this is essentially because limn→−∞ u

n = 0and limn→∞ u

n =∞. Let w = u−nv, so that 1 ≤ w < u.Let us write w = a + b

√d. If N(w) = 1, then w−1 = a − b

√d, so that

2a = w + w−1 ≥ 2 and therefore a > 0. Similarly 2b√d = w − w−1 ≥ 0

(since w ≥ 1), so that b ≥ 0. But u was assumed to be the least unit > 1with positive coefficients; therefore w = 1.

The argument in the case that N(w) = −1 is similar: we have w−1 =−a+ b

√d, so that 2a = w −w−1 and 2b

√d = w +w−1; in any case both of

these are nonnegative.

72

12.6 The question of unique factorization for Z[√d]

We briefly touch upon the subtle topic of unique factorization in Z[√d],

where d > 0 is not a square. We first observe that the existence of manyunits makes it harder to tell factors apart. For instance, 7 has the followingtwo factorizations in Z[

√2]:

7 = (3 +√

2)(3−√

2) = (−1 + 2√

2)(1 + 2√

2).

But 3 +√

2 is associate with −1 + 2√

2, since

3 +√

2

−1 + 2√

2= −(1 +

√2)

is a unit.In fact Z[

√2] has the property of unique factorization into primes, and

we can prove it the same way as usual: it has a division algorithm.

Theorem 12.6.1. Let α, β ∈ Z[√

2] with β 6= 0. There exist γ, δ ∈ Z[√

2]such that α = βγ + δ and |N(δ)| < |N(β)|.

(Note the presence of the absolute value signs: these are important,because norms in Z[

√2] can be negative.)

Proof. Write α/β = a1 + a2√

2, where a1, a2 ∈ Q. Let q1 and q2 be theintegers nearest to a1 and a2 respectively. Then if ri = ai − qi, we have|ri| ≤ 1/2. Let γ = q1 + q2

√2 and δ = α− βγ = β(r1 + r2

√2). We have

|N(δ)/N(β)| =∣∣r21 − 2r22

∣∣ ≤ r21 + 2r22 ≤3

4< 1,

so that |N(δ)| < |N(β)|.

There is also a division algorithm for Z[√

3], but not for Z[√

5]. In factZ[√

5] lacks unique factorization into primes; a counterexample is

4 = 2 · 2 = −(1 +√

5)(1−√

5).

(You should confirm that 2 and 1 ±√

5 cannot be factored into nonunits.)This time, 2 really does not divide 1 ±

√5, simply because the quotients

(1±√

5)/2 do not lie in Z[√

5].In this particular case the problem can be remedied simply by throwing

in the element φ = (1 +√

5)/2 to form the new ring Z[φ], the set of realnumbers of the form a+bφ, with a, b ∈ Z. This is closed under multiplication,since φ2 = 1 +φ. Then in fact elements of Z[φ] can be factored into primes,uniquely up to units; the positive units are exactly the powers of φ.

73



1. Express 103/71 as a continued fraction.

2. Evaluate [2, 1, 8, 1] as a rational number.

3. Express√

11 as a continued fraction.

4. Find three solutions to x2 − 11y2 = 1 in positive integers x, y.

5. Write [3, 4, 1] in closed form.

6. Z[√

2] has the property of unique factorization into primes. Using this,show that an odd prime p can be written as a2 − 2b2 if and only if(2p

)= 1.

7. Factor 23 + 10√

2 into primes in Z[√

2].

8. Z[√

10] does not have the property of unique factorization. Provethis as follows: show that 2 cannot be factored as the product oftwo non-units, so it is a prime in Z[

√10]. From this we see that

10 = 2 · 5 =√

10√

10 is factorization of 10 in two different ways.

9. True or false: a prime p 6= 2, 5 can be written as a2 − 10b2 if and only

if(10p

)= 1.

10. Does x2 − 21y2 = −1 have a solution in integers x, y?

13 Lagrange’s four square theorem

We have fully answered the question of which integers are expressible as thesum of two perfect squares. So the next natural question is to replace twosquares by three, or four, or more.

A little arithmetic reveals that some integers (like 7 or 15) are not ex-pressible as the sum of three squares, but they all seem to be expressibleas the sum of four squares. Here are the first few integers n which are notexpressible as the sum of two squares, but which are nonetheless are sumsof four:

74

n a2 + b2 + c2 + d2

3 12 + 12 + 12 + 02

6 22 + 12 + 12 + 02

7 22 + 12 + 12 + 12

11 32 + 12 + 12 + 02

12 22 + 22 + 22 + 02

15 32 + 22 + 12 + 12

19 42 + 12 + 12 + 12

The goal of this final section is to prove the following theorem.

Theorem 13.0.1 (Lagrange, 1770). Every positive integer can be writtenas the sum of four perfect squares.

There are a few standard approaches to proving this theorem, all ofwhich are quite interesting:

1. Lagrange’s original proof by descent,

2. Jacobi’s proof using modular forms,

3. Minkowski’s geometry of numbers,

4. Hurwitz’s system of integral quaternions.

We will follow the last approach, since it is most in line with themes wehave encountered so far.

13.1 Hamiltonian quaternions

We are all familiar with the real numbers R and the complex numbersC. These are fields, meaning that they are systems of numbers admittinglaws of addition, subtraction, multiplication and division. The addition andmultiplication laws are required to be commutative: a + b = b + a andab = ba. When the commutativity constraint on multiplication is lifted, itturns out one can form a further extension of the complex numbers, knownas the quaternions.

Definition 13.1.1. A (Hamiltonian) quaternion is a formal sum of the forma + bi + cj + dk, where a, b, c, d ∈ R. The set of all quaternions is denotedH. Addition in H is componentwise, and multiplication in H is determinedby the distributive law, the associative law, and by the rules

ij = k = −ji, jk = i = −kj, ki = j = −ik; i2 = j2 = k2 = −1.

75

The first three equations work the same way as unit vectors i, j,k underthe cross product in R3. But unlike R3, we are allowed to add togetherscalars and vectors to produce results like 2 + 3i − 4j. You should be ableto follow along with calculations like

(2 + 3i− 4j)(1− j) = −2 + 3i− 6j − 3k,

and if you like you can carefully note where the distributive and associativelaws are used.

The first observation we make is that multiplication in H is not commu-tative; indeed ij = k but ji = −k. This means we must be very careful aboutorder when we multiply quaternions. However, a real number a (interpretedas a+ 0i+ 0j + 0k) does commute with every quaternion.

Much like the complex numbers, the quaternions are equipped with aconjugation operation, which turns α = a+bi+cj+dk into α = a−bi−cj−dk,as well as a norm N(α) = a2 + b2 + c2 + d2.

Lemma 13.1.2. The following statements are true in H.

1. For all α, β ∈ H, αβ = βα.

2. αα = αα = a2 + b2 + c2 + d2.

3. If α 6= 0, then N(α) 6= 0. If we let α−1 = N(α)−1α, then αα−1 =α−1α = 1.

4. N(αβ) = N(α)N(β).

Proof. 1 and 2 are routine calculations, and 3 follows from

αα−1 = αN(α)−1α = N(α)−1αα = N(α)N(α)−1 = 1

(and similarly for the other order of multiplication); note that N(α) is ascalar and so it commutes with everything. Part 4 comes from

N(αβ) = αβαβ = αββα = αN(β)α = N(α)N(β)

The third part of the lemma tells us that nonzero quaternions havemultiplicative inverses, and so we can divide by them. However one must becareful about order. It is not advised to write α/β, since this is ambiguous.Instead, write β−1α or αβ−1.

76

The fourth part of the lemma is quite interesting. It says that the prod-uct of two sums of four squares is another product of four squares:

(a2 + b2 + c2 + d2)(e2 + f2 + g2 + h2) = w2 + x2 + y2 + z2, (13.1.1)

where

w = ae− bf − cg − dhx = af + be+ ch− dgy = ag − bh+ ce+ df

z = ah+ bg − cf + de

(We have also a similar formula for the product of two sums of twosquares, coming from multiplication in C. Is there a formula like this ex-pressing the product of two sums of n squares as another sum of n squares?It turns out the answer is yes for n = 1, 2, 4, 8, but false otherwise! Thisis one reason why an analysis of sums of four squares is tractable for thiscourse, but the same analysis for sums of three squares is much harder.)

The quaternions were introduced in 1843 by Hamilton, who later devel-oped applications to analysis and physics. They are especially useful fordescribing arbitrary rotations of a sphere. But for us, the main applicationwill be to number theory.

13.2 The Lipschitz quaternions

Definition 13.2.1. Let L be the set of Lipschitz quaternions; these are thequaternions a+ bi+ cj + dk with a, b, c, d ∈ Z.

Then L is closed under addition, subtraction and multiplication. It iseasy to see why we might be interested in L: an integer n is the sum of foursquares exactly when n = N(α) for some α ∈ L. The multiplicativity ofthe norm tells us that in order to prove Lagrange’s theorem, it is enough toshow that every prime number p is a norm from L.

A unit in L would be a nonzero element u ∈ L such that u−1 is also inL. Since N(u)N(u−1) = 1, we can deduce that units in L are exactly thoseelements of norm 1, namely the eight elements ±1,±i,±j,±k.

We might want to proceed as we have for Z[i], when we showed thatevery prime p ≡ 1 (mod 4) is the norm of a Gaussian integer. Thus ourfirst instinct might be to see whether L has a division algorithm. Thus,given α, β ∈ L with β 6= 0, is it possible to write α = βq + r, whereq, r ∈ L and N(r) < N(β)? This is the same as asking whether β−1α ∈ H

77

can be approximated by an element q ∈ L, which is close enough so thatN(β−1α− q) < 1.

We can try to think about this geometrically, although admittedly it isdifficult to think in four dimensions! Let us imagine H as a four-dimensionalspace, with the distance between quaternions α and β given by the formula√N(α− β). The points of L determine a collection of four-dimensional

“hypercubes”. The distance from one corner of such a hypercube to theopposite corner is

√12 + 12 + 12 + 12 = 2. Thus the distance from the

center of the hypercube to each corner is 1. This is bad news for us. If β−1αhappens to land exactly in the center of a hypercube, then the nearest q ∈ L(there are sixteen of these!) are a full distance 1 away. This won’t quitework: we need the remainder N(r) to be strictly less than N(β), in orderfor the Euclidean algorithm to be guaranteed to halt.

13.3 The Hurwitz quaternions

We were in a similar situation before: Z[√−3 formed a rectangular lattice

in C where the distance from the center of a rectangle to the corners was1. By adding in the centers of the rectangles, we arrived at an enlargementZ[ω], in which the division algorithm was saved.

We can do something similar to L.

Definition 13.3.1. Let H be the set of Hurwitz quaternions: these are thequaternions of the form a + bi + cj + dk, where the a, b, c, d are either allintegers, or all half of an odd integer.

Thus 1 + i and (3− i+ 5j − 9k)/2 lie in H, but 1 + 3i/2 does not.

Lemma 13.3.2. H is closed under addition, subtraction and multiplication.If α ∈ H, then N(α) is a nonnegative integer.

Proof. The claims about closure can be proved by inspection, though thedetails might be tedious. Here’s a shortcut: every Hurwitz quaternion iseither in L or else it equals α+ ω, where α ∈ L and

ω =1

2(−1 + i+ j + k).

Now ω + overlineω = −1 and ωω = N(ω) = 1, so that ω2 = ω(−1 − ω) =−ω − 1; that is, ω2 + ω + 1 = 0. This implies that ω3 = 1. In this regardit is like the element ω belonging to the Eisenstein integers Z[ω]. Note thefollowing interactions: iω = ω − i − j, ωi = ω − i − k, jω = ω − j − k,ωj = ω− i− j, kω = ω− i− k, ωk = ω− j − k. These show that when ω is

78

multiplied on the right or left by any element of L, the result still belongsto H.

Now one only has to check that if we have two Hurwitz quaternions ofthe form α, β+ω or α+ω, β+ω (with α, β ∈ L), then the sum and productof those elements belongs to H again. This is now quite easy; for instancewe have

(α+ omega)(β + ω) = (αβ − 1) + αω + βω

which by our obvservations still belongs to H.For the final claim one just has to notice that if a, b, c, d are odd then

a2 + b2 + c2 + d2 is divisible by 4.

The units in H are exactly those elements of norm 1. There are 24 ofthese: 8 are the units from L, and the other 16 are of the form (±1 ± i ±j ± k)/2.

Theorem 13.3.3. Given α, β ∈ H with β 6= 0, there exist q, r ∈ H suchthat α = βq + r and N(r) < N(β).

Proof. Let x = β−1α. Then the distance from x to the nearest elementof H must be less than 1: it was already distance at most 1 from thenearest element of L, and if the distance is exactly 1, then x lies in thecenter of its hypercube, which means it already lies in H. Thus there existsq ∈ H such that N(β−1α − q) < 1. Multiply through by N(β) to obtainN(α− βq) < N(β).

We can now proceed to investigate the arithmetic of H as we have donebefore, but we have to be extra careful about the order of multiplication.For instance, it is ambiguous to write α|β: does this mean there exists γwith β = αγ, or β = γα? In the first case we say that α is a left divisor ofβ, and in the second case we say that α is a right divisor of β. However, ifn is a rational (i.e., scalar) integer, then n|β is unambiguous: it means thatthere exists γ ∈ H with β = γn = nγ.

The set of left and right divisors of a given quaternion might be different!However, note that units u are both left and right divisors of every element,since α = αu−1u and α = uu−1α.

Theorem 13.3.4. Suppose α and β have no common left divisors otherthan units. Then there exist x, y ∈ H such that αx+ βy = 1.

Proof. Consider the set S of all αx+ βy, where x, y ∈ H. This set containssome nonzero elements (it contains α and β, and if these were both zero,the condition on common divisors would not be satisfied). So there is an

79

element d = αx + βy in S of least nonzero norm. Now apply the divisionalgorithm: there existq, r ∈ H with α = dq + r with N(r) < N(d). Butr = α− dq = α(1− xq)− βyq also lies in H. The minimality of N(d) showsthat r must be 0, so that in fact d is a left divisor of α. Similarly, it is a leftdivisor of β, which implies that d is a unit. Now we can take αx + βy = dand multiply through on the right by d−1 to get the desired equation.

Theorem 13.3.5. Let a be a rational integer, and let β, γ ∈ H. Supposethat a and β have no common left divisors other than units. Also supposethat a|γβ. Then a|γ.

Proof. By Theorem 13.3.4, there exist x, y ∈ H such that ax+βy = 1. Sincea|βγ, we can write γβ = az, with z ∈ H. Multiplying Bezout’s identity onthe left by γ gives

γax+ γβy = γ,

so thatγ = a(γx+ zy)

is divisible by a.

13.4 Hurwitz primes

Definition 13.4.1. An element π ∈ H is a Hurwitz prime if it is a non-unitwhich cannot be factored as αβ for non-units α, β.

Lemma 13.4.2. If π ∈ H and N(π) = p is a prime number, then π is aHurwitz prime.

Proof. If π = αβ, then p = N(α)N(β). This implies that N(α) or N(β)must be 1, so that either α or β must be a unit.

Theorem 13.4.3. Let p be a rational prime. Then p = N(α) for someα ∈ H.

Proof. The trick is to use a result from Exercise 6 from Assignment #5.This said that the congruence x2 + y2 ≡ a (mod p) always has a solution,no matter the value of a. Therefore we can find x, y ∈ Z with x2 + y2 ≡ −1(mod p), so that p|x2 + y2 + 1. But this last expression, being a sum of (atmost) four squares, is a norm from L: it is N(α), where α = 1 + xi + yj.This means that p|αα.

Note that p does not divide α, because α/p does not lie in H. (Thisargument is valid even if p = 2: the k-component of α/2 is 0, which is aninteger, not a half-odd.)

80

Assume that p is a Hurwitz prime. Then up to right associates its onlyright divisors would be 1 and p. This means that p cannot share any non-unit right divisors with α. Now Theorem 13.3.5 applies to give p|α, whichis (for the same reasoning as before) absurd. Thus our assumption is false,and p cannot be a Hurwitz prime.

Thus we can write p = αβ for non-units α and β; taking norms givesp2 = N(α)N(β), so that N(α) = p.

13.5 The end of the proof

We have only proved that every prime p is the norm of a Hurwitz integer α.This is the same as saying that either p is the sum of four squares, or else4p is the sum of four odd squares. Not quite good enough!

But the solution is not far off. If α does not belong to L, we can hopethat there exists a unit u ∈ H such that αu ∈ L. Then p = N(αu) is thenorm of a Lipschitz quaternion, which is what we want.

Lemma 13.5.1. Given α ∈ H, there exists a unit u ∈ H such that αu ∈ L.

Proof. Of course we might as well assume that α 6∈ L. Then α = (a +bi + cj + dk)/2, where a, b, c, d are all odd. Now, each of a, b, c, d must becongruent to ±1 modulo 4. This means that for some choice of signs wehave α = (±1± i± j±k)/2+2λ, where λ ∈ L. But u = (±1± i± j±k)/2 isa unit (it has norm 1), so that u−1α = 1 + 2u−1λ. Finally, 2u−1 ∈ 2H ⊂ L,so that u−1α ∈ L.

Finally we can complete the proof of Theorem 13.0.1. By the identityin (13.1.1), it suffices to show that every prime p is the sum of four squares,or equivalently that it is a norm from L. By Theorem 13.4.3, p = N(α) forsome α ∈ H. By the preceding lemma, αu ∈ L for some unit u, and thenp = N(αu) is the sum of four squares.

81

Number Theory Course notes for MA 341, Spring 2018

Documents