Practical Mathematics_Real Functions-Complex Analysis-Fourier Transforms-Probabilities_Kaxiras_Harvard

PRACTICAL MATHEMATICS

REAL FUNCTIONS, COMPLEX ANALYSIS,FOURIER TRANSFORMS, PROBABILITIES

Efthimios KaxirasDepartment of Physics and

School of Engineering and Applied Sciences

Harvard University

2

PREFACE

This book is the result of teaching applied mathematics courses to un-dergraduate students at Harvard. The emphasis is on using certain con-cepts of mathematics for applications that often arise in scientific and en-gineering problems. In typical books of applied mathematics, when all thedetails of the mathematical derivations and subtleties are included, thetext becomes too formal; students usually have a hard time extracting theuseful information from the text in a way that is easy to digest. Here Itried to maintain some of the less-than-formal character of a lecture, withmore emphasis on examples rather than the exact statement and proof ofmathematical theorems.

Mathematics is the language of science. Someone can learn a newlanguage in a formal way and have a deep knowledge of the many aspectsof the language (grammar, literature, etc.). This takes a lot of effort and,at least in the beginning, can be rather tiresome. On the other hand,one can pick up the essentials of a language and develop the ability tocommunicate effectively in this language without a deep knowledge of allits richness. The contents of this book are a study guide for such a quickfamilirization with some topics in applied mathematics that can be veryhandy in science and engineering. It is a bit like learning a language ata street-smart level, without being exposed to its fine points. Of coursemuch is missed this way. But the satisfaction of being able to quickly andeffectively use the language for new experiences may compensate for this.

Contents

1 Functions of real variables 5

1.1 Functions as mappings . . . . . . . . . . . . . . . . . . . . 51.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.4 Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Power series expansions 23

2.1 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Convergence of number series . . . . . . . . . . . . . . . . 272.3 Convergence of series of functions . . . . . . . . . . . . . . 32

3 Functions of complex variables 41

3.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . 413.2 Complex variables . . . . . . . . . . . . . . . . . . . . . . 453.3 Continuity, analyticity, derivatives . . . . . . . . . . . . . 483.4 Cauchy-Riemann relations and harmonic functions . . . . 523.5 Branch points and branch cuts . . . . . . . . . . . . . . . 563.6 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Contour Inegration 79

4.1 Complex integration . . . . . . . . . . . . . . . . . . . . . 794.2 Taylor and Laurent series . . . . . . . . . . . . . . . . . . 864.3 Types of singularities - residue . . . . . . . . . . . . . . . 934.4 Integration by residues . . . . . . . . . . . . . . . . . . . . 954.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5 Fourier analysis 117

5.1 Fourier expansions . . . . . . . . . . . . . . . . . . . . . . 1175.1.1 Real Fourier expansions . . . . . . . . . . . . . . . 1175.1.2 Fourier expansions of arbitrary range . . . . . . . . 121

3

4 CONTENTS

5.1.3 Complex Fourier expansions . . . . . . . . . . . . . 1225.1.4 Error analysis in series expansions . . . . . . . . . . 124

5.2 Application to differential equations . . . . . . . . . . . . . 1255.2.1 Diffusion equation . . . . . . . . . . . . . . . . . . 1255.2.2 Poisson equation . . . . . . . . . . . . . . . . . . . 126

5.3 The δ-function and the θ-function . . . . . . . . . . . . . . 1275.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 127

5.4 The Fourier transform . . . . . . . . . . . . . . . . . . . . 1295.4.1 Definition of Fourier Transform . . . . . . . . . . . 1295.4.2 Properties of the Fourier transform . . . . . . . . . 1305.4.3 Singular transforms - limiting functions . . . . . . . 137

5.5 Fourier analysis of signals . . . . . . . . . . . . . . . . . . 139

6 Probabilities and random numbers 145

6.1 Probability distributions . . . . . . . . . . . . . . . . . . . 1456.1.1 Binomial distribution . . . . . . . . . . . . . . . . 1486.1.2 Poisson distribution . . . . . . . . . . . . . . . . . 1496.1.3 Gaussian or normal distribution . . . . . . . . . . 150

6.2 Multivariable probabilities . . . . . . . . . . . . . . . . . . 1576.3 Conditional probabilities . . . . . . . . . . . . . . . . . . . 1616.4 Random numbers . . . . . . . . . . . . . . . . . . . . . . 173

6.4.1 Arbitrary probability distributions . . . . . . . . . 1756.4.2 Monte Carlo integration . . . . . . . . . . . . . . . 176

Chapter 1

Functions of real variables

In this chapter we give a very brief review of the basic concepts related tofunctions of real variables, as these pertain to our subsequent discussionof functions of complex variables. There are two aspects of interest: thefact that a function implies a mapping, and the fact that functions can beapproximated by series expansions of simpler functions.

1.1 Functions as mappings

A function f(x) of the real number x is defined as the mathematical objectwhich for a given value of the variable x takes some other real value. Thevariable x is called the argument of the function. Some familiar examplesof functions are:

f(x) = c, c : real constant (1.1)

f(x) = ax + b, a, b : real constants (1.2)

f(x) = a0 + a1x + a2x2 + · · ·+ anxn, an : real constants (1.3)

The first is a constant function, the second represents a line, the third is ageneral polynomial in powers of the variable x, called nth order polynomialby the largest power of x that appears in it.

A useful set of functions of a real variable are the trigonometric func-tions. Since we will be dealing with these functions frequently and exten-sively in the following chapters, we recall here some of their properties.The two basic trigonometric functions are defined by the projections of aradius of the unit circle, centered at the origin, which lies at an angle θ withrespect to the horizontal axis (θ here is measured in radians), onto the ver-tical and horizontal axes; these are called the “sine” (sin(θ)) and “cosine”

5

6 CHAPTER 1. FUNCTIONS OF REAL VARIABLES

θθ

θ

θ

θ

tan( )

sin( )

cos( )

cot( )

0

y

x

1

Figure 1.1: The definition of the basic trigonometric functions in the unitcirlce.

(cos(θ)) functions, respectively. Additional functions can be defined by theline segments where the ray at angle θ with respect to the horizontal axisintersects the two axes that are parallel to the vertical and horizontal axesand tangent to the unit circle; these are called the “tangent” (tan(θ)) and“cotangent” (cot(θ)) functions, respectively. These definitions are shownin Fig. 1.1. From the definitions, we can deduce

tan(θ) =sin(θ)

cos(θ), cot(θ) =

cos(θ)

sin(θ)(1.4)

The following relation links the values of the sine and cosine functions:

sin2(θ) + cos2(θ) = 1 (1.5)

From their definition, it is evident that the trigonometric functions areperiodic in the argument θ with a period of 2π, that is, if the argument ofany of these functions is changed by an integer multiple of 2π the value ofthe function does not change:

sin(θ + 2kπ) = sin(θ), cos(θ + 2kπ) = cos(θ), k : integer (1.6)

Moreover, by geometric arguments, a number of useful relations betweentrigonometric functions of different arguments can be easily established,

1.1. FUNCTIONS AS MAPPINGS 7

for example:

cos(θ1 + θ2) = cos(θ1) cos(θ2) − sin(θ1) sin(θ2) (1.7)

⇒ cos(2θ) = cos2(θ) − sin2(θ)

sin(θ1 + θ2) = cos(θ1) sin(θ2) + sin(θ1) cos(θ2) (1.8)

⇒ sin(2θ) = 2 cos(θ) sin(θ)

Another very useful function is the exponential exp(x) = ex, defined asa limit of a simpler function that involves powers of x.

f(x) = ex = limN→∞

(

1 +x

N

)N

(1.9)

Other important functions that are defined through the exponential, arethe so-called hyperbolic sine (sinh(x)) and hyperbolic cosine (cosh(x)) func-tions:

f(x) = cosh(x) =ex + e−x

2(1.10)

f(x) = sinh(x) =ex − e−x

2(1.11)

Note that, from their definition, it is easy to show that the hyperbolic sineand cosine functions satisfy

cosh2(x) − sinh2(x) = 1 (1.12)

which is reminiscent of the relation between the sine and cosine functions,Eq. (1.5). We can also define the so-called hyperbolic tangent (tanh(x))and cotangent (coth(x)) functions as:

tanh(x) =sinh(x)

cosh(x)=

ex − e−x

ex + e−x(1.13)

coth(x) =cosh(x)

sinh(x)=

ex + e−x

ex − e−x(1.14)

by analogy to the trigonometric sine and cosine functions. The exponentialand the hyperbolic functions are not periodic functions of the argument x.

The inverse of the mathematical object we called a function is notnecessarily a function: For the first example, every value of the variablex gives the same value for the function, namely c, that is, if we know thevalue of the argument we know the value of the function, as it should be


In[1]:= Plot@Exp@xD, 8x, -2, 2<D

-2 -1 1 2

1

2

3

4

5

6

7

Out[1]= � Graphics �

In[2]:=Plot@Cosh@xD, 8x, -2, 2<D

-2 -1 1 2

1.5

2

2.5

3

3.5


In[3]:= Plot@Sinh@xD, 8x, -2, 2<D

-2 -1 1 2

-3

-2

-1

1

2

3


Examples of functions 1

Figure 1.2: The exponential function and the hyperbolic cosine and sine func-tions for values of the argument in the range [−2, 2].

1.1. FUNCTIONS AS MAPPINGS 9

according to the definition. But, if we know the value of the function f(x),we cannot tell from which value of the argument it was obtained, since allvalues of the argument give the same value for the function. In some ofthe other examples, the relation can be inverted, that is, if we know thevalue of the function f(x) = y, we can uniquely determine the value of theargument that produced it, as in the second example, Eq. (1.2):

y = ax + b ⇒ x =y − b

a(1.15)

assuming a 6= 0 (otherwise the function would be a constant, just like Eq.(1.1)). Thus, a function takes us unambiguously from the value of x tothe value of y = f(x), for all values of x in the domain of the function (thedomain being the set of x values where the function is well defined), butnot necessarily in the opposite direction (from y to x). If the relation canbe inverted, then we can obtain x from the inverse function, denoted byf−1, evaluated at the value of the original function y:

f(x) = y ⇒ x = f−1(y) (1.16)

This brings us to the idea of mapping. A function represents the map-ping of the set of real values (denoted by the argument x) to another setof real values. A given value of the argument x gets mapped to a precisevalue, which we denote by f(x) or y. If each value of x gets mapped toone and only one value y, then the relation can be inverted (as in thesecond example, Eq. (1.2)), otherwise it cannot (as in the first example,Eq. (1.1)). The mapping can be visualized by plotting the values of thefunction y = f(x) on a vertical axis, for each value of the argument x ona horizontal axis. Examples of plots produced by the familiar functions inEq. (1.9) - (1.11) are shown in Fig. 1.2.

These plots give a handy way of determining whether the relation canbe inverted or not: Draw a horizontal line (parallel to the x axis) thatintersects the vertical axis at an arbitrary value; this corresponds to somevalue of the function. If this line intersects the plot of the function onlyonce, the relation can be inverted; otherwise it cannot. For example, it isobvious from the plot of the exponential function, Eq. (1.9), that in thiscase the relation can be inverted to produce another function. In fact, theinverse of the exponential function is the familiar natural logarithm:

f(x) = ex ⇒ f−1(x) = ln(x) (1.17)

The domain of the exponential is the entire real axis (all real values of x),

domain of exp(x) : x ∈ (−∞, +∞)


but exp(x) takes only real positive values. This means that the domain ofthe logarithm is only the positive real axis

domain of ln(x) : x ∈ (0, +∞)

with the lower bound corresponding to the value of the exponential at thelowest value of its argument (−∞) and the upper bound corresponding tothe value of the exponential at the highest value of its argument (+∞).

1.2 Continuity

An important notion in studying the behavior of a function is that ofcontinuity. Crudely speaking, a function is called continuous if its valuesdo not make any jumps as the value of the argument is varied smoothlywithin its domain; otherwise it is called discontinuous. To make this notionmore precise, we require that as the value of the argument x approachessome specific value x0, the value of the function f(x) approaches the valuewhich it takes for x0, that is f(x0). In mathematical terms, this is expressedas follows: For every value of ǫ > 0, there exists a value of δ > 0 such thatwhen x is closer than δ to the specific value x0, f(x) is closer that ǫ to thecorresponding value f(x0):

|x − x0| < δ ⇒ |f(x) − f(x0)| < ǫ (1.18)

This concept is illustrated in Fig. 1.3.Since we require this to hold for any value ǫ > 0, we can think of ǫ being

very small, ǫ → 0, which means that for a function which is continuousnear x0 we can always find values of f(x) arbitrarily close to f(x0), as wechange its argument by some amount δ. Usually, when ǫ becomes smallerwe take δ to be correspondingly smaller, but this is not mandatory.

A counter example is the function

f(x) =1

x

in the neighborhood of x0 = 0: when x → 0+ the value of f(x) → +∞, andwhen x → 0− the value of f(x) → −∞. Therefore, very close to 0, if wechange the argument of the function by some amount δ we can jump froma very large positive value of f(x) to a very large negative value, whichmeans f(x) cannot be restricted by an arbitrary value ǫ > 0 within f(x0);in fact, we cannot even assign a real value to f(x0) for x0 = 0. Therefore,

1.2. CONTINUITY 11

−δ +δ

−ε

+ε

x0

f(x )0

x0 x0

f(x )0

f(x )0

0 x

f(x)

Figure 1.3: Illustration of continuity argument: when x is closer than δ to x0,f(x) gets closer than ǫ to f(x0).

this function is discontinuous at x0 = 0. Note that it is continuous at everyother real value of x. To show this explicitly, for a given ǫ we take the valueof δ to be:

δ =ǫ|x0|2

1 + ǫ|x0|⇒ ǫ =

δ

(|x0| − δ)|x0|

and consider what happens for |x − x0| < δ: First, we have:

|x0| = |x0−x+x| ≤ |x0−x|+|x| < δ+|x| ⇒ |x| > |x0|−δ ⇒ 1

|x| <1

|x0| − δ

where we have used the triangle inequality (see Problem 1) to obtain thefirst inequality above. Using the last relationship together with |x−x0| < δwe obtain:

ǫ =δ

(|x0| − δ)|x0|>

|x − x0||x||x0|

=

∣

∣

∣

∣

x − x0

xx0

∣

∣

∣

∣

=

∣

∣

∣

∣

1

x− 1

x0

∣

∣

∣

∣

= |f(x) − f(x0)|

that is, with our choice of δ, which is always positive as long as x0 6= 0, forany value of ǫ we have managed to satisfy the continuity requirement, Eq.(1.18). For this proof to be valid, since ǫ and δ must both be greater thanzero, we must have |x0| > δ which is always possible to satisfy as long asx0 6= 0, for ǫ being very small.


1.3 Derivative

The notion of continuity makes it feasible to define the derivative of afunction,

f ′(x0) =df

dx(x0) ≡ lim

x→x0

f(x) − f(x0)

x − x0

(1.19)

(the notation f ′ is used interchangeably with the notation df/dx). Whenthe function is continuous at x = x0 this limit gives a finite value, that is,it is properly defined. If this holds for all values of x in a domain, thenthe derivative is itself a proper function in this domain. Conversely, if thefunction is discontinuous, the numerator in the limit can be finite (or eveninfinite) as the denominator tends to zero, which makes it impossible toassign a precise value to the limit, so that the derivative does not exist. Asan example, the derivative of the function 1/x is:

d(1/x)

dx(x0) = lim

x→x0

(1/x − 1/x0)

x − x0= lim

x→x0

(x0 − x)

(x − x0)xx0= − 1

x20

(1.20)

which holds for every value of x0, except x0 = 0, where the function isdiscontinuous. Thus, the derivative of f(x) = 1/x is f ′(x) = −1/x2 for allx 6= 0.

The derivatives of some familiar functions are:

f(x) = xa → f ′(x) = axa−1 a : real (1.21)

f(x) = ex → f ′(x) = ex (1.22)

f(x) = cosh(x) → f ′(x) = sinh(x) (1.23)

f(x) = sinh(x) → f ′(x) = cosh(x) (1.24)

f(x) = ln(x) → f ′(x) =1

x, x > 0 (1.25)

f(x) = cos(x) → f ′(x) = − sin(x) (1.26)

f(x) = sin(x) → f ′(x) = cos(x) (1.27)

Two useful relations in taking derivatives are the chain rule and the deriva-tive of a function of a function:

d

dx[f(x)g(x)] = f ′(x)g(x) + f(x)g′(x) (1.28)

d

dx[f(g(x))] = f ′(g(x))g′(x) (1.29)

The notation f ′(g(x)) implies the derivative of the function f with respectto its argument, which in this case is g, and the evaluation of the resulting

1.3. DERIVATIVE 13

experession at g(x). Using Eqs. (1.28) and (1.29), we can obtain thegeneral expression for the derivative of the ratio of two functions. We firstrewrite the ratio as the product, through

h(x)

g(x)= h(x)f(g(x)), where : f(x) =

1

x

which with the help of Eqs. (1.28) and (1.29) gives

d

dx

[

h(x)

g(x)

]

= h′(x)f(g(x)) + h(x)f ′(g(x))g′(x)

and this, by using Eq. (1.21) with a = −1 produces:

d

dx

[

h(x)

g(x)

]

= h′(x)1

g(x)− h(x)

1

[g(x)]2g′(x) =

h′(x)g(x) − h(x)g′(x)

[g(x)]2

(1.30)The notion of the derivative can be extended to higher derivatives, with

the first derivative playing the role of the function in the definition of thesecond derivative and so on:

f ′′(x0) =d2f

dx2(x0) = lim

x→x0

f ′(x) − f ′(x0)

x − x0(1.31)

For higher derivatives it becomes awkward to keep adding primes, so thenotation f (n) is often used for dnf/dxn for the nth derivative.

Another generalization is the “partial derivative” of a function of severalvariables with respect to one of them. For instance, suppose that we aredealing with a function of two variables, f(x, y), and want to take itsderivative with repsect to x for x = x0, holding y constant at a valuey = y0; this is called the “partial derivative with respect to x”, evaluatedat (x, y) = (x0, y0):

∂f

∂x(x0, y0) ≡ lim

x→x0

f(x, y0) − f(x0, y0)

x − x0(1.32)

and similarly for the partial derivative with respect to y evaluated at thesame point:

∂f

∂y(x0, y0) ≡ lim

y→y0

f(x0, y) − f(x0, y0)

y − y0

(1.33)

The geometric interpretation of the derivative f ′(x) of the function f(x)is that it gives the slope of the tangent of the function at x. This is easilyjustified by the fact that the line described by the equation

q(x) = f(x0) + f ′(x0)(x − x0)


(a)

x0

f(x )0

0

f(x)

x

(b)

0

f(x)

f(a)

a

f(b)

b x

Figure 1.4: Geometric interpretation of (a) the derivative as the slope of thetangent of f(x) at x = x0 and (b) the integral as the area below the graph ofthe function from x = xi to x = xf .

passes through the point (x = x0, y = f(x0)), that is, q(x0) = f(x0), andhas a slope of

q(x) − q(x0)

x − x0

= f ′(x0)

as illustrated in Fig. 1.4.

1.4 Integral

A notion related to the derivative is the integral of the function f(x). Wedefine a function F (x) such that its infinitesimal increment at x, denotedby dF (x) is equal to f(x) multiplied by the infinitesimal increment in thevariable x, denoted by dx:

dF (x) = f(x)dx ⇒ F ′(x) ≡ dF

dx(x) = f(x) (1.34)

From this last relation, it is evident that finding the function F (x) is equiv-alent to determining a function whose derivative is equal to f(x) 1. Notethat because the derivative of a constant function is zero, F (x) is definedup to an unspecified constant. Now consider the summation of all the val-ues dF (x) takes starting at some initial point xi and ending at the finalpoint xf . Since these are successive infinitesimal increments in the value

1For this reason F (x) is also referred to as the antiderivative of f(x).

1.4. INTEGRAL 15

of F (x), this summation will give the total difference ∆F of the values ofF (x) between the end points:

∆F = F (xf) − F (xi) =xf∑

x=xi

dF (x)

Because this sum involves infinitesimal quantities, it has a special charac-ter, so we give a new name, the definite integral, and use the symbol

∫

instead of the summation:∫ xf

xi

f(x)dx = F (xf ) − F (xi) (1.35)

If we omit the limits on the integration symbol, we call this quantity theindefinite integral, which implies finding the antiderivative of the functionf(x) that appears under the

∫

symbol. In the definite integral, the dif-ference between the values at the end points of integration cancels theunspecified constant involved in F (x).

We can express the integral in terms of a usual sum of finite quantitiesif we break the interval xf − xi into N equal parts and then take the limitof N → ∞:

∆x =xf − xi

N,

xj = xi + j∆x, j = 0, . . . , N, x0 = xi, xN = xf

∫ xf

xi

f(x)dx = limN→∞

N∑

j=0

f(xj)∆x

(1.36)

From this definition of the integral, we conclude that the definite integralis the area below the graph of the function from xi to xf ; this is illustratedin Fig. 1.4. From this we can also infer that the definite integral willbe a finite quantity if there are no infinities in the values of the functionf(x) (called here the “integrand”) in the range of integration, or if theseinfinities are of a special nature so that they cancel. Moreover, if the upperor lower limit of the range of integration is ±∞, then in order to obtain afinite value for the integral, the integrand in absolute value should vanishfaster than 1/|x|, that is, as 1/|x|a with a > 1, as this limit is approached.

As a simple exercise we apply these concepts to a very useful function,the exponential of −x2 which is known as the gaussian function. The do-main of the gaussian function is the entire real axis, but the function takesnon-negligible values only for a relatively narrow range of values of |x| near


x = 0, because for larger values of |x| the argument of the exponential be-comes very large and negative, making the value of the function essentiallyzero. We can include a constant multiplicative factor in the exponential,chosen here to be α2,

gα(x) = e−α2x2

,

which allows us to adjust the range of the domain in which the gaussiantakes non-negligible values. Examples of gaussian functions with differentvalues of α are shown in Fig. 1.5. Using the rule for the derivative of afunction of a function, Eq. (1.29), we find the derivative of the gaussian tobe:

g′α(x) = −2α2xe−α2x2

where we have also used Eq. (1.21) with a = 2.It is often useful to multiply the gaussian with a factor in front, to

make the definite integral of the product over the entire domain exactlyequal to unity (this is called “normalization” of the function). In order tofind the constant factor which will ensure normalization of the gaussianfunction we must calculate its integral over all values of the variable x.One of the nice features of the gaussian is that we can obtain analyticallythe value of the definite integral over the entire domain. This integral ismost easily performed by using Gauss’ trick of taking the square root ofthe square of the integral and turning the integration over the cartesiancoordinates (x, y) into one over the polar coordinates (r, θ), with the twosets of coordinates related by

x = r cos(θ), y = r sin(θ) → r =√

x2 + y2, θ = tan−1(

y

x

)

These relations imply: dxdy = rdrdθ.

∫ ∞

−∞e−α2x2

dx =[∫ ∞

−∞e−α2x2

dx∫ ∞

−∞e−α2y2

dy]

1

2

=[∫ ∞

−∞

∫ ∞

−∞e−α2(x2+y2)dxdy

]1

2

=[∫ 2π

0

∫ ∞

0e−α2r2

rdrdθ]

1

2

=[

2π∫ ∞

0e−α2r2 1

2dr2

]

1

2

=

√π

α(1.37)

In the limit of α → ∞ this function tends to an infinitely sharp spikeof infinite height which integrates to unity; this is a useful mathematicalconstruct called a δ-function, which we will examine in great detail later.

A general expression that uses the rules of differentation and integrationwe discussed so far is the so-called “integration by parts”. Suppose that

1.4. INTEGRAL 17

In[1]:= Show@Plot@Exp@-x^2D, 8x, -1, 1<, PlotRange ® 80, 1.1<D,Plot@Exp@-25 x^2D, 8x, -1, 1<, PlotRange ® 80, 1.1<DD

-1 -0.5 0.5 1

0.2

0.4

0.6

0.8

1

-1 -0.5 0.5 1

0.2

0.4

0.6

0.8

1

-1 -0.5 0.5 1

0.2

0.4

0.6

0.8

1


Gaussian functions 1

Figure 1.5: Examples of gaussian functions with α = 1 and 5. These plots donot include the normalization factors α/

√π.


the integrand can be expressed as a product of two functions, one of whichis a derivative:

∫ xf

xi

f ′(x)g(x)dx =∫ xf

xi

(

d

dx[f(x)g(x)] − f(x)g′(x)

)

dx

= [f(x)g(x)]xf

xi−∫ xf

xi

f(x)g′(x)dx (1.38)

where we have used primes to denote first derivatives, we have employedthe rule for differentiating a product, Eq. (1.28), and we introduced thenotation

[h(x)]xf

xi= h(xf ) − h(xi)

An application of the integration by parts formula, which will prove veryuseful in chapter 5, involves powers of x and sines or cosines of multiplesof x. Consider the following definite integrals:

∫ π

0xk cos(nx)dx,

∫ π

0xk sin(nx)dx

with k, n positive integers. Using the identities

d sin(nx)

dx= n cos(nx),

d cos(nx)

dx= −n sin(nx)

and the integration by parts formula, Eq. (1.38), we find:

∫ π

0xk cos(nx)dx = −k

n

∫ π

0xk−1 sin(nx)dx (1.39)

=kπk−1(−1)n

n2− k(k − 1)

n2

∫ π

0xk−2 cos(nx)dx

∫ π

0xk sin(nx)dx = −πk(−1)n

n+

k

n

∫ π

0xk−1 cos(nx)dx (1.40)

= −πk(−1)n

n− k(k − 1)

n2

∫ π

0xk−2 sin(nx)dx

where, to obtain the second result in each case, we have used the integrationby parts formula once again. The usefulness of these results lies in thefact that the final expression contains an integrand which is similar to theoriginal one with the power of x reduced by 2. Applying these resultsrecursively, we can evaluate any integral containing powers of x and sinesor cosines.

1.4. INTEGRAL 19

We give two more examples of definite integrals to demonstrate dif-ficulties that may arise in their evaluation, in anticipation of integrationmethods based on complex analysis (see chapter 3). The first example weconsider is the definite integral

∫ ∞

0

1

x2 + 1dx

We note that the integrand is always finite over the range of integrationand it vanishes as ∼ 1/x2 for x → ∞, which suggests that this integralshould give a finite value. Indeed, with the change of variables

x = tan(t) =sin(t)

cos(t)⇒ dx =

cos2(t) + sin2(t)

cos2(t)dt =

1

cos2(t)dt

we find for the original integral:

∫ ∞

0

1

x2 + 1dx =

∫ π/2

0

1

tan2(t) + 1

1

cos2(t)dt =

∫ π/2

0dt =

π

2(1.41)

Other than the inspired change of variables, this integral presents no par-ticular difficulties.

The next definite integral we consider is the following:

∫ ∞

0

1

x2 − 1dx

which is somewhat more demanding: for x → ∞ the integrand againvanishes as ∼ 1/x2, but in this case there are infinities of the integrand asx approaches 1 from below or above. Fortunately, these can be made tocancel. To see this, just below x = 1 we write the variable x as x = 1 − ǫwith ǫ > 0 which gives

1

(1 − ǫ)2 − 1=

1

ǫ(ǫ − 2)→ − 1

2ǫfor ǫ → 0

and similarly, just above x = 1 we express write the variable x as x = 1+ ǫwith ǫ > 0 which gives

1

(1 + ǫ)2 − 1=

1

ǫ(ǫ + 2)→ +

1

2ǫfor ǫ → 0

so that when we add the two contributions they cancel each other. Noticethat this cancellation is achieved only if the value x = 1 is approached


symmetrically from above and below, that is, we use the same values of ǫfor x → 1− and x → 1+. In the general case where the integrand has aninfinite value at a point x0 in the interval of integration x ∈ [a, b], thissymmetric approach of the infinite value on either side of x0 is expressedas follows:

∫ b

af(x) dx = lim

ǫ→0

[

∫ x0−ǫ

af(x) dx +

∫ b

x0+ǫf(x) dx

]

(1.42)

This way of evaluating the integral is called taking its “principal value”.If we attempted to solve the problem by a change of variables analogous

to the one we made for the previous example, we might try first to separatethe integral into two parts to facilitate the change of variables:

∫ ∞

0

1

x2 − 1dx =

∫ 1

0

1

x2 − 1dx +

∫ ∞

1

1

x2 − 1dx

and for the first part we could choose

x = tanh(t) =sinh(t)

cosh(t)⇒ dx =

cosh2(t) − sinh2(t)

cosh2(t)dt =

1

cosh2(t)dt

with x = 0 ⇒ t = 0, x = 1 ⇒ t → ∞ and we have used Eq. (1.12), sothat the first integral gives:

∫ 1

0

1

x2 − 1dx =

∫ ∞

0

1

tanh2(t) − 1

1

cosh2(t)dt = −

∫ ∞

0dt

while for the second part we could choose

x = coth(t) =cosh(t)

sinh(t)⇒ dx =

sinh2(t) − cosh2(t)

sinh2(t)dt = − 1

sinh2(t)dt

with x = 1 ⇒ t → ∞, x → ∞ ⇒ t = 0 so that the second integral gives:

∫ ∞

1

1

x2 − 1dx = −

∫ 0

∞

1

coth2(t) − 1

1

sinh2(t)dt =

∫ ∞

0dt

This, however, has led to an impass, because both integrals are infinite andwe cannot decide what the final answer is, even though from the precedinganalysis we expect it to be finite. The problem in this approach is that webroke the integral in two parts and tried to evaluate them independentlybecause that facilitated the change of variables, while the preceding anal-ysis suggested that the two parts must be evaluated simultatneously to

1.4. INTEGRAL 21

make sure that the infinities are cancelled (at least in the “principal value”sense of the integral). This is an illustration of difficulties that may arise inthe evaluation of real definite integrals using traditional approaches. Theapproaches we will develop in chapter 3 provide an elegant way of avoidingthese difficulties.


Problems

1. For any two real numbers x1, x2 show that:

|x1 + x2| ≤ |x1| + |x2|

This is known as the “triangle inequality”. Show that this inequalitycan also be generalized to an arbitrary number of real numbers, thatis,

|x1 + · · · + xn| ≤ |x1| + · · ·+ |xn|

2. Is the inverse of cos(x) a properly defined function, if cos(x) is definedin the domain x ∈ (−∞, +∞)? Is it a properly defined function, ifcos(x) is defined in the domain x ∈ [0, 2π]? Is it a properly definedfunction, if cos(x) is defined in the domain x ∈ [0, π]? What aboutthe inverse of sin(x) for the same three domains?

3. Show that the even moments of the normalized gaussian are givenby the expressions below, by simply taking derivatives with respectto α2 of both sides of Eq.(1.37):

∫ ∞

−∞x2e−α2x2

dx =1

2

√π

α3

∫ ∞

−∞x4e−α2x2

dx =3

4

√π

α5

∫ ∞

−∞x2ne−α2x2

dx =(2n − 1)!!

2n

√π

α2n+1

where the symbol (2n − 1)!! denotes the product of all odd integersup to (2n − 1).

4. Prove that the derivative of the exponential function is given by theexpression of Eq. (1.22), using the definition of the exponential givenin Eq. (1.9).

Chapter 2

Power series expansions

We are often interested in representing a function f(x) as a series expansionof other functions, gn(x), with n = 1, 2, ..., which can be manipulated moreeasily than the original function:

f(x) =∞∑

n=1

cngn(x) (2.1)

By this we mean that the functions in the series are easier to evaluate,differentiate or integrate, and it is consequently easier to deal with thesefunctions than with the original function. There are three important issuesin dealing with series expansions:

• What are the functions gn(x) that should be included in the series?

• What are the numerical coefficients cn accompanying the functionsin the expansion?

• Does the series converge and how fast?

Typically, the series expansion is an exact representation of the functiononly if we include an infinite number of terms. This approach is useful if theexact infinite series expansion can be truncated to a few terms which givean adequate approximation of the function. Thus, the choices in responseto the above questions should be such that:

• The functions gn(x) are easy to work with (evaluate, differentiate,integrate).

• The coefficients cn can be obtained easily from the definition of gn(x)and the function f(x).

23

24 CHAPTER 2. POWER SERIES EXPANSIONS

• The series must converges fast to f(x), so that truncation to the firstfew terms gives a good approximation.

2.1 Taylor series

The simplest terms on which a series expansion can be based are powers ofthe variable x, or more generally powers of (x − x0) where x0 is a specificvalue of the variable near which we want to know the values of the func-tion. This is known as a “power series” expansion. Another very usefulexpansion, the Fourier series expansion, will be discussed in detail in chap-ter 5. For the case of power series, there exists an expression, known asthe Taylor series expansion, which gives the best possible representation ofthe function in terms of powers of (x−x0). The Taylor series expansion ofa continuous and infinitely differentiable function of x around x0 in powersof (x − x0) is:

f(x) = f(x0) +1

1!f ′(x0)(x − x0) +

1

2!f ′′(x0)(x − x0)

2 + · · ·

+1

n!f (n)(x0)(x − x0)

n + · · · (2.2)

with the first, second,..., nth, ... derivatives evaluated at x = x0. Theterm which involves the nth derivative is called the term of order n. Thenumerical coefficient accompanying this term is the inverse of the factorial:n! = 1 · 2 · 3 · · ·n. For x close to x0, the expansion can be truncated to thefirst few terms, giving a very good approximation for f(x) in terms of thefunction and its lowest few derivatives evaluated at x = x0.

Using the general expression for the Taylor series, we obtain for thecommon exponential, logarithmic, trigonometric and hyperbolic functions:

ex = 1 + x +1

2x2 +

1

6x3 + · · · , x0 = 0 (2.3)

log(1 + x) = x − 1

2x2 +

1

3x3 + · · · , x0 = 1 (2.4)

cos x = 1 − 1

2x2 +

1

24x4 + · · · , x0 = 0 (2.5)

sin x = x − 1

6x3 +

1

120x5 + · · · , x0 = 0 (2.6)

tanx ≡ sin x

cos x= x +

1

3x3 +

2

15x5 + · · · , x0 = 0 (2.7)

cotx ≡ cos x

sin x=

1

x− 1

3x − 1

45x3 + · · · , x0 = 0 (2.8)

2.1. TAYLOR SERIES 25

cosh x = 1 +1

2x2 +

1

24x4 + · · · , x0 = 0 (2.9)

sinh x = x +1

6x3 +

1

120x5 + · · · , x0 = 0 (2.10)

tanh x ≡ sinh x

cosh x= x − 1

3x3 +

2

15x5 + · · · , x0 = 0 (2.11)

coth x ≡ cosh x

sinh x=

1

x+

1

3x − 1

45x3 + · · · , x0 = 0 (2.12)

(2.13)

Fig. 2.1 shows the exponential function near x0 = 0, and its approxi-mation by the two and three terms of lowest order in the Taylor expansion.As is evident from this figure, the more terms we keep in the expansionthe better the approximation is over a wider range of values around x0.

Another very useful power series expansion is the binomial expansion:

(a + b)p = ap +p

1!ap−1b +

p(p − 1)

2!ap−2b2 +

p(p − 1)(p − 2)

3!ap−3b3 + · · ·

+p(p − 1)(p − 2) · · · (p − (n − 1))

n!ap−nbn + · · · (2.14)

with a > 0, b, p real. This can be easily obtained as a Taylor series expan-sion (see Problem 1). It is evident from this expression that for p = N : aninteger, the binomial expansion is finite and has a total number of (N +1)terms; if p is not an integer it has infinite terms. When p = N : an integer,the binomial expansion can be written as:

(a + b)N = aN +N !

(N − 1)!1!aN−1b +

N !

(N − 2)!2!aN−2b2 + · · ·

=N∑

k=0

N !

(N − k)!k!aN−kbk (2.15)

With the help of the Taylor expansion of the exponential and the bi-nomial expansion, we can decipher the deeper meaning of the exponentialfunction: it is the result of conitnuous compounding, which mathematicallyis expressed by the relation

limN→∞

(

1 +κt

N

)N

= eκt (2.16)

where we use the variable t (multiplied by the constant factor κ), insteadof x to make the analogy to compounding with time more direct. The


In[1]:= Show@Plot@Exp@xD, 8x, -1, 1<, PlotRange ® 80, 3<D,Plot@1 + x, 8x, -1, 1<, PlotRange ® 80, 3<D,Plot@1 + x + 1�2 x^2, 8x, -1, 1<, PlotRange ® 80, 3<DD

-1 -0.5 0.5 1

0.5

1

1.5

2

2.5

3

-1 -0.5 0.5 1

0.5

1

1.5

2

2.5

3

-1 -0.5 0.5 1

0.5

1

1.5

2

2.5

3

-1 -0.5 0.5 1

0.5

1

1.5

2

2.5

3

Taylor series of exponential 1

Figure 2.1: The exponential function and its approximation by the Taylorexpansion keeping terms of order up to one and two.

2.2. CONVERGENCE OF NUMBER SERIES 27

argument goes as follows: suppose that at time t = 0 we have a capitalK, to which we add interest at a rate κ every time interval dt, that is, ouroriginal capital increases by a factor (1 + κdt) at every time interval 1. Inorder to reach the continuous compounding limit we will let dt → 0. Thenwe will have the following amounts at successive time intervals:— at time 0 → K,— at time dt → K(1 + κdt),— at time 2dt → K(1 + κdt)2, ...,— at time Ndt → K(1 + κdt)N .Now we choose N large enough, so that even though dt → 0, Ndt = t witht a finite value. With the help of the binomial expansion we get:

(1 + κdt)N =N∑

k=0

N !

k!(N − k)!(κdt)k

The general term in this series can be written as:

N !

k!(N − k)!(κdt)k =

1

k!(κdt)k(N − k + 1)(N − k + 2) · · · N

but since N is very large, each factor involving N in the parentheses isapproximately equal to N , giving

N !

k!(N − k)!(κdt)k ≈ 1

k!(κdt)kNk =

1

k!(κNdt)k =

1

k!(κt)k

and for N → ∞, dt → 0, Ndt = t:finite, we get:

∞∑

k=0

1

k!(κt)k = eκt

where the last equality follows from the Taylor expansion of the exponentialfunction.

2.2 Convergence of number series

An important consideration when dealing with series expansions is whetherthe series converges or not. We are in principle interested in the conver-gence of series expansions of functions, such as the power series expansions

1If we want to be very precise, we must assign dimensions of [1/time] to the constantκ, as in 1% per day, so for dt = 1 day, the capital would increase by a factor 1.01.


we discussed above. However, it is easier to establish the basic concepts inthe context of series of real numbers.

To discuss the issue of convergence of a number series, we refer firstto the more familiar case of the convergence of a sequence. A sequence ofnumbers

a1, a2, . . . , an, . . .

is said to converge if for n → ∞ the terms get closer and closer to a welldefined, finite value. For example, for the sequence

2

1,

3

2,

4

3,

5

4, . . .

defined by the following expression for the nth term

an =n + 1

n⇒ lim

n→∞an = 1

that is, this sequence converges to the value 1 (for large enough valueof n, an comes arbitrarily close to unity). The precise definition of theconvergence of a sequence is the following: A sequence an converges if forany ǫ > 0, we can find a value of the index n = N such that

|an − a| < ǫ, ∀n > N

where a is the limit (also called the limiting or asymptotic value) of thesequence. Since this must hold for any ǫ > 0, by taking ǫ → 0 the abovedefintion ensures that beyond a certain value of the index n all terms ofthe sequence are within the narrow range a± ǫ, that is, arbitrarily close tothe limit of the sequence.

Now consider an infinite series, defined by

∞∑

n=1

an

The issue here is whether or not the summation of the infinite numberof terms an gives a well defined (finite) value. Notice that the problemcomes from the infinite number of terms as well as from the values thatthese terms take for large values of n. For this reason, typically we are notinterested in the sum of the first few terms, or for that matter in the sum ofany finite number of terms, because this part always gives a finite number,as long as the terms themselves are finite. Accordingly, we usually focuson what happens for the “tail” of the series, that is, starting at some value


of the index n and going all the way to infinity. We will therefore oftendenote the infinite series as a sum over n of terms an, without specifyingthe starting value of n and with the understanding that the final value ofthe index is always ∞.

In order to determine the convergence of an infinite series we can turnit into a sequence, by considering the partial sums:

sn =n∑

j=1

aj

because then we can ask whether the sequence sn converges. If it does,so will the series, since the asymptotic term of the sequence sn for n →∞ is the same as the infinite series. We use this concept to define theconvergence of series: a series is said to converge if the sequence of partialssums converges; otherwise we say that the series diverges.

As an example, consider the sequence of numbers given by:

aj = tj , 0 < t < 1, j = 0, 1, . . .

and the corresponding series, given by:

∞∑

j=0

aj =∞∑

j=0

tj = 1 + t + t2 + t3 + · · ·+ tn + · · ·

which is known as the “geometric series”. The corresponding sequence ofpartials sums is:

sn =n∑

j=0

tj = 1 + t + t2 + t3 + · · ·+ tn =1 − tn+1

1 − t

which for large n converges because

t < 1 ⇒ limn→∞

tn = 0 ⇒ limn→∞

1 − tn+1

1 − t=

1

1 − t(2.17)

which proves that the limit of the geometric series, for 0 < t < 1, is1/(1 − t).

One useful thing to notice is that if the series of absolute values con-verges, then so does the original series:

∑

n

|an| : converges ⇒∑

n

an : converges


The reason for this is simple: the absolute values are all positive, whereasthe original terms can be positive or negative. If all the terms have thesame sign, then it is the same as summing all the absolute values with theoverall sign outside, so if the series of absolute values converges so doesthe original series. If the terms can have either sign, then their total sumin absolute value will necessarily be smaller than the sum of the absolutevalues, and therefore if the series of absolute values converges so does theoriginal series. If the series

∑ |an| converges, then we say that the series∑

an converges absolutely (and of course it converges in the usual sense).A number of tests have been devised to determine whether or not a

series converges by looking at the behavior of the nth term. We give someof these tests here.

1. Comparison test: Consider two series,∑

an,∑

bn:

if |bn| ≤ |an| and∑

an converges absolutely

⇒∑

bn converges absolutely (2.18)

if |an| ≤ |bn| and∑

an diverges absolutely

⇒∑

bn diverges absolutely (2.19)

Note, however, that if a series diverges absolutely we cannot concludethat it diverges.

2. Ratio test: For the series∑

an,

if∣

∣

∣

∣

an+1

an

∣

∣

∣

∣

≤ t < 1, ∀ n > N ⇒ the series converges (2.20)

Proof: from the inequiality obeyed by successive terms for n > N weobtain:|aN+2| ≤ t|aN+1||aN+3| ≤ t|aN+2| ≤ t2|aN+1||aN+4| ≤ t|aN+3| ≤ t3|aN+1|By summing the two sides of the above inequalities we obtain:

|aN+1| + |aN+2| + |aN+3| + · · · ≤ |aN+1|(1 + t + t2 + · · ·) =|aN+1|(1 − t)

where we have used the result for the geometric series summation,Eq. (2.17). The last relation for the sum of an with n > N showsthat the “tail” of the series of absolute values is bounded by a finitenumber, |aN+1|/(1 − t), therefore the series converges absolutely.


3. Limit tests: For the series∑

an, we define the following limits:

limn→∞

|an+1||an|

= L1, limn→∞

n

√

|an| = L2

if L1 or L2 < 1 → the series converges absolutely;if L1 or L2 > 1 → the series diverges;if L1 or L2 = 1 → we cannot determine convergence.

4. Integral test: If 0 < an+1 ≤ an and f(x) is a continuous non-increasing function such that an = f(n), then the series

∑∞n=1 an

converges if and only if

∫ ∞

1f(x)dx < ∞

5. Dirichlet test: Suppose that we have a series with terms cj whichcan be expressed as follows:

∞∑

j=1

cj =∞∑

j=1

ajbj

Consider aj and bj as separate series: then if the partial sums of aj

are bounded and the terms of bj are positive and tend monotonicallyto 0, the series

∑

cj converges.

As an example of a series that satisfies the Dirichlet test consider the serieswith terms given by:

cj = cos[

(j − 1)π

4

]

1

j

Making the obvious choices aj = cos[(j − 1)π/4] and bj = (1/j), anddefining the partial sums as

sn =n∑

j=1

aj

leads to the following table of values for an, bn and sn:


n 1 2 3 4 5 6 7 8 9 · · ·

an 1 1√2

0 − 1√2

−1 − 1√2

0 1√2

1 · · ·

sn 1 1 + 1√2

1 + 1√2

1 0 − 1√2

− 1√2

0 1 · · ·

bn 1 12

13

14

15

16

17

18

19

· · ·

As is evident from this table, the partial sums of aj are bounded by thevalue (1+1/

√2), and the bj terms are positive and monotonically decreas-

ing. From this we conclude that the original series cj converges.

2.3 Convergence of series of functions

We turn next to the case of series of functions as in Eq. (2.1). Of coursefor a given value of x this is just a series of real numbers, since given thevalue of x, each function gn(x) takes on a specific real value. However, itis useful to consider the convergence of the series as a function of x.

As in the case of series of numbers, we study the convergence of a seriesof functions by turning them into sequences through the partial sums:

sn(x) =n∑

j=1

cjgj(x)

where gj(x) is the set of functions we have chosen for the expansion. If thesequence sn(x) converges to the limiting function s(x), so will the seriesbecause

∞∑

j=1

cjgj(x) = limn→∞

sn(x) = s(x)

In this case, the condition for convergence of the sequence sn(x) is: forany ǫ > 0 and a given value of x, we can find N such that for all n >N , |sn(x) − s(x)| < ǫ. However, the dependence of this condition on xintroduces another dimension to the problem. This leads to the notion ofuniform convergence, which has to do with convergence of the series for arange of values of x. The definition of uniform convergence for x ∈ [x1, x2] is

2.3. CONVERGENCE OF SERIES OF FUNCTIONS 33

the following: when for any ǫ > 0 we can find N such that |sn(x)−s(x)| < ǫfor all n > N and for all x in the interval of interest x ∈ [x1, x2].

The meaning of uniform convergence of the sequence sn(x) for x ∈[x1, x2] is that for ǫ > 0, we can find N such that for n > N all terms ofsn(x) lie within ±ǫ of s(x) and this applies to all x in the interval [x1, x2].N always depends on ǫ (usually, the smaller the value of ǫ the larger theN we have to choose), but as long as we can find one N for all x in theinterval [x1, x2] that satisfies this condition (that is, N is finite and doesnot depend on x), the sequence converges uniformly.

As an example of uniform convergence we consider the sequence

sn(x) =n

n + 1(1 − x), for x ∈ [0, 1]

(we are not concerned here from what series this sequence was produced).These functions for n = 1, 2, 3 are illustrated in Fig. 2.2(a). We first needto determine limit of the sequence:

s(x) = limn→∞

sn(x) = limn→∞

n

n + 1(1 − x) = (1 − x)

To determine whether the sequence is uniformly convergent, we examine

|sn(x) − s(x)| =∣

∣

∣

∣

n

n + 1(1 − x) − (1 − x)

∣

∣

∣

∣

=∣

∣

∣

∣

1

n + 1(1 − x)

∣

∣

∣

∣

=1

n + 1(1 − x)

Notice that because x ∈ [0, 1], we will have

0 ≤ (1 − x) ≤ 1 ⇒ |sn(x) − s(x)| =1 − x

n + 1≤ 1

n + 1

Given an ǫ > 0, we want to find the value of N such that (1−x)/(n+1) < ǫfor n > N . But (1 − x)/(n + 1) ≤ 1/(n + 1), so it is sufficient to find Nsuch that 1/(n + 1) < ǫ. If we choose

1

N + 1< ǫ ⇒ N >

1

ǫ− 1

then 1/(n + 1) < 1/(N + 1) < ǫ and

|sn(x) − s(x)| ≤ 1

n + 1<

1

N + 1< ǫ

for n > N . Since our choice of N is independent of x we conclude that thesequence sn(x) converges uniformly.


n

1

2

3

n

1

23

s (x)

x0

s (x)

s (x)

s (x)

s(x)

1

s (x)

s (x)

s (x)s (x)

x0 s(x)

Figure 2.2: Examples of (a) uniform and (b) non-uniform convergence ofsequences sn(x). In each case the limiting function is denoted by s(x).

As a counter-example of a sequence that does not convergence uni-formly, we consider the sequence

sn(x) =nx

1 + n2x2, x ∈ [0,∞)

These functions for n = 1, 2, 3 are illustrated in Fig. 2.2(b). Again we firstdetermine the limit s(x): for a fixed value of x

s(x) = limn→∞

sn(x) = limn→∞

nx

1 + n2x2= 0

To determine whether the sequence converges uniformly or not, we examinethe behavior of

|sn(x) − s(x)| =nx

1 + n2x2= sn(x)

Let us study the behavior of sn(x) for a given n, as a function of x. Thisfunction has a maximum at x = 1/n, which is equal to 1/2. For fixed x,and given ǫ > 0, if the sequence converges uniformly we should be able tofind N such that nx/(1 + n2x2) < ǫ, for n > N . How large should N beto make this work? We define n0 to be the index of the function with itsmaximum at our chosen value of x ⇒ n0 = 1/x (we assume that x is smallenough so that 1/x is arbitrarily close to an integer). For n larger thann0, we will have sn(x) < sn0

(x) and for sufficiently large n we will achievesn(x) < ǫ. Therefore, n must be at least larger than n0. How much larger?Call N the first value of n for which sn(x) < ǫ, then

Nx

1 + N2x2=

N 1n0

1 + N2( 1n0

)2< ǫ ⇒ 1 +

(

N

n0

)2

>1

ǫ

N

n0


This inequality is satisfied by the choice (N/n0) > 1/ǫ:

N

n0>

1

ε⇒(

N

n0

)2

>N

n0

1

ǫ⇒ 1 +

(

N

n0

)2

>N

n0

1

ǫ

But N > n0/ǫ ⇒ N > 1/(ǫx), which makes the choice of N dependon x. This contradicts the requirements for uniform convergence. If weconcentrate for a moment to values x ∈ [x,∞) where x is a finite number> 0, the choice

N >1

xǫ≥ 1

xǫ

covers all possibilities in the interval [x,∞), so that sn(x) converges unifromlyfor x in that interval. But for x ∈ [0,∞) it is not possible to find a single Nto satisfy N > 1/(xǫ) for all x, as required for uniform convergence, fromwhich we conclude that the sequence sn(x) does not converge uniformly forx ∈ [0,∞). The problem with this sequence is that there is a spike nearx = 0, and as n gets larger the spike moves closer to zero and becomesnarrower, but it never goes away. This spike cannot be contained by an ǫwithin s(x) which is zero everywhere. If we truncate the interval to [x,∞)with x a finite value, then we can sqeeze the spike to values below x andthe sequence converges uniformly in this range, but if we include x = 0 inthe interval we can never squeeze the spike away.

As is evident from these two examples, non-uniform convergence impliesthat the difference between sn(x) and s(x) can be made arbitrarily smallfor each x with suitable choice of n > N , but cannot be made uniformlysmall for all x simultaneously for the same value of N .

It is often useful to try to establish uniform convergence by a graphical,rather than an analytic argument. For an example, consider the series

1 − x +1

2!x2 − 1

3!x3 +

1

4!x4 − 1

5!x5 + · · · (2.21)

in the range x ∈ [0, 1]. The partial sums of this series are given by:

sn(x) = 1 +n∑

k=1

(−1)k 1

k!xk (2.22)

We easily see that in the limit n → ∞ this sequence becomes the Taylorexpansion of the exponential of −x, so that:

s(x) = limn→∞

sn(x) = e−x (2.23)


In[1]:= Plot@HExp@-xD - H1 + Sum@H-xL^k�k!, 8k, 1, 5<DLL,8x, 0, 1<, PlotRange ® 8-0.0015, 0.0015<D

0.2 0.4 0.6 0.8 1

-0.0015

-0.001

-0.0005

0.0005

0.001

0.0015



0.2 0.4 0.6 0.8 1

-0.0002

-0.00015

-0.0001

-0.00005

0.00005

0.0001

0.00015

0.0002



0.2 0.4 0.6 0.8 1

-0.00003

-0.00002

-0.00001

0.00001

0.00002

0.00003


Uniform convergence of series expansion of exponential 1

Figure 2.3: Graphical argument for the uniform convergence of the sequenceof partial sums defined in Eq. (2.22) to exp(−x) in the interval x ∈ [0, 1].


In order to establish uniform convergence of the original series in the giveninterval, we must take the difference |s(x) − sn(x)| and ask whether ornot it is bounded by some ǫ > 0 for n > N and for all x ∈ [0, 1]. Aswe can see from the plots of this difference as a function of x ∈ [0, 1] fordifferent values of n, shown in Fig. 2.3, no matter how small ǫ is, we canalways find a value for N which satisfies this condition. For example, ifǫ = 0.0015 → N = 5, if ǫ = 0.0002 → N = 6, if ǫ = 0.00003 → N = 7, etc.These plots also show that the worst case senario corresponds to x = 1, soall we have to do is to make sure that |s(x) − sn(x)| for x = 1 is smallerthan the given ǫ, which will then guarantee that the same condition is metfor all other values of x in the interval [0, 1].

By analogy to the tests for convergence of series of numbers, we candevise tests for the uniform convergence of series of functions. Some usefultests are:

• Weierstrass test: consider the sequence of functions sj(x) and thesequence of positive numbers aj. If

|sj(x)| < aj , for x ∈ [x1, x2]

and aj is a convergent sequence, then so is the sequence sj(x) in theinterval [x1, x2]. Since aj does not involve x, this implies uniformconvergence of the sequence sj(x).

• Cauchy test: a sequence for which

|sn(x) − sm(x)| < ǫ for n, m > N

is called a Cauchy sequence. This is a necessary condition for con-vergence of the sequence, because if it is convergent we will have:

|sn(x) − s(x)| <ǫ

2, |sm(x) − s(x)| <

ǫ

2, for n, m > N

where we have called ǫ/2 the small quantity that restricts sn(x) andsm(x) close to s(x) for n, m > N . We can rewrite the differencebetween sn(x) and sm(x) as:

|sn(x) − sm(x)| = |sn(x) − s(x) + s(x) − sm(x)|

and using the triangle inequality and the previous relation, we obtain

|sn(x) − sm(x)| ≤ |sn(x) − s(x)| + |sm(x) − s(x)| < ǫ

It turns out that this is also a sufficient condition, so that if a sequenceis a Cauchy sequence then it converges uniformly.


Finally, we mention here four useful theorems related to uniform con-vergence, without delving into their proofs:

• Theorem 1. If there is a convergent series of constants,∑

cn, suchthat |gn(x)| ≤ cn for all values of x ∈ [a, b], then the series

∑

gn(x)is uniformly and absolutely convergent in [a, b].

• Theorem 2. Let∑

gn(x) be a series such that each gn(x) is a con-tinuous function of x in the interval [a, b]. If the series is uniformlyconvergent in [a, b], then the sum of the series is also a continuousfunction of x in [a, b].

• Theorem 3. If a series of continuous functions∑

gn(x) convergesuniformly to g(x) in [a, b], then

∫ β

αg(x)dx =

∫ β

αg1(x)dx +

∫ β

αg2(x)dx + . . . +

∫ β

αgn(x)dx + . . . ,

where a ≤ α ≤ b and a ≤ β ≤ b. Moreover, the convergence isuniform with respect to α and β.

• Theorem 4. Let∑

gn(x) be a series of differentiable functions thatconverges to g(x) in [a, b]. If the series

∑

g′n(x) converges uniformly

in [a, b], then it converges to g′(x).

The usefullness of uniform convergence, as is evident from the abovetheorems, is that it allows us to interchange the operations

∫

dx ord

dx, and

∞∑

n

at will and thereby makes possible the evaluation of integrals and deriva-tives trhough term-by-term integration or differentiation of a series. Non-uniform convergence implies that in general these operations cannot beinterchanged, although in some cases this may be possible.


Problems

1. Derive the binomial expansion, Eq. (2.14), from the Taylor expansionof the function

f(x) = ap(1 + x)p

with x = (b/a), expanded around x0 = 0.

2. Using the Taylor expansion, find power series representations of thefunction

1√1 ± x

For what range of values of x can these power series expansions beused effectively as approximations of the function, that is, they canbe truncated at some term, with the remainder of the terms makingan insignificant contribution?

3. Find the Taylor expansion of the following function:

f(x) = cos(πex), near x0 = 0

f(x) = cosh(sin(x)), near x0 = 0

4. (a) Show that the harmonic series:

∞∑

n=1

1

n

diverges (Hint: use the integral test).

(b) Show that the series∞∑

n=1

1

nα

converges if and only if α > 1 (Hint: use the integral or com-parison test).

(c) Show that the alternating harmonic series:

∞∑

n=1

(−1)n

n

converges (Hint: combine successive terms in pairs in the partialsums of the series. Show that the partial sums converge toa unique limit, using another series, for which one can easilydetermine the convergence.)


5. Establish by both an analytic and a graphical argument that thesequence

sn(x) =(

1 − x

n

)n

converges uniformly in the interval x ∈ [0, 1]. Does this sequenceconverge uniformly in the interval x ∈ [0,∞)?

6. Consider the sequence sn(x) = 2nxe−nx2

for x ∈ [0, 1] and n ≥ 1.

(a) For any chosen finite x in the give interval, what is the limit ofthe sequence for n → ∞? Call this limit s(x).

(b) Make a sketch showing the character of sn(x) for the valuesn = 1, 2, 3. Uniform convergence requires that we be able tobound the differene between s(x) and sn(x), for all x in theinterval, such that the bound → 0 as n → ∞. Does the sequenceconverge uniformly?

(c) If the interval had been specified as x ∈ [x0, 1], where 0 < x0 <1, how might the index N be chosen for a given bound ǫ, so that|s(x) − sn(x)| < ǫ for n > N and x ∈ [x0, 1]

(d) Evaluate the integral∫ 1

0sn(x)dx

and take its limit for n → ∞. How does this limit compare tothe integral

∫ 1

0s(x)dx

7. Consider the sequence sn(x) = e−x2/n2

for x ∈ [0,∞] and n ≥ 1.

(a) For any chosen finite x in the give interval, what is the limit ofthe sequence for n → ∞? Call this limit s(x).

(b) Make a sketch showing the character of sn(x) for the valuesn = 1, 2, 3. Does the sequence converge uniformly in [0,∞]?Explain in detail your answer.

(c) If the interval is specified as x ∈ [0, x0], where 0 < x0, doesthe sequence converge uniformly in this interval? Justify youranswer.

8. Consider the sequence sn(x) = 2nλe−nx2

for x ∈ [0,∞], n ≥ 1 and λ:real. Find the values of λ for which the sequence converges uniformly.

Chapter 3

Functions of complex variables

3.1 Complex numbers

We begin with some historical background to complex numbers. Complexnumbers were originally introduced to solve the inocent looking equation

x2 + 1 = 0 (3.1)

which in the realm of real numbers has no solution because it requires thesquare of a number to be a negative quantity. This led to the introductionof the imaginary unit i defined as

i =√−1 ⇒ i2 = −1 (3.2)

which solved the original problem seemingly by brute force. This solutionhowever, had far reaching implications. For starters, the above equation isjust a special case of the second-order polynomial equation

ax2 + bx + c = 0 (3.3)

The usual solutions of this equation for ∆ ≥ 0 are

x =−b ±

√∆

2a(3.4)

but there are no real solutions for ∆ < 0. With the use of the imaginaryunit, for ∆ < 0 the solutions become:

z =−b ± i

√

|∆|2a

(3.5)

41

42 CHAPTER 3. FUNCTIONS OF COMPLEX VARIABLES

z is a complex number with real part −b/2a and imaginary part ±√

|∆|/2a.

We denote the real part of a complex number z by x = R[z] and theimaginary part by y = I[z]. Thus, the introduction of the imaginary unitmade sure that the second-order polynomial equation always has exactlytwo solutions (also called “roots”). In fact, every polynomial equation ofdegree n

anxn + an−1xn−1 + · · · + a1x + a0 = 0 (3.6)

with real coefficients aj , j = 0, 1, 2, . . . , n has exactly n solutions if we allowthem to be complex numbers.

A significant breakthrough in working with complex numbers was in-troduced by the representation of the complex number z = x + iy on theso-called “complex plane” with abscissa (x) along the real axis (horizontal),and ordinate (y) along the imaginary axis (vertical). This is illustrated inFig. 3.1.

θ

θ

θ

0 real axis

x

y

z

imaginary axis

r

x = r cos( )

y = r sin( )

Figure 3.1: Illustration of the representation of a complex number z on thecomplex plane, in terms of its real x and imaginary y parts and the polar coor-dinates r and θ.

An extension of this idea is the so-called Riemannian sphere, introducedto include the point at infinity in the representation. This is called theextended complex plane. In this picture, illustrated in Fig. 3.2 the SouthPole corresponds to the value z = 0, the sphere has diameter equal tounity, and the North Pole corresponds to the point z → ∞. Each pointon the z-plane is mapped onto a point on the sphere through the followingconstruction: consider a line that begins at the North Pole and ends onthe z-plane; the point where this line intersect the surface of the sphereis the image of the point where the line meets the z-plane. Trhough this

3.1. COMPLEX NUMBERS 43

procedure, the equator of the sphere maps to the unit circle on z-plane,the points in the southern hemisphere map to the interior of this circle andthe points in the northern hemisphere map to the exterior of this circle.

1z

z2

z 3

z 3

z2

1z

0

NP: line from NP intersects sphere at point in northern hemisphere: line from NP intersects sphere at point on the equator: line from NP intersects sphere at point in southern hemisphere

1

1

x

y

Figure 3.2: Illustration of the representation of a complex number on theRiemannian sphere and its mapping to the complex z-plane.

A useful concept is the complex conjugate z of the complex number z:

z = x + iy ⇒ z = x − iy (3.7)

A complex number and its conjugate have equal magnitude, given by

|z| =√

z · z =√

x2 + y2 (3.8)

We examine next the form that different operations with complex num-bers take: The product and the quotient of two complex numbers z1, z2 interms of their real and imaginary components, z1 = x1 + iy1, z2 = x2 + iy2,are:

z1z2 = (x1x2 − y1y2) + i(x1y2 + x2y1) (3.9)

z1

z2=

(

x1x2 + y1y2

x22 + y2

2

)

+ i

(

−x1y2 + x2y1

x22 + y2

2

)

(3.10)

An extremely useful expression is the so-called Euler formula:

eix = cos(x) + i sin(x) (3.11)


We can prove Euler’s formula by using the series expansions of the trigono-metric functions (see chapter 1):

cos(x) = 1 − 1

2!x2 +

1

4!x4 − · · · = 1 +

1

2!(ix)2 +

1

4!(ix)4 + · · ·

i sin(x) = ix − i1

3!x3 + i

1

5!x5 − · · · = (ix) +

1

3!(ix)3 +

1

5!(ix)5 + · · ·

where we have used i2 = −1, i3 = −i, i4 = 1, i5 = i, etc. Adding these twoexpressions we find

cos(x)+i sin(x) = 1+(ix)+1

2!(ix)2+

1

2!(ix)3+

1

4!(ix)4+

1

5!(ix)5+· · · (3.12)

in which we recognize the Taylor series expansion for the exponential of(ix), which is the desired result. It may seem strange that we identifythe above infinite series with an exponential, because we have also dealtwith exponentials of real numbers so far. What allows us to do this is thatthe laws of addition and multiplication are the same for real and imaginarynumbers, therefore the term by term operations involved in the above seriesmake it identical to the exponential of ix power series expansion a la Taylor.

Applying Euler’s formula for x = π we obtain:

eiπ = −1 (3.13)

which relates in a neat way the irrational constants

e = 2.7182818284590452 · · ·π = 3.1415926535897932 · · ·

and the real and imaginary units.There is a different representation of complex numbers in terms of po-

lar coordinates, which produces a much more convenient way of handlingcomplex number operations. The polar coordiantes radius r and angle θand the usual cartesian coordinates x, y are related by:

x = r cos(θ), y = r sin(θ) ⇒ r =√

x2 + y2, θ = tan−1(

y

x

)

as illustrated in Fig. 3.1. In terms of the polar coordinates the complexnumber takes the form:

z = x + iy = r[cos(θ) + i sin(θ)] (3.14)

3.2. COMPLEX VARIABLES 45

r is called the magnitude of z because r = |z|, and θ is called the argument,denoted by Arg[z]. Note that the use of polar coordinates introduces anambiguity: if the argument is changed by an integral multiple of 2π thecomplex number stays the same:

r[cos(θ + 2kπ) + i sin(θ + 2kπ)] = r[cos(θ) + i sin(θ)] (3.15)

where k: any integer. This produces certain complications in handlingcomplex numbers using polar coordinates, which we will discuss in detailbelow.

The use of Euler’s formula and the polar coordinate representationtrivializes operations with complex numbers:

product : z1 · z2 = (r1r2)ei(θ1+θ2) = (r1r2)[cos(θ1 + θ2) + i sin(θ1 + θ2)]

inverse : z−1 = r−1e−iθ = r−1[cos(−θ) + i sin(−θ)]

quotient :z1

z2=

r1

r2ei(θ1−θ2) =

r1

r2[cos(θ1 − θ2) + i sin(θ1 − θ2)]

nth power : zn = rneinθ = rn[cos(nθ) + i sin(nθ)]

nth root : z1/n = r1/nei(θ+2kπ)/n, k : integer

Applying the expression for the nth power to a complex number of magni-tude 1, we obtain:

[

eiθ]n

= [cos(θ) + i sin(θ)]n = [cos(nθ) + i sin(nθ)]

the last equation is known as De Moivre’s formula. As an application ofthis powerful formula, we can prove the trigonometric relations discussedin chapter 1, Eqs. (1.7), (1.8):

[cos(θ) + i sin(θ)]2 = [cos(2θ) + i sin(2θ)] ⇒

cos2(θ) − sin2(θ) = cos(2θ), 2 cos(θ) sin(θ) = sin(2θ)

where we simply raised the left-hand side of the first equation to the squarepower in the usual way and then equated real and imaginary parts on thetwo sides of the resulting equation.

3.2 Complex variables

The next natural step is to consider functions of the complex variablez. First, notice that the ambiguity in the value of the argument θ of z


in the polar coordinate representation has important consequences in theevalution of the nth root: it produces n different values for the root

z1/n = r1/n

[

cos

(

θ + 2kπ

n

)

+ i sin

(

θ + 2kπ

n

)]

, k = 0, ..., n − 1 (3.16)

This is actually what should be expected if we consider it a solution of analgebraic equation: call the unknown root w, then

w = z1/n ⇒ wn − z = 0

which implies that there should be n values of w satisfying the equation. Avalue of k outside the range [0, n− 1] gives the same result as the value ofk within this range that differs from it by a multiple of ±n. For example,consider the equation

w4 − 16 = 0 ⇒ w4 = 24ei(0+2kπ) ⇒ w = 2ei(2kπ/4), k = 0, 1, 2, 3

⇒ w = 2, 2eiπ/2, 2eiπ, 2ei3π/2, or w = 2, 2i, −2, −2i

as can be easily verified by raising these four solutions to the fourth power.From Euler’s formula we can derive the following expressions for the

trigonometric functions sin(x) and cos(x) in terms of exp(±ix):

cos(x) =eix + e−ix

2(3.17)

sin(x) =eix − e−ix

2i(3.18)

The exponential of (ix) involved in the above expressions is a new typeof function, because it contains the imaginary unit i. Taking this idea astep further, we can consider functions of complex numbers, such as theexponential of z = x + iy:

ez = ex+iy = exeiy

and since we know how to handle both the exponential of a real numberexp(x) and the exponential of an imaginary number exp(iy), this expressionis well defined. It turns out (see Problem 1) that the exponential of z canbe experessed as:

ez = 1 + z +1

2!z2 +

1

3!z3 + · · · (3.19)

which is consistent with the Taylor expansion of the exponential of a realvariable (see chapter 1) or the exponential of a purely imaginary variable

3.2. COMPLEX VARIABLES 47

(see above, proof of Euler’s formula). We can also define the hyperbolicand trigonometric functions of the complex variable z as:

cosh(z) =ez + e−z

2, sinh(z) =

ez − e−z

2(3.20)

cos(z) =eiz + e−iz

2, sin(z) =

eiz − e−iz

2i(3.21)

by analogy to the definition of the same functions for a real variable. Thesedefinitions lead to the following relations:

cosh(iz) = cos(z), sinh(iz) = i sin(z) (3.22)

cos(iz) = cosh(z), sin(iz) = i sinh(z) (3.23)

Another use of the Euler formula is to provide a definition for thenatural logaritm of a complex number:

ln(z) = ln(rei(θ+2kπ)) = ln(r) + i(θ + 2kπ) (3.24)

In this case, the ambiguity in the argument of z produces an ambiguity inboth the argument and the magnitude of ln(z) which are:

| ln(z)| =√

[ln(r)]2 + (θ + 2kπ)2, Arg[ln(z)] = tan−1

(

θ + 2kπ

ln(r)

)

which both depend on the value of k. This bothersome situation is dealt byconsidering what is called the principal value, that is, limiting the values ofthe argument to one specific interval of 2π, usually chosen as the interval0 ≤ θ < 2π (which corresponds to k = 0 in the above expressions).

The logarithm can be used to define arbitrary powers, such as raisingthe complex number t to a complex power z:

w = tz ⇒ ln(w) = z ln(t) ⇒ ln(rw) + iθw = (x + iy) [ln(rt) + iθt + i2kπ]

where rt and θt are the magnitude and argument of t, and again the lastexpression allows us to determine the magnitude and argument of w byequating real and imaginary parts on the two sides of the equation:

ln(rw) = x ln(rt) − y(θt + 2kπ), θw = x(θt + 2kπ) + y ln(rt)

Example: We are interested in finding the magnitude and argument of wdefined by the equation

w = (1 + i)(1+i)


We first express (1 + i) in polar coordinates:

1 + i =√

2

(

1√2

+ i1√2

)

=√

2ei(π/4+2kπ) = eln(√

2)ei(π/4+2kπ)

which, when substituted in the above equation gives:

w =[

eln(√

2)+i(π/4+2kπ)](1+i)

= eln(√

2)+i ln(√

2)ei(π/4+2kπ)−(π/4+2kπ) ⇒

w =√

2e−(π/4+2kπ)ei(ln(√

2)+π/4) ⇒ rw =√

2e−(π/4+2kπ), θw = ln(√

2) +π

4that is, the magnitude of w takes an infinite number of values for k an arbitraryinteger, whereas its argument is unique (up to a multiple of 2π which is irrele-vant). This is actually very problematic, because the function w is multivalued,something that is not allowed for a properly defined function. The reason forthis multivaluedness is that we relied on the logarithm to obtain the expressionfor w, and we have seen above that the logarithm suffers from this porblemas well unless we restrict the value of the argument (as in the definition of theprincipal value).

3.3 Continuity, analyticity, derivatives

As in the case of a complex number z = x + iy, a function of a complexvariable f(z) can always be separated into its real and imaginary parts:

f(z) = w = u(x, y) + iv(x, y) (3.25)

where u(x, y) and v(x, y) are real functions of the real variables x, y. Thefunction f(z) implies a mapping of the z-plane to the w-plane, a topic wediscuss in great detail in section 3.6.

We will consider several issues related to the behavior of functions ofcomplex variables, as we did for functions of real variables. The first is theissue of continutity. We define continuity of a function f(z) of the complexvariable z as follows: the function is continuous if for any ǫ > 0, thereexists δ > 0 such that

|z − z0| < δ ⇒ |f(z) − f(z0)| < ǫ (3.26)

in close analogy to what was done for a function of a real variable in chapter1. The essence of this condition is illustrated in Fig. 3.3. Usually, we musttake δ smaller when we take ǫ smaller, but this is not always necessary.

3.3. CONTINUITY, ANALYTICITY, DERIVATIVES 49

0

0 0

0

δ εzz-

z

0

v

u

w

w- w

0

x

y

w-planez-plane

w = f(z)

Figure 3.3: Illustration of continuitty of function f(z) = w near the pointf(z0) = w0.

As an example, consider the function

w = f(z) = z2

In order to show that it is continuous, for any given ǫ < 0 we have to showthat when |z − z0| < δ, then we can manage to make |w − w0| < ǫ. Wemust find a way to relate ǫ and δ. Given that |z − z0| < δ, we will have:

|w − w0| = |z2 − z20 | = |(z − z0)(z + z0)| < δ|z + z0| = δ|(z − z0) + 2z0|

Using the triangle inequality, we obtain from the last relation:

|w − w0| ≤ δ[|z − z0| + |2z0|] ≤ δ[δ + 2|z0|]

If we take the relation between δ and ǫ to be

ǫ = δ[δ + 2|z0|] ⇒ δ2 + 2|z0|δ − ǫ = 0 ⇒ δ =√

|z0|2 + ǫ − |z0| > 0

(where we have chosen the positive root of the second-order polynomialequation in δ), then the last expression shows that no matter how smallǫ is, we can always find a value for δ > 0 that satisfies the contniuitycondition, Eq. (3.26). From this expression, it is also edivent that when ǫgets smaller we must take a correspondingly smaller δ, a situation that iscommon but is not always necessary.

The next issue is the definition of the derivative. By analogy to thecase of functions of real variables, we define the derivative of a functionf(z) of a complex variable as:

f ′(z0) = lim∆z→0

f(z0 + ∆z) − f(z0)

∆z= lim

z→z0

f(z) − f(z0)

z − z0(3.27)


In order for the derivative f ′(z) to make sense, the limit must exist andmust be independent of how ∆z → 0, that is, independent of the directionin which the point z0 on the complex plane is being approached. Thisaspect did not exist in the case of the derivative of a function of a realvariable, since the point x0 on the real axis can only be approached alongother real values on the axis.

To demonstrate the importance of this aspect, we consider a counter-example, that is, a function for which the derivative is not properly defined.This function is the complex conjugate:

w = f(z) = z

From the deifnition of the derivative we have:

f ′(z0) = lim∆z→0

(z0 + ∆z) − z0

∆z= lim

∆x→0,∆y→0

∆x − i∆y

∆x + i∆y

If we choose the approach along the line determined by

y = y0 + µ(x − x0) ⇒∆y

∆x= µ

then the derivative will take the form

f ′(z0) =1 − iµ

1 + iµ=

(1 − iµ)2

1 + µ2

This last expression obviously depends on the path of approach ∆z → 0,that is, the value of the slope of the line µ, therefore the derivative of zdoes not exist.

The possible dependence of the ratio ∆f(z)/∆z on the approach to z0

introduces another notion, that of analyticity. A function f(z) is calledanalytic in the domain D of the xy-plane if:

• f(z) is defined in D, that is, it is single-valued; and

• f(z) is differentiable in D, that is, the derivative f ′(z) exists and isfinite.

If these conditions are not met, the function is called non-analytic. Thereare important consequences of the analyticity of a function f(z):— f ′(z) is continuous (this is known as the Coursat theorem); and— f(z) has continuous derivatives of all orders.

3.3. CONTINUITY, ANALYTICITY, DERIVATIVES 51

In the following when we employ the notion of analyticity we will as-sume that it holds at least in a finite region around some point z0. By regionwe mean an area, that is, part of the complex plane that has the same di-mensionality as the plane itself. For instance, the set of all points in astraight line do not consitute a region, because the line is one-dimensionalwhereas the plane is two-dimensional. In fact, had we been consideringlines as regions, we would conclude that the function f(z) = z is analyticalong any straight line, as can easily established by the preceding analysisof this function (the derivative of f(z) = z is well defined along a particu-lar line). The convention we adopt is that the notion of analyticity appliesstrictly in a finite, two-dimensional portion of the complex plane.

By this convention, it does not make sense to talk about analyticity ata single point of the complex plane. In contrast to this, it makes perfectsense to talk about a function being non-analytic at certain points insidea region where the function is analytic. For example, the function

f(z) =1

z

is analytic everywhere on the complex plane except at z = 0, where it isnon-analytic because it does not have a finite value. For any other point,arbitrarily close to zero, the function is properly defined and so are all itsderivatives. Isolated points where an otherwise analytic function is non-analytic are called singularities. There exist different types of singularities,depending on what goes wrong with the analyticity of the function, aswe will discuss in more detail in following sections. In order to classifythe singularities, we will need to develop the power series expansions forfunctions of complex variables.

We give a few examples of analytic and non-analytic functions:The function f(z) = zn, where n is a positive integer, is analytic and soare all simple functions containing it, except if it is in the denominator,because then it can vanish for z = 0 causing problems; in fact f(z) = 1/zn

is analytic everywhere except for z = 0, which is a singularity.The function f(z) = z is a non-analytic function and so are all simplefunctions containing it. For instance, the function f(z) = R[z] = x is notan analytic function, because:

f(z) = R[z] = x =z + z

2

contains z, which is not analytic, therefore f(z) is not analytic. What doesit mean to have a function like R[z]? This expression certainly leads to a


mapping, since

f(z) = w = u(x, y) + iv(x, y) ⇒ u(x, y) = x, v(x, y) = 0

but this mapping is not a function in the usual sense, that is, it cannot bedifferentiated, etc.

3.4 Cauchy-Riemann relations and harmonic

functions

For an analytic function f(z) = w = u(x, y)+ iv(x, y), the existence of thederivative leads to the following relations between the partial derivativesof the real and imaginary parts:

∂u

∂x=

∂v

∂y,

∂u

∂y= −∂v

∂x(3.28)

These are called the Cauchy-Riemann relations. To prove this statementwe rely on the definition of the derivative, which for an analytic functionmust exist and must be independent of the approach ∆z → 0:

f ′(z) = lim∆z→0

∆f

∆z(3.29)

Thinking of f(z) in terms of its real and imaginary parts, u(x, y) andv(x, y) respectively, which are both functions of x and y, we can write thedifferential of f(z) in terms of the partial derivatives with respect to x andy:

∆f =∂f

∂x∆x +

∂f

∂y∆y =

(

∂u

∂x+ i

∂v

∂x

)

∆x +

(

∂u

∂y+ i

∂v

∂y

)

∆y

=

(

∂u

∂x+ i

∂v

∂x

)

∆x +

(

−i∂u

∂y+

∂v

∂y

)

i∆y = fx∆x + fyi∆y

where the last equation serves as the definition of the partial derivativesfx and fy in terms of partial derivatives of u and v. Let us assume that fx

and fy are not both zero, and we will take fx 6= 0; then the derivative canbe put in the form

f ′(z0) = lim∆x,∆y→0

[

∆x + (fy/fx)i∆y

∆x + i∆yfx

]

3.4. CAUCHY-RIEMANN RELATIONS AND HARMONIC FUNCTIONS 53

The derivative must be independent of the approach of ∆x, ∆y → 0 whichcan be achieved only for fx = fy, because this is the only way of makingthe coefficients of ∆x and i∆y in the numerator equal, as they are in thedenominator. Consequently,

fx = fy ⇒(

∂u

∂x+ i

∂v

∂x

)

=

(

−i∂u

∂y+

∂v

∂y

)

Equating the real and imaginary parts of fx and fy we obtain the Cauchy-Riemann relations, Eq. (3.28). Taking cross-derivatives of these relationsand adding them produces another set of relations:

∂2u

∂x∂y=

∂2v

∂y2and

∂2u

∂x∂y= −∂2v

∂x2⇒ ∂2v

∂x2+

∂2v

∂y2= 0 (3.30)

∂2u

∂x2=

∂2v

∂x∂yand

∂2u

∂y2= − ∂2v

∂x∂y⇒ ∂2u

∂x2+

∂2u

∂y2= 0 (3.31)

Functions that satisfy this condition, called Laplace’s equation, are referredto as harmonic.

The harmonic functions appear frequently in equations related to sev-eral physical phenomena, such as electrostatic fields produced by electri-cal charges, the temperature field in an isotropic thermal conductor understeady heat flow, the pressure field for incompressible liquid flow in a porousmedium.

The Cauchy-Riemann relations have certain important implications:

• the u(x, y), v(x, y) functions (often called fields) are determined bythe boundary values;

• if one field is known, the conjugate field (u from v or v from u) isdetermined up to a constant of integration;

• a complex analytic function is determined by the boundary values;

• any linear combination of the two fields u(x, y), v(x, y) of an analyticfunction is a harmonic function (because it corresponds to multipli-cation of the original function f(z) by a complex number).

From these we can also derive the following general statements. Iff1(z), f2(z) are two analytic functions in a domain D, then so is the functionf(z) given by:

f(z) = f1(z) + f2(z)


f(z) = f1(z)f2(z)

f(z) =1

fi(z), except where fi(z) = 0 (i = 1, 2)

f(z) = f1(f2(z))

in the same domain D. By analogy to the case of functions of real variables,the following rules also apply to the differentiation of analytic functions:

f(z) = f1(z)f2(z) ⇒ d

dzf(z) =

[

d

dzf1(z)

]

f2(z) + f1(z)

[

d

dzf2(z)

]

f(z) = f1(f2(z)) ⇒ d

dzf(z) =

(

d

df2f1(f2(z))

)(

d

dzf2(z)

)

Finally, an important implication of the Cauchy-Riemann relations is

∂f

∂z= 0 (3.32)

In order to prove this, we begin by expressing the variables x and y interms of z and z:

x =1

2(z + z), y =

1

2i(z − z)

from which we obtain for the partial derivatives with respect to x and y:

∂

∂x=

∂z

∂x

∂

∂z+

∂z

∂x

∂

∂z=

∂

∂z+

∂

∂z

∂

∂y=

∂z

∂y

∂

∂z+

∂z

∂y

∂

∂z= i

∂

∂z− i

∂

∂z

Using these expressions in the Cauchy-Riemann relations we find:

∂u

∂x=

∂v

∂y⇒ ∂u

∂z+

∂u

∂z= i

∂v

∂z− i

∂v

∂z

∂u

∂y= −∂v

∂x⇒ i

∂u

∂z− i

∂u

∂z= −∂v

∂z− ∂v

∂z⇒ i

∂v

∂z+ i

∂v

∂z=

∂u

∂z− ∂u

∂z

Adding the two equations on the far right of the above lines we get:

∂

∂z(u + iv) +

∂

∂z(u + iv) =

∂

∂z(u + iv) − ∂

∂z(u + iv)

which simplifies to the following result:

∂f

∂z= −∂f

∂z⇒ ∂f

∂z= 0

3.4. CAUCHY-RIEMANN RELATIONS AND HARMONIC FUNCTIONS 55

as desired. A conseqeunce of this result is that a function that containsexplicitly z cannot be analytic, a fact we have already mentioned.

Example: To illustrate how these concepts work, we consider the followingsimple example: given the field

u(x, y) = cos(x)e−y

we would like to know whther or not it is harmonic, and if it is, what is conjugatefield v(x, y) and what is complex analytic function f(z) = u(x, y) + iv(x, y)?To answer the first question we calculate the second partial derivatives of thegiven field and check if their sum vanishes or not:

∂2u

∂x2= − cos(x)e−y,

∂2u

∂y2= + cos(x)e−y

the sum of which vanishes, which means that u(x, y) is a harmonic func-tion. We next try to determine the conjugate field v(x, y). From the Cauchy-Reimann relations we obtain:

∂v

∂x= −∂u

∂y= cos(x)e−y ⇒ v(x, y) =

∫

[cos(x)e−y]dx = e−y[sin(x) + c1(y)]

∂v

∂y=

∂u

∂x= − sin(x)e−y ⇒ v(x, y) =

∫

[− sin(x)e−y]dy = sin(x)[e−y+c2(x)]

where c1(y) and c2(x) are constants of integration, produced the first fromthe indefinite integral over dx, the second from the indefinite integral over dy.The two different expressions for v(x, y) onbtained from the two integrationsmust be equal, so

e−yc1(y) = sin(x)c2(x)

which can only be satisfied if they are both equal to the same constant c,because the first is exclusively a function of y and the second is exclusively afunction of y. This conclusion gives

v(x, y) = sin(x)e−y + c

and we will take the constant c = 0 for simplicity. Then the complex functionf(z) must be given by

f(z) = u(x, y) + iv(x, y) = [cos(x) + i sin(x)]e−y = eix−y = eiz


3.5 Branch points and branch cuts

One type of singularity that is very important when discussing mappingsimplied by functions of complex variables, is that which gives rise to mul-tivaluedness. We call such singularities branch points. Not all singulari-ties are branch points. For example, z = 0 is a singularity of the functionf(z) = 1/z because the function is not well defined at this point, but is nota branch point because this function does not suffer from multivaluedness.To illustrate the idea of a branch point, we use a simple function in whichthis problem comes into play, the nth root:

w = z1/n =(

rei(θ+2kπ))1/n

= r1/nei(θ+2kπ)/n, k = 0, 1, . . . , n − 1

For any point other than zero, this function takes n different values de-pending on the choice of value for k. Only for z = 0 it does not matterwhat the value of k is. In this sense, z = 0 is a special point as far as thefunction w is concerned. We can look at this from a different perspective:suppose that we are at some point z = r 6= 0 and we start increasing theargument from θ = 0 at constant r. All the points we encounter whiledoing so are different points on the complex plane for 0 ≤ θ < 2π, butwhen θ hits the value 2π we get back to the same point where we hadstarted. If we continue increasing the value of θ we will hit this point againevery time θ is an integral multiple of 2π. When z is raised to 1/n power,these values of z will in general be different from each other, since theirargument will be divided by n. Thus, for the same value of the complexvariable z we get multiple values for the function w. This only happens if

we go around the point z = 0 more than once. If we go around a path thatencircles any other point but not zero, the argument of the variable willnot keep increasing, and therefore the problem will not arise.

The same problematic behavior was encountered for the natural log-arithm: if the argument of the variable z is allowed to increase withoutlimit, then both the argument and the magnitude of the function w = ln(z)change when we go through points on the complex plane whose argumentdiffers by a multiple of 2π, even though these are the same point on thecomplex plane. It is useful to notice that in the case of the logarithmbecause of the relation

ln(

1

z

)

= − ln(z)

the behavior of the function in the neighborhood of the points z and 1/zis the same, with an overall minus sign. Thus, if z = 0 is a special point in

3.5. BRANCH POINTS AND BRANCH CUTS 57

the sense described above, so is the point z → ∞. We conclude that thelogarithm has two branch points, z = 0 and z → ∞. In fact, since all non-integer powers of complex numbers can be expressed with the help of thelogarithm, as we discussed above, we conclude that the functions involvingnon-integer powers of z have the same two branch points. Note that by thiswe refer to the values of the variable where the quantity whose logarithmwe want to evaluate, or the base of the non-integer power, vanishes orbecomes infinite. For example, the function

f(z) = ln

(√z2 − 1√z2 + 1

)

has branch points at z = ±1, where the content of the parenthesis becomeszero, and at z = ±i where the content of the parenthesis becomes infinite,but not at z → ∞ because for this value the content of the parenthesis isfinite; the function

f(z) =√

(z − a), a : finite

has branch points z = a and z → ∞; the function

f(z) =(

1

z− b

)π

, b 6= 0

has branch points z = 1/b and z = 0; and so on.In order to remove these problems, which become particularly accute

when we discuss mappings implied by multivalued functions of z, we in-troduce the idea of the branch cut. This is a cut of the complex planewhich joins two branch points: we imagine that with the help of a pair of“mathematical scissors” we literally cut the complex plane along a line sothat it is not allowed to cross from one side of the line to the other. Inthe case of the logarithm one possible cut would be along the x axis formx = 0 to x → ∞. This choice restricts the argument θ in the range ofvalues [0, 2π), where we imagine that we can approach the cut from oneside but not from the other. That is, we can reach the values of z that lieon the x axis approaching from above the axis, but we cannot reach themfrom below the axis. This is a matter of convention, and the reverse wouldhave been equally suitable, that is, being allowed to approach the valuesof z that lie on the x axis from below the axis but not from above the axis;this would be equivalent to restricting the argument θ to the range (0, 2π].The effect of this is that it elminates any ambiguity in the values that thefunction ln(z) is allowed to take!


There are usually many possible choices for a branch cut. In fact, forthe case of the logarithm or a non-integer power, any line joining the points0 and ∞ is a legitimate choice, as long as it does not cross itself, becausethen a portion of the complex plane would be completely cut off (it couldnot be reached from the rest of the plane), which is not allowed to happen.Each such choice of branch cut restricts the range of the argument of z toone set of values spanning an interval of 2π. For example, if we had chosenthe positive y axis as the branch cut, the allowed values of θ would be inthe interval [π/2, 5π/2), if we had chosen the negative x axis as the branchcut, the values of θ would be in the interval [π, 3π), etc. Notice that theintroduction of a branch cut that restricts the values of the argument to0 ≤ θ < 2π is equivalent to taking the principal value of the argument.There are also more complicated possible choices, for which the range ofθ may be different for each value of the radius r, but it always spans aninterval of 2π for a given value of r.

With the introduction of branch points and the associated branch cuts,we have therefore managed to eliminate the problem of multivaluednessfor the natural logarithm and non-integer powers. This approach can beapplied to any function that suffers from multivaluedness. So far we havemet only these two types of functions that suffer from this problem, thelogarithm and non-integer powers, which are in fact related. Thus, forthe purposes of the present discussion, the only truly multivalued functionthat must be fixed by the introduction of branch cuts is the natural loga-rithm, which occasionally lurks in the definition of other functions we willexamine.

3.6 Mappings

Next, we will study mappings of the z-plane to the w-plane through com-plex functions w = f(z). These mappings take us from a value of z to avalue of w through f(z). A very simple, almost trivial case is the mappingproduced by the function

w = az + b, a, b : constants

which, expressing w in terms of its real and imaginary parts w = u + iv,can be written as:

w = u + iv = (ar + iai)(x + iy) + (br + ibi)

⇒ u(x, y) = (arx − aiy) + br, v(x, y) = (ary + aix) + bi

3.6. MAPPINGS 59

where we have also separated the constants a and b into real and imaginaryparts. This represents a shift of the origin by −b and a rotation and scalingof the original set of values of x and y. Specifically, if we define the angleφ through

φ = tan−1(

ai

ar

)

⇒ ar = |a| cos(φ), ai = |a| sin(φ)

the original values of x, y are rotated by the angle φ and scaled by thefactor |a|. Thus, if the values of x, y describe a curve on the z plane, thiscurve will be shifted, rotated and scaled through the mapping implied bythe function f(z), but its shape will not change.

It is often useful, from a practical point of view, to view the mappingfrom the opposite perspective. The central idea is to consider a set ofcomplicated curves on the z-plane which through f(z) gets mapped to aset of very simple curves on the w-plane. Thus, we will pose the questionas follows: Given the function f(z), what kind of curves in the z planeproduce some simple curves in the w plane? This is because the usefulnessof the mapping concept lies in simplifying the curves and shapes of interest.An even more difficult question is: Given a set of complicated curves on thez plane, what function f(z) can map them onto a set of simple curves onthe w plane? We will not attempt to address this question directly, but thefamiliarity we will develop with the mappings of several common functionsgoes a long way toward suggesting an answer to it in many situations.

The simplest set of curves on the w-plane correspond to u = constant(these are straight vertical lines, with v assuming any value between −∞and +∞), and to v = constant (these are straight horizontal lines, with uassuming any value between −∞ and +∞). Other simple sets of curves onthe w plane are those that correspond to a fixed argument θw and variablemagnitude rw of w, which are rays emanating from the origin at fixed angleθw with the u axis, or those that correspond to a fixed magnitude rw andvariable argument θw of w, which are circlular arcs centered at the originat fixed radius rw (the arcs are full circles for 0 ≤ θw < 2π).

We illustrate these ideas with several examples.

Example 1: We consider the mapping of the z-plane to the w-plane throughthe exponential function:

w = ez = ex cos(y) + iex sin(y) ⇒ u(x, y) = ex cos(y), v(x, y) = ex sin(y)

In the spirit discussed above, we consider what curves in the z plane getmapped to vertical lines u(x, y) = u0 or horizontal lines v(x, y) = v0 on the


−π/2

π/2

π

3π/2

vw-planez-plane

00

w = exp[z]

x

y

u

3π/2

π

π/2

2π

vw-planez-plane

0

w = exp[z]

0 x

y

u

Figure 3.4: Mapping of z-plane to w-plane through w = f(z) = exp(z) usingcartesian coordinates in the w plane.

3.6. MAPPINGS 61

w plane. For vertical lines on the w planes the corresponding curves on the zplane are given by:

u(x, y) = ex cos(y) = u0 ⇒ x = ln(u0) − ln(cos(y))

for[

u0 > 0, cos(y) > 0 ⇒ −π

2< y <

π

2

]

⇒ x = ln(−u0) − ln(− cos(y))

for[

u0 < 0, cos(y) < 0 ⇒ π

2< y <

3π

2

]

Similarly, for horizontal lines on the w planes the corresponding curves on thez plane are given by:

v(x, y) = ex sin(y) = v0 ⇒ x = ln(v0) − ln(sin(y))

for [v0 > 0, sin(y) > 0 ⇒ 0 < y < π]

⇒ x = ln(−v0) − ln(− sin(y))

for [v0 < 0, sin(y) < 0 ⇒ π < y < 2π]

These mappings are illustrated in Fig. 3.4.We can examine the mapping produced by the same function using polar

coordinates in the w plane:

w = ez ⇒ ln(w) = z ⇒ ln(rw) + iθw = x + iy ⇒ ln(rw) = x, θw = y

In this case, curves described by a constant value of x = x0 with y in a rangeof values spanning 2π, map to circles of constant radius rw = ex0 :

[x = x0, 0 ≤ y < 2π] → [rw = ex0 , 0 ≤ θw < 2π]

Similarly, curves described by constant value of y = y0 and x taking all realvalues −∞ < x < ∞, map to straight lines (rays) emanating from the originat constant angle angle θw to the u axis:

[−∞ < x < ∞, y = y0] → [0 ≤ rw < ∞, θw = y0]

as illustrated in Fig. 3.5.This analysis of the mappings produced by the function w = exp(z) indi-

cates that every strip of width 2π in y and −∞ < x < ∞ on z-plane mapsonto entire w-plane. This is true for the mapping using catresian or polarcoordinates in the w plane.

Example 2: We next consider the mapping of the inverse function of the


−π/2

π/2

π

3π/2

z-plane w-planev

0 u

x

y

0

w = exp[z]

−π/2

π/2

π

3π/2

z-plane w-planev

0 u

x

y

0

w = exp[z]

Figure 3.5: Mapping of z-plane to w-plane through w = f(z) = exp(z) usingpolar coordinates in the w plane.

3.6. MAPPINGS 63

exponential, namely the natural logarithm which will prove to be much trickier.We employ polar coordinates for z, which, as we have seen earlier, with theuse of Euler’s formula make things simpler.

w = ln(z) = ln(reiθ) = ln(r) + iθ

⇒ u(x, y) = ln(r) = ln(√

x2 + y2), v(x, y) = θ = tan−1(

y

x

)

Since this is the inverse mapping of the exponential, we can make the cor-respondence (x ↔ u, y ↔ v; rw ↔ r, θw ↔ θ), and from this obtain themapping by reversing what we found for the mapping w = ez. This simpleargument gives the following mappings:

[r = r0, 0 ≤ θ < 2π] 7→ [u(x, y) = u0 = ln(r0), 0 ≤ v(x, y) < 2π]

[θ = θ0, 0 ≤ r < ∞] 7→ [−∞ < u(x, y) < ∞, v(x, y) = v0 = θ0]

that is, circles of radius r0 on the z plane map to straight vertical lines u(x, y) =u0 = ln(r0) on the w plane of height 2π, while rays at an agle θ0 to the xaxis on the z plane map to straight horizontal lines v(x, y) = v0 = θ0 on thew plane. This is all well and nice when θ lies within an interval of 2π, aswe assumed so far, but leads to difficulties if the range of values of θ is notspecified. This feature is of course related to the problem of multivaluednessof the logarithm, which can be eliminated by identifying the branch points andintroducing a branch cut. We have already established that the function ln(z)has two branch points, zero and ∞, and any line between them which doesnot intersect itself is a legitimate branch cut.

r2

θ2

θ1

r1

z

y

xa-a

z-plane

0

Figure 3.6: Branch cut for the function f(z) = ln(z + a) − ln(z − a).


As another illustrative example of the need to introduce branch cuts con-sider the function:

± ln(z − z0)

This function is often used to simplify the description of radial or circumferen-tial flow from a source (with the + sign) or a sink (with the − sign) situated atz = z0. The combination of the two choices of signs, centered at two differentpoints z = a and z = −a, is called the source-sink pair:

f(z) = f1(z) + f2(z) = ln(z + a) − ln(z − a) = ln(r1

r2

) + i(θ1 − θ2)

where r1 = |z + a|, θ1 = Arg[z + a] and r2 = |z − a|, θ2 = Arg[z − a],with the source at z = −a (denoted by the function f1(z)) and the sink atz = +a (denoted by the function f2(z)). Evidently, for this function we needto introduce branch cuts, because there are at least two branch points, z = ±a,where the function is non-analytic. To see this, we notice that for a closedpath that encircles either both or neither of z = +a and z = −a the quantity(θ1 −θ2) takes the same value at the end of the path as at the beginning, thatis, it does not increase by a factor of 2π even though each of the angles θ1

and θ2 increase by 2π. Thus, such a path would not lead to multivaluedness.For a closed path that encircles only one of z = a or z = −a the quantity(θ1 − θ2) increases by 2π at the end of the path relative to its value at thebeginning. These facts are established by inspecting Fig. 3.6. One questionwe need to answer is whether the function has any other branch points. Onepossible candidate is z → ∞, but the function is actually well defined in thislimit:

f(z) = ln(z + a) − ln(z − a) = ln(

z + a

z − a

)

→ ln(1) = 0 for z → ∞

The other possible candidate, z = 0, also does not present any problem,because:

f(0) = ln(a) − ln(−a) = ln(

a

−a

)

= ln(−1) = ln(eiπ) = iπ

Thus the only branch points are z = ±a. Another way to see this is to evaluatethe derivative of the function, which is:

f ′(z) =1

z + a− 1

z − a

which is well defined everywhere except for z = ±a. Consequently, we need abranch cut joining the points z = ±a. Any such path will do, and it does not

3.6. MAPPINGS 65

2

vw-planez-plane

x0

y

u0

w = z

Figure 3.7: Mapping of z-plane to w-plane through w = f(z) = z2.

have to pass through z → ∞. A possible choice for a branch cut is shown inFig. 3.6.

Example 3: We consider next a simple power of z, the square. Using carte-sian coordinates, we have:

w = z2 ⇒ u(x, y) + iv(x, y) = (x2 − y2) + i(2xy)

The last expression gives, for the curves on the z plane that map to verticallines (u(x, y) = u0) or to horizontal lines (v(x, y) = v0) on the w plane:

x2 − y2 = u0, y =v0

2x

which are hyperbolas, as illustrated in Fig. 3.7. From this mapping we see thathalf of the z plane maps onto the entire w plane, because if we restrict thevalues of z to x ≥ 0 or to y ≥ 0, we still can obtain all the possible values of uand v. Similar arguments apply to sub-regions of the z plane, which throughw = f(z) = z2 are “inflated” to cover much larger regions on the w plane.For instance, the region enclosed by the square

z : 0 ≤ x ≤ l, 0 ≤ y ≤ l

identified by the corners at z : (x, y)

a : (l, 0), b : (l, l), c : (0, l), d : (0, 0)


2

2

z−plane

x

y

u

v

0 a

bc

d

l

l

0

l

2l

A

B

C D

w−plane

Figure 3.8: Mapping of a square region on the z-plane to the w-plane throughw = f(z) = z2.

on the z plane, which lies entirely on the first quadrant (upper right quarterplane), gets mapped to the region identified by the points w : (u, v)

A : (l2, 0), B : (0, 2l2), C : (−l2, 0), D : (0, 0)

with the correspondence a 7→ A, b 7→ B, c 7→ C, d 7→ D. This region extendsover the first two quadrants of the w plane (upper half plane). If we think ofthe square on the z plane as covering the entire first quadrant (l → ∞), thenits image on the w plane will cover the entire upper half plane.

Another way to see this is to use polar coordinates:

w = z2 ⇒ rweiθw = r2ei2θ ⇒ rw = r2, θw = 2θ

so 0 ≤ θ < π 7→ 0 ≤ θw < 2π, that is, the range of values from zero to πin the argument of z is mapped to a range of values from zero to 2π in theargument of w.

Example 4: As another example we examine the inverse of the previousfunction, that is, the square root. Using cartesian coordinates, we have:

w = z1

2 ⇒ w2 = z ⇒ (u2 − v2) + i2uv = x + iy ⇒ x = (u2 − v2), y = 2uv

For the curves on the z plane that map to straight vertical lines (u(x, y) = u0)or straight horizontal lines (v(x, y) = v0) on the w plane, we will have:

u = u0 → v =y

2u0

→ u20 −

y2

4u20

= x

3.6. MAPPINGS 67

1/2

vw-planez-plane

x

y

u0

w = z

0

Figure 3.9: Mapping of z-plane to w-plane through w = f(z) =√

z.

v = v0 → u =y

2v0

→ y2

4v20

− v20 = x

which are parabolas, as illustrated in Fig. 3.9. In this case, the entire z planemaps onto half of the w plane, since the values of u0 and v0 appear squaredin the final equations, so that two values of opposite sign correspond to thesame curve. This is what we would expect, given that the function z1/2 is theinverse of the function z2. This is a troubling fact, however, because we arenot sure at which of the two halves of the w plane we will end up. This hasactually to do with the fact that the function z1/2 is multivalued (it involves anon-integer power, and hence the logarithm lurks somewhere in its definition).To see this more clearly, we use polar coordinates:

w = z1/2 ⇒ rweiθw = r1/2ei(θ+2kπ)/2 ⇒ rw = r1/2, θw =θ

2+ kπ, k = 0, 1

so there are two possible choices for the values of k, giving rise to the twopossible mappings. In terms of real numbers, this corresponds to the factthat the root of x2 is ±x. In order to remove the ambiguity, we need again tointroduce branch cuts. Evidently, the branch points in this case are again z = 0and z → ∞, because of the involvement of the logarithm in the definitionof za with a: non-integer. Any proper branch cut joining these two pointsremoves the ambiguity and renders the function single-valued. For example,the standard choice of the positive x axis as the branch cut with 0 ≤ θ < 2π


(equivalent to the choice k = 0), leads to the entire z plane mapping to theupper half (0 ≤ θw < π or v ≥ 0) of the w plane; the same choice for a branchcut with 2π ≤ θ < 4π (equivalent to the choice k = 1), leads to the entire zplane mapping to the lower half (π ≤ θw < 2π or v ≤ 0) of the w plane.

Example 5: For another example we consider the function

w = f(z) =1

z

and ask what is the image of a circle on the z plane through the mapping byf(z) on the w plane. The equation for a circle on the z plane with its centerat (x0, y0) and radius ρ is:

(x − x0)2 + (y − y0)

2 = ρ2 ⇒ a(x2 + y2) + bx + cy + d = 0 (3.33)

where the parameters a, b, c, d are related to x0, y0, ρ by 1:

a =1

ρ2, b = −2x0

ρ2, c = −2y0

ρ2, d =

x20 + y2

0

ρ2− 1

Using the following relations:

x =z + z

2, y =

z − z

2i, x2 + y2 = zz

and the fact that w = u + iv = 1/z ⇒ w = u − iv = 1/z, therefore

u =w + w

2=

z + z

2zz, v =

w − w

2i=

z − z

2izz, u2 + v2 = ww =

1

zz

we find that the above equation for a circle on the z plane is transformed to

a + bu − cv + d(u2 + v2) = 0 (3.34)

which has exactly the same functional form in the variables u and v as theequation for the circle on the z plane, Eq. (3.33), in terms of the variables xand y. From this we conclude that the original circle on the z plane is mappedto a circle on the w plane.

1The fact that we pass from a set of three parameters x0, y0, ρ to a set of fourparameters a, b, c, d in describing the circle, is not a problem because the latter setof parameters are not independent; the following relation between them can easily beestablished:

b2 + c2 = 4a(d + 1)

3.6. MAPPINGS 69

A related mapping is that given by the function

w = f(z) =(z − a)

(z − b)

which also maps a circle to a circle. To see this, we perform the followingchanges of variables:

z′ = z − b : shift origin by b

z′′ =1

z′: maps circles to circles

w = λz′′ + 1 : linear scaling by λ, shift origin by − 1

which proves that the mapping by w = (z − a)/(z − b) is also a mapping of acircle to a circle, since the only relation that involves a mapping is the secondchange of variables; the other two merely shift or scale the curve withoutchanging its shape. All that remains to do is find the value of the scalingparameter λ. By requiring that the final expression is equal to the original one,we find:

w = λz′′ + 1 = λ1

z′+ 1 = λ

1

z − b+ 1 =

z − a

z − b⇒ λ = b − a

which completes the argument. A generalization of this is the mapping pro-duced by the so-called Mobius funtion (see Problem 6).

��

��

2

��

��

l

1

x

v

w−plane

0

b

c

l

0 u

y

a

z−plane

AC

B

Figure 3.10: Mapping of the shaded region on the z-plane to the w-planethrough w = f(z) = 1/z2.


Having studied both the mapping of the function 1/z and the functionz2, we can consider the mapping produced by the composition of the twofunctions, that is the function:

w = f(z) =1

z2

As an exercise, consider the region on the z plane shown in Fig. 3.10, consistingof the entire upper half plane except for a semicircle of radius l centered at theorigin. The mapping of this by the function 1/z produces a semicircle centeredat the origin, with radius 1/l, with the point at infinity mapping to the originof the circle. The function z2 produces a full circle centered at the origin, withradius 1/l2. The net result from applying the two mappings is shown in Fig.3.10 (see also Problem 7).

Example 6: We next study the mapping implied by a function which ivolvesboth powers and inverse powers of z:

f(z) = z2 +1

z2

Expressing the complex variable z in terms of its real ad imaginary parts z =x + iy leads to:

f(z) = (x2 − y2)

(

1 +1

(x2 + y2)2

)

+ i2xy

(

1 − 1

(x2 + y2)2

)

and setting as usual the function f(z) equal to w = u(x, y) + iv(x, y) we findfor the real and imaginary parts of w:

u(x, y) = (x2 − y2)

(

1 +1

(x2 + y2)2

)

, v(x, y) = 2xy

(

1 − 1

(x2 + y2)2

)

As an illustration of the mapping produced by f(z), we consider a curve onthe z plane described by the parametric form:

x(t) = ret2/a2

cos(t), y(t) = ret2/a2

sin(t), −π < t < π (3.35)

where r, a are real positive constants; this curve is illustrated in Fig. 3.11.This curve is mapped through f(z) to a curve on the w plane given by:

u(t) = r2e2t2/a2

cos(2t)(

1 +1

r4e4t2/a2

)

v(t) = r2e2t2/a2

sin(2t)(

1 − 1

r4e4t2/a2

)

(3.36)

3.6. MAPPINGS 71

In[2]:= ParametricPlot@82 Exp@t^2�5D Cos@tD, Exp@t^2�5D Sin@tD<, 8t, -Pi, Pi<D

-12.5 -10 -7.5 -5 -2.5

-2

-1

1

2


In[3]:= ParametricPlot@8HH2 Exp@t^2�5D Cos@tDL^2 - H2 Exp@t^2�5D Sin@tDL^2L H1 + 1�HH2 Exp@t^2�5DL^4LL,2 H2 Exp@t^2�5D Cos@tDL H2 Exp@t^2�5D Sin@tDL

H1 - 1�HH2 Exp@t^2�5DL^4LL<, 8t, -Pi, Pi<D

50 100 150 200

-60

-40

-20

20

40

60


Mapping under f(z) = z^2+1/z^2 1

Figure 3.11: Illustration of the curve on the z plane described by Eq. (3.35)and its map on the w plane through w = z2 + 1/z2, described by Eq. (3.36); inthis example we have chosen r = 2 and a =

√5.


which is also illustrated in Fig. 3.11.

Example 7: As our next example we will consider the mappings impliedby the hyperbolic and trigonometric functions, which are closely related. Webegin by examining the mapping implied by the hyperbolic cosine:

w = cosh(z) =ez + e−z

2=

ex(cos(y) + i sin(y))

2+

e−x(cos(y) − i sin(y))

2⇒ u(x, y) = cosh(x) cos(y), v(x, y) = sinh(x) sin(y)

We investigate first the types of curves on the z plane that produce vertical(u = u0: constant) or horizontal (v = v0: constant) lines on the w plane. Forvertical lines we have:

u0 =ex + e−x

2cos(y) ⇒ ex − 2u0

cos(y)+ e−x = 0

which can be turned into a second-order polynomial in exp(x) by multiplyingthrough by exp(x), from which we obtain the solutions:

ex =u0

cos(y)±(

u20

cos2(y)− 1

)1/2

⇒ x = ln

u0

cos(y)±(

u20

cos2(y)− 1

)1/2

The last expression is meaningful only for values of y which make the quan-tity under the square root positive; for these values of y, the quantity underthe logarithm is positive for either choice of sign in front of the square root.Similarly, for horizontal lines we have:

v0 =ex − e−x

2sin(y) ⇒ ex − 2v0

sin(y)− e−x = 0 ⇒

ex =v0

sin(y)±(

v20

sin2(y)+ 1

)1/2

⇒ x = ln

v0

sin(y)±(

v20

sin2(y)+ 1

)1/2

where again we have used the second-order polynomial solutions. In this casethe quantity under the square root is always positive and the argument of thenatural logarithm is positive only for the solution with the plus sign in frontof the square root. From these equations, the sets of curves on the z planethat correspond to vertical or horizontal lines on the w plane can be drawn.These sets of curves are quite similar to the curves that get mapped to verticalor horizontal lines through the function exp(z), which is no surprise since thehyperbolic cosine is the sum of exp(z) and exp(−z). From this statement,

3.6. MAPPINGS 73

we conclude that a strip of values on the z plane, extending over the entire xaxis and covering a width of 2π on the y axis, is mapped onto the entire wplane, just as we had found for the case of the mapping produced by exp(z).In the present case, the values z and −z get mapped to the same value of w,in other words, the strip of width 2π in y gets mapped twice to the w plane.

From the above analysis, we can easily obtain the mappings due to theother hyperbolic and trigonometric functions. For instance, from the relation:

z′ = z + iπ

2⇒ cosh(z′) =

ez+iπ/2 + e−z−iπ/2

2=

iez − ie−z

2= i sinh(z)

we conclude that the function sinh(z) can be viewed as the combination of twofunctions three functions: the first consists of a shift of the origin by −iπ/2,the second is the hyperbolic cosine and the third is a multiplication by −i:

z 7→ z′ = z + iπ

2, z′ 7→ z′′ = cosh(z′), z′′ 7→ w = −iz′′ = sinh(z)

But we know what each of these functions does in terms of a mapping, so itis straightforward to derive the mapping of the hyperbolic sine. In particular,since multiplication by −i involves a rotation by π/2, we conclude that underw = sinh(z) a strip of the z plane of width 2π on the x axis and extending overthe entire y axis gets mapped twice to the entire w plane. Similarly, becauseof the relations

cos(iz) = cosh(z), sin(iz) = i sinh(z)

we can derive the mappings due to the trigonometric sine and cosine functionsin terms of the mappings due to the hyperbolic sine and cosine functions.

Example 8: Finally, we consider the mapping due to an inverse trigono-metric function,

w = sin−1(z)

Taking the sine of both sides of this equation we obtain:

z = sin(w) = sin(u + iv) =(cos(u) + i sin(u))e−v − (cos(u) − i sin(u))ev

2i=

sin(u) cosh(v) + i cos(u) sinh(v) ⇒ x = sin(u) cosh(v), y = cos(u) sinh(v)

From the last two relations, using the properties of the trigonometric andhyperbolic sine and cosine functions, we find

x2

cosh2(v)+

y2

sinh2(v)= sin2(u) + cos2(u) = 1


x0

2b

y

−1 1

2a

−1

ellipse

x

2a

0

2b

y

1

hyperbola

Figure 3.12: Illustration of the basic geometric features of the ellipse and thehyperbola.

which is the equation for an ellipse with extend 2a = 2 cosh(v) on the x axisand 2b = 2| sinh(v)| on the y axis, as illustrated in Fig. 3.12. Setting v = v0:constant, we see that confocal ellipses on the z plane with foci at z = ±1 getmapped to straight horizontal lines on the w plane. Similarly,

x2

sin2(u)− y2

cos2(u)= cosh2(v) − sinh2(v) = 1

which is the equation of a hyperbola with length scales 2a = 2| sin(u)| on thex axis and 2b = 2| cos(u)| on the y axis, as illustrated in Fig. 3.12. Settingu = u0: constant, we see that confocal hyperbolae on the z plane with fociat z = ±1 get mapped to straight vertical lines on the w plane. The imagesof ellipses and hyperbolae on the z plane through w = sin−1(z) to horizontaland vertical lines, respectively, on the w plane is illustrated in Fig. 3.13.

This example deserves closer scrutiny, because we are really interested inobtaining the values of w from z, but we derived the relations between x andy by going in the inverse direction, that is, by expressing z as a function ofw. This may have hidden problems with multivaluedness, so we examine theexpression of w in terms of z in more detail:

z = sin(w) =eiw − e−iw

2i⇒ eiw − 2iz − e−iw = 0 ⇒

eiw = iz ±√

1 − z2 = i(

z ±√

z2 − 1)

where we have used the solution of the second-order polynomial equation toobtain the last exrpession. With this, we obtain the explicit expression of w as

3.6. MAPPINGS 75

00

y

x−1 1

v

z−plane w−plane

uπ/2−π/2

Figure 3.13: Mapping of ellipses and hyperbolae with foci at z = ±1 on the zplane to horizontal or vertical lines on the w plane through w = sin−1(z). Thewavy lines indicate the branch cuts, located on the x axis from −∞ to −1 andfrom +1 to +∞ on the z plane and their images, the vertical lines at u = ±π/2on the w plane.

a function of z:

w =1

iln[

i(

z ±√

z2 − 1)]

=π

2+

1

iln[

z ±(√

z − 1) (√

z + 1)]

from which we conclude that we do need to worry about branch points andbranch cuts for this situation. Specifically, there are two square root functionsinvolved as well as the logarithm function, all of which introduce branch points.The two square roots imply that the points z = ±1,∞ are branch points. Thebranch points of the logarithm correspond to the values zero and ∞. Byinspection, we see that the expression under the logarithm above is never zero,because:

z ±√

z2 − 1 = 0 ⇒ z2 = z2 − 1

which cannot be satisfied for any value of z. On the other hand, for z → ∞we have:

z ±√

z2 − 1 = z ± z(

1 − 1

z2

)1/2

= z ± z(

1 − 1

2z2+ · · ·

)

where we have used the binomial expansion to get the last result. With theplus sign, the last expression becomes

z + z(

1 − 1

2z2+ · · ·

)

≈ 2z for z → ∞


while with the minus sign, it becomes

z − z(

1 − 1

2z2+ · · ·

)

≈ 1

2zfor z → ∞

Thus, the values where the argument of the logarithm becomes zero or infinityare both approached for z → ∞. In other words, the two branch points forthe logarithm are both at z → ∞. From this analysis, we conclude that thebranch cut we need for this function must go through the points z = ±1 aswell as through the point z → ∞. A possible choice for a branch cut is shownin Fig. 3.13: it consists of two parts, −∞ < x ≤ −1 and +1 ≤ x < +∞.The images of these lines to the w plane are the vertical lines u = ±π/2. Theentire z plane is mapped on to the strip of values extending over the entirerange of v and having width 2π on the u axis on the w plane. This is preciselywhat we expect from the preceding analysis of the mapping of the functionsin(z)!

3.6. MAPPINGS 77

Problems

1. Using the Taylor series expansions for exp(ix) and exp(y), show thatthe series which defines the exponential of z = x + iy is that givenby Eq. (3.19).

2. Prove the relations between the hyperbolic and trigonometric func-tions of z and iz given in Eqs. (3.22) and (3.23).

3. By following the same steps as those that led to Eq. (4.3), prove themore general result Eq. (4.4).

4. Find the mapping of the curve described in parametric form by Eq.(3.35), through the function

f(z) = z +1

z

5. From the proof that the function f(z) = 1/z maps a circle in the zplane, Eq. (3.33), to a circle in the w plane, Eq. (3.34), determinethe radius and the position of the center of the circle on the w planein terms of the radius ρ and the position (x0, y0) of of the center of thecircle on the z plane.. Comment on what happens if (x2

0 + y20) = ρ2.

6. The function

f(z) =az + b

cz + dand the mapping it produces are named after Mobius. Show thatthis function maps circles to circles and lines to lines. Also show thatthe combination of two such functions, that is:

w =a2z

′ + b2

c2z′ + d2

, z′ =a1z + b1

c1z + d1

is another Mobius fnuction; find the explicit expression for w as aMobius function in terms of the parameters that enter in the twofunctions of the combination.

7. Show explicitly that the function

w = f(z) =1

z2

maps the region on the z plane shown in Fig. 3.10 to the full circleon the w plane centered at the origin, by finding the images of a fewrepresentative points on the boundary of the z-plane region.


8. We are interested in the mapping of the z plane to the w planethrough

w = sin−1(z)

which was discussed in the text. For this mapping to be meaningful,we must introduce a branch cut which includes the points z = ±1and ∞. Consider the branch cut consisting of the x axis from −1 to+∞, that is, −1 ≤ x < +∞. Is this an appropriate branch cut forthe function of interest? What is the image of the branch cut on thew plane, and what is the region of the w plane to which the entire zplane is mapped? Does this region have the same area as the regioncorresponding to the branch cut shown in Fig. 3.13?

Chapter 4

Contour Inegration

4.1 Complex integration

By analogy to the case of integrals of real functions, Eqs. (1.35) and (1.36),we define the integral of a complex function as:

∫ zf

zi

f(z)dz = lim∆z→0

[ zf∑

z=zi

f(z)∆z

]

(4.1)

There is a qualitative difference in performing definite integrals of functionsof real variables and functions of complex variables. In the case of realvariables, a definite integral is the integral between two points on the realaxis xi and xf and there is only one way to go from xi to xf . In thecase of complex variables zi = xi + iyi and zf = xf + iyf there are many(actually infinite) ways of going from one number to the other. It is thenpossible that the definite integral between zi and zf depends on the path(called “contour”) followed on the complex plane. Indeed, this is the casein general: the integral

∫ zf

zi

f(z)dz

depends on the contour along which the integration is performed. But itis not always the case. Recall that the derivative evaluated at z = z0

df

dz(z0)

in general depends on the direction of approach to z0, but for analyticfunctions it does not depend on the direction of approach. An analogousstatement applies to the integral of f(z). For analytic functions integrated

79

80 CHAPTER 4. CONTOUR INEGRATION

along a closed contour, that is, one for which the starting and ending pointsare the same, the integral does not depend on the contour and is zero ifthe contour does not enclose any singularities of f(z). This is known asCauchy’s Integral Theorem (CIT for short).

The simplest way to perform an integration of a function of a com-plex variable along a contour is in a parametric fashion: suppose that thecontour C can be described by z = z(t), with t a real variable. Then

dz =dz

dtdt = z(t)dt = (x(t) + iy(t))dt ⇒

∫

Cf(z)dz =

∫ tmax

tmin

[u(t)+ iv(t)]z(t)dt =∫ tmax

tmin

[u(t)+ iv(t)][x(t)+ iy(t)]dt =

∫ tmax

tmin

[u(t)x(t) − v(t)y(t)]dt + i∫ tmax

tmin

[u(t)y(t) + v(t)x(t)]dt

and in this last expression we are dealing with integrals of functions ofreal variables, which presumably we know how to handle. Although thismethod is straightforward, it is not always applicable because there is noguarantee that the variable z can be put in a parametric form along thepath of integration.

A useful relation that bounds the value of an integral of function f(z)is the so-called ML inequality:

∣

∣

∣

∣

∫ z2

z1

f(z)dz∣

∣

∣

∣

≤ max(|f(z)|)∣

∣

∣

∣

∣

∫ l2

l1dl

∣

∣

∣

∣

∣

= ML (4.2)

where M is the maximum value of |f(z)| along the contour and L is thelength of the contour, with l1, l2 corresponding to the end points of thepath z1, z2.

Related to the integral of f(z) over a contour are the notions of a con-tour enclosing a simply-connected region, which is called a “simple con-tour” and has a unique sense of traversing, and of a multiply-connectedregion which we will call a “complicated contour” and has no unique senseof traversing. The term complicated refers to the fact that such a con-tour intersects itself, which is a consequence of the fact that it contains amultiply-connected region; the points of intersection create problems withthe sense of traversing since at each intersection there are more than onechoices for the direction of the path. Examples of such contours are shownin Fig. 4.1.

Another important notion is that of contour deformation: it is oftenconvenient to change the original contour along which we need to perform

4.1. COMPLEX INTEGRATION 81

I0

I2

I

I I3

0

1

(c)(b)(a)

Figure 4.1: Illustration of (a) a “complicated” contour enclosing a multi-ply connected region, which does not have a unique sense of traversing; (b) a“simple” contour enclosing a simply connected region with a unique sense oftraversing, in this example counter-clockwise; and (c) contour deformtation, asdescribed in the text.

the integration into a simpler one, as long as this deformation does notchange the result. For example, consider an irregularly shaped simplecontour such as the one shown in Fig. 4.1(b). Let us denote the integralof the function around the contour as I0. Now assume that by deformingthe contour as shown in Fig. 4.1(c) the value of the integral over the newcontour vanishes:

I0 + I1 + I2 + I3 = 0

But in the limit where the inside and outside contours are fully closed, wewill have I3 = −I1 because along the two straight segments f(z) takes thesame values while dz1 = −dz3. This means that we only have to evaluatethe integral I2 in order to find the value of I0. Since the path for this newintegrals is a circle, it may be easy to perform a parametric integration,which reduces the problem to one of doing integrals of real functions asdescribed above.

How useful is the above argument? In other words, can we use contourdeformation in a practial sense? The answer is a resounding yes, as nu-merous examples will demonstrate in the following sections of this chapter.As a first example consider the following integral

∮

C

1

zdz

where C is an arbitrary simple closed contour around the point z = 0 wherethe integrand is singular. We perform contour deformation as indicatedin Fig. 4.1(c). For the deformed contour, which does not enclose the


singularity z = 0, we conclude by Cauchy’s Integral Theorem that theintegral is zero, since the function 1/z is analytic everywhere inside and onthis deformed contour. This leads to I0 = −I2, since the integrals along thestraight line segments cancel each other, by the argument mentioned above.We can then evaluate the original integral by calculating the integral alongthe inner circular contour, which can be done by parametric integrationbecause along this circle of radius R we have:

z = Reiθ, 0 ≤ θ < 2π ⇒ dz = iReiθdθ

and the contour is traversed in the clockwise sense, which means θ goesfrom 2π to 0. Substituting these relations for the integral along the circularpath we obtain:

∮

CR

1

zdz =

∫ 0

2π

1

ReiθiReiθdθ = −2πi

which shows that the original integral is

I1 =∮

C

1

zdz = 2πi (4.3)

This is a very useful result, because it holds for an arbitrary simple closedcontour which encloses the singularity of the integrand at z = 0. By exactlythe same procedure, we can prove that for n a positive integer and for anarbitrary simple closed contour C enclosing z = 0:

∮

C

1

zndz = 2πiδ1n (4.4)

where δij is the Kronecker delta, equal to one for i = j and zero otherwise.We next prove Cauchy’s Integral Theorem. The basic setup of the proof

is illustrated in Fig. 4.2. The function f(z) is assumed to be analytic onand inside the simple closed contour C. In general we have:∫

f(z)dz =∫

[u(x, y) + iv(x, y)][dx + idy]

=∫

u(x, y)dx−∫

v(x, y)dy + i∫

u(x, y)dy + i∫

v(x, y)dx

(4.5)

For the closed contour C, the various terms in the last expression give:The first term∮

Cu(x, y)dx =

∫ xmax

xmin

ul(x, y)dx +∫ xmin

xmax

uu(x, y)dx


max

min

max

min x

y

y

x0

y

x

AB

path 1

path 2

Figure 4.2: Setup for the proof of Cauchy’s Integral Theorem.

=∫ xmax

xmin

[ul(x, y) − uu(x, y)]dx =∫ xmax

xmin

∫ ymax

ymin

(

−∂u

∂y

)

dydx

=∫ ∫

S

(

−∂u

∂y

)

dydx (4.6)

where ul(x, y) and uu(x, y) refer to the values that u(x, y) takes in the lowerand upper parts of the contour, respectively, as those are identified in Fig.4.2, and S is the total area enclosed by the contour. The second term:∮

Cv(x, y)dy =

∫ ymax

ymin

vr(x, y)dy +∫ ymin

ymax

vl(x, y)dy

=∫ ymax

ymin

[vr(x, y) − vl(x, y)]dy =∫ ymax

ymin

∫ xmax

xmin

(

∂v

∂x

)

dxdy

=∫ ∫

S

(

∂v

∂x

)

dxdy (4.7)

where vl(x, y) and vr(x, y) refer to the values that v(x, y) takes in the leftand right parts of the contour, respectively, as those are identified in Fig.4.2. Combining these two terms we obtain for the real part of the last linein Eq. (4.5):

∮

C[u(x, y)dx − v(x, y)dy] =

∫ ∫

S

(

−∂u

∂y− ∂v

∂x

)

dxdy (4.8)

Similarly we can derive for the imaginary part of the last line in Eq. (4.5):∮

C[u(x, y)dy + v(x, y)dx] =

∫ ∫

S

(

−∂v

∂y+

∂u

∂x

)

dxdy (4.9)


Therefore, if f(z) is analytic on and inside the contour C, from the Cauchy-Riemann relations Eq. (3.28), we conclude that its integral over C vanishesidentically.

A corollary of the CIT is that the integral of an analytic function be-tween two points A and B on the complex plane is independent of thepath we take to connect A and B, as long as A, B and the entire path liewithin the domain of analyticity of the function. To prove this, considertwo different paths going from A to B (see Fig. 4.2, labeled path 1 andpath 2) . The combination of these paths forms a closed contour, if one ofthe paths is traversed in the opposite sense (from B to A, rather than fromA to B), and the result of the integration over the entire closed contour iszero. Therefore, the integration over either path gives the same result.

As a counter example, we consider the integral of a non-analytic func-tion over a simple closed contour C. Our example concerns the prototypi-cal non-analytic function that we discussed before, namely z. Its integralaround C, which encloses a total area S, is:

∮

Czdz =

∫ ∫

S

(

−∂u

∂y− ∂v

∂x

)

dxdy + i∫ ∫

S

(

−∂v

∂y+

∂u

∂x

)

dxdy = 2iS

where we have used Eqs. (4.8) and (4.9), which are valid for any function,and the fact that

z = x − iy ⇒ u(x, y) = x, v(x, y) = −y ⇒

∂u

∂y= 0,

∂v

∂x= 0,

∂u

∂x= 1,

∂v

∂y= −1

A notion related to the CIT is the Cauchy Integral Formula (CIF forshort): for a function f(z) which is analytic everywhere inside the regionenclosed by a simple contour C, which contains the point z0, the followingrelation holds:

f(z0) =1

2πi

∮

C

f(z)

z − z0dz (4.10)

The proof pf this statement proceeds by considering a circular contour ofconstant radius R around z0, and letting R → 0:

CR : z = z0 + Reiθ ⇒ dz = iReiθdθ, 0 ≤ θ < 2π

in which case the integral of f(z)/(z − z0) around this contour is:

limR→0

∮

CR

f(z)

z − z0dz = lim

R→0

∫ 2π

0

f(z0 + Reiθ)

ReiθiReiθdθ = 2πif(z0)


z 0R

C

0

y

x

Figure 4.3: Setup for the proof of the Cauchy Integral Formula.

By using contour deformation we can easily prove that this result holds forany simple contour C that encloses z0, as illustrated in Fig. 4.3.

The deeper meaning of Cauchy’s Integral Formula is the following: ifanalytic function is known at the boundary of a region (the contour ofintegration C), then it can be calculated everywhere within that region byperforming the appropriate contour integral over the boundary. Indeed,if z0 is in the interior of the contour, CIF gives the value of f(z0) byperforming an integral of f(z)/(z − z0) over the contour, that is, by usingonly values of the function on the contour C. This is consistent with thestatements that we made earlier, namely that harmonic functions are fullydetermined by the boundary values and that an anlytic function has realand imaginary parts which are harmonic funtions. CIF basically providesthe recipe for obtaining all the values of an analytic function (and hencethe harmonic functions which are its real and imaginary parts) when theboundary values (the values on the contour C) are known.

Taking the CIF one step further, we can evaluate the derivatives of an-alytic function, by differentiating the expression in Eq. (4.10) with respectto z0:

f ′(z0) =d

dz0

[

1

2πi

∮

C

f(z)

(z − z0)dz

]

=1

2πi

∮

C

f(z)

(z − z0)2dz

and similarly for higher orders:

f (n)(z0) =n!

2πi

∮

C

f(z)

(z − z0)n+1dz (4.11)


This is also consistent with an earlier statement that we had made, namelythat all derivatives of an analytic function exist. We should caution thereader that in the case of derivatives, we cannot use a simple contour ofconstant radius R and let R → 0 to obtain explicit expressions, but wemust perform the contour integration over C.

4.2 Taylor and Laurent series

We next consider a series expansion of functions of complex vairables, anduse as the terms in the series (z − z0)

n with n a non-negative integer,assuming that we are interested in the behavior of the function near thepoint z0 on the complex plane:

f(z) =∞∑

n=0

an(z − z0)n (4.12)

This is the power series expansion of f(z) at a reference point z0. Byanalogy to what we discussed for power series expansions of functions ofreal variables, and using our knowledge of convergence criteria, we concludethat following statements must hold for the power series on the right-handside of Eq. (4.12):— if it converges for z1 and |z2 − z0| < |z1 − z0| then it converges for z2;— if it diverges for z3 and |z4 − z0| > |z3 − z0| then it diverges for z4.To justify these statements, we can think of the convergence of the seriesin absolute terms and since |z − z0| is a real quantity:

|z − z0| =√

(x − x0)2 + (y − y0)2

we can apply the tests we learned for the convergence of real series. Inparticular the comparison test, Eq. (2.19), is useful in establishing thetwo statements mentioned. These lead us to the definition of the radius of

convergence: it is the smallest circle of radius R = |z− z0| that encloses allpoints around z0 for which series converges. In the limit R → ∞ the seriesconverges everywhere on the complex plane; in the limit R = 0 the seriesdiverges everywhere on the complex plane.

Example 4.1: To illustrate these points, we consider the familiar binomialexpansion involving a complex number z:

(a + z)p = ap + pap−1z +p(p − 1)

2!ap−2z2 +

p(p − 1)(p − 2)

3!ap−3z3 + · · ·

4.2. TAYLOR AND LAURENT SERIES 87

In this expression, we have put the function f(z) = (a+z)p in an infinite seriesform of the type shown in Eq. (4.12), with well defined coefficients given by

an =p(p − 1) · · · (p − (n − 1))

n!ap−n

around the point z0 = 0. Next, we ask what is the radius of convergence ofthis series. Defining ζ = z/a, we obtain:

(a + z)p = ap

[

1 + pζ +p(p − 1)

2!ζ2 +

p(p − 1)(p − 2)

3!ζ3 + · · ·

]

(4.13)

We then apply the ratio test, Eq. (2.20), to determine the radius of conver-gence of this series. The ratio of the (n + 1)th to the nth term is∣

∣

∣

∣

∣

∣

[

p(p − 1) · · · (p − n)ζn+1

(n + 1)!

] [

p(p − 1) · · · (p − (n − 1))ζn

n!

]−1∣

∣

∣

∣

∣

∣

=∣

∣

∣

∣

p − n

n + 1ζ∣

∣

∣

∣

and the ratio test requires that this last expression be bounded by a positivenumber smaller than unity for all n > N , but

n → ∞ ⇒∣

∣

∣

∣

p − n

n + 1

∣

∣

∣

∣

→ 1

leaving as the condition for the convergence of the series

|ζ | < 1 ⇒ |z| < |a|

which gives the radius of convergence R = |a| of the series around z0 = 0.

For the power series representation of a function f(z), as given in Eq.(4.12), assuming that it convergences absolutely and uniformly inside thedisk |z − z0| < R, the following statements are true (some for obviousreasons):

1. Since (z − z0)n is continuous, f(z) is also continuous.

2. The power series representation about z0 is unique inside the disk.

3. We can differentiate and integrate the series term by term inside thedisk, to obtain a new convergent series, since the resulting series hasthe same ratio of the (n + 1)th to the nth term as the original series.

4. For R > 0, the power series represents an analytic function, which isinfinitely differentiable within the disk of convergence.


5. Conversely, every function has a unique power series representationabout z0 when it is analytic in a disk of radius R around it, and anymethod of obtaining this series is acceptable.

We present next two specific power series expansions of functions ofcomplex variables, known as the Taylor and the Laurent series expansions.The first refers to a function that is analytic in a disk around a point z0,the second to a function which is not analytic in the neighborhood of z0,but is analytic in a circular annulus around z0. Both series are extremelyuseful in representing functions of complex variables and in using thoserepresentations for practical applications, such as evaluating integrals.

We discuss first the Taylor series expansion. This can be actually de-rived for a function f(z) which is analytic everywhere on and inside asimple closed contour C, by applying the CIF, Eq. (4.10) (note that in thefollowing proof we have replaced the variable of integration by the symbolζ and we have called the point where the function is evaluated simply z):

f(z) =1

2πi

∮

C

f(ζ)

(ζ − z)dζ =

1

2πi

∮

C

f(ζ)

(1 − z−z0

ζ−z0)(ζ − z0)

dζ

=1

2πi

∮

C

f(ζ)

(ζ − z0)

1 +

(

z − z0

ζ − z0

)

+

(

z − z0

ζ − z0

)2

+ · · ·

dζ

= f(z0) + (z − z0)f′(z0) +

1

2!(z − z0)

2f ′′(z0) + · · ·

where in the next-to-last step we have used the geometric series summation

1

1 − t= 1 + t + t2 + t3 + · · ·

(see chapter 1, Eq. (2.17)), with t = (z − z0)/(ζ − z0) which is valid for|t| = |z − z0|/|ζ − z0| < 1 ⇒ |z − z0| < |ζ − z0|, and for the last step wehave used the CIF for the derivatives of f(z), Eq. (4.11).

Example 4.2: Suppose we want to calculate the power series of

f(z) = sin−1(z)

about the point z0 = 0. We first notice that this function is analytic in adisk of non-zero radius around the point of interest. Therefore, it should haveTaylor series expansion around this point. However, it is messy to try to takederivatives of the inverse sine function, so we resort to a trick: since any way


of obtaining the power series expansion is acceptable for an analytic function,we call the original function w and examine its inverse, which is simply the sinefunction:

w = sin−1(z) ⇒ z = sin(w) ⇒ dz

dw= cos(w) =

√

1 − sin2(w) =√

1 − z2

from which we can derive the following relation:

dw

dz=

1√1 − z2

= [1 + (−z2)]−1/2

and in the last expression we recognize the familiar binomial as in Eq. (4.13)with a = 1, ζ = −z2, p = −1/2, which, when expanded in powers of (−z2)and integrated term by term leads to:

w = z +1

6z3 +

3

40z5 +

5

112z7 + · · ·

Note that we can apply integration term by term to the series we obtain fordw/dz because the binomial series converges uniformly for |ζ | = |−z2| < 1, acondition that is certainly satisfied for z in the neighborhood of zero. Moreover,the constant of integration which appears when we integrate the series fordw/dz in order to obtain the series for w, can be determined by observingthat from the definition of w we have z = 0 ⇒ w = 0, and hence the constantof integration must be zero.

z 0

C 2

ρ1

ρ2

C 1

0

y

x

Figure 4.4: Circular annulus around the point z0 on the complex plane, forthe derivation of the Laurent series expansion.


Another important power series expansion is the Laurent series. Con-sider a function f(z) which is analytic on a circular annulus with outerradius ρ1 on the circular contour C1, and inner radius ρ2 on the circularcontour C2, both centered at z0. We will employ the following integrals onthe contours C1 and C2, taken as usual in the counter-clockwise sense:

∮

Ci

f(ζ)

(ζ − z)dζ, i = 1, 2

We notice that if we join the two circular contours by two line segmentstraversed in the opposite direction, as illustrated in Fig. 4.4, the functionf(z) is analytic everywhere in and on the resulting closed contour, andtherefore we can apply the CIF for this situation. In the limit where thetwo line segments lie on top of each other, hence their contributions cancelout, we obtain:

f(z) =1

2πi

∮

C1

f(ζ)

(ζ − z)dζ − 1

2πi

∮

C2

f(ζ)

(ζ − z)dζ (4.14)

In the last expression, each contour is traversed in the counter-clockwisesense and the integration over C2 has an overall minus sign because thatpart was traversed in the clockwise direction in the joined contour. Wedeal with the two integrals that appear on the right-hand side of the aboveequation separately. The first integral can be written as:

I1 =1

2πi

∮

C1

f(ζ)

(ζ − z)dζ =

1

2πi

∮

C1

f(ζ)

(1 − z−z0

ζ−z0)(ζ − z0)

dζ (4.15)

and since for z in the annulus and ζ on the C1 contour we have

∣

∣

∣

∣

∣

z − z0

ζ − z0

∣

∣

∣

∣

∣

< 1

we can apply the geometric series expansion to obtain

I1 =1

2πi

∮

C1

f(ζ)

(ζ − z0)

1 +

(

z − z0

ζ − z0

)

+

(

z − z0

ζ − z0

)2

+ · · ·

dζ (4.16)

For the second integral we have:

I2 = − 1

2πi

∮

C2

f(ζ)

(ζ − z)dζ =

1

2πi

∮

C2

f(ζ)

(1 − ζ−z0

z−z0)(z − z0)

dζ (4.17)


and since for z in the annulus and ζ on the C2 contour we have∣

∣

∣

∣

∣

ζ − z0

z − z0

∣

∣

∣

∣

∣

< 1

we can apply the geometric series expansion to obtain

I2 =1

2πi

∮

C2

f(ζ)

(z − z0)

1 +

(

ζ − z0

z − z0

)

+

(

ζ − z0

z − z0

)2

+ · · ·

dζ (4.18)

Combining these two results, we arrive at the general expression

f(z) =1

2πi

∞∑

n=−∞(z − z0)

n

[

∮

C

f(ζ)

(ζ − z0)n+1dζ

]

(4.19)

with C a closed contour that lies entirely within the circular annulusbounded by the circular contours C1 and C2, as can be easily justifiedby contour deformation. The expression in Eq. (4.19) is known as theLaurent power series expansion and contains both negative and positive

powers of (z − z0)n.

1 2III

III

y

x0

Figure 4.5: Example of Taylor and Laurent series expansions of the functionf(z) = 1/[(z − 1)(z − 2)] around z = 0: the complex plane is separated into thethree regions labeled I, II, III, where the function is analytic on a disc or on acircular annulus.

Example 4.3: To illustrate these concepts we consider the function

f(z) =1

(z − 1)(z − 2)=

1

1 − z+

1

z − 2


and ask what is its power series expansion around z = 0. To answer thisquestion, we need to separate the complex plane in three regions in which wecan easily determine the analyticity of the function. These are:— region I : |z| < 1, where both 1/(1 − z) and 1/(z − 2) are analytic;— region II : 1 < |z| < 2, which encloses a singularity of 1/(1 − z);— region III : 2 < |z|, which encloses a singularity of both 1/(1 − z) and of1/(z − 2).In each region, we can determine separately the expansion of the two fractionsthat appear in f(z), that is, 1/(1 − z) and 1/(z − 2). The first fraction,1/(1−z), is analytic in region I, so we expect a Taylor expansion for it which iseasily obtained from the geometric series, since in this region |z| < 1. RegionsII and III can be considered as circular annuli which contain a singularity atz = 1, so we expect a Laurent expansion; in both regions |z| > 1 ⇒ 1/|z| < 1,hence if we rewrite the denominator so as to contain the factor (1 − 1/z) wecan use the geometric series again to obtain:

I :1

1 − z= 1 + z + z2 + · · ·

II, III :1

1 − z=

1

(1 − 1z)(−z)

=−1

z

[

1 +1

z+

1

z2+ · · ·

]

The second fraction, 1/(z − 2), is analytic in regions I and II, so we expecta Taylor expansion for it which is easily obtained from the geometric series,since in these regions |z| < 2 ⇒ |z/2| < 1. Region III can be consideredas a circular annulus which contains a singularity at z = 2, so we expect aLaurent expansion; in this region |z| > 2 ⇒ 2/|z| < 1, hence if we rewrite thedenominator so as to contain the factor (1 − 2/z) we can use the geometricseries again to obtain:

I, II :1

z − 2=

−1

2

1

(1 − z2)

=−1

2

[

1 +z

2+(

z

2

)2

+ · · ·]

III :1

z − 2=

1

(1 − 2z)z

=1

z

[

1 +2

z+(

2

z

)2

+ · · ·]

Combining the results in the different regions, we obtain:

I : f(z) =1

2+ z

(

1 − 1

4

)

+ z2(

1 − 1

8

)

+ · · ·

II : f(z) = · · · − 1

z2− 1

z− 1

2− 1

4z − 1

8z2 + · · ·

III : f(z) = (−1 + 2)1

z2+ (−1 + 22)

1

z3+ · · ·+ (−1 + 2n−1)

1

zn+ · · ·

4.3. TYPES OF SINGULARITIES - RESIDUE 93

We note that the function has a Taylor expansion in region I, a Laurent ex-pansion with all powers of z in region II and a Laurent expansion with onlynegative powers of z starting at n = −2 in region III.

4.3 Types of singularities - residue

With the help of the Laurent series expansion, we classify the isolatedsingularities of a function into three categories:

1. z0 is a removable singularity if the coefficients of the negative ex-ponents in the Laurent series, b−1, b−2, . . ., are zero. In this casethe function has a Taylor series expansion around z0. If the valueof the function is assigned to be equal to the constant term of theTaylor series expansion, the function is then continuous and analyticeverywhere in the neighborhood of z0.

2. z0 is a pole of order m if the coefficient b−m 6= 0 and coefficientswith lower indices n < −m are all zero. A pole of order one is alsocalled a simple pole.

3. z0 is an essential sigularity if an infinite number of the coefficientsof the negative exponents in the Laurent series are non-zero.

We give some examples to illustrate these notions.We have already encountered one type of removable singularities, the

branch points. In such cases, the singularity is due to multivaluedness ofthe function which results from different values of the function when goingaround the point z0 more than once. This problem is eliminated (removed)by the introduction of branch cuts that connect pairs of branch points.

A different type of removable singularity is one in which the functionitself is not properly defined at z = z0 (hence the singularity), but theTaylor series expansion that approximates it near z0 is well defined for allvalues of z, including z0. If we define the value of the function to be thesame as the value of the series expansion for z = z0, that is, the functionat z0 is assigned the value of the constant term in the series expansion,then the singularity is eliminated (removed) and the function becomescontinuous and analytic. For an example, consider the function

f(z) =sin(z)

z


which for z = 0 is not properly defined: both numerator and denomina-tor are zeroes, so strictly speaking any value is possible for the function.However, if we consider the power series expansion of the numerator anddivide by the denominator for finite z, we have:

sin(z) = z− 1

3!z3 +

1

5!z5 −· · · ⇒ sin(z)

z= 1− 1

3!z2 +

1

5!z4 −· · · , z : finite

and the value of the last expression for z = 0 is unity. If we now assign thesame value to the function for z = 0, that is, f(0) ≡ 1, then the apparentsingularity at z = 0 is removed and the function becomes analytic in theneighborhood of this point, including the point itself.

As far as poles are concerned, the standard example of a pole of orderm is the function

f(z) =1

(z − z0)m

Occasionally, the pole is a little harder to identify, as for example in thecase:

f(z) =1

sinm(z)

which has a pole of order m at z = 0. To see this, consider the expansionof sin(z) near z = 0, which gives:

sin(z) = z − 1

3!z3 + · · · ⇒ sinm(z) = zm

(

1 − 1

3!z2 + · · ·

)m

f(z) =1

zm

(

1 +1

3!z2 + · · ·

)−m

→ 1

zmfor z → 0

which is the behavior we expect for a pole of order m at z = 0.

An example of an essential singularity is the point z = 0 for the function

f(z) = e1/z = 1 +1

z+

1

2!

1

z2+

1

3!

1

z3+ · · ·

where we have used the familiar power sereies expansion of the exponential.In the above expansion, all negative powers of z appear, hence we have aninfinite number of coefficients of the negative powers which are not zero (aLaurent expansion). For z → 0, each term z−m gives a larger and largercontribution as m increases.

4.4. INTEGRATION BY RESIDUES 95

Another important consequence of the Laurent series expansion is thatthe integral of a function f(z) around a closed contour C[z0]

1 which en-closes a singularity of the function at z0, can be evaluated using the coef-ficients of the Laurent series expansion. Assuming that the Laurent seriesconverges uniformly for z inside a circular annulus around z0, we have:

f(z) =∞∑

n=−∞bn(z − z0)

n ⇒∮

C[z0]f(z)dz =

∞∑

n=−∞bn

∮

C[z0](z − z0)

ndz

but we have shown earlier, Eq. (4.4), that:∮

C[z0]

1

(z − z0)ndz = 2πiδ1n

and of course all the terms (z − z0)n with n ≥ 0 give vanishing integrals

because they are analytic, which leads to:∮

C[z0]f(z)dz = 2πi b−1 = 2πi(Residue at z0)

that is, the integral is equal to 2πi times the coefficient of the term (z−z0)n

with exponent n = −1, to which we give a special name, the “residue” atz = z0. This is known as the “residue theorem”. The usefulness of thetheorem lies in the fact that the coefficients in the Laurent expansion maybe determined by much simpler methods than by having to evaluate theintegrals which appear in the general expression of Eq. (4.19); this was thecase in the example of a Laurent series expansion that we discussed above.

4.4 Integration by residues

The simplest application of the residue theorem involves simple poles, thatis, integrands which have simple roots in the denominator. We assume thatthe integrand can be written as a ratio of two functions, f(z) and g(z) suchthat: f(z) is analytic inside the region enclosed by C and g(z) has simpleroots at z = zk, k = 1, ..., n in this region. A simple root of g(z) at z = zk

means that the function vanishes at this value of z but its first derivativedoes not (see also Problem 2). Near the roots we can expand the functiong(z) in Taylor series as:

g(z) = g′(zk)(z − zk) +1

2!g′′(zk)(z − zk)

2 + · · ·1In the following we adopt the notation C[z0] to denote that the contour C encloses

the point z0 at which the integrand has a singularity. Other features of the contour C,are indicated as subscripts.


z 1 z k

C

0

y

x

Figure 4.6: Illustration of contour deformation for the integration of a functionwith several simple residues at z = z1, . . . , zk in the closed contour C: the smallcircles centered at each pole have vanishing radii and the contributions from thestraight segments cancel out.

since g(zk) = 0, where the dots represent higher order terms in (z − zk)which vanish much faster than the first order term as z → zk. We alsoknow that because we are dealing with simple roots of g(z), g′(zk) 6= 0.We can then write:

g(z) = g′(zk)(z − zk)

[

1 +1

2!

g′′(zk)

g′(zk)(z − zk) + · · ·

]

⇒

g(z) = g′(zk)(z − zk) for z → zk

The integral of the ratio f(z)/g(z) over the contour C is then given by:

∮

C[zk(k=1,...,n)]

f(z)

g(z)dz =

n∑

k=1

∮

C[zk]

f(z)

g′(zk)(z − zk)dz = 2πi

n∑

k=1

[

f(zk)

g′(zk)

]

(4.20)where the final result comes from performing contour deformation as indi-cated in Fig. 4.6 and using the small circular paths with radii tending tozero around each simple root of the denominator, for which the result of Eq.(4.4) applies. Evidently, in this case the residues at z = zk (k = 1, . . . , n)come out to be f(zk)/g

′(zk), which can be easily evaluated.A generalization of this result is the calculation of the contribution from

a pole of order m at z = z0, that is, a root of order m in the denominator.In this case, the function of interest can be written in general form as:

f(z) =b−m

(z − z0)m+

b−m+1

(z − z0)m−1+ · · ·+ b−1

(z − z0)+ b0 + b1(z − z0) + · · ·


that is, it has a Laurent expansion starting with the term of order −m. Wedefine a new function h(z) by multiplying f(z) with the factor (z − z0)

m:

h(z) = (z − z0)mf(z)

= b−m + b−m+1(z − z0) + · · · + b−1(z − z0)m−1 + b0(z − z0)

m + · · ·

which is a Taylor series expansion for the function h(z), since it containsno negative powers of (z−z0), therefore the coefficient of the term of orderm−1 is the derivative of the same order evaluated at z = z0, with a factorof 1/(m − 1)! in front:

b−1 =

[

1

(m − 1)!

dm−1

dzm−1h(z)

]

z=z0

or, equivalently, for a pole of order m the residue takes the form:

(Residue at z0) = b−1 =

[

1

(m − 1)!

dm−1

dzm−1((z − z0)

mf(z))

]

z=z0

(4.21)

With this result, we can generalize the residue theorem to any number n ofsingularities at z = zk (k = 1, . . . , n) within the contour C, each singularitybeing a pole of arbitrary order (but not an essential singularity, because inthat case we cannot evaluate the residue):

∮

C[zk(k=1,...,n)]f(z) dz = 2πi

n∑

k=1

(Residues in C) (4.22)

These results are extermely useful in evaluating both complex integralsas well as integrals of real functions. In fact, they can be used to evaluatemany real integrals by cleverly manipulating the expressions that appearin the integrand. Since this assertion cannot be described in general terms,we illustrate it by several representative examples.

Example 4.4: We consider the following real integral:

I =∫ ∞

0

1

x2 + 1dx (4.23)

which we will evaluate as one part of a complex integral over a closed contourC which is shown in Fig. 4.7:

I + I ′ + IR =∮

C

1

z2 + 1dz


As is evident from Fig. 4.7, the real integral I corresponds to the integralalong the part of the contour C extending from x = 0 to x → ∞. The othertwo parts of the contour, along the negative real axis (x → −∞ to x = 0)and along the semicircle of radius R centered at the origin, give rise to thecontributions denoted as I ′ and IR respectively. From the residue theorem we

IR

I’−R RI

i

0

y

x

Figure 4.7: Contour for evaluating the real integral of Eq. (4.23).

have for the total contribution to the contour C from its different parts:

I + I ′ + IR = 2πi∑

(Residues in C) = 2πi1

2i= π

because there is only one singularity of the integrand contained in the contourC at z = i and the value value of the Residue for this singularity is 1/(2i),as can be easily derived from Eq. (4.20) with f(z) = 1 and g(z) = z2 + 1.But the integrand along the real axis is an even function of x, that is, it is thesame for x → −x, which leads to:

I ′ =∫ 0

−∞

1

x2 + 1dx =

∫ ∞

0

1

(−x)2 + 1d(−x) = I

Moreover, the integral IR over the large semicircle with radius R → ∞ van-ishes, because the integrand goes to zero faster than 1/R. To justify this claim,we note that we can write the variable of integration z and the differntial dzover the semicircle as:

z = Reiθ, 0 ≤ θ ≤ π ⇒ dz = Rieiθdθ

and with these, the integral IR in the limit R → ∞ takes the form:

limR→∞

(IR) = limR→∞

[∫ π

0

1

R2e2iθ + 1iReiθdθ

]

= limR→∞

i

R

[

∫ π

0

eiθ

e2iθ + 1R2

dθ

]

= 0


since the integral over θ gives a finite value. Having established that I ′ = Iand IR → 0 for R → ∞, we conclude that the value of the original integral is

I =∫ ∞

0

1

x2 + 1dx =

π

2

a result we had obtained in chapter 1, Eq. (1.41), using an inspired changeof variables; here the same result was derived by simply applying the residuetheorem.

Before we proceed we further examples, we will generalize what wefound above for the behavior of the integral IR over a semicircle of radiusR centered at the origin. Consider the case where the integrand is the ratioof two polynomials of degree p and q, respectively:

I =∫ xf

xi

a0 + a1x + a2x2 + · · · + apx

p

b0 + b1x + b2x2 + · · · + bqxqdx

and we will try to evaluate this integral by turning it into a complex integralover a contour C one part of which can be identified as the real integral I:

∫

C

a0 + a1z + a2z2 + · · · + apz

p

b0 + b1z + b2z2 + · · · + bqzqdz

Let us assume that the contour C contains an arc of radius R centered atthe origin, which contributes IR to the value of the contour integral. Alongthis arc, the integration variable z and the differential dz take the form:

z = Reiθ, θ1 ≤ θ ≤ θ2

which, when substituted in the complex integral, leads to:

IR =∫ θ2

θ1

a0 + a1Reiθ + a2R2ei2θ + · · · + apR

peipθ

b0 + b1Reiθ + b2R2ei2θ + · · ·+ bqRqeiqθiReiθdθ

Taking common factors Rpeipθ and Rqeiqθ from the numerator and the de-nominator and keeping only the dominant terms (because eventually wewill let R → ∞) we arrive at:

IR =ap

bqRp−q+1

∫ θ2

θ1

ei(p−q+1)θidθ =ap

bqRp−q+1 1

p − q + 1

[

eiθ2 − eiθ1

]

(4.24)

where in order to obtain the last expression we must assume that p − q 6=−1. Typically, we want this integral to give a vanishing contribution when


R → ∞, which is useful when one of the limits of the original is infinite.For this to happen, we need to have

p − q + 1 ≤ −1 ⇒ q ≥ p + 2

so that the factor Rp−q+1 makes the value of IR tend to zero. The conclusionfrom this analysis is that, when the integrand is a ratio of two polynomials,the order of the polynomial in the denominator must be larger by at least

two than the order of the polynomial in the numerator, to guarantee that

IR → 0 as R → 0.

IR

IR

I’−R RI

Path 1 Path 2

z z

z

2

I

I’

z 0

1

3

R

z0

0

y

x 0

y

x

Figure 4.8: Two possible contour choices, labeled Path 1 (C1) and Path 2(C2), for evaluating the real integral of Eq. (4.25).

Example 4.5: As a second example, we consider a similar real integral,for which a simple change of variables would not work. The residue theorem,however, can still be applied with equal ease. The real integral we want toevaluate is:

I =∫ ∞

0

1

x8 + 1dx (4.25)

which, as in the previous example, we turn into a complex integral over a closedcontour C1, shown in Fig. 4.8:

I + I + IR =∮

C1

1

z8 + 1dz

This contour is chosen so that one part of it lies along the positive real axis(from x = 0 to x → ∞), which gives the integral I that we want to evaluate.There are two more parts to the contour, one along the negative real axis


leading to the integral I ′, and one along the semicircle of radius R centeredat the origin, leading to the integral IR. We notice that the integrand is aneven function under x → −x, so the integral on the negative real axis (fromx → −∞ to x = 0) is the same as I, that is, in the notation of Fig. 4.8,I ′ = I. We can also argue, for the same reasons as in the previous example,that IR → 0 for R → ∞, therefore

I + I ′ + limR→0

(IR) = 2I = 2πi∑

(Residues in C1)

For the residues, we use the expression of Eq. (4.20) since the denominatorg(z) = z8 + 1 has only simple roots inside the contour C1 and the numeratoris simply unity, which for each residue leads to:

1

8z7k

, for z8k = −1 ⇒ zk = ei(2k+1)π/8, k = 0, . . . , 7

Of those residues the ones with k = 0, 1, 2, 3 are within C1. Moreover, wenote from the above relations that

z8k = −1 ⇒ z7

k = − 1

zk⇒ 1

8z7k

= −zk

8

which leads to the following result for the original integral:

2I = 2πi−1

8

[

eiπ/8 + ei3π/8 + ei5π/8 + ei7π/8]

and using the trigonometric relations

cos(

7π

8

)

= − cos(

π

8

)

, sin(

7π

8

)

= sin(

π

8

)

cos(

5π

8

)

= − cos(

3π

8

)

, sin(

5π

8

)

= sin(

3π

8

)

we arrive at the final result

I =π

4

(

sin(

π

8

)

+ sin(

3π

8

))

=π

4

(

sin(

π

8

)

+ cos(

π

8

))

Let us try to evaluate the same integral with the use of contour C2: Asfor the previous path, the part on the x axis (from x = 0 to x → ∞) givesthe desired integral I, while the path along the circular arc gives a vanishingcontribution by the same arguments as before, IR → 0 for R → ∞. For the


last part of this contour, along the line at an angle π/4 relative to the x axis,we can write the variable z and the differential dz as:

z = reiπ/4, dz = eiπ/4dr

which leads to the following contribution:

I ′ =∫ 0

r=∞

1

(reiπ/4)8 + 1d(reiπ/4) = −eiπ/4

∫ ∞

0

1

r8 + 1dr = −eiπ/4I

We therefore have for the entire contour:

I + I ′ + limR→0

(IR) = I(1 − eiπ/4) = 2πi∑

(Residues in C2)

and since the only singularity within C2 is at z0 = exp(iπ/8), using the relationswe established above for the value of the residues, we find the final answer,

I(1 − eiπ/4) = 2πi−1

8eiπ/8 ⇒ I = −πi

4

eiπ/8

1 − eiπ/4

or, after using Euler’s formula for the complex exponentials,

I = −πi

4

1

e−iπ/8 − eiπ/8=

π

8 sin π8

=π

4

(

sinπ

8+ cos

π

8

)

which is identical to the previous answer from contour C1, as it should be.

Example 4.6: A slightly more complicated situation arises when theintegrand has poles on the real axis, as for example in the case of the integral

I =∫ ∞

0

1

x8 − 1dx (4.26)

with which we deal in the same fashion as before, that is, by turning it intoa complex integral over a closed contour, a part of which can be identified asthe desired real integral. Two possible contour choices are shown in Fig. 4.9.In the following we concentrate on the first choice, which we will call contourC1:

I + I ′ + IR + Ia + Ib =∮

C1

1

z8 − 1dz

Note that in addition to the usual contributions from the positive real axis(which corrsponds to the desired integral I), from the negative real axis (whichcorresponds to I ′ and as in the previous two examples I ′ = I), and from the


IR

IR

z 1

z 2

z3z1

−R RI’ I−1 1 R

I’

I1

I

I

b Ia

b

Ia

Path 1 Path 2

0

y

x 0

y

x

Figure 4.9: Two possible contour choices, labeled Path 1 (C1) and Path 2(C2), for the evaluation of the real integral in Eq. (4.26).

semicircle centered at the origin with radius R (which, for the same reasonsas in the previous examples vanishes for R → ∞), we also have two morecontributions from the the semicircles with vanishing radii, centered at x =±1. These contributions are required in order to avoid the singularities ofthe integrand at z = ±1. This situation is analogous to the example we haddiscussed in chapter 1, for the integral

∫ ∞

0

1

x2 − 1dx

where we had argued that approaching the root of the denominator at x = 1symmetrically from above and below is required in order to produce a welldefined value for the integral (we had called this approach taking the principalvalue of the integral). This is precisely what the two semicircles with vanishingradii lead to: For the semicircle centered at x = +1 we have:

Ca[+1] : za = 1 + ǫaeiθa , π ≤ θa ≤ 0 ⇒

Ia =∫

Ca[+1]

1

z8a − 1

dza = −∫ π

0

1

(1 + ǫaeiθa)8 − 1ǫaie

iθa dθa

and similarly, for the semicircle centered at x = −1 we have:

Cb[−1] : zb = −1 + ǫbeiθb , π ≤ θb ≤ 0 ⇒

Ib =∫

Cb[−1]

1

z8b − 1

dzb = −∫ π

0

1

(−1 + ǫbeiθb)8 − 1ǫbie

iθb dθb


so in the limit ǫi → 0, i = a, b the integrals I and I ′ are evaluated in theprincipal value sense:

I = limǫa→0

[∫ 1−ǫa

0

1

x8 − 1dx +

∫ ∞

1+ǫa

1

x8 − 1dx]

I ′ = limǫb→0

[∫ −1−ǫb

−∞

1

x8 − 1dx +

∫ 0

−1+ǫb

1

x8 − 1dx]

We can expand the denominators in the integrands of Ia and Ib with the useof the binomial expansion: for the pole at z = +1 we have

z8a = (1 + ǫae

iθa)8 = 1 + 8ǫaeiθa + · · · ⇒ z8

a − 1 = 8ǫaeiθa

while for the pole at z = −1 we have

z8b = (−1 + ǫbe

iθb)8 = (1 − ǫbeiθb)8 = 1 − 8ǫbe

iθb + · · · ⇒ z8b − 1 = −8ǫbe

iθb

where we have neglected higher order terms, since we are interested in thelimit ǫi → 0, i = a, b. When these expansions are substituted in the integralsIa and Ib, we find:

limǫa→0

(Ia) = −iπ1

8, lim

ǫb→0(Ib) = −iπ

−1

8

that is, the two contributions from the infinitesimal semicircles centered atx = ±1 cancel each other. Putting all these results together, we arrive at thefinal result:

I + I ′ + limǫa→0

(Ia) + limǫb→0

(Ib) + limR→0

(IR) = 2I = 2πi∑

(Residues in C1)

= 2πi1

8

[

eiπ/4 + eiπ/2 + ei3π/4]

because the only residues enclosed by the entire contour C are the simple polesat

z1 = eiπ/4, z2 = ei2π/4, z3 = ei3π/4

all of which are solutions of the equation

z8k = 1 ⇒ 1

z7k

= zk, k = 1, 2, 3

Using the identities:

cos(

π

4

)

= − cos(

3π

4

)

, sin(

π

4

)

= sin(

3π

4

)

, eiπ/2 = i


we get for the value of I:

I = −π

8(1 +

√2) (4.27)

In the above example, it is instructive to write the expressions for Ia

and Ib in a different format:

limǫa→0

(Ia) = 2πi(

π

2π

)

(−1)(Residue at z = 1)

limǫb→0

(Ib) = 2πi(

π

2π

)

(−1)(Residue at z = −1)

These expressions can be rationalized as follows: The integral over a circu-lar arc of vanishing radius centered at z = z0 where the integrand f(z) hasa simple pole, is equal to 2πi times the fraction of the arc in units of 2πtimes the residue of the function at the simpe pole; an extra factor of −1accompanies the residue when the arc is traversed in the clockwise sense.

This is actually a specific application of a the following more generalresult:Consider a function f(z) analytic on annulus centered at z0, with a simplepole at z = z0; then

limǫ→0

∫

Cǫ,ϕ[z0]f(z)dz = 2πi

(

ϕ

2π

)

(Residue at z = z0) (4.28)

where Cǫ,ϕ[z0] is a circular arc of radius ǫ and angular width ϕ, which liesentirely within the annulus centered at z0:

Cǫ,ϕ[z0] : z = z0 + ǫeiθ, θ0 ≤ θ ≤ θ0 + ϕ

To prove this statement, we note first that since f(z) is analytic on anannulus around z0 we can use its Laurent series expansion in the neighbor-hood of z0 assuming that it converges unifromly in the annulus:

f(z) =∞∑

n=−∞bn(z − z0)

n

Since the fuction has a simple pole at z0, bm = 0 for m ≤ −2 which leadsto:

∫

Cǫ,ϕ[z0]f(z)dz =

∞∑

n=−∞bn

∫ θ0+ϕ

θ0

iǫeiθǫneinθdθ =∞∑

n=−1

bnIn


where the integral In is defined as:

In = ǫn+1i∫ θ0+ϕ

θ0

ei(n+1)θdθ

and evaluation of this gives:

for n = −1 → I−1 = iϕ

for n ≥ 0 → In = ǫn+1 1

n + 1eiθ0(n+1)

(

eiϕ(n+1) − 1)

(4.29)

In the limit ǫ → 0 the terms with n 6= −1 vanish, and only the term n = −1survives, leading to

∫

Cǫ,ϕ[z0]f(z) dz = iϕ b−1 = i ϕ (Residue at z = z0)

which proves the desired formula, Eq. (4.28).

Note that if ϕ = 2π this formula simply reduces to the residue theorem.We emphasize, however, that the formula of Eq. (4.28) applies to simple

poles only, whereas the residue theorem is valid for a pole of any order.To see where this comes from, we observe that for an arbitrary value ofϕ 6= 2π the expression on the right-hand side of Eq. (4.29) vanishes onlyfor n ≥ 0, because for these values of n the power of ǫ is positive and thelimit ǫ → 0 produces a zero value for In. In contrast to this, when ϕ = 2πthe expression on the right-hand side of Eq. (4.29) vanishes identically forany value of n, because of the identity

ei2π(n+1) = 1

which is a direct consequence of the Euler formula. Thus, for ϕ = 2π wecan afford to have terms with n < −1 in the Laurent expansion, whichimplies a pole of order higher than unity, since the integrals In producedby these terms will vanish and we will be left with the value of the residue.

We also note that if the circular arc is traversed in the counter-clockwisesense, the angle θ increases, therefore ϕ > 0, whereas if it is traversed in theclockwise sense, ϕ < 0 which produces the (−1) factor factor mentionedearlier.

The formula of Eq. (4.28) allows us to skip simple poles that happen tolie on the path of integration, which is equivalent to evaluating integrals bytaking their principal value, as already mentioned in the example above.


Another very useful general criterion for when the integrals over circulararcs of infinite radius vanish is the following:Jordan’s Lemma: Consider the integral

IR =∫

Ceiλzf(z) dz (4.30)

where λ is a real, positive constant, C is a circular arc of radius R centeredat the origin and lying on the upper-half complex plane, that is,

C : z = Reiθ, 0 ≤ θ1 ≤ θ ≤ θ2 ≤ π

and the function f(z) is bounded on C,

|f(z)| ≤ F, F : finite

then the integral IR over the circular arc is bounded and for F → 0 itvanishes.This statement can be proved as follows: on the circular arc, extendingfrom θ1 to θ2 we have:

z = Reiθ, dz = Rieiθ

and the absolute value of the integral over this arc takes the form:

|IR| =

∣

∣

∣

∣

∣

∫ θ2

θ1

eiλ(x+iy)f(z)iReiθ dθ

∣

∣

∣

∣

∣

≤∫ θ2

θ1

∣

∣

∣eiλ(x+iy)f(z)iReiθ∣

∣

∣ dθ

where the last relation is derived from the triangle inequality. But |i| =| exp(iλx)| = | exp(iθ)| = 1 and y = R sin(θ), which lead to:

|IR| ≤ FR∫ θ2

θ1

e−λR sin(θ) dθ

Assume θ1 = 0, θ2 = π, which is covers the entire range of allowed valuesfor the situation we are considering, then since the integrand in the lastexpression is a symmetric function of θ with respect to π/2, we can evaluateit only in the range 0 ≤ θ ≤ π/2 and multiply it by a factor of 2:

sin(π − θ) = sin(θ) ⇒∫ π

0e−λR sin(θ) dθ = 2

∫ π/2

0e−λR sin(θ) dθ

but in this range we have:

0 ≤ θ ≤ π

2: sin(θ) ≥ 2θ

π


which can be easily shown by a graphical argument, and if we replace sin(θ)in the integrand by 2θ/π we will get an upper bound for the original value:

0 ≤ θ ≤ π

2: − sin(θ) ≤ −2θ

π⇒∫ π/2

0e−λR sin(θ)dθ ≤

∫ π/2

0e−λR2θ/πdθ

With all this we obtain:

|IR| ≤ 2FR∫ π/2

0e−λR2θ/π dθ

The following change of variables

t =λR2θ

π

when inserted in the integral gives:

|IR| ≤Fπ

λ

∫ λR

0e−t dt =

Fπ

λ(1 − e−λR)

From this result, we obtain in the limit of R → ∞

R → ∞ ⇒ |IR| ≤Fπ

λ

which shows that the integral IR is bounded since F is finite and in thelimit F → 0 it vanishes. Note that Jordan’s Lemma can be applied for λa real, negative constant as well, only then the path of integration must lieon the lower half of the complex plane (that is, y ≤ 0).

Example 4.7: To illustrate the usefulness of Jordan’s Lemma, we considerthe following example of a real integral which can be evaluated by contourintegration:

I =∫ ∞

−∞

cos(λx)

x2 − a2dx (4.31)

with a, λ real positive constants. First, note that we can rewrite this integralin terms of two other integrals:

I =1

2

∫ ∞

−∞

eiλx + e−iλx

x2 − a2dx =

1

2I1 +

1

2I2

I1 =∫ ∞

−∞

eiλx

x2 − a2dx, I2 =

∫ ∞

−∞

e−iλx

x2 − a2dx


I1

I2

~I’a a

~I

−a Ra−R

IR−

−R R

IR+

−a a

I’a aI

x

0

y

x

y

0

Figure 4.10: Contour for the evaluating the real integral of Eq. (4.31) usingJordan’s Lemma.

and I1, I2 can be considered as parts of contour integrals on the complex planewhich satisfy the conditions of Jordan’s Lemma for the contours shown in Fig.4.10. For I1, we perform the integration on a contour skipping over the polesat x = ±a on the real axis and closing by a semicircle lying on the upper halfof the complex plane, because in this case the exponential in the integrandinvolves the real constant λ > 0 so that Jordan’s Lemma is satisfied for y ≥ 0for the integral IR+. This contour integration leads to:

I1 + limǫa→0

(Ia) + limǫa→0

(I ′a) + lim

R→0(IR+) = 0 ⇒ I1 = − lim

ǫa→0(Ia) − lim

ǫa→0(I ′

a)

where the integral IR+ on the upper half plane is eliminated by Jordan’sLemma. The other two integrals at the singularities on the x axis give:

limǫa→0

(Ia) = −1

2(2πi)(Residue at z = −a) =

πi

2ae−iλa

limǫa→0

(I ′a) = −1

2(2πi)(Residue at z = +a) = −πi

2aeiλa

For I2, we perform the integration on a contour skipping over the poles atx = ±a on the real axis and closing by a semicircle on the lower half of thecomplex plane, in order to satisfy Jordan’s Lemma for y ≤ 0 for the integralIR−, because in this case the exponential in the integral involves the realconstant −λ < 0. This contour integration leads to:

I2 + limǫa→0

(Ia) + limǫa→0

(I ′a) + lim

R→0(IR−) = 0 ⇒ I2 = − lim

ǫa→0(Ia) − lim

ǫa→0(I ′

a)

where the integral IR− on the lower half plane is eliminated by Jordan’s Lemma.The other two integrals at the singularities on the x axis give:

limǫa→0

(Ia) =1

2(2πi)(Residue at z = −a) = − πi

2ae−iλa


limǫa→0

(I ′a) =

1

2(2πi)(Residue at z = +a) =

πi

2aeiλa

In both cases, the integrals around the singularities at x = ±a were evaluatedin the usual way, using the results we discussed above for skipping simple poles.This is equivalent to evaluating the integrals I1 and I2 in the principal valuesense. Note that for Ia and I ′

a the semicircle of radius ǫa → 0 is traversedin the clockwise sense, while for Ia and I ′

a the semicircle is traversed in thecounter-clockwise sense.

Combining the results we obtain for the value of the original integral:

I =1

2[I1 + I2] = −1

2

[

limǫa→0

(Ia) + limǫa→0

(I ′a) + lim

ǫa→0(Ia) + lim

ǫa→0(I ′

a)]

= −π

asin(λa)

Example 4.8: We look at a different integral where Jordan’s Lemmacomes again into play: consider the real integral

I =∫ ∞

0

(

sin(x)

x

)2

dx (4.32)

First, note that there are no signularities involved in this integral, becausesin(x)/x is well defined everywhere, even for x = 0 because:

limx→0

(

sin(x)

x

)

= 1

as can be easily established from the Taylor expansion of sin(x) near x = 0(see chapter 1). Morover, we can extend the integration to negative values ofx because the integrand is an even function of x, so

I =1

2

∫ ∞

−∞

(

sin(x)

x

)2

dx

We use the expression for the sine in terms of the complex exponentials andrewrite the original integral as follows:

I =1

2

∫ ∞

−∞

(

eix − e−ix

2ix

)2

dx = −1

8

[

∫ ∞

−∞

e2ix − 2 + e−2ix

x2dx

]


I

I

I+ I0

I− R−R

−R R

ε

ε

x

0

y

x

y

0

Figure 4.11: Integration contours for the evaluation of the three integrals,I+, I−, I0, into which the real integral in Eq. (4.32) is decomposed.

From the last expression, we can break the integral into three parts and performcontour integration in terms of the complex variable z, to find their values.

I = −1

8(I+ − 2I0 + I−), I± =

∫ ∞

−∞

e±2ix

x2dx, I0 =

∫ ∞

−∞

1

x2dx

However, by doing this we have introduced in the integrands functions withsingularities. To avoid infinities that will arise from the singularities, we musttreat them on an equal footing, that is, by using the same variables to de-scribe portions of the path that skip over the singularities. We will employ thecontours shown in Fig. 4.11, for which the contribution along the real axisrepresents the desired integrals: For I+ and I0 we use a contour that closes onthe upper half plane by a semicircle of radius R → ∞, whereas for I− we usea contour that closes on the lower half plane by a semicircle of radius R → ∞.We jump over the singularities at z = 0 with the same path, which consistsof a semicircle on the upper half plane of radius ǫ → 0, centered at zero,for all three integrals. Since there are two contributions from this semicirclecorresponding to I+ and I− and one more contribution corresponding to I0

which has an overall factor of −2 in front, and in the limit ǫ → 0 all thesecontributions are equal, denoted by Iǫ in Fig. 4.11, they cancel out:

Iǫ + Iǫ − 2Iǫ = 0

For contour that contains the integral I+, from Jordan’s Lemma, the contri-bution of the semicircle of radius R → ∞ and since the enclosed area containsno sigularities we conclude that the contribution of this contour to the sumis zero. As far as I0 is concerned, using the result for the integral of a ratioof two polynomials, Eq. (4.24), we conclude that the contribution from the


semicircle of radius R → ∞ vanishes, and since the enclosed area for thecorresponding contour contains no singularities its contribution to the sum iszero. Finally, for the integral I−, from Jordan’s Lemma we will have anothervanishing contribution from the semicircle of radius R → ∞ on the lower halfplane and the contour integral is therefore equal to 2πi times the sum of theenclosed residues. In this case, the contour encloses the point z = 0. Since thesingularity at this point is a second-order pole, we can use the general resultof Eq. (4.21) to find:

(Residue at z = 0) =

[

d

dz

(

z2 e−2iz

z2

)]

z=0

= −2i

Putting all these partial results together, we obtain for the integral of Eq.(4.32):

I = −1

8[−2πi(Residue at z = 0)] =

π

2

Example 4.9: Another type of contour integral where similar issues ariseconcerns functions which require a branch cut. Consider the real integral

I =∫ ∞

0

xλ

x + 1dx, −1 < λ < 0 (4.33)

which can be expressed as part of a contour integral of a complex function.Note that raising the variable x to the non-integer power λ implies that wewill need to introduce a branch cut when we pass to the complex variable z,from z = 0 to z → ∞. We therefore consider the contour shown in Fig. 4.12,with the branch cut along the real axis, from z = x = 0 to z = x → ∞.The point z = 0, being a singularity of the integrand (a branch point) mustbe avoided, hence the small circle of radius ǫ → 0 around it. The contour isclosed by a large circle of readius R → ∞ also centered at the origin. The partof the contour which lies just above the real axis approaches the value of thedesired integral I. For the part of the contour just below the real axis, wherethe argument of the complex variable z approaches the value 2π, we have:

z = xe2πi ⇒ zλ

z + 1= ei2πλ xλ

x + 1⇒ I ′ = −ei2πλI

The circle of radius ǫ around the origin contributes:

z = ǫeiθ ⇒ Iǫ = −∫ 2π

0

(ǫeiθ)λ

ǫeiθ + 1ǫieiθdθ = −iǫ1+λ

∫ 2π

0

eiθ(1+λ)

ǫeiθ + 1dθ


I

IR

−R R

I

I’

ε

0 x

y

Figure 4.12: Integration contour for the evaluation of the real integral in Eq.(4.33) with a branch cut along the positive real axis.

⇒ limǫ→0

(Iǫ) = 0

since the contribution of the integral over θ for any ǫ is a finite quantity and1 + λ > 0. By analogous considerations, the circle of radius R → ∞ aroundthe origin contributes:

z = Reiθ ⇒ IR =∫ 2π

0

(Reiθ)λ

Reiθ + 1Rieiθdθ = iR1+λ

∫ 2π

0

eiθ(1+λ)

Reiθ + 1dθ ⇒

|IR| ≤ R1+λ∫ 2π

0|Reiθ + 1|−1dθ ≤ R1+λ

R − 12π ⇒ lim

R→∞|IR| ≤ lim

R→∞(2πRλ) = 0

where we have used the ML inequality, Eq. (4.2), to find an upper bound forthe integral |IR| before taking the limit R → ∞. Finally, the entire contourencloses only one singularity, the point z = −1 which is evidently a simplepole, the residue at that value being:

(Residue at z = −1) = (eiπ)λ = eiλπ

Putting all this together, we obtain:

I + I ′ + limǫ→0

(Iǫ) + limR→∞

(IR) = 2πi(Residue at z = −1) ⇒

I(1 − ei2πλ) = 2πieiλπ ⇒ I = − π

sin(λπ)


Example 4.10: As a final example we consider an integration that re-quires a rectangular rather than a semicircular contour. Integrals of that typeappear when the integrand involves trigonometric functions. Our example con-sists ot the following real integral:

I0 =∫ ∞

−∞

sin(λx)

sinh(x)dx = I

[

∫ ∞

−∞

eiλx

sinh(x)dx

]

= Im[I] (4.34)

We notice that from the definition of the hyperbolic functions,

sinh(x) =ex − e−x

2⇒ sinh(x + 2πi) =

ex+2πi − e−x−2πi

2= sinh(x)

so we can use the contour shown in Fig. 4.13 for the evaluation of the integralI in Eq. (4.34) because along the two horizontal parts of the contour thedenominators in the integrands are the same. For the part of this contour

−R R

2π

I

I

Iπ

I

b

a

RI’R

I’

y

0 x

Figure 4.13: Rectangular integration contour for the evaluation of the realintegral in Eq. (4.34).

along z = x + i2π we have:

I ′ = −∫ ∞

−∞

eiλ(x+2πi)

sinh(x + 2πi)d(x + 2πi) = −e−2λπI

For the contribution IR along the vertical path z = R + iy, 0 ≤ y ≤ 2π theintegrand becomes:

2eiλ(R+iy)

(eR+iy − e−R−iy)=

2e−λyeiλR

eR(eiy − e−2R−iy)⇒


limR→∞

(IR) = limR→∞

(

1

eR

∫ 2π

0

2ie−λyeiλR

eiy − e−2R−iydy

)

= 0

since the integration over y gives a finite value (the integrand is finite inmagnitude). Similarly for I ′

R along the vertical path z = −R+iy, 2π ≥ y ≥ 0we find

2eiλ(−R+iy)

(e−R+iy − eR−iy)=

2e−λye−iλR

eR(e−2R+iy − e−iy)⇒

limR→∞

(I ′R) = lim

R→∞

(

− 1

eR

∫ 2π

0

2ie−λye−iλR

e−2R+iy − e−iydy

)

= 0

because again the integration over y gives a finite value. From the residuetheorem we have:

I + I ′ + limR→∞

(IR) + limR→∞

(I ′R) + lim

ǫa→0(Ia) + lim

ǫb→0(Ib) =

I − e−2πaI + limǫa→0

(Ia) + limǫb→0

(Ib) = 2πi(Residue at z = πi)

the singularity at z = πi being the only one inside the closed contour, andsince the poles at z = 0, πi, 2πi are simple poles (see Problem 3), we obtain:

limǫa→0

(Ia) = (−1)πi (Residue at z = 0) = −πi

limǫb→0

(Ib) = (−1)πi (Residue at z = 2πi) = −πie−2λπ

(Residue at z = πi) = −e−λπ

which, when substituted in the previous equation give for the value of theoriginal real integral

I0 = I[I] = π

(

1 − 1

sinh(λπ)

)


4.5 Problems

1. Evaluate the real integral in Eq. (4.26), using the contour C2 labeledPath 2 in Fig. 4.9, and make sure you get the same result as in Eq.(4.27). Pay attention to the integrations along the two semicircleswith vanishing radii centered at z = 1 and at z = exp(iπ/4).

2. Consider a function g(z) which is analytic everywhere on the complexplane and has at least one root at z = z0 : g(z0) = 0. Show that ifit is a simple root, that is, if the function can be written as

g(z) = (z − z0)g1(z), g1(z0) 6= 0

then the function f(z) = 1/g(z) has a simple pole at z = z0. Whatis the relation of g1(z) to g(z) near z0? Show that if the root of g(z)is of order m, that is, if the function can be written as

g(z) = (z − z0)mgm(z), gm(z0) 6= 0

then the function f(z) = 1/g(z) has a pole of order m at z = z0.What is the relation of gm(z) to g(z) near z0?

3. Find all the poles of the following functions

f1(z) =1

sinh(z), f2(z) =

1

tanh(z), f3(z) =

1

sin(z), f4(z) =

1

tan(z)

and show that they are all simple poles (Hint: use the results ofProblem 2, as they apply to the neighborhood of the denominatorroots).

4. Prove Jordan’s Lemma for the integral defined in Eq. (4.30) withλ < 0 and the integration path C lying on the lower half of thecomplex z plane, that is, z = R exp(iθ), π ≤ θ ≤ 2π.

5. Using the same procedure as for the evaluation of the integral in Eq.(4.31), evaluate the real integral

I =∫ ∞

−∞

cos(x)

x2 + p2dx, p > 0

Alternatively, use the result found for the integral in Eq. (4.31) witha = ip; do the two answers agree?

Chapter 5

Fourier analysis

5.1 Fourier expansions

5.1.1 Real Fourier expansions

A very important representation of functions is in terms of the trigonomet-ric functions. We consider a function f(x) of the real variable x and itsrepresentation as a series expansion in terms of the functions cos(nx) andsin(nx) with n: integer.

f(x) = a0 +∞∑

n=1

an cos(nx) +∞∑

n=1

bn sin(nx) (5.1)

where an and bn are real constants. We have explicitly separated out thecosine term with n = 0, which is a constant a0 (there is no such sine term,because sin(0) = 0). The expression of Eq.(5.1) is referred to as the “realFourier series representation” or “real Fourier expansion”. Note that weonly need positive values for the indices n, since cos(−nx) = cos(nx) andsin(−nx) = − sin(nx); the sine and cosine functions with negative indicesare the same or involve at most a sign change (in the case of the sine), thusthey do not provide any additional flexibility in represneting the generalfunction f(x). The expression on the right-hand side of Eq. (5.1) impliesthat the function f(x) is periodic:

f(x + 2π) = f(x)

because each term in the expansion obeys this periodicity.Our first task is to determine the values of the coefficients an and bn

that appear in the real Fourier expansion. In order to do this, we multiply

117

118 CHAPTER 5. FOURIER ANALYSIS

both sides of Eq. (5.1) by cos(nx) or by sin(nx) and integrate over x from−π to π, which produces the following terms on the right-hand side:

cos(nx) cos(mx) =1

2cos(nx − mx) +

1

2cos(nx + mx)

sin(nx) sin(mx) =1

2cos(nx − mx) − 1

2cos(nx + mx)

sin(nx) cos(mx) =1

2sin(nx − mx) +

1

2sin(nx + mx)

where we have used the trigonometric identities

cos(a ± b) = cos(a) cos(b) ∓ sin(a) sin(b)

sin(a ± b) = sin(a) cos(b) ± cos(a) sin(b)

to rewrite the products of two cosines, two sines and a sine and cosine. Forn 6= m, we can easily integrate these terms over −π ≤ x ≤ π which all givezeros because of the periodicity of the sine and cosine functions:

∫ π

−πcos(nx) cos(mx)dx =

1

2

[

sin(nx − mx)

n − m+

sin(nx + mx)

n + m

]π

−π

= 0

∫ π

−πsin(nx) sin(mx)dx =

1

2

[

sin(nx − mx)

n − m− sin(nx + mx)

n + m

]π

−π

= 0

∫ π

−πsin(nx) cos(mx)dx = −1

2

[

cos(nx − mx)

n − m+

cos(nx + mx)

n + m

]π

−π

= 0

The only non-vanishing terms are those for n = m:

∫ π

−πcos2(nx)dx =

∫ π

−πsin2(nx)dx =

1

2

∫ π

−π

[

sin2(nx) + cos2(nx)]

dx = π

(5.2)and for n = 0 the integral is trivial and concerns only the cosine term,producing 2π. Collecting these results, we arrive

a0 =1

2π

∫ π

−πf(x)dx (5.3)

an =1

π

∫ π

−πf(x) cos(nx)dx (5.4)

bn =1

π

∫ π

−πf(x) sin(nx)dx (5.5)

These are known as the “Euler formulae”.

5.1. FOURIER EXPANSIONS 119

The relations we derived above between sines and cosines with differentindices n, m, can be rewritten in more compact form with a change innotation. We define the functions cn(x) and sn(x) through the relations

cn(x) =1√π

cos(nx), sn(x) =1√π

sin(nx) (5.6)

in terms of which the relations between sines of different indices n and mtake the form:

∫ π

−πcn(x)cm(x) dx = δnm,

∫ π

−πsn(x)sm(x) dx = δnm (5.7)∫ π

−πcn(x)sm(x) dx = 0

The sets of functions that satisfy such relations are called “orthonormal”:they are “orthogonal” upon integration over the argument x from −π to π,that is, the integral vanishes unless we have only one function multiplyingitself in the integrand; moreover, the integral is unity when it does notvanish, that is, the functions are properly “normalized”. There are manysets of orthonormal functions, some involving various types of polynomi-als. These sets are extremely useful as bases for expansion of arbitraryfunctions. Here we will only deal with the sines and cosines and linearcombinations of them, as in the complex exponential. We next give someexamples of real Fourier series expansions.

Example 5.1: Consider the function that describes a so-called square wave:

f(x) = −k, −π ≤ x ≤ 0

= +k, 0 ≤ x ≤ π (5.8)

with k > 0 a real constant, which we want to express as a real Fourier ex-pansion. From the Euler formulae we can determine the coefficients of thisexpansion, which turn out to be:

an = 0, for all n; bn =4k

πn, n = odd; bn = 0, n = even

With these coefficients, we can then express the orginal function as a Fourierexpansion:

f(x) =4k

π

[

sin(x) +1

3sin(3x) +

1

5sin(5x) +

1

7sin(7x) + · · ·

]


A useful application of this expression is to evaluate both sides at x = π/2;from f(x = π/2) = k obtain:

1 − 1

3+

1

5− 1

7+ · · · =

∞∑

n=0

(−1)n 1

2n + 1=

π

4

Example 5.2: As a second example, consider the function which descibesthe so-called triangular wave:

f(x) = −kx

π, −π ≤ x ≤ 0;

= +kx

π, 0 ≤ x ≤ π (5.9)

with k a real constant. From the Euler formulae we determine coefficients forthe Fourier expansion of this function:

a0 =k

2; an = − 4k

π2n2, n = odd; an = 0, n = even; bn = 0, for all n

From these we reconstruct f(x) as a Fourier expansion

f(x) =k

2− 4k

π2

[

cos(x) +1

9cos(3x) +

1

25cos(5x) +

1

49cos(7x) + · · ·

]

A useful application of this expression is to evaluate both sides at x = π; fromf(x = π) = k obtain:

1 +1

9+

1

25+

1

49+ · · · =

∞∑

n=0

1

(2n + 1)2=

π2

8

It is instructive to consider the expansions in the above two examplesin some more detail. Both expressions are approximations to the originalfunctions which become better the more terms we include in the expansion.This is shown in Fig. 5.1, where we present the original functions andtheir expansions with the lowest one, two, three and four terms. Moreover,the terms that appear in the expansions have the same symmetry as thefunctions themselves. Specifically, for the square wave which is an oddfunction of x, f(−x) = −f(x), only the sine terms appear in the expansion,which are also odd functions of x: sin(−nx) = − sin(nx). Similarly, for


−9.5 0 9.5−2

0

2

4

6

8

10

−9.5 0 9.50

1

2

3

4

5

Figure 5.1: Fourier expansion representation of the square wave (left) and thetriangular wave (right): the bottom curve is the original wave, the next fourcurves are representations keeping the lowest one, two, three and four terms inthe expansion. The different curves are displaced in the vertical axis for clarity.In both cases, we have chosen the constant k = 1 in Eqs. (5.8) and (5.9).

the triangular wave which is an even function fo x, f(−x) = f(x), onlythe cosine terms appear in the expansion, which are also even functionsof x: cos(−nx) = cos(nx). If the function f(x) has no definite symmetryfor x → −x, then the Fourier expansion will contain both types of terms(sines and cosines).

5.1.2 Fourier expansions of arbitrary range

In the preceding discussion, we assumed that the functions under consid-eration were defined in the interval [−π, π] and were periodic with a periodof 2π (as the Fourier expansion requires). The limitation of the valuesof x in this interval is not crucial. We can consider a periodic functionf(x) defined in an interval of arbitrary length 2L, with x ∈ [−L, L] andf(x + 2L) = f(x). Then, with the following change of variables

x′ =π

Lx ⇒ x =

L

πx′

and by rewriting the function as f(x) = g(x′), we see that g(x′) has period2π in the variable x′ which takes values in the interval [−π, π]. Therefore,we can apply the Euler formulae to find the Fourier expansion of g(x′) and


from this the corresponding expansion of f(x). The result, after we changevariables back to x, is given by:

f(x) = a0 +∞∑

n=1

an cos(

πn

Lx)

+∞∑

n=1

bn sin(

πn

Lx)

(5.10)

a0 =1

2π

∫ π

−πg(x′)dx′ =

1

2L

∫ L

−Lf(x)dx

an =1

π

∫ π

−πg(x′) cos(nx′)dx′ =

1

L

∫ L

−Lf(x) cos

(

πn

Lx)

dx

bn =1

π

∫ π

−πg(x′) sin(nx′)dx′ =

1

L

∫ L

−Lf(x) sin

(

πn

Lx)

dx

For functions that have definite symmetry for x → −x, we can againreduce the computation to finding only the coefficients of terms with thesame symmetry. Specifically, for odd functions, f−(−x) = −f−(x), onlythe sine terms survive, whereas for even functions, f+(−x) = f+(x), onlythe cosine terms survive, and the coefficients can be expressed as:

an =2

L

∫ L

0f+(x) cos

(

πn

Lx)

dx, bn = 0 for all n

bn =2

L

∫ L

0f−(x) sin

(

πn

Lx)

dx, an = 0 for all n

Often, the function of interest is defined only in half of the usual range.For example, we might be given a function f(x) defined only in the interval[0, π]. In this case, we cannot readily apply the expressions of the Euler for-mulae to obtain the coefficients of the Fourier expansion. What is usuallydone, is to assume either a cosine series expansion for the function in theinterval [−π, π], or a sine series expansion in the same interval. The firstchoice produces an even function while the second choice produces an oddfunction in the interval [−π, π]. Since both expansions approximate theoriginal function accurately in the interval [0, π] (assuming enough termsin the expansion have been retained), it does not matter which choice wemake. However, the behavior of the function may make it clear which is thepreferred expansion choice, which will also converge faster to the desiredresul.

5.1.3 Complex Fourier expansions

We can generalize the notion of the Fourier expansion to complex numbersby using the complex epxonentials exp(inx) as the basis set, instead of


the sines and cosines. With this complex notation we obtain the complexFourier series expansion:

f(x) =∞∑

−∞cneinx (5.11)

where cn are complex constants. The values of these coefficients can beobtained from the Euler formula for the complex exponential

eix = cos(x) + i sin(x)

and the Euler formulae for coefficeints that appear in the real Fourier seriesexpansion. Alternatively, we can multiply both sides of Eq. (5.11) by e−imx

and integrate over x in [−π, π], using:∫ π

−πei(n−m)xdx = 2πδnm

that is, the fact that the complex exponentials with different indices n andm are orthogonal. The coefficient cn of exp(inx) is given by:

cn =1

2π

∫ π

−πf(x)e−inxdx (5.12)

We discuss next a useful relation that applies to the Fourier expansionof the convolution of two periodic functions, f(x) and g(x). Let us assumethat the Fourier expansions of these functions are known,

f(x) =∞∑

n=−∞cne

inx g(x) =∞∑

m=−∞dmeimx

that is, the coefficients cn and dm have been determined. Then the convo-lution of the two functions will be given by

f ⋆ g(x) ≡ 1

2π

∫ π

−πf(t)g(x − t)dt =

∞∑

n=−∞

∞∑

m=−∞

1

2πcndm

∫ π

−πeinteim(x−t)dt =

∞∑

n=−∞

∞∑

m=−∞

1

2πcndmeimx

∫ π

−πei(n−m)tdt =

∞∑

n=−∞

∞∑

m=−∞

1

2πcndmeimx(2πδnm) =

∞∑

n=−∞cndne

inx

From the last expression we conclude that the Fourier expansion of the

convolution of two periodic functions is given by the term-by-term product

of the Fourier expansions of the functions.


5.1.4 Error analysis in series expansions

As we have mentioned several times before, using series expansions to rep-resent a function is useful because we have to deal with simpler functions,those that consitute the basis for the expansion. However, the represen-tation of the original function in terms of a series expansion is exact onlyif we include an infinite number of terms in the expansion, which is notpossible from a practical point. Therefore, the hope is that we can truncatethe series to a finite number of terms and still get a reasonable representa-tion. When we do this truncation, we inevitably introduce an error in thefunction, as it is represented by the finite number of terms that we haveretained in the series. It is useful to have a measure of this error, typicallyas a function of the number of terms retained, which we will denote by n.Let us assume that we have an infinite series representation of the functionf(x)

f(x) =∞∑

j=1

cjpj(x)

where pj(x) is a set of orthogonal functions in the interval [xi, xf ]:∫ xf

xi

pj(x)pk(x)dx = δjkλj, λj =∫ xf

xi

p2j (x)dx (5.13)

We define

sn(x) =n∑

j=1

cjpj(x)

which is the approximation of f(x) by a truncated series, containing nterms. A convenient measure of the error introduced by the truncation isthe square of the integral of f(x) − sn(x):

E(n) =∫ xf

xi

[f(x) − sn(x)]2 dx =∫ xf

xi

∞∑

j=n+1

cjpj(x)

2

dx

=∞∑

j=n+1

c2j

∫ xf

xi

p2j(x)dx +

∞∑

j 6=k=n+1

cjck

∫ xf

xi

pj(x)pk(x)dx

=∞∑

j=n+1

c2jλj

where the last equation is obtained from the orthogonality of the functionspj(x), Eq. (5.13). By considering cj and λj as functions of j, the error canbe bound by

E(n) ≤∫ ∞

nc2(j)λ(j)dj

5.2. APPLICATION TO DIFFERENTIAL EQUATIONS 125

5.2 Application to differential equations

5.2.1 Diffusion equation

The diffusion equation is typically used to describe how a physical quantityspreads with time and space. Consider T (x, t) to be the temperature dis-tribution in a physical system. The behavior of this quantity is governedby the diffusion equation

κ∂2T

∂x2=

∂T

∂t(5.14)

where κ is the thermal diffusivity coefficient. Assume that the tempera-ture distribution is given at t = 0, and that the boundary conditions are:T (0, t) = T (L, t) = 0 for all t.The usual way of solving differential equations is by separation of variables,that is, by assuming that

T (x, t) = f(x)g(t)

We substitute this expression in the differential equation that T (x, y) obeysto obtain:

κf ′′(x)g(t) = f(x)g′(t) ⇒ κf ′′(x)

f(x)=

g′(t)

g(t)

In the last expression, a function of x is equal to a function of t whichis possible only if they are both equal to a constant, which we call −c,assuming for the moment that c > 0. From this we obtain:

f ′′(x) = − c

κf(x) ⇒ f(x) = f0e

±ix√

c/κ

g′(t) = −cg(t) ⇒ g(t) = g0e−tc

In order to the satisfy boundary condition at t = 0, we must choose the

linear combination of e±ix√

c/κ functions which vanishes at x = 0: theproper choice is

e+ix√

c/κ − e−ix√

c/κ = 2i sin(x√

c/k)

In order to satisfy the boundary condition at x = L we must have

sin(L√

c/κ) = 0 ⇒ L√

c/κ = nπ ⇒ c = n2π2κ/L2

With this we obtain:

Tn(x, t) = T (0)n sin

(

nπ

Lx)

e−κn2π2t/L2


and the most general solution is a superposition of these terms:

T (x, t) =∞∑

n=1

T (0)n e−κn2π2t/L2

sin(

nπ

Lx)

Alternatively, we can use the Fourier expansion for the temperaturedistribution:

T (x, t) =∞∑

n=1

bn(t) sin(

nπ

Lx)

in order to solve the diffusion equation. Our choice of the expansion au-tomatically satisfies the boundary conditions T (0, 0) = T (L, 0) = 0. Thesolution proceeds as follows: we substitute the Fourier expansion in thedifferential equation:

κ∞∑

n=1

bn(t)(−n2π2

L2) sin

(

nπ

Lx)

=∞∑

n=1

dbn(t)

dtsin

(

nπ

Lx)

Equating term-by-term the two sides of the equation we obtain:

dbn(t)

dt= κbn(t)(−n2π2

L2) ⇒ bn(t) = e−κn2π2t/L2

bn(0)

which gives the solution to the initial equation:

T (x, t) =∞∑

n=1

bn(0)e−κn2π2t/L2

sin(

nπ

Lx)

and bn(0) are obtained from T (x, 0) (which is assumed given) by using Eulerformulae. This is identical to the expression obtained from usual approachthrough separation of variables, with T (0)

n = bn(0). The advantage of usingthe Fourier expansion was that the boundary conditions are automaticallysatisfied by proper choice of expansion functions.

5.2.2 Poisson equation

Teh Poisson equation is typically used to determine a multi-dimensionalpotential function Φ which arises form a source function with the samedimensionality. For example, consider two-dimensional functions in (x, y):

(

∂2

∂x2+

∂2

∂y2

)

Φ(x, y) = −4πρ(x, y) (5.15)

5.3. THE δ-FUNCTION AND THE θ-FUNCTION 127

To find the solution for the potential Φ(x, y) for a given source functionρ(x, y), we use two-dimensional Fourier expansions for Φ(x, y) and ρ(x, y)which are defined in 0 ≤ x ≤ a and 0 ≤ y ≤ b:

Φ(x, y) =∞∑

n=−∞

∞∑

m=−∞cn,meinπx/aeimπy/b

cn,m =1

4ab

∫ a

−a

∫ b

−bΦ(x, y)e−inπx/ae−imπy/bdxdy

The Poisson equation can then be solved by equating coefficients term-by-term on the two sides of the equation and obtaining the coefficients of Φfrom those of ρ, which is presumed known. Specifically:

ρ(x, y) =∞∑

n=−∞

∞∑

m=−∞dn,meinπx/aeimπy/b

with the coefficients dn,m given by

dn,m =1

4ab

∫ a

−a

∫ b

−bρ(x, y)e−inπx/ae−imπy/bdxdy

and considered known, since ρ(x, y) is known. Then(

∂2

∂x2+

∂2

∂y2

)

Φ(x, y) =∞∑

n=−∞

∞∑

m=−∞cn,m

(

−n2π2

a2− m2π2

b2

)

einπx/aeimπy/b

= −4πρ(x, y)

and comparing the two sides term by term we obtain

⇒ cn,m = dn,m4π

(

n2π2

a2+

m2π2

b2

)−1

which gives the solution for Φ(x, y).

5.3 The δ-function and the θ-function

5.3.1 Definitions

A very useful, if somewhat unconventional, function is the so called δ-function, also known as the Dirac function. Its definition is given by therelations::

δ(0) → ∞, δ(x 6= 0) = 0,∫ ∞

−∞δ(x − x′)dx′ = 1 (5.16)


δθheight ~ 1/a

width ~ a

(x)(x)

xx0

1

0

Figure 5.2: Definition of the θ-function and the δ-function (a → 0).

that is, it is a function with an infinite peak at the zero of its argument,it is zero everywhere else, and it integrates to unity. The behavior of theδ-function is shown schematically in Fig. 5.2.

A function closely related to the δ-function is the so called θ-functionor step-function, also known as the Heavyside function:

θ(x − x′) = 0, for x < x′, θ(x − x′) = 1, for x > x′ (5.17)

The behavior of the θ-function is shown schematically in Fig. 5.2.From its definition, it follows that the product of the δ-function with

an arbitrary function f(x) integrated over all values of x must satisfy:∫ ∞

−∞f(x′)δ(x − x′)dx′ = f(x) (5.18)

This expression can also be thought of as the convolution of the functionf(x) with the δ-function, which gives back the function f(x). Thus, theδ-function picks out one value of another function f(x) when the productof the two is integrated over the entire range of values of f(x). The valueof f(x) that is picked by this operation corresponds to the value of x forwhich the argument of the δ-function vanishes. The θ-function picks up aset of values of the function f(x) when the product of the two is integratedover the entire range of values of f(x). The set of of values of f(x) thatis picked by this operation corresponds to the values of x for which theargument of the θ-function is positive.

The δ-function is not a function in the usual sense, it is a function rep-resented by a limit of usual functions. For example, a simple generalizationof the Kronecker δ is:

w(a; x) =1

2a, for |x| ≤ a

5.4. THE FOURIER TRANSFORM 129

xx0 0

y

a-a

1/2ay

1/b

b-b

Figure 5.3: Examples of simple functions which in the proper limit become δ-functions: the window function w(a;x) defined in Eq. (5.19) and the triangularspike function, defined in Eq. (5.20).

= 0, for |x| > a (5.19)

We will refer to this as the “window” function, since its product with anyother function picks out the values of that function in the range −a ≤ x ≤a, that is, in a window of width 2a in x around zero (the window can bemoved to any other point x0 on the x-axis by replacing x by x − x0 inthe above relqations). Taking the limit of w(a; x) for a → 0 produces afunction with the desired behavior to represent the δ-function. Anothersimple representation of the δ-function is given by the triangular spikefunction:

s(a; x) =1

b

[

1 − |x|b

]

, for |x| ≤ b

= 0, for |x| > b (5.20)

which for b → 0 produces a δ-function. The window function and thetriangular spike function are shown in Fig. 5.3. Both of these functionsare continuous functions of x but their first derivatives are discontinuous.

5.4 The Fourier transform

5.4.1 Definition of Fourier Transform

The Fourier transform is the limit of the Fourier expansion for non-periodicfunctions that are defined in interval −∞ < x < ∞. We need to make the


following assumption in order to make sure that the Fourier transform isa mathematically meaningful expression:

∫ ∞

−∞|f(x)|dx ≤ M

where M is a finite number.

To derive the Fourier transform, we start with the Fourier expansion:

f(x) =∞∑

n=−∞cneinπx/L cn =

1

2L

∫ L

−Lf(x)e−inπx/Ldx

Define a variable ω = nπ/L. Note that, when L → ∞ the spacing ofvalues of ω becomes infinitesimal. From the Fourier expansion coefficientswe obtain:

limL→∞

(cn2L) =∫ ∞

−∞f(x)e−iωxdx ≡ f(ω)

where f(ω) is a finite quantity, because:

∣

∣

∣

∣

∫ ∞

−∞f(x)e−iωxdx

∣

∣

∣

∣

≤∫ ∞

−∞|f(x)|dx ≤ M

From the expression for f(x) in terms of the Fourier expansion, weobtain:

f(x) =∞∑

n=−∞cneinπx/L =

∞∑

n=−∞(2Lcn)einπx/L 1

2L=

∞∑

n=−∞f(ω)eiωx 1

2L

The spacing of values of the new variable ω that we introduced above is:

∆ω =π

L⇒ 1

2L=

∆ω

2π⇒ lim

L→∞

(

1

2L

)

=dω

2π

and therefore, the infinite sum can be turned into an integral in the con-tinuous variable ω:

f(x) =1

2π

∫ ∞

−∞f(ω)eiωxdω

5.4.2 Properties of the Fourier transform

We discuss several properties of the FT, which can be easily derived fromits definition:


1. Linearity: If f(ω) and g(ω) are the FT’s of the functions f(x) andg(x), then the FT of the linear combination of f(x) and g(x) withtwo arbitrary constants a, b, is:

af(x) + bg(x) −→ af(ω) + bg(ω) (5.21)

2. Shifting: Given that f(ω) is the FT of f(x), we can obtain the FTof f(x − x0):∫ ∞

−∞f(x − x0)e

−iωxdx =∫ ∞

−∞f(x − x0)e

−iω(x−x0)d(x − x0)e−iωx0

= e−iωx0 f(ω) (5.22)

Similarly, the FT of eiω0xf(x) is:∫ ∞

−∞f(x)eiω0xe−iωxdx =

∫ ∞

−∞f(x)e−i(ω−ω0)xdx

= f(ω − ω0) (5.23)

3. Derivative: Given that f(ω) is the FT of f(x), we can obtain theFT of f ′(x), assuming that f(x) → 0 for x → ±∞:

∫ ∞

−∞f ′(x)e−iωxdx =

[

f(x)e−iωx]∞

−∞−∫ ∞

−∞f(x)(−iω)e−iωxdx

= iωf(ω) (5.24)

Similarly, the FT of f (n)(x) is given by (iω)nf(ω).

4. Monents: The nth moment of the function f(x) is defined as:∫ ∞

−∞xnf(x)dx

From the definition of the FT we see: 0th moment = f(0). We takethe derivative of the FT with respect to ω:

df

dω(ω) =

∫ ∞

−∞f(x)(−ix)e−iωxdx ⇒

idf

dω(0) =

∫ ∞

−∞xf(x)dx (5.25)

This is easily generalized to the expression for the nth moment:

∫ ∞

−∞xnf(x)dx = in

dnf

dωn(0) (5.26)


5. Convolution: Consider two functions f(x), g(x) and their FT’sf(ω), g(ω). The convolution f ⋆ g(x) is defined as:

f ⋆ g(x) ≡∫ ∞

−∞f(t)g(x − t)dt = h(x) (5.27)

The Fourier transform of the convolution is:

h(ω) =∫ ∞

−∞h(x)e−iωxdx =

∫ ∞

−∞dxe−iωx

[∫ ∞

−∞f(t)g(x− t)dt

]

=∫ ∞

−∞

[∫ ∞

−∞dxe−iω(x−t)g(x− t)

]

e−iωtf(t)dt

= f(ω)g(ω) (5.28)

6. Correlation: Consider two functions f(x), g(x) and their FT’s f(ω), g(ω).The correlation c[f, g](x) is defined as:

c[f, g](x) ≡∫ ∞

−∞f(t + x)g(t)dt =

∫ ∞

−∞f(t)g(t − x)dt (5.29)

The Fourier transform of the correlation is:

c(ω) =∫ ∞

−∞c[f, g](x)e−iωxdx =

∫ ∞

−∞e−iωx

[∫ ∞

−∞f(t + x)g(t)dt

]

dx

=∫ ∞

−∞

[∫ ∞

−∞e−iω(t+x)f(t + x)dx

]

e−i(−ω)tg(t)dt

= f(ω)g(−ω) (5.30)

7. FT of δ-function: To derive the Fourier transform of the δ-functionwe start with its Fourier expansion representation as in Eq.(5.11):

δ(x) =∞∑

n=−∞dne

i nπL

x ⇒ dn =1

2L

∫ L

−Lδ(x)e−i nπ

Lxdx =

1

2L(5.31)

which in the limit L → ∞ produces:

δ(ω) = limL→∞

(2Ldn) = 1 (5.32)

Therefore, the Inverse Fourier transform of the δ-function gives:

δ(x − x′) =1

2π

∫ ∞

−∞δ(ω)eiω(x−x′)dω =

1

2π

∫ ∞

−∞eiω(x−x′)dω (5.33)


We give below several examples of Fourier transforms.

Example 5.3 : Consider the function f(x) defined as

f(x) = e−ax, x > 0

= 0, x < 0

The FT of this function is

f(ω) =∫ ∞

0e−iωx−axdx =

1

i(ω − ia)

which is easily established by the regular rules of integration. The inverse FTgives:

f(x) =1

2π

∫ ∞

−∞

eiωx

i(ω − ia)dω

To perform this integral we use contour integration: For x > 0, we close thecontour on the upper half plane to satisfy Jordan’s Lemma,

f(x) =1

2π2πi(Residue at ω = ia) = e−ax

For x < 0, we close the contour on the lower half plane to satisfy Jordan’sLemma,

f(x) = 0 (no residues in contour)

Example 5.4: Consider the function f(x) defined as

f(x) = 1, for − 1 < x < 1;

= 0 otherwise

The FT of this function is

f(ω) =∫ 1

−1e−iωxdx =

1

−iω(e−iω − eiω) =

2 sinω

ω

The inverse FT gives:

f(x) =1

2π

∫ ∞

−∞

[

eiω

iω− e−iω

iω

]

eiωxdω =1

2πi

[

∫ ∞

−∞

eiω(x+1)

ωdω −

∫ ∞

−∞

eiω(x−1)

ωdω

]


The last two integrals can be done by contour integration:

I1 = πi, for x + 1 > 0;

= −πi, for x + 1 < 0

I2 = πi, for x − 1 > 0;

= −πi, for x − 1 < 0

Combining the results, we find:

−∞ < x < −1 : both x + 1 < 0 and x − 1 < 0 → f(x) = 0

−1 < x < 1 : x + 1 > 0 and x − 1 < 0 → f(x) = 1

1 < x < ∞ : both x + 1 > 0 and x − 1 > 0 → f(x) = 0

Example 5.5: Consider the function defined as

f(x) = 1, for − 1 < x < 1,

= 0 otherwise (5.34)

and calculate its moments using FT:

f(ω) =2 sinω

ω=

2

ω

[

ω − ω3

6+

ω5

120− · · ·

]

= 2 − 1

3ω2 +

1

60ω4 − · · ·

From this we obtain:

f(0) = 2, idf

dω(0) = 0, i2

d2f

dω2(0) =

2

3, i3

d3f

dω3(0) = 0, i4

d4f

dω4(0) =

2

5, · · ·

which give the 0th, 1st, 2nd, 3rd, 4th, ... moments of f(x). In this case the resultscan be easily verified by evaluating

∫ ∞

−∞f(x)xndx =

∫ 1

−1xndx =

2

n + 1

for n = even, and 0 for n = odd.

Example 5.6: Consider the function f(x) which has a discontinuity ∆f atx0:

∆f = limǫ→0

[f(x0 + ǫ) − f(x0 − ǫ)]


We want to calculate the FT of this function:

f(ω) =∫ ∞

−∞e−iωxf(x)dx = lim

ǫ→0

[∫ x0−ǫ

−∞e−iωxf(x)dx +

∫ ∞

x0+ǫe−iωxf(x)dx

]

=

= limǫ→0

[

e−iωx

−iωf(x)

]x0−ǫ

−∞+

[

e−iωx

−iωf(x)

]∞

x0+ǫ

−∫ ∞

−∞

e−iωx

−iωf ′(x)dx

=e−iωx0

iω∆f +

1

iω

∫ ∞

−∞e−iωxf ′(x)dx

where we have assumed that f(x) → 0 for x → ±∞, which is required for thethe FT to be meaningful. Simlarly, a discontinuity ∆f (n) in the nth derivativeproduces the following FT:

e−iωx0

(iω)n+1∆f (n)

Example 5.7: Calculate the FT of the gaussian function:

f(x) =1

a√

2πe−x2/2a2 → f(ω) =

1

a√

2π

∫ ∞

−∞e−x2/2a2−iωxdx

Notice that we are dealing with a properly normalized gaussian function. Com-plete the square in the exponential:

x2

2a2+ iωx =

(

x

a√

2

)2

+ 2

(

x

a√

2

)(

iωa√2

)

− ω2a2

2+

ω2a2

2

=

(

x

a√

2+

iωa√2

)2

+ω2a2

2

which when inserted in the integral, with the change of variables: z = x/a√

2+iωa/

√2 ⇒ dz = d(x/a

√2), gives:

f(ω) =

[

1√π

∫ x=∞

x=−∞e−z2

dz

]

e−ω2a2/2 = e−ω2a2/2

where we have used the fact that∫ x=∞

x=−∞e−z2

dz =√

π, z = x + iω


as we showed in chapter 1 1. Therefore the FT of a gaussian is a gaussian.Notice that for a → 0 the gaussian in x becomes a δ-function, and its FTbecomes 1 for all ω. The inverse FT also involves a gaussian integral which isagain performed by completing the square.

Example 5.8: Which function gives

f(ω) =1

iω

as its Fourier transform?

f(x) =1

2π

∫ ∞

−∞dω

eiωx

iω

This is evaluated by contour integration, closing in the upper half plane forx > 0, which gives f(x) = 1/2, and in the lower half plane for x < 0, whichgives f(x) = −1/2. The function f(x) then is: f(x) = 1/2, for x > 0;f(x) = −1/2, for x < 0.This is just like the Heavyside step function, shifted down by 1/2.

Example 5.9: Which function gives g(ω) = πδ(ω) as its Fourier transform?Here we define the δ-function so that:

∫ ∞

−∞δ(x)dx = 1 ⇒

∫ ∞

−∞δ(x)f(x)dx = f(0)

g(x) =1

2π

∫ ∞

−∞dωeiωxπδ(ω) =

1

2

Using the linearity of the FT, we find that the function whose FT is

g(ω) + f(ω) = πδ(ω) +1

iω

is the sum of the function g(x) = 1/2 for all x, and the function f(x) ofthe previous example, which gives the following function: 1, for x > 0; 0, forx < 0, that is the Heavyside step function.

Example 5.10: From the shifting properties of the FT and Example 5.9, wededuce that the function whose FT is

f(ω) = πδ(ω − ω0) + πδ(ω + ω0)

1Notice that this result was proven for a real variable x. However, it is easy to extendit to complex variable z = x+ iω by the same trick we used to prove the original result,and simply shifting the origin of both axes by −iω.


is given by:1

2eiω0x +

1

2e−iω0x = cos(ω0x)

The function whose FT is

f(ω) = −iπδ(ω − ω0) + iπδ(ω + ω0)

is given by:

−i1

2eiω0x + i

1

2e−iω0x = sin(ω0x)

5.4.3 Singular transforms - limiting functions

We have seen in the previous section that

f(ω) =1

iω(5.35)

is the Fourier transform of the function

f(x) =1

2, for x > 0 (5.36)

= −1

2, for x < 0 (5.37)

This FT is singular, i.e. it blows up for ω → 0, which contradicts ourearlier assumption that f(ω) is bounded. The reason for the contradictionis that f(x) is discontinuous:

f(x) → 1

2, x → 0+ f(x) → −1

2, x → 0−

If we attempted to calculate the FT from f(x) we would take:

f(ω) =∫ ∞

−∞e−iωxf(x)dx = −1

2

∫ 0−

−∞e−iωxdx +

1

2

∫ ∞

0+

e−iωxdx

The limits at ±∞ can be evaluated in principal value sense, R → ∞:

f(ω) = −1

2

∫ 0−

−Re−iωxdx +

1

2

∫ R

0+

e−iωxdx =1

2iω

[

1 − eiωR + 1 − e−iωR]

In the last expression, [eiωR + e−iωR] = 2 cos(ωR) vanishes for R → ∞ be-cause the argument of the cosine goes through 2π for infinitesimal changes


height ~ R

width ~ 1/R

sin(wR)/w

0 w

Figure 5.4: The function sin(ωR)/ω which behaves just like a δ(ω) functionfor R → ∞.

in ω, and the average of the cosine function is 0. Then, what is left in theabove equation is exactly the relation of Eq. (5.35).

We have also seen in the previous section that

f(ω) = πδ(ω) (5.38)

is the Fourier transform of

f(x) =1

2, for all x

This FT is singular, i.e. it blows up for ω → 0, which contradicts ourearlier assumption that f(ω) is bounded. The reason for the contradictionis that the integral of f(x) is not bounded:

∫ ∞

−∞(1/2)dx → ∞

If we attempted to calculate f(ω) from f(x) we would have:

1

2

∫ ∞

−∞e−iωxdx =

1

−2iωlim

R→∞

[

e−iωx]R

−R= lim

R→∞

sin(ωR)

ω

The function sin(ωR)/(ωR) is 1 at ω = 0, falls off for ω → ∞, and haswidth π/R, obtained from the first zero of sin(ωR) which occurs at ωR = π.This behavior is illustrated in Fig. 5.4. Therefore, the function sin(ωR)/ω

5.5. FOURIER ANALYSIS OF SIGNALS 139

has height ∼ R and width ∼ 1/R. Its width goes to 0 and its heigh goesto ∞ when R → ∞, while its integral is:

∫ ∞

−∞lim

R→∞

eiωR − e−iωR

2iωdω = 2π

1

2π

∫ ∞

−∞f(ω)eiω0dω = 2πf(0) = π

Therefore, the function sin(ωR)/ω indeed behaves just like a δ function inthe variable ω for R → ∞, but is normalized so that it gives π rather than1 when integrated over all values of ω, so we can identify

limR→∞

(

sin(ωR)

ω

)

= πδ(ω)

5.5 Fourier analysis of signals

In the following we will consider functions of a real variable and their FT’s.We will assume that the real variable represents time and denote it by t,while the variable that appears in the FT represents frequency, and denoteit by ω 2. Very often, when we are trying to measure a time signal f(t), it isconvenient to measure its frequency content, that is, its Fourier transformf(ω). This is because we can construct equipment that responds accuratelyto specific frequencies. Then, we can reconstruct the original signal f(t)by doing an inverse Fourier transform on the measured signal f(ω) in thefrequency domain:

f(t)FT−→ f(ω)

IFT−→ f(t)

Some useful notions in doing this type of analysis are the total power andthe spectral density. The total power of the signal is defined as:

total power : P =∫ ∞

−∞|f(t)|2dt (5.39)

while the spectral density is defined as

spectral density : |f(ω)|2 (5.40)

A useful relation linking these two notions is Parceval’s theorem

∫ ∞

−∞|f(ω)|2dω = 2πP (5.41)

2The variables t and ω are referred to as “conjugate”, and their physical dimensionsare the inverse of each other, so that their product is dimensionless.


We can prove Parceval’s theorem with the use of the FT of the δ-function,Eq. (5.33). From the definition of the FT of f(t) we have:

f(ω) =∫ ∞

−∞e−iωtf(t)dt ⇒ |f(ω)|2 =

∫ ∞

−∞e−iωtf(t)dt

∫ ∞

−∞eiωtf(t′)dt′

where f is the complex conjugate of f . Integrating the absolute valuesquared of the above expression over all values of ω leads to:

∫ ∞

−∞|f(ω)|2dω =

∫ ∞

−∞

∫ ∞

−∞f(t)f(t′)

[∫ ∞

−∞eiω(t′−t)dω

]

dtdt′ =

2π∫ ∞

−∞

∫ ∞

−∞f(t)f(t′)δ(t − t′)dtdt′ = 2π

∫ ∞

−∞|f(t)|2dt

where we have used Eq. (5.33) to perform the integration over ω in thesquare brackets, which produced a δ function with argument t − t′, andthen performed an integration over one of the two time variables. Theresult is Parcevel’s theorem.

An essential feature of time-signal measurements is their sampling atfinite time intervals. This leads to important complications in determiningthe true signal from the measured signal. To illustrate this problem, sup-pose that we sample a time signal f(t) at the 2N discrete time momentsseparated by a time interval 3, which we will call τ

tn = nτ, n = −N, . . . , 0, . . . , N − 1 (5.42)

that is, the 2N values of f(tn) are what we can obtain from an experiment.Usually, we are interested in the limit of N → ∞ and τ → 0 with 2Nτ =ttot, the total duration of the measurements. Let us define the characteristicfrequency ωc (also called Nyquist frequency)

ωc =π

τ

Then, assume that we are dealing with “band-width limited” signal, thatis, a signal whose FT vanishes for frequencies outside the range |ω| > ωc:

f(ω) = 0 for ω < −ωc and ω > ωc

For such a signal, the so-called “sampling theorem” asserts that we canreconstruct the entire signal using the expression:

f(t) =N−1∑

n=−N

f(tn)sin ωc(t − tn)

ωc(t − tn)(5.43)

3This time interval is often called the “sampling rate”, but this is a misnomer becausea rate has the dimensions of inverse time.


where tn are the moments in time defined in Eq. (5.42). This is a veryuseful expression, because it gives the time signal f(t) for all values of

t (which is a continuous variable), even though we measured it only atthe discrete time moments tn. In other words, the information content ofthe signal can be determined entirely by sampling it at the time momentstn. The reconstruction of the signal f(t) from the measurements f(tn)essentially amounts to an inverse Fourier transform.

However, typically a time signal is not band-width limited, that is, ithas Fourier components for all values of ω. If we are still forced to samplesuch a signal at the same time moments defined above, then performingan inverse Fourier transform will create difficulties. To show this, considertwo functions f1(t) and f2(t) which are described by

f1(t) = eiω1t, f2(t) = eiω2t, ω1 − ω2 = 2ωck, k : integer

that is, they each have a unique frequency content and the two frequenciesdiffer by integer multiple of 2ωc. Then, if these two functions are sampledat the time moments tn, we will have

f1(tn) = eiω1tn = ei(ω2+2ωck)tn = eiω2tn = f2(tn)

because from the definitions of ωc and tn we have:

ei2ωcktn = ei2πkn = 1

The result is that the two functions will appear identical when sampled atthe time moments tn.

We can apply this analysis to the sampling of a signal f(t). If thesignal contains frequencies that differ by integer multiples of 2ωc, then thecontributions of these frequencies will not be differentiated. This effect iscalled “aliasing”. When the FT of the signal is measured by sampling thesignal at intervals τ apart, the components that fall outside the interval−ωc ≤ ω ≤ ωc will appear as corresponding to a frequency within thatinterval, from which they differ by 2ωc. For instance, the contribution ofthe frequency ωc+δω (where δω > 0) will appear at the frequency −ωc+δωbecause theses two frequencies differ by 2ωc. Similarly, the contributionof the frequency −ωc − δω will appear at the frequency ωc − δω becausetheses two frequencies differ by 2ωc. The net result is that the frequencyspectrum outside the limits −ωc ≤ ω ≤ ωc gets mapped to the spectrumwithin these limits, which significantly alters the measured specturm fromthe true one, at least in the regions near ±ωc. Fortunately, typical signals


true signal

c ωc

ω

measured

−ω

ω

measured

true signal

0

∼f( )

Figure 5.5: Illustration of aliasing: the tails for ω < −ωc and ω > ωc getmapped to values within the interval [−ωc, ωc], giving a measured signal differentthan the true signal.

have spectra that fall off with the magnitude of the frequency, so that thealiasing problem is not severe. Moreover, increasing the value of ωc, thatis, reducing the value of the time interval τ at which the signal is measured,tends to eliminate the aliasing problem for most signals by simply pushingωc to values where the signal has essentially died off. Typically, only thepositive portion of the frequency spectrum is measured, in which case thenegative portion appears in the upper half of the spectrum by aliasing:this is equivalent to shifting all frequencies in the range [−ωc, 0] by 2ωc, sothat their values appear in the range [ωc, 2ωc].

Example 5.11: As a practical demonstration, consider the following timesignal which at first sight appears quite noisy:

f(t) = sin(25(2π)t) cos(60(2π)t) sin(155(2π)t) + 1

(the signal is shifted by unity for ease of visualization). This signal is sampledat the 2N = 512 time moments

tn = nτ, n = −256, . . . , 0, . . . , 255, where τ =π

256

In this case, the Nyquist frequency is evidently given by

ωc =π

τ= 256


The Fourier analysis of the signal should reveal its characteristic frequencies.Expressing the signal in terms of complex exponentials, we obtain:

f(t) = 1−1

8

[

(ei25(2π)t − e−i25(2π)t)(ei60(2π)t + e−i60(2π)t)(ei155(2π)t − e−i155(2π)t)]

from which we expect the FT to contain δ-functions at all the possible com-binations ±ω1 ± ω2 ± ω3 of the three frequencies (in units of 2π)

ω1 = 25, ω2 = 60, ω3 = 155

Indeed, as is seen in Fig. 5.6, there are four very prominent peaks in theFourier transform of the signal in the interval [1, 256] 4, which are due to thefour possible combinations of frequencies that appear in the signal, namely:

+ω1 + ω2 + ω3 = +25 + 60 + 155 = 240

−ω1 + ω2 + ω3 = −25 + 60 + 155 = 190

+ω1 − ω2 + ω3 = +25 − 60 + 155 = 120

−ω1 − ω2 + ω3 = −25 − 60 + 155 = 70

The other combinations of the frequencies which have negative values appearin the [257, 512] range by aliasing:

−ω1 − ω2 − ω3 + 512 = 512 − 240 = 272

+ω1 − ω2 − ω3 + 512 = 512 − 190 = 322

−ω1 + ω2 − ω3 + 512 = 512 − 120 = 392

+ω1 + ω2 − ω3 + 512 = 512 − 70 = 442

Notice that all frequencies have equal Fourier components, as expected fromthe form of the signal.

4The additional feature at ω = 0 comes from the uniform shift of the signal, whichappears as a component of zero frequency.


In[23]:= signal =

Table@N@Sin@25 2 Pi n�512D Cos@60 2 Pi n�512D Sin@155 2 Pi n�512D + 1D, 8n, -256, 255<D;ListPlot@signal, PlotJoined ® TrueD

100 200 300 400 500

0.25

0.5

0.75

1

1.25

1.5

In[25]:= ftsig = Fourier@signalD;

In[26]:= ListPlot@Abs@ftsigD, PlotJoined ® True, PlotRange ® 80, 3<D

100 200 300 400 500

0.5

1

1.5

2

2.5

3

Fourier analysis of time signal 1

Figure 5.6: Illustration of Fourier analysis of a time signal by sampling ata finite time interval. Even though the original signal appears very noisy (toppanel), the characteristic frequencies are readily identified by the Fourier trans-form (bottom panel).

Chapter 6

Probabilities and random

numbers

6.1 Probability distributions

We begin the discussion of probability theory by defining a few key con-cepts:

1. Event space: it consists of all possible outcomes of an experiment.For example, if our experiment involves the tossing of a fair die, theevent space cosists of the numbers 1,2,3,4,5,6, which are all the pos-sible outcomes for the top face of the labeled die. If our experimentinvolves flipping a fair coin, the event space is H,T (for “heads” and“tails”), which are all the possible outcomes for the face of the cointhat is up when the coin lands.

2. Probability of event A: it is the probability that event A willoccur. This is equal to

P (A) =m

N(6.1)

where m is the number of outcomes in the event space that correp-sonds to event A and N is the total number of all possible outcomesin the event space. For example, if we define as event A that a fair diegives 1, then P (A) = 1/6; that the die gives an even number, thenP (A) = 3/6 = 0.5; that a fair coin gives heads, then P (A) = 1/2.

3. Permutations: it is the number of ways to arrange n differentthings, taken all at a time. There are n! = n · (n − 1) · (n − 2) · · ·1permutations of n objects. For example, if we label three objects as

145

146 CHAPTER 6. PROBABILITIES AND RANDOM NUMBERS

a, b, c, then the possible permutations are: abc, acb, bac, bca, cab, cba,that is, 3 · 2 · 1 = 6 permutations.

4. Combinations: it is the number of ways of taking k things out ofa group of n things, without specifying their order or identity. Thenumber of combinations of k things out of n are given by:

n · (n − 1) · · · (n − k + 1)

k · (k − 1) · · ·1 =n!

(n − k)!k!(6.2)

This expression is easy to justify: there are n ways of picking thefirst object, n − 1 ways of picking the second, . . . (n − k + 1) waysof picking the kth object our of a total of n. But since we do notcare which ojects we pick, we have to divide the total number by thek possible choices for the first one, by the k − 1 possible choices forthe second one, etc. Notice that these values enter in the binomialexpansion:

(p + q)n =n∑

k=0

n!

(n − k)!k!pkqn−k (6.3)

with p, q real values, that we have encountered several times before(see Eq. (2.14)).

We define a general random variable x, which can be discrete or con-tinuous, and the probability distribution function f(x) that the valuex of the variable occurs. For f(x) to be a proper probabiity distributionfunction, it must be normalized to unity, that is:

N∑

j=0

f(xj) = 1 or∫ +∞

−∞f(x)dx = 1 (6.4)

where the first expression holds for a discrete variable with possible val-ues xj identified by the discrete index j (taking the values 0, . . . , N), andthe second expression holds for a continuous variable. We also define thecumulative probability function F (x) that any value of the variablesmaller than or equal to x occurs, which is given by:

F (x) =∑

j,xj≤x

f(xj) or F (x) =∫ x

−∞f(y)dy (6.5)

for a discrete or a continuous variable. Note that from this definition, weconclude that F (xmax) = 1, where xmax is the maximum value that x cantake (in either the discrete or continuous version).

6.1. PROBABILITY DISTRIBUTIONS 147

We define the median m as the value of the variable x such that thetotal probability of all values of x below m is equal to 1/2:

F (m) =1

2(6.6)

From the definition of F (x) we conclude that the total probability of allvalues of x above m is also 1/2. Thus, the median m splits the range ofvalues of x in two parts, each containing a set of values that have totalprobability 1/2.

The mean value µ and the variance σ of the the probability distri-bution are defined as:

µ =N∑

j=0

xjf(xj) or µ =∫ +∞

−∞xf(x)dx (6.7)

σ2 =N∑

j=0

(xj − µ)2f(xj) or σ2 =∫ +∞

−∞(x − µ)2f(x)dx (6.8)

for the discrete and continuous cases. From the definitions of the meanand the variance, we can easily prove the following relation:

σ2 =N∑

j=0

x2jf(xj) − µ2 or σ2 =

∫ +∞

−∞x2f(x)dx − µ2 (6.9)

that is, the variance squared is equal to the second moment minus the mean

squared.We define the expectation E of a function g(x) of the random variable

as

E(g(X)) =N∑

j=0

g(xj)f(xj) or E(g(X)) =∫ +∞

−∞g(x)f(x)dx (6.10)

for the discrete and continuous cases, respectively. The notation E(g(X))signifies that the expectation involves a function g of the random variable,but the expectation itself is not a function of x, since the values of x aresummed (or integrated) over in the process of determining the expectation.With this definition, the mean and the variance can be expressed as:

µ = E(X) (6.11)

σ2 = E((X − µ)2) (6.12)


6.1.1 Binomial distribution

The simplest case of a probability distribution invloving a single randomvariable is the binomial distribution. Consider an event A that occurs withprobability p. Then the probability that event does not occur is q = 1− p.The probability that the event A in n trials occurs x times (x here is bydefinition an integer, thus we treat it as a discrete variable) is given by:

p · q · · · q · p = pxqn−x

where we assumed that it occured in the first trial, it did not occur onthe second one, . . ., it did not occur on the (n − 1)th trial and it occuredin the nth trial. But this corresponds to a particular sequence of trials.If all we care is to find the probability of x successful outcomes out of ntrials, no matter in what sequence the successes and failures occured, thenwe have to consider all the possible combinations of the above sequence ofoutcomes, which we learned earlier is given by Eq. (6.2). Combining thetwo results, we conclude that the probability distribution we are seeking isgiven by:

fb(x) =n!

(n − x)!x!pxqn−x (6.13)

This is called the binomial distribution for the obvious reason that theseexpressions are exactly the terms in the binomial expansion. Note that thisis a properly normalized probability distribution function, since accordingto Eq. (6.3) we have:

n∑

x=0

fb(x) =n∑

x=0

n!

(n − x)!x!pxqn−x = (p + q)n = 1

because (p + q) = 1.We are interested (as in all cases of probability distributions) in the

mean and variance of the binomial distribution. To this end, we use theexpression of Eq. (6.7) for the discrete case:

µb =n∑

x=0

xn!

(n − x)!x!pxqn−x (6.14)

To calculate the value of µb we turn to Fourier transforms. The Fouriertransform of fb(x) is given by

fb(ω) =n∑

x=0

n!

(n − x)!x!pxqn−xe−iωx =

n∑

x=0

n!

(n − x)!x!

(

pe−iω)x

qn−x

= (pe−iω + q)n (6.15)


where we have used the binomial expansion Eq. (2.14) to obtain the laststep. Since the mean is the first moment of the variable x with respectto the function f(x), as is evident from the definition Eq. (6.7), we canevaluate it from the general expression we derived for the first momentusing FT’s, Eq. (5.25):

µb = i

[

dfb

dω

]

ω=0

= np(−i) · i(p + q)n−1 ⇒ µb = np (6.16)

where we have also used the fact that p+q = 1 to obtain the last expression.We can also use the expression for the second moment that involves FT’s,Eq. (5.26), to obtain:

i2[

d2fb

dω2

]

ω=0

= n(n − 1)p2 + np = np − np2 + n2p2 (6.17)

But we have found that the variance squared is equal to the second momentminus the mean squared, Eq. (6.9), which, combined with the above resultproduces for the variance of the binomial distribution:

σ2b = i2

[

d2fb

dω2

]

ω=0

− µ2b ⇒ σ2

b = npq (6.18)

where we have used q = 1 − p to obtain the last expression.

Conclusion: the binomial distribution applies when we know the totalnumber of trials n, the probability of success for each trial p (the probabilityof failure is q = 1−p) and we are interested in the probability of x successfuloutcomes. Both n and x are integers, both p and q are real (0 ≤ p, q ≤ 1).The mean is µb = np and the variance σ2

b = npq.

6.1.2 Poisson distribution

Another important probability distribution of a single variable is the socalled Poisson distribution. This is actually the limit of the binomial dis-tribution for p very small, n very large, np = µ finite and x ≪ n. Theensuing distribution for the variable x (x being still an integer (discrete)variable) takes the following form:

fp(x) =µxe−µ

x!(6.19)


To obtain this from the binomial distribution, we note first that

n!

(n − x)!= (n − x + 1) · (n − x + 2) · · · (n − 1) · n ≈ nx

since for x ≪ n each term in the above product is approximately equal ton and there are x such terms. Next, we note that

qn−x =(1 − p)n

(1 − p)x⇒ ln(qn−x) = n ln(1 − p) − x ln(1 − p) ≈ −np + xp

where we have used q = 1− p and ln(1− ǫ) ≈ −ǫ for ǫ → 0. From this lastresult, taking exponentials of both sides, we obtain

qn−x ≈ e−np

where we have neglected xp compared to np, since we assumed x ≪ n.Combining this and the previous relation between n!/(n − x)! and nx, wearrive at

fp(x) ≈ (np)xe−np

x!

which is the desired expression for the Poisson distribution if set µ = npeverywhere.

Conclusion: The Poisson distribution applies when we know the meanvalue µ, and we are interested in the probability of x successful outcomes,in the limit when the number of trials n is much larger than the numberof successes we interested in x ≪ n, the probability of success for anindividual trial is very small (p ≪ 1), but the mean is finite (np = µ). x isan integer and µ is a real number.

6.1.3 Gaussian or normal distribution

Another very important probability distribution is the so-called Gaussianor normal distribution. This is obtained from the binomial distribution inthe limit when n is very large and x is close to the mean µ, and is givenby the expression:

fg(x) =1√2πσ

e(x−µ)2/2σ2

(6.20)

To obtain this expression, we will start again with the binomial distributionand set:

µ = np, x = µ + δ, n → ∞ (6.21)


with p, q finite, which implies that

δ ≪ np, δ ≪ nq (6.22)

as well as the relations:

n − µ = n(1 − p) = nq, n − x = n − µ − δ = nq − δ (6.23)

We will use the Stirling formula

n! ≈ (2πn)1/2nne−n (6.24)

which is valid in the limit of large n, to rewrite the factorials that appearin the binomial expression Eq. (6.13). With this, we find that the binomialexpression takes the form:

n!

(n − x)!x!pxqn−x ≈ nn

xx(n − x)n−xpxqn−x

(

n

x(n − x)

)1/21√2π

=1√2π

(

np

x

)x ( nq

n − x

)n−x(

x(n − x)

n

)−1/2

We will deal with each term in parentheses in the above expression sepa-rately, substituting n, p and q through the variables µ, x and δ, using therelations of Eq.s (6.21), (6.22) and (6.23). The first term produces:

(

np

x

)x

=

(

µ

µ + δ

)µ+δ

=

(

1 +δ

µ

)−(µ+δ)

Taking the logarithm of this expression, we get

−(µ + δ) ln

(

1 +δ

µ

)

= −(µ + δ)

(

δ

µ− δ2

2µ2+ · · ·

)

= −δ − δ2

2µ· · ·

where we have used the fact that δ ≪ np = µ and we have employed theTaylor expansion of the logarithm ln(1 − ǫ) for ǫ ≪ 1 and kept the firsttwo terms in the expansion, with ǫ = δ/µ. Similarly, the second term inparentheses produces:

(

nq

n − x

)n−x

=

(

nq

nq − δ

)nq−δ

=

(

1 − δ

nq

)−(nq−δ)


Taking the logarithm of this expression, we get

−(nq − δ) ln

(

1 − δ

nq

)

= −(nq − δ)

(

− δ

nq− δ2

2(nq)2+ · · ·

)

= δ − δ2

2nq· · ·

where we have used the fact that δ ≪ nq and we have employed againthe Taylor expansion of the logarithm ln(1 − ǫ) for ǫ ≪ 1 and kept thefirst two terms in the expansion, with ǫ = δ/nq. The product of these twoparentheses, which upon taking logarithms is the sum of their logarithms,produces:

− δ2

2np− δ2

2nq= − δ2

2n

(

1

p+

1

q

)

= −δ2(p + q)

2npq= − δ2

2σ2

since p + q = 1 and σ2 = 2npq. The third term in parentheses produces:

(

x(n − x)

n

)−1/2

=

[(

np + δ

np

)(

nq − δ

nq

)

npq

]−1/2

≈ (npq)−1/2

(

1 − δ

2np+

δ

2nq+ · · ·

)

≈ (npq)−1/2 = (σ2)−1/2

Putting all the partial results together, we arrive at:

fg(x) ≈ e−δ2/2σ2

(2πσ2)−1/2

which, with δ = x − µ gives the desired result of Eq. (6.20).In the final form we obtained, the variable x ranges from 0 to n. When n

is very large, it is difficult to work with the original values of the parametersthat enter in the definition of fg(x); it is much more convenient to scaleand shift the origin of x as follows: We define first the scaled variables

x =x

n, µ =

µ

n, σ =

σ

n(6.25)

Next, we define the variable y through

y =x − µ

σ⇒ dy =

dx

σ(6.26)

Since the limits of x are [0, n], the limits of the variable x are [0, 1] and thelimits of the variable y are [−µ/σ, (1 − µ)/σ]. But we have:

µ = pn ⇒ µ =µ

n= p ⇒ (1 − µ) = 1 − p = q


We also have:

σ2 = npq ⇒ σ =σ

n=

√pq√n

With these results, the limits of the variable y become:

− µ

σ= −

√n

√

p

q,

(1 − µ)

σ=

√n

√

q

p

and since we are taking the limit of very large n, with p and q finite valuesbetween 0 and 1, we conclude that the limits of the variable y are essentially±∞. Note that the mean and variance of the variable y have now become

µy =µ − µ

σ= 0, σy = 1

The final expression we produced is a properly normalized probability dis-tribution function, since

∫ +∞

−∞

1√2π

e−y2/2dy = 1

as was shown in Eq. (1.37). It is also convenient to define the followingfunction of the variable w

Φ(w) =1√2π

∫ w

−∞e−y2/2dy =

1√2π

∫ +∞

−we−y2/2dy (6.27)

from which we deduce the following relations

Φ(+∞) = 1, Φ(−∞) = 0, Φ(−w) = 1 − Φ(w) (6.28)

The function Φ(w) cannot be easily computed by simple methods so it hasbeen tabulated, since it is extremely useful in determining probabilitiesthat involve the gaussian distribution.

Conclusion: The Gaussian or normal distribution is valid when thenumber of trials n is very large. There are two parameters needed tocompletely specify this distribution, µ and σ, both real numbers. x is nowa real variable and can range from −∞ to +∞.

Example 6.1 : Five fair coins are tossed at once. What is the probabilitythat the result is at least one head?Anwer: There are 25 = 32 combinations, all of which contain at least one headexcept one, that with all tails. Therefore the probability of at least one head


is 31/32.

Example 6.2 : A fair coin is tossed n times. What is the probability ofgetting x heads, where x = 0, . . . , n?Answer: Here the binomial distribution applies because we need to select xsuccessful outcomes out of n tries, with the probability of success in each tryp = 0.5 (and probability of failure q = 1−p = 0.5), and we do not care aboutthe order in which the successes or failures occur.

fb(x) =n!

x!(n − x)!pxqn−x, p = q =

1

2

For instance, for n = 10, the probabilities are:

fb(0) =10!

0!10!

(

1

2

)0 (1

2

)10

=1

1024, fb(1) =

10!

1!9!

(

1

2

)1 (1

2

)9

=10

1024

fb(2) =10!

2!8!

(

1

2

)2 (1

2

)8

=45

1024, fb(3) =

10!

3!7!

(

1

2

)3 (1

2

)7

=120

1024

fb(4) =10!

4!6!

(

1

2

)4 (1

2

)6

=210

1024, fb(5) =

10!

5!5!

(

1

2

)5 (1

2

)5

=252

1024

Note that from the general exrpession for fb(x) we have fb(n − x) = fb(x)for 0 ≤ x ≤ n/2, so the remaining probabilities are the same as the ones wealready calculated. In Fig. 6.1 we show the values of fb(x) for all values of0 ≤ x ≤ n for three values of n = 4, 10, 30.

Example 6.3: A fair coin is tossed n = 10 times. What is the probability ofgetting at least three heads?Answer: Using the results of the previous example, the probability is 1 minusthe probability of getting any number of heads that is smaller than three, thatis, zero, one and two heads:

1 − fb(0) − fb(1) − fb(2) = 1 − 56

1024= 0.9453125

Example 6.4: We consider a fair coin tossed n = 106 times. What is theprobability that there is a 0.1% imbalance in favor of heads in all of the tossestogether?Answer: Here we are obviously dealing with a gaussian distribution since n isvery large. We could, in principle, use again the binomial distribution, as in


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

Figure 6.1: The binomial probability fb(x) for all x ∈ [0, n], for three valuesof n = 4 (squares), 10 (rhombuses) and 30 (circles). Note that the shape offb(x) evolves closer to that of a gaussian for increasing n and is very close tothe gaussian form already for the largest value of n = 30. In order to plot thethree distributions on a common scale, the horizontal axis is scaled, x/n.

Examples 6.2 or 6.3, but this would lead to extremely tedious and difficultcalculations, which would give essentially the same result. In the context ofthe gaussian ditribution, we have:

p =1

2, q =

1

2, µ = np =

1

2× 106, σ2 = npq =

1

4× 106 ⇒ σ =

1

2× 103

The question we are asked requires to calculate the probability that we getat least 50.1% heads or at most 49.9% tails. This probability is given by theintegral of the gaussian probability distribution over all values of x from 0 (notails) to x1 = 0.499n (49.9% tails) with µ and σ determined above:

∫ 0.499n

0

1√2πσ

e−(x−µ)2/2σ2

dx

This, however, is an awkward integral to calculate. Instead, we change variablesas in the general discussion of the gaussian distribution:

x =x

n, σ =

σ

n= 0.5 × 10−3, µ =

µ

n= 0.5

so that the range of the new variable x is [0, 1]. We express the limits of theintegral in terms of µ and σ:

x1 =x1

n= 0.499 = 0.5 − 2 × 0.5 × 10−3 = µ − 2σ


0 = 0.5 − 103 × 0.5 × 10−3 = µ − 103σ

and the desired integral then becomes:

∫ µ−2σ

µ−103σ

1√2πσ

e−(x−µ)2/2σ2

dx

We next shift the origin of the variable x by −µ and scale it by σ, defining anew variable y:

y =x − µ

σ⇒ dy =

dx

σ

with which the above integral becomes:

∫ −2

−∞

1√2π

e−y2/2dy = Φ(−2) = 0.0228

or a little over 2%. In the last expression we have replaced the lower limit ofy = −103 with −∞, since for such large negative values of y the integrandis zero for all practical purposes, the value that it takes for y → −∞. Thenumerical value is obtained from the values of the normalized gaussian integralΦ(w) defined in Eq.(6.27).

Example 6.5: Cars pass though a toll booth at the rate of r = 20 per hour.What is the probability that in 5 minutes 3 cars will go through?Answer: In this case we can calculate the average number of cars µ that passesthrough the toll booth in the given time interval from the rate r:

µ = 5 min × 20

60 min=

5

3

and we also know the number of cars that we want to go through at this giveninterval, x = 3, so we can use the Poisson distribution:

fp(x = 3) =µxe−µ

x!=

(5/3)3e−5/3

3!= 0.145737 (6.29)

Alternatively, we can look at this problem from the point of view of the binomialdistribution, but then we must subdivide the total time interval of three minutesinto more elementary units τ in order to have a well posed problem, andeventually take the limit of these elementary units going to zero. For example,for τ = 60 sec, there are n = 5 such subdivisions in the total time interval ofinterest (5 min), and the probability of success in one subdivision is

p = 1min × 20

60min=

1

3

6.2. MULTIVARIABLE PROBABILITIES 157

while the number of successful outcomes is x = 3 and the probability of failureis q = 1 − p = 5/6, giving:

n = 5, p =1

3, q =

2

3, x = 3 ⇒ fb(3) =

5!

3!2!

(

1

3

)3 (2

3

)2

= 0.1646

Similarly, for τ = 0.5 min, we find:

n = 10, p =1

6, q =

5

6, x = 3 ⇒ fb(3) =

10!

3!7!

(

1

6

)3 (5

6

)7

= 0.1550

and for τ = 0.25 min, we find:

n = 20, p =1

12, q =

11

12, x = 3 ⇒ fb(3) =

20!

3!17!

(

1

12

)3 (11

12

)17

= 0.1503

We see that the result keeps changing and gets closer and closer to the answerwe obtained with the Poisson distribution as we decrease the size of the subdi-vision τ or equivalently increase the number of trials n, since nτ = t, the totaltime interval (5 min in this example). It is evident that in the limit τ → 0we find n → ∞, p → 0, but np = 5/3 = µ, as in the case of the Poissondistribution, and we recover the earlier result of Eq. (6.29).

6.2 Multivariable probabilities

We consider next cases where we have more than one random variable.We will deal explicitly with two random variables, but the discussion iseasily generalized to more. We will use the symbol P (X, Y ) to denote thevalue of the probability that depends on two variables, without specifyingthe values of the variables (this is what the capital symbols X, Y mean).When these variables take specific values the qantity P (X, Y ) is equal tothe probability of these specific values occuring.

We define the probability distribution f(x, y) for the two random vari-ables x, y as the probability of getting the first variable with value x andthe second variable with value y:

f(x, y) = P (X = x, Y = y) (6.30)

With this definition, we can define the probability of getting the value xfor the first variable independent of what the value of the second variable


is, which is given by:

f1(x) = P (X = x, Y : arbitrary) =∑

y

f(x, y) (6.31)

where the symbol∑

y implies a summation over all the possible values ofy, whether this is a discrete or a continuous variable. The quantity f1(x) isknown as the “marginal probability” for the random variable x. Similarly,the probability of getting the value y for the second variable independentof what the value of the first variable is, is given by:

f2(x) = P (X : arbitrary, Y = y) =∑

x

f(x, y) (6.32)

which is the marginal probability for the random variable y.If the two variables are independent of each other, then the following

relation is true:f(x, y) = f1(x)f2(y) (6.33)

A consequence of this is that the cummulative probability distributionsobey a similar relation:

F (x, y) = F1(x)F2(y) (6.34)

where F1(x), F2(y) are the cummulative probability distributions corre-sponding to the single-variable probability distributions f1(x) and f2(y),the two marginal probabilities.

By analogy to what we had done in the case of single-variable probabil-ity distributions, we define the expectation value of a function of the twovariables g(x, y) as:

E(g(X, Y )) =∫ +∞

−∞g(x, y)f(x, y)dxdy (6.35)

Using this definition, we can derive the following relations

E(X + Y ) = E(X) + E(Y ) (6.36)

which holds for any pair of random variables. Also:

E(XY ) = E(X)E(Y ) (6.37)

which holds for independent variables only. We can express the meanand variance of each of the random variables as:

µx = E(X), σ2x = E((X − µx)

2) (6.38)

µy = E(Y ), σ2y = E((Y − µy)

2) (6.39)

6.2. MULTIVARIABLE PROBABILITIES 159

A consequence of these relations is:

σ2x = E(X2) − µ2

x = E(X2) − [E(X)]2 (6.40)

σ2y = E(Y 2) − µ2

y = E(Y 2) − [E(Y )]2 (6.41)

We can define a new random variable z which is the sum of the originaltwo random variables:

z = x + y

For this new random variable, we can calculate the mean value and thevariance, using the corresponding values for the original random variablesµx, σx, µy, σy as follows:

µz = E(Z) = E(X + Y ) = µx + µy (6.42)

σ2z = E((Z − µz)

2) = E(Z2 − 2Zµz + µ2z) = E(Z2) − µ2

z

But the first term in the last expression can be re-written as:

E(Z2) = E(X2 + 2XY + Y 2) = E(X2) + 2E(XY ) + E(Y 2)

and from the definition of E(Z) we have:

[E(Z)]2 = [E(X) + E(Y )]2 = [E(X)]2 + [E(Y )]2 + +2E(X)E(Y )

Subtracting the second expression from the first and using Eq.s (6.40),(6.41), we obtain:

σ2z = σ2

x + σ2y + 2σxy (6.43)

where we have defined the covariance σxy as:

σxy = E(XY ) − E(X)E(Y ) (6.44)

If the two variables are independent, the covariance is equal to zero.

Example 6.6: Consider two yards, which we will label A and B, and a bigtree which sits right in the middle of the fence separating the two yards. Theleaves from the tree fall randomly in one of the two yards, with probabilities pand q. The values of p and q are not necessarily equal (for example, a slightbreeze may lead to p > q) but they always add up to unity: p + q = 1. Letx be the random variable associated with the fate of leaf 1, which may landeither in yard A, in which case we assign the value x0 = 0 to it, or in yardB, in which case we assign the value x1 = 1. Similarly, the random variabley describes the fate of leaf 2, and it can also have values y0 = 0 or y1 = 1,


depending on where leaf 2 lands. The random variables x, y take on the values0 with probability p each, that is

px(X = 0) = p, py(Y = 0) = p

and they take on the values 1 with probability q each, that is:

px(X = 1) = q, py(Y = 1) = q

We define a new random variable z as the sum of x and y. From the waywe have indexed the possible values of the random variables x, y, we have:

xi = i, yj = j, i, j = 0, 1

The values that the random variable z assumes are:

z0 = 0 → (X, Y ) = (x0, y0)

z1 = 1 → (X, Y ) = (x1, y0) or (x0, y1)

z2 = 2 → (X, Y ) = (x1, y1)

From the way we have indexed the possible values of z we deduce:

zk = xi + yk−i

As far as the probabilities associated with the three possible values of z areconcerned, from the above relation we find that:

pz(Zk) =∑

i

px(Xi)py(Yk−i)

but this expression is exactly what we have defined as the convolution of thetwo functions px(x) and py(y), if we think of the subscript of the variablesX, Y as the integration variable in the general definition of the convolution.From the definition of the probabilities for px(Xi) and py(Yj) we then find:

pz(0) = px(0)py(0) = p2

pz(1) = px(1)py(0) + px(0)py(1) = 2pq

pz(2) = px(1)py(1) = q2

It is easy to check that

∑

k

pz(Zk) = p2 + 2pq + q2 = (p + q)2 = 1

6.3. CONDITIONAL PROBABILITIES 161

that is, pz(Z) is normalized to unity, whatever the values of p and q are, aslong as they add up to unity; thus, pz(Z) is a proper probability distributionfunction.

We next want to calculate the average and variance of pz(Z) in termsof the averages and variances of px(X), py(Y ), defined as µx, µy, and σx, σy,respectively. We can use the fact that pz(Z) is the covolution of the probabilitydistributions px(X) and py(Y ) and the fact that the Fourier transform of theconvolution of two functions is the product of the Fourier transforms of thethe two functions:

pz(ω) = px(ω)py(ω)

From the moments of the probabilities px(X), py(Y ), using the Fourier trans-forms and their derivatives (which are all evaluated at ω = 0), we have:

0th moment : px(0) = 1 = py(0)

1st moment :1

ip′x(0) = µx,

1

ip′y(0) = µy

2nd moment :1

(i)2p′′x(0) = σ2

x + µ2x,

1

(i)2p′′y(0) = σ2

y + µ2y

while for the Fourier transform of the probability pz(Z) we have:

p′z(ω) = p′x(ω)py(ω) + px(ω)p′y(ω) ⇒ 1

ip′z(0) = µz = µx + µy

as we would expect for any random variable defined to be the sum of two otherrandom variables. Similarly, for the variance in the variable z we obtain:

p′′z(ω) = p′′x(ω)py(ω) + 2p′x(ω)p′y(ω) + px(ω)p′′y(ω) ⇒

σ2z + µ2

z = (σ2x + µ2

x) + 2µxµy + (σ2y + µ2

y) ⇒ σ2z = σ2

x + σ2y

as we would expect for two independent variables, which is what the variablesx and y are in the present case.

6.3 Conditional probabilities

Consider two sets of events, one denoted by A, the other by B. The eventscommon to both A and B comprise the intrsection A∩B. The events thatcan be either in A or in B comprise the union A ∪ B. If all events belongto a superset S that includes both A and B and possibly other events that


do not belong to either A or B, then the following relations will hold forthe probabilities P (A), P (B), P (S), associated with the various sets:

P (S) = 1, 0 ≤ P (A) ≤ 1, 0 ≤ P (B) ≤ 1 (6.45)

P (A ∪ B) = P (A) + P (B) − P (A ∩ B) (6.46)

the last relation justified by the fact that the events that belong to bothA and B are counted twice in P (A) + P (B). We define as mutually

exclusive the sets of events that have no common elements, that is, A ∩B = 0; in that case:

P (A ∪ B) = P (A) + P (B) (6.47)

We further define the complement of A, denoted by Ac, as the set of allevents that do not belong to A, and similarly for the complement of B,denoted by Bc; the complements satisfy the relations:

Ac = S − A, A ∪ Ac = S, P (Ac) = 1 − P (A) (6.48)

We define the conditional probability P (A|B) as the probability thatan event in B will occur under the condition an event in A has occured:

P (B|A) =P (A ∩ B)

P (A)(6.49)

Based on this definition, we find that the probability of an event whichbelongs to both A and B to occur, given by P (A ∩ B), is obtained fromthe conditional probabilities as:

P (A ∩ B) = P (B|A)P (A) = P (A|B)P (B) (6.50)

that is, this probability is equal to the probability of an event in B occuringunder the condition that an event in A has occured times the probabilityof an event in A occuring (the same holds with the roles of A and Binterchanged). Notice that

P (A ∩ B) = P (B ∩ A)

P (A|B) 6= P (B|A)

We call the events in a A and B independent if the probability of anevent in A occuring has no connection to the probability of an event in Boccuring. This is expressed by the relation:

P (A ∩ B) = P (A)P (B) (6.51)


In this case the conditional probabilities become:

P (A|B) =P (A ∩ B)

P (B)= P (A), P (B|A) =

P (B ∩ A)

P (A)= P (B) (6.52)

which simply says that the probability of an event in A occuring under thecondition that an event in B has occured is the same as the probability ofan event in A occuring, P (A), since this has no connection to an event inB (the same holds with the roles of A and B interchanged).

Example 6.7: Consider two fair dice that are thrown simultaneously. Wedefine two events A and B as follows:event A: the sum of values of the two dice is equal to seven;event B: only one of the two dice has the value two.We can consruct the table of all possible outcomes and identify the events Aand B in this table, as shown below (the line of values i = 1 − 6 representsthe outcome of the first die, the column of values j = 1 − 6 represents thevalues of the second die):

j, i 1 2 3 4 5 6

1 B A2 B B B A, B B3 B A4 B A5 A, B6 A B

The total number of outcomes 6×6 = 36 is the event space S. The boxeslabeled with A represent the set of values corresponding to event A and theboxes laleled with B the set ov values corresponding to event B. There areseveral boxes carying both labels, and this is the intersection of the two sets,A ∩ B; all the labeled boxes comprise the union of A and B, A ∪ B.

From this we can calculate the probabilities:

P (A) = 6 × 1

36=

1

6, P (B) = 10 × 1

36=

10

36

that is, the probability of event A is the total number of boxes containing thelabel A, which is 6, divided by the total number of possible outcomes which is36; similarly for the probability of event B. We also have:

P (A ∪ B) = P (A) + P (B) − P (A ∩ B) =6

36+

10

36− 2

36=

14

36


that is, the probability of either event occuring is the probability of the eventA occuring plus the probability of event B occuring minus the probability ofthe intersection wich is counted twice. Indeed, there are 14 occupied entriesin the entire table, two of which have two labels (representing the intersectionA ∩ B).

Finally, the conditional probabilities are obtained as follows:the probability that event A will occur (sum of values is 7) under the conditionthat event B has occured (only one die has value 2) is

P (A|B) =P (A ∩ B)

P (B)=

2/36

10/36=

2

10

and the probability that event B will occur (only one die has value 2) underthe condition that event A has occured (sum of values is 7) is

P (B|A) =P (B ∩ A)

P (A)=

2/36

6/36=

2

6

These results make good sense if we check the table of events: out of the 10boxes containing a label B, that is, all the cases that only one die has value2, there are only two boxes that also contain the label A, that is, the sum ofvalues of the two dice is 7: thus, the probability that the sum of values is 7under the condition that only of the two dice has value 2 is 2/10, which isprecisely the conditional probability P (A|B). We can rationalize the value weobtained for the conditional probability P (B|A) by a similar argument.

Example 6.8: Consider two boxes, both containing gold and silver coins,but in different proportions: the first box contains 70% gold, 30% silver coinsand the second box contains the reverse ratio. The total number of coins ineach box is the same. From the outside, the boxes are identical in size, shapeand total weight. We want to figure out which contains more gold coins bysamplng them: we are allowed to pick coins from each box, see what kind ofcoin they are, and put them back in the same box. Based on our sampling, weshould be able to figure out which box is which with certain degree of certainty.We will use conditinal probabilies to do this.

We define the following events:Event A: choose a coin from box 1; Ac then must correspond to choosinga coin from box 2. Since the boxes look identical from the outside we mustassign P (A) = P (Ac) = 0.5.Event B: choose a gold coin; Bc then must correspond to choosing a silvercoin, since these are the only two possibilities as far as the type of coin we can


pick are concerned. In this case we cannot easily determine P (B) and P (Bc),which depend on which box we are chosing the coin from, and we cannot tellthis from the outside.

We next calculate the conditional probabilities, which in this case are ac-tually easy to determine:

P (B|A) = 0.7, P (B|Ac) = 0.3, P (Bc|A) = 0.3, P (Bc|Ac) = 0.7

These are the result of the definitions we have made: The probability of gettinga gold coin under the condition that we have chosen the first box is 70%, orP (B|A) = 0.7, etc.

We can use these results to calculate the probabilities of the intersectionsP (A ∩ B), P (Ac ∩ B), P (A ∩ Bc), P (Ac ∩ Bc), using the general fromula ofEq. (6.50):

P (A ∩ B) = P (B|A)P (A) = 0.35, P (A ∩ Bc) = P (Bc|A)P (A) = 0.15

P (Ac∩B) = P (B|Ac)P (Ac) = 0.15, P (Ac∩Bc) = P (Ac|Bc)P (Ac) = 0.35

We put these intersection probabilities in a table:

B Bc

A 0.35 0.15 0.50Ac 0.15 0.35 0.50

0.50 0.50

The four entries under B, Bc and across A, Ac are the corresponding in-tersection probabilities P (A∩B) = P (B ∩A), etc. The extra entries are thecolumn and row sums. These are interesting quantities: The first row sum isthe sum of the probabilities P (A ∩ B) + P (A ∩ Bc), that is the probabilityof getting either a gold coin (event B) or a silver coin (event Bc) from box1 (event A), but this is the same as the probability of choosing box 1 itself,which must be equal to P (A) = 0.5, as indeed it is. Similarly for the secondrow. The first column sum is the probability of choosing a gold coin (eventB) from either box 1 (event A) or box 2 (event Ac), which turns out to beP (B) = 0.5. Similarly for the second column, P (Bc) = 0.5. These are themarginal probailities for events B and Bc: that is, the first column representsthe probability of getting a gold coin (P (B)) whether we picked it from box 1(event A) or from box 2 (event Ac); similarly for the second column.

Our next step is to consider a situation where we pick not one but severalcoins at a time. We illustrate what can happen if we pick three coins from one


box. If we assume that the box was the one containing 70% gold and 30%silver coins, then we can actually figure out the probabilities of picking threegold coins, two gold and one silver coins, one gold and two silver coins or threesilver coins, using the binomial distribution. We know that the probability ofgetting a gold coin from box 1 is p = 0.7 and the probability of getting a silvercoin is q = 1 − p = 0.3; each time we pick a coin these probabilites are thesame. Therefore, the probabilities of gettins, 3, 2, 1, 0 gold coins (and hence0, 1, 2, 3 silver coins) in n = 3 tries, on the condition that we are pickingcoins from box 1 (event A), are:

P (B3|A) =3!

3!0!(0.7)3(0.3)0 = 0.343 : 3 gold, 0 silver

P (B2|A) =3!

2!1!(0.7)2(0.3)1 = 0.441 : 2 gold, 1 silver

P (B1|A) =3!

1!2!(0.7)1(0.3)2 = 0.189 : 1 gold, 2 silver

P (B0|A) =3!

0!3!(0.7)0(0.3)3 = 0.027 : 0 gold, 3 silver

where we have now defined the events Bj : get n gold coins (j = 0, 1, 2, 3) inthe three picks. These probabilities add up properly to 1. We can use theseconditional probabilities to obtain the intersection probabilities P (A ∩ Bj) bythe same procedure as before, through the relation

P (A ∩ Bj) = P (Bj|A)P (A)

with P (A) = 0.5. Similarly for the intersection probabilities P (Ac ∩Bj), withP (Ac) = 0.5. We put all these intersection probabilities in a table as before:

B0 B1 B2 B3

A 0.0135 0.0945 0.2205 0.1715 0.50Ac 0.1715 0.2205 0.0945 0.0135 0.50

0.185 0.315 0.315 0.185

The numbers under both entries A or Ac and Bj are the intersection proba-bilities P (Bj ∩A) or P (Bj∩Ac), while the last column is the sum of individualrows of intersection probabilities which, as expected, sum up to 0.5, and the thenumbers in the last row are the column sums of the P (Bj ∩A) or P (Bj ∩Ac)entries. Notice that the column sums represent the marginal probabilities

P (Bj) = P (Bj ∩ A) + P (Bj ∩ Ac)


of getting j coins, whether we chose box 1 (event A) or box 2 (event Ac).Thus, the numbers of the bottom row in the above table are the marginalprobabilities P (B0), P (B1), P (B2), P (B3), in this order.

Having determined the intersection and marginal probabilities, we can usethose to calculate another set of conditional probabilities, which are actuallymore interesting. Specifically, we can calculate the probability that we havechosen box 1, under the condition that we drew three gold coins in our threetrials; this is given by the expression:

P (A|B3) =P (B3 ∩ A)

P (B3)=

0.1715

0.185= 0.927

In other words, if in our three tries we get three gold coins, then the probabilitythat we have found the box with the most gold coins (box 1 or event A) is92.7%. This is indeed very useful information, if our goal were to identify thebox with the most gold coins by picking coins from one of the two identical-looking boxes! Conversely, if we had obtained three silver coins (and hencezero gold coins) in our three picks, the probability that we had found box 1would be:

P (A|B0) =P (B0 ∩ A)

P (B0)=

0.0135

0.185= 0.073

or 7.3%, a rather small value, and equal to 1 − P (A|B3).Let us also calculate the average number of gold coins in the n = 3 picks:

it is given by

µ =n∑

j=0

jP (Bj) = 0 × 0.185 + 1 × 0.315 + 2 × 0.315 + 3 × 0.185 = 1.5

µ

n=

1.5

3= 0.5

We note that because of the statement of the problem and the fact that theratio of gold and silver coins in box 1 is the reverse of that in box 2, the averagenumber scaled by the nmber of picks will not change if we change n, and willremain equal to 0.5.

What would be the right strategy for finding the box with the most goldcoins? A logical strategy would be to choose as the box with the most goldcoins that from which we get a larger than average number of gold coins inthree picks. Since the average is 1.5, getting more than average gold coinsmeans getting two or three gold coins. We should then compare the probabilityof getting tow or three gold coins to the probability of getting zero or one gold


coins. We note that the events of getting i and j gold coins are mutuallyexclusive for i 6= j, therefore:

P (Bi ∪ Bj) = P (Bi) + P (Bj), i 6= j

and the same relation will hold for the intersection probabilities:

P ((Bi ∪ Bj) ∩ A) = P (Bi ∩ A) + P (Bj ∩ A), i 6= j

P ((Bi ∪ Bj) ∩ Ac) = P (Bi ∩ Ac) + P (Bj ∩ Ac), i 6= j

From the table of intersection probabilities for the individual Bj events, we thencan calculate the intersection probabilities P ((B2∪B3)∩A), P ((B0∪B1)∩A),P ((B2 ∪ B3) ∩ Ac) and P ((B0 ∪ B1) ∩ Ac):

B0 ∪ B1 B2 ∪ B3

A 0.108 0.392 0.50Ac 0.392 0.108 0.50

0.50 0.50

From these entries, we can next obtain the conditional probabilities ofgetting more than average gold coins (B2∪B3) or less than average gold coins(B0 ∪ B1), on the condition that we are picking from box 1 (event A) or box2 (event Ac):

P ((B2 ∪ B3)|A) = 0.784, P ((B0 ∪ B1)|A) = 0.216

P ((B2 ∪ B3)|Ac) = 0.216, P ((B0 ∪ B1)|Ac) = 0.784

These results tell us that if we get more than average gold coins there is a78.4% probability that we have hit on the right answer, that is, the box thatactually contains the larger percentage of gold coins. There is also a non-negligible 21.6% probability that we have the wrong box, that is, the one withsmaller percentage of gold coins, even though we got more than average goldcoins in our three picks. Similarly, if we had gotten fewer than average goldcoins, there is a 78.4% probability that we are picking from box 2, the boxwith the smaller percentage of gold coins, but there is also a 21.6% probabilitythat we were picking from the box with the larger percentage of gold coins.Assuming that our strategy is to keep as the right choice the box if we gotmore than the average gold coins and to reject the box if we got fewer thanthe average gold coins, then we would be making the correct choice with aprobability 78.4%, and the wrong choice with probability 21.6%.


It is evident that if we increase the number of samplings n, the odds offinding the right box will increase, but there will always be a non-zero chanceof making the wrong choice. Essentially, with n increasing, the distributionof probabilities for each case evolves into a gaussian, and for large enough nthe gaussians corresponding to A and Ac will have very little overlap. This isshown in Fig. 6.2.

0 0.5 10

0.2

0.4

0.6

n=3

n=10

n=30

Figure 6.2: The probabilities of intersections, P (Bj∩A), P (Bj∩Ac), (shown byopen symbols) and the marginal probabilities P (Bj) = P (Bj ∩A) + P (Bj ∩Ac)(shown by filled symbols), for j = 1, . . . , n. Circles correspond to n = 3, squaresto n = 10 and rhombuses to n = 30. The probabilities for n = 10 and n = 30have been shifted up for clarity (by 0.4 and 0.6, respectively). In order to displayall values in the same range, j is scaled by n, that is, the horizontal axis is j/n.

We next pose the problem in a more general, and challenging manner:Suppose that we wanted to be assured of a desired degree of certitude thatwe picked the right box, the one that has a majority of gold coins. How manytrials should we make to make sure we have the right box with the right degreeof certitude? If the number of trials n is large enough we know that we willhave gaussian distributions. If we are picking from box 1, then:

A : µ1 = np1, p1 = 0.7 ⇒ µ1

n= 0.7

If we are picking from box 2, then:

Ac : µ2 = np2, p2 = 0.3 ⇒ µ2

n= 0.3


Since the two boxes look identical from the utside, we may be pcking coinsfrom either box, and therefore the normalization of each gaussian representingthe probabilities of getting any combination of gold and silver coins must beequal to P (A) = P (Ac) = 0.5. For both curves, the variance will be given by

σ1 = np1q1, σ2 = np2q2 ⇒σ1

n=

σ2

n=

√0.21√n

The first curve will be peaked at µ1/n = 0.7, the second at µ2/n = 0.3, andfor large enough n the overlap between them will be small.

The tails of the gaussians have interesting meaning (in both cases, therandom variable represents the number of gold coins divided by n):– the tail of the first curve for values of the random variable less than 0.5corresponds to situations where we would get less than half gold coins bypicking from box 1;– the tail of the second curve for values of the random variable greater than0.5 corresponds to situations where we would get more than half gold coins bypicking from box 2.Assuming that the strategy is to keep the box if we get more than gold coinsand to reject it if we get less than half, then we could be making a mistake ifwe hit the values corresponding to either tail. The first type of mistake wouldbe a false rejection because we hit an event in the tail of the first curve;the second type of mistake would be a false acceptance, if we hit an eventthat corresponds to the tail of the second curve.

Suppose that we want to make sure that either type of mistake is avoidedwith a certain degree of confidence. The only freedom we have is to make morepicks n. How big should n be to guarantee that the probability of making eithertype of mistake is less than 1%?

To answer this question, we have to sum up all the probabilities that cor-respond to the tails of the two gaussian curves above and below the averagenumber of gold coins. In the present example, the two gaussians are the sameas far as variance and normalization is concerned, and their position with re-spect to the average is symmetrical, therefore the contribution of each tailis the same. We then simply calculate the contribution of one of the tailsand simply multiply that by 2; this turns one of the gaussians into a properlynormalized one. We will choose the gaussian that corresponds to event A,centered at µ1 = 0.7n. We then want to calculate the gaussian integral:

Φ(w) =1√2π

∫ w

−∞e−y2/2dy = 0.01 ⇒ w = −2.326


which says that the integral of the tail of the normalized gaussian from −∞to u gives a 1% contribution. The tabulated values of Φ(w) give the value ofw = −2.326 for this case; this is f course expressed in units of the variance(defined to be 1 for the integrand of Φ(w)) away from the mean (defined tobe 0 for the integrqand of Φ(w)). All that remains is to turn this result to anactual number of picks n. To this end, we need to express everything in termsof the scaled variables

µ1 =µ1

n= 0.7, σ1 =

σ1

n=

√0.21√n

and express the value of the average (scaled by n), which denotes the cutoffpoint of the tail, in terms of the mean and variance of the gaussian:

0.5 = µ1 − 2.326σ1

Plugging into this expression the values of µ1 and σ1 from above, the lattercontaining a factor of

√n, and solving for n we find n = 28.4. If we take

the value of n to be the nearest integer larger than this non-integer calculatedvalue, then we are guaranteed that the overall error of either type in choosingthe right box will not exceed 1%.

We can generalize the results of the example above to the case where wehave a choice of two possible events, A and Ac, one of which is the desired,and we try to figure which one is the right choice by sampling n times thepossible outcomes. The two choices need not be symmetrical as was thecase in the example above. For large n the distributions will be gaussian,but the normalizations N1 and N2 will not necessarily be each equal to 0.5,although their sum must be 1. The two choices will have different meansµ1, µ2 and different variances σ1, σ2. The overall mean value of a randomvariable sampling the two districutions will be given by:

µ = µ1N1 + µ2N2 ⇒µ

n=

µ1

nN1 +

µ2

nN2 ⇒ µ = µ1N1 + µ2(1 − N1)

We will assume that the values of µ1, µ2 and N1, N2 are such that:

µ1 < µ < µ2

If the value of µ is used as the criterion for choosing one of the two possibleoptions, then the tails representing the two types of errors extend fromµ → +∞ for the first distribution and from −∞ → µ for the second; these


must be expressed in terms of the mean and standard deviation of eachdistribution:

µ = µ1 + λ1σ1, µ = µ2 − λ2σ2, λ1, λ2 > 0

Then, the total proability of a wrong call is given by:

∫ −λ2

−∞e−t2/2dt +

∫ ∞

λ1

e−t2/2dt

This will be related to the tolerable value of error, from which the numberof trials n can be determined. In order to proceed, more information isneeded about the various quantities that enter in the above expression, µ1,µ2, σ1, σ2 and N1, as was the case for example 6.8.

Example 6.9: A television game played between a host and a player consistsof finding a prize which is hidden behind one of three doors. The rules of thegame are:

1. The player chooses one of the three doors without opening it.

2. The host opens one of the two remaining doors, which does not havethe prize behind it.

3. The player has a chance to change her choice of door or stick with theoriginal choice.

What is the right strategy for the player to maximize her chances of findingthe prize: switch choice of doors or stay with the original choice?Answer: We label the door that the player chooses initially as door 1 and theother two doors 2 and 3. We have to calculate probabilities from the player’sperspective, that is, taking into account only the information that the playerhas. We define the following events:Ai: the prize is behind door i = 1, 2, 3.Bi: the host opens door i = 1, 2, 3.From the player’s perspective, the prize can be behind any one of the threedoors, hence P (A1) = P (A2) = P (A3) = 1/3. Also, the host will not opendoor 1 since the player picked it first, but can open door 2 or 3 with equalprobability as far as the player is concerned, hence P (B1) = 0, P (B2) =P (B3) = 1/2. We can also calculate the following conditional probabilities:

if the prize is behind door 1 : P (B2|A1) = 1/2, P (B3|A1) = 1/2

6.4. RANDOM NUMBERS 173

since the host is equally likely (from the player’s perspective) to open door 2or 3, and

if the prize is behind door 2 : P (B2|A2) = 0, P (B3|A2) = 1

if the prize is behind door 3 : P (B2|A3) = 1, P (B3|A3) = 0

since the host will not open the door with the prize. With this informationand the marginal probabilities for Ai and Bi, we can now calculate the tableof intersection probabilities:

B1 B2 B3

A1 0 1/6 1/6 1/3A2 0 0 1/3 1/3A3 0 1/3 0 1/3

0 1/2 1/2

The entries that are under both B1, B2, B3 and across A1, A2, A3 are theintersection probabilities, the entries in the last column are the marginal prob-abilities P (Ai) and the entries in the last row are the marginal probabilitiesP (Bi). From this table we can then construct the following conditional prob-abilities, which is what the palyer actually wants to know:

P (A1|B2) =P (A1 ∩ B2)

P (B2)=

1

3, P (A1|B3) =

P (A1 ∩ B3)

P (B3)=

1

3

P (A2|B3) =P (A2 ∩ B3)

P (B3)=

2

3, P (A3|B2) =

P (A3 ∩ B2)

P (B2)=

2

3

Thus, the probability that the prize is behind door 1 (the original choice)under the condition that the host opens door 2 or 3 is 1/3 in each case, butthe probability that the prize is behind door 2 if the host opens door 3 orbehind door 3 if the host opens door 2 is 2/3 in each case. Therefore, theplayer should always switch choice of doors to maximize her chances.

6.4 Random numbers

It is often useful to be able to generate a sequence of random numbers nthe computer, in order to apply it to various problems related to proba-bilities. However, since by definition these numbers will be generated by


some deterministic algorithm, they cannot be truly random; such numbersare “pseudo-random” numbers. If a sequence of pseudo-random numberscan adequately approximate the properties of a truly random sequence ofnumbers, it can be very useful.

A standard algorithm for generating random numbers is the moduloalgorithm:

Ri = mod[(ARi−1 + B), M ], ri =Ri

M, i = 1, . . . , N ≪ M (6.53)

where ri are the random numbers and A, B, M, R0 are positive integers;the ri numbers are in the range ri ∈ [0, 1). R0 is usually referred to asthe “seed”. A different choice of seed with the same values for the rest ofthe parameters generates a different sequence of random numbers. Notethat with this algorithm we can generate a number N of pseudo-randomnumbers where N ≪ M , because otherwise the numbers are not reallyrandom. The reason for this restriction is that the numbers generated bythis algorithm repeat after a period M . For example, with

M = 4, A = 5, B = 3, R0 = 1

the first five values generated are

R1 = 0, R2 = 3, R3 = 2, R4 = 1, R5 = 0

and then the numbers keep repeating (R6 = R1, etc.) because at R5 wehave hit the same value that R0 had. Actually this is to be expected, sinceM = 4 and all the possible outcomes of the modulo are 0, 1, 2, 3. Thesenumbers appear to occur in a “random” sequence in the first period, untilthe range of possible outcomes is exhausted, and then the sequence startsover. Thus, if we allow N to reach M , then the numbers we generate donot have any semblance of randomness anymore!

As this simple example shows, a careful choice of the variables A, B,M , R0 is important to avoid a short period. In particular M should bevery large. Actually, the sequence of pseudo-random numbers generatedby the modulo algorithm always involves correlations for any finite valueof M . Specifically, if we use D such numbers to define a random variablein an D-dimensional space (D = 2 for a plane, D = 3 for a cube, etc.),then these points do not fill uniformly the D-dimensional space but lieon (D − 1)-dimensional hyper-planes, and there are at most M1/D suchhyperlanes, if A, B, M , R0 are carefully chosen (there can be fewer such


hyper-planes, meaning poorer quality pseudo-random numbers, if the val-ues of the parameters are not well chosen).

One way to improve the randomness of pseudo-random numbers is tocreate a shuffling algorithm. This works as follows: we first create an arrayof N random numbers rj , j = 1, . . . , N , with N large. Next we chooseat random a value j between 1 and N and use the random number rj ,but at the same time we take it out of the array, and replace it by a newrandom number. This continuous shuffling of the random numbers reducesthe correlations between them. Elaborate algorithms have been producedto make this very very efficient, which are known as Random NumberGenerators (RNG’s).

6.4.1 Arbitrary probability distributions

An important consideration is to be able to generate random numbers ofany desired probability distribution. This can be achieved by requiringthat two distributions p(x) and p(y) are related by

|p(x)dx| = |p(y)dy| ⇒ p(y) = p(x)

∣

∣

∣

∣

∣

dx

dy

∣

∣

∣

∣

∣

(6.54)

For example, we may start with a uniform probability distribution of ran-dom numbers x in the range [0, 1], that is, p(x) = 1, and derive from itrandom numbers y that have an exponential probability distribution, thatis, the probability of finding the value y is proportional to e−y. To achievethis, we would then need:

e−y =

∣

∣

∣

∣

∣

dx

dy

∣

∣

∣

∣

∣

⇒ x = e−y ⇒ y(x) = − ln(x)

that is, the new random numbers y are given in terms of the originalrandom numbers x (which have uniform probability distribution) throughthe relation y(x) = − ln(x). The scheme we described earlier, using themodulo algorithm, generates random numbers x with uniform probabilitydistribution p(x) = 1 in the range [0, 1]. Taking the natural logarithm ofthese numbers (with a minus sign, which makes sure that the new num-bers are positive) generates random numbers y in the range [0,∞] with aprobability distribution p(y) = e−y.

Another important distribution, which we discussed in detail in thischapter is the gaussian distribution. This can be generated from ran-dom numbers of uniform distribution as follows: starting with two random


numbers of uniform distribution x1, x2, we define the new numbers y1, y2

tgrough:

y1 =√

−2 ln(x1) cos(2πx2) , y2 =√

−2 ln(x1) sin(2πx2) ⇒

x1 = e−(y21+y2

2)/2 , x2 =

1

2πtan−1

(

y2

y1

)

and we use the general formula of Eq.(6.54) to relate the two distributions:

p(y1, y2)dy1dy2 = p(x1, x2)

∣

∣

∣

∣

∣

∂(x1, x2)

∂(y1, y2)

∣

∣

∣

∣

∣

dy1dy2

where the quantity inside the absolute value is the Jacobian:

∣

∣

∣

∣

∣

∂(x1, x2)

∂(y1, y2)

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∂x1

∂y1

∂x2

∂y1∂x1

∂y2

∂x2

∂y2

∣

∣

∣

∣

∣

(6.55)

Using the relations between x1, x2, and y1, y2 from above, we can calculatethe Jacobian to obtain:

p(y1, y2) =1√2π

e−y21/2 1√

2πe−y2

2/2 (6.56)

which is actually two indpendent gaussian probabilities in the variablesy1, y2. This is useful, because no computation will be wasted in producingtwo (rather than one) random numbers x1, x2 of uniform distribution.

6.4.2 Monte Carlo integration

We can use random numbers to do integration and other calculations thatmay be otherwise very difficult to perform, by sampling efficiently therelevant space of possible outcomes.

For example, we can calculate the value of π by generating randomnumbers in pairs, representing the (x, y) coordinates in space. By shiftingand scaling the random numbers generated by a RNG we can arrange theirvalues to span the interval [−1, 1]:

r′ = 2(r − 0.5) (6.57)

where r is the original set of random vlaues generated by the RNG withuniform distribution in [0, 1] and r′ is a random number with uniformdistribution in the interval [−1, 1]. If we keep generating such numbers in


−1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

Figure 6.3: Example of Monte Carlo evaluation of π. Filled dots are randompoints within the range of the circle and open circles are outside this region.

pairs corresponding to points on a plane, we would eventually fill a squareof side 2 and total area 2 × 2 = 4. If we keep track of how many of thesepoints are inside a circle or radius 1, that is, x2 + y2 < 1, and comparethis number to the total number of generated points we could calculate thevalue or π: The number of points inside the circle of radius 1 compared tothe number of points inside the square of side 2 must give the same ratioas the respective areas, that is π/4. Alternatively, we can use the randomvalues in the interval [0, 1] in pairs as the (x, y) points, and check whetherx2 +y2 < 1, which would give us the number of points within one quadrantof the circle of radius 1. The area of this quadrant is π/4. Comparingagain the number of points within the quadrant to the total number ofpoints generated, which would fill the square of side 1, we can calculatethe value of π. In Fig. 6.3, we show the distribution of random points forN = 103 for this calculation. A simple code in Mathematica that does thiscalculation with N = 106 random points is given in Fig. 6.4.

We can use the same concept to calculate integrals. For example, wecalculate the integral

I1 =∫ 1

0x2dx =

[

1

3x3]1

0= 0.333333 (6.58)

by generating random points in the range [0, 1], which will be tha valuesof the variable x, and for each point compare the value of f(x) to another


N π ∆0 I1 ∆1 I2 ∆2

103 3.164000 0.022407 0.334000 0.000667 1.002143 0.047643104 3.121200 -0.020393 0.331700 -0.001633 0.952834 -0.001666105 3.142560 0.000967 0.333970 0.000637 0.957605 0.003105106 3.143352 0.001759 0.333440 0.000107 0.954688 0.000188107 3.142856 0.001263 0.333544 0.000210 0.954613 0.000113108 3.141668 0.000075 0.333385 0.000052 0.954552 0.000052109 3.141624 0.000031 0.333350 0.000017 0.954511 0.000011

Table 6.1: Monte Carlo calculation for π and the integrals I1 of Eq. (6.58) andI2(2) of Eq. (6.59). N is the number of random points used and ∆i, i = 0, 1, 2are the differences from the exact result.

random point, y. If y < f(x), then this point would be under the curvef(x). Since the integral represents the area under the curve, we can com-pare the number of points that satisfy the relation y < f(x) to the totalnumber of points generated, and this ratio will give the result for the inte-gral of Eq.(6.58). A simple code in Mathematica that does this calculationwith N = 106 random points is given in Fig. 6.4. In Fig. 6.5, we show thedistribution of random points for N = 103 for this calculation.

As another example, we can use Monte Carlo integration to calculatethe following integral:

I2(Q) =∫ Q

−Q

1√2π

e−x2/2dx (6.59)

for finite values of Q. This caan also be obtained from the tabulatedvalues of the gaussian integral Φ(Q). In this case, we must generate valuesof the random variable in the interval [−Q, Q], or, due to the fact that theintegrand is an even function of x, we can do the integration in the interval[0, Q] and multiply the result by a factor of 2. We must also make sure thatwe scale the result by the total area which would be covered by randompoints. In order to not waste computation, we can arange the total area tobe a rectangle of width 2Q, the range of the variable x, (or Q if we use thehalf range [0, Q]), and height 1/

√2π = 0.39894228, which is the highest

value of the integrand. In Fig. 6.6, we show the distribution of randompoints for N = 103 for this calculation.

In Table 6.1 we collect the results of the three Monte Carlo calculations,using different numbers of random points N (from one thousand to one


billion). As is seen from this table, the larger the number of points used, themore accurate the result (closer to the exact value), as we would intuitivelyexpect. This can be quantified by using the following expression, for anintegral in the range x ∈ [0, L]:

1

L

∫ L

0f(x)dx = 〈f〉 ± 1√

N

[

〈f 2〉 − 〈f〉2]1/2

(6.60)

where the expressions in brackets are defined as:

〈f〉 =1

N

N∑

i=1

f(xi), 〈f 2〉 =1

N

N∑

i=1

[f(xi)]2 (6.61)

This expression states that the fluctuations, which are proportional to

[〈f 2〉 − 〈f〉2]1/2, die out with a prefactor 1/

√N . This is not an exact result,

but a rough estimate of the deviation from the exact value, assuming thatit is one standard deviation in a gaussian type distribution.


MONTE CARLO INTEGRATION OF x^2

f@x_D := x^2

In[39]:= xrange = 1

Out[39]= 1

In[40]:= frange = f@xrangeD

Out[40]= 1

In[41]:= mcount = 0

Out[41]= 0

In[42]:= Do@If@ Hf@Random@Real, 80, xrange<DD - Random@Real, 80, frange<DL > 0, mcount ++D,8i, 1000000<D

In[43]:= N@mcount�1000000D

Out[43]= 0.333142

MONTE CARLO CALCULATION OF Pi

In[44]:= mcount = 0

Out[44]= 0

In[45]:= Do@ If@ HRandom@D^2 + Random@D^2 - 1L < 0, mcount ++D, 8i, 1000000<D

In[46]:= N@mcount�1000000D*4

Out[46]= 3.14136

Examples of Monte Carlo Calculations 1

Figure 6.4:


0 0.5 10

0.5

1

Figure 6.5: Example of Monte Carlo integration for the integral of Eq. (6.58).Filled dots are random points under the curve f(x) = x2 and open circles areoutside this region.

−2 −1 0 1 20

0.2

0.4

Figure 6.6: Example of Monte Carlo integration for the integral of Eq. (6.59),for Q = 2. Filled dots are random points under the curve f(x) = e−x2/2/

√2π

and open circles are outside this region.


APPENDIX - GAUSSIAN TABLE

Φ(w) =1√2π

∫ w

−∞e−u2/2du, Φ(−w) = 1 − Φ(w), Φ(0) = 0.5

w Φ(w) w Φ(w) w Φ(w) w Φ(w) w Φ(w)0.01 0.503989 0.26 0.602568 0.51 0.694974 0.76 0.776373 1.01 0.8437520.02 0.507978 0.27 0.606420 0.52 0.698468 0.77 0.779350 1.02 0.8461360.03 0.511966 0.28 0.610261 0.53 0.701944 0.78 0.782305 1.03 0.8484950.04 0.515953 0.29 0.614092 0.54 0.705401 0.79 0.785236 1.04 0.8508300.05 0.519939 0.30 0.617911 0.55 0.708840 0.80 0.788145 1.05 0.8531410.06 0.523922 0.31 0.621720 0.56 0.712260 0.81 0.791030 1.06 0.8554280.07 0.527903 0.32 0.625516 0.57 0.715661 0.82 0.793892 1.07 0.8576900.08 0.531881 0.33 0.629300 0.58 0.719043 0.83 0.796731 1.08 0.8599290.09 0.535856 0.34 0.633072 0.59 0.722405 0.84 0.799546 1.09 0.8621430.10 0.539828 0.35 0.636831 0.60 0.725747 0.85 0.802337 1.10 0.8643340.11 0.543795 0.36 0.640576 0.61 0.729069 0.86 0.805105 1.11 0.8665000.12 0.547758 0.37 0.644309 0.62 0.732371 0.87 0.807850 1.12 0.8686430.13 0.551717 0.38 0.648027 0.63 0.735653 0.88 0.810570 1.13 0.8707620.14 0.555670 0.39 0.651732 0.64 0.738914 0.89 0.813267 1.14 0.8728570.15 0.559618 0.40 0.655422 0.65 0.742154 0.90 0.815940 1.15 0.8749280.16 0.563559 0.41 0.659097 0.66 0.745373 0.91 0.818589 1.16 0.8769760.17 0.567495 0.42 0.662757 0.67 0.748571 0.92 0.821214 1.17 0.8790000.18 0.571424 0.43 0.666402 0.68 0.751748 0.93 0.823814 1.18 0.8810000.19 0.575345 0.44 0.670031 0.69 0.754903 0.94 0.826391 1.19 0.8829770.20 0.579260 0.45 0.673645 0.70 0.758036 0.95 0.828944 1.20 0.8849300.21 0.583166 0.46 0.677242 0.71 0.761148 0.96 0.831472 1.21 0.8868610.22 0.587064 0.47 0.680822 0.72 0.764238 0.97 0.833977 1.22 0.8887680.23 0.590954 0.48 0.684386 0.73 0.767305 0.98 0.836457 1.23 0.8906510.24 0.594835 0.49 0.687933 0.74 0.770350 0.99 0.838913 1.24 0.8925120.25 0.598706 0.50 0.691462 0.75 0.773373 1.00 0.841345 1.25 0.894350


w Φ(w) w Φ(w) w Φ(w) w Φ(w) w Φ(w)1.26 0.896165 1.51 0.934478 1.76 0.960796 2.01 0.977784 2.26 0.9880891.27 0.897958 1.52 0.935745 1.77 0.961636 2.02 0.978308 2.27 0.9883961.28 0.899727 1.53 0.936992 1.78 0.962462 2.03 0.978822 2.28 0.9886961.29 0.901475 1.54 0.938220 1.79 0.963273 2.04 0.979325 2.29 0.9889891.30 0.903200 1.55 0.939429 1.80 0.964070 2.05 0.979818 2.30 0.9892761.31 0.904902 1.56 0.940620 1.81 0.964852 2.06 0.980301 2.31 0.9895561.32 0.906582 1.57 0.941792 1.82 0.965620 2.07 0.980774 2.32 0.9898301.33 0.908241 1.58 0.942947 1.83 0.966375 2.08 0.981237 2.33 0.9900971.34 0.909877 1.59 0.944083 1.84 0.967116 2.09 0.981691 2.34 0.9903581.35 0.911492 1.60 0.945201 1.85 0.967843 2.10 0.982136 2.35 0.9906131.36 0.913085 1.61 0.946301 1.86 0.968557 2.11 0.982571 2.36 0.9908631.37 0.914657 1.62 0.947384 1.87 0.969258 2.12 0.982997 2.37 0.9911061.38 0.916207 1.63 0.948449 1.88 0.969946 2.13 0.983414 2.38 0.9913441.39 0.917736 1.64 0.949497 1.89 0.970621 2.14 0.983823 2.39 0.9915761.40 0.919243 1.65 0.950529 1.90 0.971283 2.15 0.984222 2.40 0.9918021.41 0.920730 1.66 0.951543 1.91 0.971933 2.16 0.984614 2.41 0.9920241.42 0.922196 1.67 0.952540 1.92 0.972571 2.17 0.984997 2.42 0.9922401.43 0.923641 1.68 0.953521 1.93 0.973197 2.18 0.985371 2.43 0.9924511.44 0.925066 1.69 0.954486 1.94 0.973810 2.19 0.985738 2.44 0.9926561.45 0.926471 1.70 0.955435 1.95 0.974412 2.20 0.986097 2.45 0.9928571.46 0.927855 1.71 0.956367 1.96 0.975002 2.21 0.986447 2.46 0.9930531.47 0.929219 1.72 0.957284 1.97 0.975581 2.22 0.986791 2.47 0.9932441.48 0.930563 1.73 0.958185 1.98 0.976148 2.23 0.987126 2.48 0.9934311.49 0.931888 1.74 0.959070 1.99 0.976705 2.24 0.987455 2.49 0.9936131.50 0.933193 1.75 0.959941 2.00 0.977250 2.25 0.987776 2.50 0.993790

Practical Mathematics_Real Functions-Complex Analysis-Fourier Transforms-Probabilities_Kaxiras_Harvard

Documents

functions of real variablesin

harmonic functions

additional functions

basic trigonometric

real fourier expansions

unit cos functions

real number x

familiar examples of