Simulation - Caltech High Energy Physics

Chapter 2

Simulation

Frank Porter January 13, 2011

The technology of “Monte Carlo simulation” is an important tool for un-derstanding and working with complicated probability distributions. This tech-nique is widely used in both experiment design and analysis of results.

We introduce the Monte Carlo method as a means of evaluating integralsnumerically. Suppose we wish to evaluate the k-dimensional integral:

I =

∫ 1

0

· · ·∫ 1

0

f(x)dkx, (2.1)

where x is a real vector in k-dimensions. A numerical estimate of this integralmay be formed according to:

IN =1

Nk

N∑ν1=1

· · ·N∑

νk=1

f(xν), (2.2)

where ν labels a vector xν = (xν1 , . . . , xνk) with components given by:

xνi = νi/N. (2.3)

That is, we divide our k-dimensional unit hypercube integration region intoNk equal pieces, evaluate f(x) at a point in each of these pieces, and take theaverage over all the pieces. As long as things are reasonably well-behaved, thiswill converge to the true value of the integral as we take more pieces:

I = limN→∞

IN . (2.4)

A variation on this is to randomly select N points in the hypercube, andaverage the values of f(x) over all these points. In this method, we draw Nrandom vectors r(1), r(2), . . . , r(N) from a distribution:

p(r) ={

1 r ∈ {0 ≤ ri ≤ 1|i = 1, 2, . . . , k},0 otherwise.

(2.5)

21

22 CHAPTER 2. SIMULATION

We estimate our integral with

I ′N =1

N

N∑ν=1

f(r(ν)). (2.6)

Again, if the integral is well-behaved, I ′N will converge on the true value of I inthe limit N →∞.

There is no reason to expect that the evaluation with randomly-selectedpoints will provide a better estimate in general for the same number of functionevaluations. However, it does provide a benefit: Each of the N samplings isindependent of the others – we don’t need to plan very hard what to choose forN , since taking additional samples is straightforward.

Let us re-express the integral in the form of an expectation value:

I ′′ = I/VR =

∫R

f(x)p(x) dkx, (2.7)

= 〈f〉, (2.8)

where R is the desired region of integration, VR =∫Rdkx, and p(x) is a uni-

form sampling distribution over x ∈ R. Our approximation is thus obtainedby sampling N times from p, obtaining x(1), . . . , x(N) and forming the sampleaverage

I ′N =1

N

N∑i=1

f(x(i)). (2.9)

Our estimate is unbiased:I ′′ = 〈I ′N 〉, (2.10)

and consistent:I ′′ = lim

N→∞I ′N . (2.11)

The variance of our estimator is

Var(I ′N ) =1

NVar(f). (2.12)

It is interesting to notice that the equal space estimate, IN , is typically biased,although it is consistent.

Often, we are interested in more than the evaluation of a simple integral.There may be several integrals of interest, and we may not even know at theoutset which integrals will be of greatest interest. This leads to the “simulation”aspect of the Monte Carlo method. To be more concrete, let us think in thecontext of an “experiment”. We regard an experiment as a sampling of variablesdistributed according to some differential equations (or possibly discrete equa-tions). These differential equations describe our sampling probability densityfunction. We may model our experiment with a set of differential equations,and numerically sample from the corresponding PDFs. To get some idea of howthings behave according to our model, we may sample, or simulate, an “exper-iment” from the model many times. The problem we must address is how togenerate numbers as if sampled from some specified PDF.

2.1. GETTING A “RANDOM NUMBER” 23

2.1 Getting a “Random Number”

In order to carry out the simulation, we need a source of “random numbers”distributed according to the desired probability distribution. We shall see thatif we have random numbers generated according to some distribution, we mayperform a transformation to generate the desired distribution. We usually don’thave to worry too much about the generation of some set of random numbers,as excellent methods are already available, and we can concentrate on achievingthe desired transformation. Thus, this section will be short, but it is useful tounderstand the ideas, because there are pitfalls lurking.

The generation of random numbers is a familiar sight in our everyday world.Every referee who tosses a coin is generating a (one-bit) random number. Therandomness is ensured by the enormous sensitivity of the end result on theinitial conditions – there is no practical means of predicting the outcome, andthe outcomes of subsequent tosses are completely independent. Many gamesrely on this feature. A familiar random number generator is the cage and ballssystem used in the lottery, again justified by the impracticality of predicting theoutcome and the independence of successive twirls (assuming balls are replaced).

There is a vast literature on the generation of uniformly-distributed (pseudo)-random numbers. In computer simulations, we nearly always make use of pseu-do-random sequences, which are deterministic, yet have many properties as ifthey were truly random.

To get the flavor, here is a very simple such generator, written in R (the“%%” operator takes the modulus of the first argument with respect to thesecond argument). It is intended to be called multiple times with a fixed choicefor p and with an initial value for k on the first call, with each succeeding callmade with the previous value for k. Depending on the parameters f, p, and“seed” k, it may repeat after only a few numbers or after many.

simpRand <- function(k,p) {twotop = 2^pf = 5k = (f*k) %% twotopk = (f*k) %% twotopk = (f*k) %% twotopreturn(k)}

Figure 2.1 shows the result of generating pairs of numbers with this code.The plot on the left is for k = 117, a prime number, while the plot on the rightis for k = 100. Evidently k = 100 is a dangerous choice, as it produces somepairs with highly correlated values.

2.2 An Example

It is helpful to develop the ideas with a specific more-or-less realistic example.Thus, suppose we are interested in computing the efficiency for detecting K0

S

mesons (“kaons”) produced in e+e− collisions. The K0S isn’t detected directly,


0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

x

y

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

x

yFigure 2.1: An illustration of the simple random number generator, showing1000 successive pairs of numbers generated for parameters p =16, f=5; Left: k= 117, Right: k = 100

eK

π

+ −

+

−

s0

π

e

Figure 2.2: Schematic of the K0S production and decay.

but through its decay to two pions, K0S → π+π−. Figure 2.2 illustrates the

idea. The trajectories of the two pions are measured in some tracking device,such as a drift chamber, and the kaons are reconstructed by combining thesemeasurements.

To compute the efficiency, we need to know something about the detector,and something about the production of the kaons. We’ll make some very simpleassumptions here for illustration: Assume that the tracking is 100% efficient forpions in the central 90% of the solid angle at a transverse (to the e+e− axis)radius of 50 cm, and 0% efficient outside this region. Assume that a selectioncut is placed on the kaon decay vertex such that only those decays occurringbetween 0.1 and 10 cm from the center of the detector (i.e., from the e+e−

nominal collision point) are accepted.

For the production model, suppose that the kaons are produced with a 1 +cos2 θ angular distribution and a momentum distribution that is uniform from0 to 0.1 GeV, and then falls as p−n, with, say, n = 2, up to a maximum

2.2. AN EXAMPLE 25

momentum of 10 GeV [we work in c = 1 units for convenience]. Approximatethe kaon lifetime as τ = 2.7 cm in its rest frame. The kaon is a spin-zero particle,so the decay to two pions must be isotropic in the kaon rest frame.

The three-momentum of a charged particle such as a pion is obtained byarranging that the tracking device resides in a uniform magnetic field. If themagnetic field is in the z direction, then the trajectory has a curvature in thex− y plane related to the momentum by

p cosλ = 0.3QBρ, (2.13)

where p is the magnitude of the momentum in GeV, λ ≡ θ − π/2 is the “dip”angle giving the signed angle out of the x − y plane, Q is the particle’s chargein proton charge units, B is the magnetic field in Tesla, and ρ is the radiusof curvature of the trajectory in meters. For typical arrangements a reasonableapproximation is that the measurement of a track is normally and independentlydistributed in curvature k = 1/ρ, azimuth angle, and tangent of the dip angle.

Using the independence of several of the quantities, we can break our com-putation into several steps:

1. Generate an “event”:

(a) Generate the angle and momentum of the kaon. Usually, we wouldneed to generate the azimuth angle as well as the polar angle, but inthis case everything is azimuthally symmetric, so we needn’t do this.Thus, we want to sample from the PDF:

fK(p, cos θ) = Afp(p)(1 + cos2 θ), 0 ≤ p ≤ 10, | cos θ ≤ 1|, (2.14)

where

fp(p) =

1 0 ≤ p ≤ 0.1 GeV,(

0.1p

)20.1 ≤ p ≤ 10 GeV,

0 p > 10 GeV,

(2.15)

and A is determined by∫ 10

0

dp

∫ 1

−1d cos θfK(p, cos θ) = 1. (2.16)

The kaon three-momentum is then pK = (p sin θ, 0, p cos θ), takingthe azimuth angle to be φ = 0.

(b) Generate the decay K → π+π− in the kaon rest frame (isotropicdistribution), then Lorentz transform the result to the lab frame.Here, we sample from the uniform angular distribution:

f+(cos θ∗+, φ∗+) =

1

4π, | cos θ∗+| ≤ 1, 0 ≤ φ∗+ < 2π, (2.17)

where the ∗ superscript denotes the kaon rest frame, and the + sub-script indicates that these are taken to be the angles for the π+.


Since we are in the π+π− rest frame, the angles for the π− are thenconstrained to be θ∗− = π−θ∗+ and φ∗− = π+φ∗+. Knowing the angles,we compute the pion four-vectors in the kaon rest frame accordingto:

E∗± = mK/2, (2.18)

p∗x± = p∗± sin θ∗± cosφ∗±, (2.19)

p∗y± = p∗± sin θ∗± sinφ∗±, (2.20)

p∗z± = p∗± cos θ∗±, (2.21)

where p∗± =√E∗2± −m2

π, mπ = 139.6 MeV is the π± mass, and

mK = 497.6 MeV is the K0 mass. The Lorentz transformation tothe lab frame is then:

E±px±py±pz±

= γ

1 −vx −vy −vz−vx 1

γ + γrv2x γrvxvy γrvxvz

−vy γrvyvx1γ + γrv

2y γrvyvz

−vz γrvzvx γrvzvy1γ + γrv

2z

E∗±p∗x±p∗y±p∗z±

,

(2.22)where v = −pK/EK , γ = 1/

√1− v2, and γr = γ/(1 + γ).

(c) Generate the decay length of the kaon, according to its momentumand the exponential decay law. In this step we sample from theexponential distribution:

fτ (`) =1

µe−`/µ, (2.23)

where µ = pmK

τ .

2. Generate the “detector response” to the event:

(a) Reject the event if the vertex is not in the fiducial volume. That is,the decay length should be between 0.1 and 10 cm.

(b) Reject the event if either pion trajectory (“track”) is outside thefiducial volume. Here, we’ll take this to mean that both the π+ andthe π− should have | cos θ| < 0.9, where θ is the polar angle withrespect to the beam line. [Note that I am defining θ here to be thepolar angle of the momentum vector; in practice we would be morecareful to think about the actual trajectory of the pions.] In principle,the generated angle rather than the measured angle should be usedhere, but it doesn’t matter too much in this case, so you may useeither.

(c) Generate the measured momenta and directions according to the as-sumed resolution functions. That is, we generate directions and cur-vatures according to:

f±(km±, λm±, φm±) =1

(2π)3/2σkσλσφ(2.24)

2.3. INVERSE TRANSFORM METHOD 27

× exp

{−1

2

[(km± − k±

σk

)2

+

(λm± − λ±

σλ

)2

+

(φm± − φ±

σφ

)2]}

,

where the m subscript refer to the results of the “measurements”, andthe σ quantities are the standard deviations for the measurementsindicated by the subscripts.

3. Perform the event “selection”. For example, reject the event if the invari-ant mass of the two simulated pions is “inconsistent” with the kaon mass.Here, we’ll take our criterion to be: The invariant mass of the two-pionsystem must be within 0.01 GeV of the kaon mass. Note that the invari-ant mass, m, is computed according to (quantities are as measured in thedetector):

m =√E2 − p2x − p2y − p2z,

where the energy and momentum components are the sums of the π±

values (all in the lab frame):

E = E+ + E−

px = px+ + px−

py = py+ + py−

pz = pz+ + pz−.

We repeat this procedure many times, generating and analyzing many events.The detection efficiency is the ratio of the number of events which survive theselections to the number generated.

In the following sections we’ll investigate several methods that may be usedto carry out this calculation as well as more complicated problems.

2.3 Inverse Transform Method

Consider a one-dimensional problem in which we wish to simulate a process withPDF f(x), x ∈ (a, b) (assume a < b). We are given a uniform pseudo-randomnumber generator:

p(u) ={

1 u ∈ (0, 1)0 otherwise.

(2.25)

We must find a transformation from random variable U to X such that X isdrawn from f(x). Suppose X = T (U) is that transformation. We must have:

f(x)dx = f [T (u)]

∣∣∣∣dxdu∣∣∣∣ du = p(u)du. (2.26)

Therefore,

f [T (u)]

∣∣∣∣dxdu∣∣∣∣ = p(u), (2.27)


Inverse Transform Method

0

0.2

0.4

0.6

0.8

1

1.2

2 3 4 5 6 7 8

x

F(x) u

x=F (u)-1

Figure 2.3: Illustration for the inverse transfom method.

or, with x = T (u),

f [T (u)] |dT (u)| = p(u)du. (2.28)

Choosing endpoint T (0) = a, dT/du ≥ 0, and integrating,∫ T (u)

T (0)

f [T (u′)]dT (u′) =

∫ u

0

p(u′)du′ = u. (2.29)

Let

F (x) =

∫ x

a

f(x′)dx′ (2.30)

be the desired cumulative sampling distribution. Then F [T (u)] = u, or

x = T (u) = F−1(u). (2.31)

That is, x will have the desired distribution if we solve for x in∫ x

a

f(x′)dx′ = u. (2.32)

This is sensible intuitively: We sample those regions most where F (x) is chang-ing rapidly, that is where f(x) is largest, see Figure 2.3. Because x = F−1(u),we call this the “inverse transform method”.

2.3. INVERSE TRANSFORM METHOD 29

2.3.1 Example

Suppose we wish to generate random numbers according to N(0, 1), using oursource of uniform random numbers, u, on (0, 1). Our prescription says to solvefor x in:

u =

∫ x

−∞

1√2πe−x

′2/2dx′. (2.33)

This looks hard, but there is a trick – consider a two-dimensional distribution,and transform to polar coordinates:

p(x, y)dxdy = f(x)f(y)dxdy (2.34)

= re−r2/2dr

dφ

2π(2.35)

= g(r)h(φ)drdφ. (2.36)

Now generate r and φ according to g(r) and h(φ), and transform the result tox and y obtaining two normally-distributed random numbers.

That is, sample two uniform random numbers u and v on (0, 1) and let

u =

∫ ∞r

re−r2/2dr = e−

√2r,

where we have used for convenience

u = 1−∫ x

0

f(x′)dx′. (2.37)

Thus, r =√−2 lnu. Similarly, we have

φ = v ∗ 2π

. Then x = r cosφ and y = r sinφ are each independent samplings from thedesired N(0, 1) distribution.

2.3.2 Discrete Distribution

The inverse transform method may also be applied to discrete distributions. Letp(n), n = 0, 1, 2, . . . be a discrete probability distribution. We may sample fromthis distribution, given a uniform sampling u on (0, 1) by solving for i in:

i−1∑n=0

p(n) < u ≤i∑

n=0

p(n). (2.38)

As with continuous distributions, there are tricks, see for example, Exercise 2.5.


b(1−r)frf

f

a

f(x)

x

Figure 2.4: Illustration of the decomposition of a PDF into the weighted sumof two PDFs.

2.4 Composition Method

If the distribution we wish to generate is in the form of a sum of terms, it maybe easier, or save computation time, to break it up into pieces.

For example, suppose we can decompose the desired PDF f(x) as:

f(x) = rfa(x) + (1− r)fb(x),

where fa and fb are themselves normalized PDFs (0 ≤ r ≤ 1). Figure 2.4provides an illustration of such a decomposition.

Then we can generate a sampling from f by:

1. Generate two uniform randoms u1 and u2.

2. If u1 < r let x = F−1a (u2).If u1 ≥ r let x = F−1b (u2).

Let’s try an example. Suppose we wish to generate the 1 + cos2 θ angulardistribution in our kaon detection efficiency problem, where 0 ≤ θ ≤ π.

We may break this up into the sum of two distributions:

f(x = cos θ) =3

4· 1

2+

1

4· 3

2x2,

where we have inserted the appropriate normalizations.The resulting algorithm is:

1. Generate uniform randoms u1 and u2.

2. If u1 < 3/4, let x = 2u2 − 1. Otherwise, let x = (2u2 − 1)1/3.

Figure 2.5 illustrates this decompostion.

2.5. ACCEPTANCE-REJECTION METHOD 31

f[1:jf]

Fre

quen

cy

−1.0 0.0 0.5 1.0

020

040

060

080

0

q[1:jq]

−1.0 0.0 0.5 1.0

010

020

030

040

050

060

0

x

−1.0 0.0 0.5 1.0

020

040

060

0

Figure 2.5: Example of the composition method. A total of 10,000 sampleshave been generated. Left: the uniform component; Middle: the quadraticcomponent; Right: the full distribution.

2.5 Acceptance-Rejection Method

It may be that computing the inverse transform, even on a decomposed dis-tribution, is intractable. A powerful (but potentially inefficient) method is the“acceptance-rejection method”. This method can be used even if we can’t read-ily compute the normalization of f(x). That is, we may only know the functionalform Af(x), where A is unknown. Here, the algorithm is:

1. Find numbers a < b such that f(x) = 0 whenever x /∈ (a, b). (Best to findlargest such a and smallest such b).

2. Find maximum of Af(x), or at least some number greater than the max-imum, and define a number c such that:

c

b− a≥ Af(x), ∀x

3. Sample an x uniformly on (a, b).

4. Sample a y uniformly on (0, c/(b− a)).

Note that the (x, y) pairs so generated populate a two-dimensional boxuniformly.


a bx

Af(x)

b−ac

a bx

Af(x)

b−ac

y

Figure 2.6: Illustration of the acceptance-rejection method.

5. Evaluate Af(x).

(a) If y ≤ Af(x), accept x.

(b) If y > Af(x), reject x, go back to step 3.

Notice why this works: From the set of points (x, y) uniformly distributedover the box, we accept only those which lie under the Af(x) curve, as illustratedin Fig. 2.6.

The fraction of trial points accepted is:

ε =Area under Af

Area of box(2.39)

=A

c. (2.40)

While this method is very easy to apply, it may be very inefficient, as inFig. 2.7 even if we work hard to make the box as small as possible. It alsodoesn’t work for distributions with unbounded support.

2.6 Importance Sampling

At the cost of complexity, we may mitigate the efficiency problem in acceptance-rejection sampling by modifying the simple box distribution to one which morenearly approximates the desired distribution. Of course, whatever shape weapproximate the distribution with should be tractable in terms of sampling.The procedure is:

1. Given the problem of sampling from f(x), we find some suitable PDF h(x)such that ch(x) ≥ Af(x) for all x (c > 1 if A = 1).

2.7. MONTE CARLO SIMULATION: SUMMARY 33

a b

y

x

cb−a

Af(x)

Figure 2.7: The acceptance-rejection method may be very inefficient.

2. Choose an x according to h(x), and a y uniform on (0, ch(x)).

3. If y > Af(x), reject x and try again, otherwise, accept x.

This is the method of “Importance Sampling”. Figure 2.8 illustrates the idea,where a Cauchy distribution is used to approximate a normal distribution. Notethat this method may be applied even to distributions with unbounded support,as in this example.

The method of importance sampling is embodied in the theorem:

Theorem 2.1 (Von Neumann) Represent PDF f(x) as:

f(x) = Cg(x)h(x),

where C ≥ 1 is a constant, h(x) is a PDF (chosen for convenience of generationof random numbers according to h(x)), and g(x) is the “correction function”relating Ch(x) and f(x), with the constraint Ch(x) ≥ f(x).Sample a value y uniformly on (0, 1), and an x according to h(x).Then the distribution of x, under the requirement y ≤ g(x), is just f(x).The “efficiency” (fraction of trials accepted) is 1/C.

The proof is left as an exercise, but the essential idea should already be clearfrom our discussion above.

The distribution of n, the number of failures before a successful trial is simply(letting ε be the efficiency):

p(n) = (1− ε)nε, n = 0, 1, 2, . . . .

This is the geometric distribution, shown in Fig. 2.9 for different values of ε.

2.7 Monte Carlo Simulation: Summary

We have described four basic techniques of MC simulation:


Importance Sampling -- Normal with Cauchy

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 2 4 6 8 10

x

Af(x) or ch(x)

Normal A=1

Cauchy c=21

Figure 2.8: Example of importance sampling.

Geometric Distribution

00.10.20.30.40.50.60.70.80.91

0 2 4 6 8 10

n

p(n) e=0.1

e=0.5

e=0.9

Figure 2.9: The geometric distribution.

2.8. EXERCISES 35

1. Inverse transform method.

2. Composition method.

3. Acceptance-rejection.

4. Importance sampling.

All are important tools in the simulator’s bag of tricks.

2.8 Exercises

Exercise 2.1 Complete the example of the inverse transform method for thenormal distribution.

Exercise 2.2 Apply the inverse transform method to the “linear” pdf:

f(x) = |x|, (2.41)

for x ∈ (−1, 1).

Exercise 2.3 Apply the inverse transform method to the Cauchy (Breit-Wigner)distribution.

Exercise 2.4 Write a routine to generate Poisson-distributed numbers accord-ing to the inverse transform method. Generate 1000 samples for a Poisson witha mean of five, and plot the sample frequency distribution.

Exercise 2.5 Make sense of the following C-code fragment as a Poisson numbergenerator:

int poisson(double mean) {/* Assume rand() is a source of uniform *//* random numbers on (0,1] */double sum, rand();int n;for(n=0, sum=-log(rand());

/* terminates for loop when false */sum<mean;sum-=log(rand()), n++) {}

return(n);}

Exercise 2.6 In section 2.4 we showed how to generate a 1+cos2 θ distributionwith the composition method. Do this angular distribution by the (brute force)inverse transform method, and contrast your result with the composition method.

Exercise 2.7 Prove Von Neumann’s theorem. Hint: You want to show thatP (x|y ≤ g(x)) = f(x). Use Bayes’ theorem.


Exercise 2.8 Recall the geometric series:

∞∑n=0

εn =1

1− ε,

and check that, for the geometric distribution:

∞∑n=0

p(n) = 1.

Further, show that:∞∑n=0

np(n) =1

ε− 1.

Thus, the mean number of failures before an accepted trial is just C−1, withC defined as in Von Neumann’s theorem.

Simulation - Caltech High Energy Physics

Documents