Probability Theory in Molecular Simulations Extended ...€¦ · 4. Probability Theory 4.0.1 Deterministic and stochastic processes ...

Introduction to Molecular Simulation Chapter 4. Probability Theory

© Saman Alavi Page 1

The University of Ottawa

CHM 4390c/8309f,g - Introduction to Molecular Simulation

Probability Theory in Molecular Simulations Extended Lecture Notes 4



4. Probability Theory

4.0.1 Deterministic and stochastic processes

Laws of physics and chemistry allow the behavior of some systems to be predicted

with high accuracy. For example, if the initial position and velocity of a satellite are

known, the laws of mechanics allow the prediction of positions and velocities at all times

in the future. For such cases, the laws of classical mechanics are said to be deterministic.

In stochastic (Greek: stochastikos “skillful in aiming”) systems, future states can at

best be predicted with limited accuracy. Different factors cause a system to become

stochastic. In principle, the laws governing the behavior of the system may be known, but

uncertainty in the initial conditions may prevent the prediction of its future state with

sufficient accuracy. This is the case in games of chance which use dice or roulette

wheels. If we know the mass and shape of dice, the initial velocity with which they are

thrown, the air temperature and flow at the time the throw, the hardness of the surface on

which the dice bounce, etc., we will be able to predict the faces of the dice which come

up. The difficulty of this analysis and the sensitive dependence of the outcome of a throw

on slight uncontrollable variations in the conditions of the experiment usually prevent us

from predicting the outcome of a throw with certainty.

Quantum mechanical systems, on the other hand, are inherently stochastic in

predicting mechanical variables. Heisenberg’s uncertainty principle states that there is an

intrinsic limit to the accuracy with which initial mechanical variables can be determined

for atomic-scale particles. As a result, future states, as characterized by mechanical

variables position and momentum, are in principle, unknowable (or undefined) with

complete certainty. Quantum mechanics is deterministic in predicting the wave function

of a system at a time in the future from knowledge of the present wave function of a

system, however, that does not translate into determinism of the mechanical variables.

Concepts of probability theory are used to describe the extent to which predictions

can be made on the behavior of stochastic systems. Assigning probabilities to possible

outcomes of an event, through knowledge of the probability distribution, are the best that

can be done in these cases.

Probabilities can describe the outcomes of discrete (countable) events like the flip

of a coin, the role of dice, the spin of a roulette wheel, or the spin state of an electron. The

“event space” in these cases is a limited set of possible outcomes. Probabilities can also

be used to describe the outcome of events associated with continuous variables. For

example, if we are predicting the speed of a particular molecule in a gas sample or the

height of a person on a college campus, the event space varies continuously.

4.0.2 Probability theory in statistical mechanics

Statistical mechanics provides another context in which probability theory is used.

Statistical mechanics gives probability distributions which relate averages of microscopic

variables to macroscopic observables, taking into account the type of interactions of the

system and the surroundings. These probability distributions are derived to give results

that are consistent with the macroscopic laws of thermodynamics.

From a macroscopic point of view, we often do not need the level of detailed

knowledge of a system that a full microscopic deterministic description provides. For

example, a classical molecular dynamics simulation can determine the positions and



velocities of molecules at all times in the future from knowledge of their present state.

However, this knowledge is too detailed and often we summarize this information as

probability distributions for velocities or spatial distributions of molecules as a way of

capturing system-wide properties. In molecular dynamics, the calculated probability

distributions of mechanical variables must be consistent with the properties of the

probability distribution required by statistical mechanics. As we shall see in future

chapters, this provides guidance on how to couple the results of mechanical variables in

molecular dynamics simulations to environmental variables.

After a short introduction to concepts of probability theory and the notation used,

the connection between averages of probability distributions with macroscopic quantities

will be demonstrated. We will see how well-defined macroscopic variables arise from

microscopic descriptions of systems with a sufficiently large number of molecules.

4.1. Single variable probability distributions

4.1.1. Discrete stochastic variables

Stochastic process that have a finite “event space” are described by possible

outcomes, ε(1), ε(2), …, ε(ν), and their associated discrete probabilities p(1), p(2), …,

p(ν). The event space and associated probabilities constitute the probability distribution,

)()3()2()1(

)()3()2()1(

pppp

. (4.1)

The numerical values chosen to represent each event ε(i) and the probabilities assigned to

each event depend on intrinsic features of the system. The values of individual

probabilities are zero or positive numbers and are often scaled such that probability

distribution is normalized, i.e., the sum of all probabilities is one,

11

i

ip . (4.2)

Discrete probability distributions are encountered in games of chance, but many

natural phenomena also have discrete event spaces. Examples of discrete probabilities

are,

In flipping a coin, the two possible events are the scoring of a heads or tails. A

numerical value assigned for a heads toss is ε(1) = 1 and tails toss is ε(2) = –1 . If

the coin is fair we assign p(1) = p(2) = ½.

In the role of a die, there are six possible events. The role of a 1 assigned is ε(1) =

1, role of a 2 is assigned ε(2) = 2 and so on up to ε(6) = 6. For a fair die, we assign

p(1) = p(2) = … = p(6) = 1/6.

One-dimensional random walk, representing diffusion of a particle in a medium, is

composed of individual steps of motion of the particle to the left ε(1) = –1 or to the

right ε(2) = +1. For an unbiased random walk, p(1) = p(2) = ½. In a biased walk

(for example, the diffusion of an ion in an electric field in electrophoresis), the

probabilities of motion to the left and right are not identical and p(1) p(2). A

random walk with n steps is described by the binomial distribution.

In bingo, for the first draw, drawing a 1 represented by ε(1) = 1, drawing a 2

represented by ε(2) = 2, …, drawing a 75 represented by ε(75) = 75. If all the bingo

balls are identical, p(1) = p(2) = … = p(75) = 1/75. The probabilities for drawing



subsequent numbers change. After drawing the first ball, the event space will

become smaller and have 74 members.

Probability distributions are characterized by a number of mathematic properties.

The probability distribution gives the average (mean, expectation value) associated with

observations of an event,

)(...)2()1(

)()(...)2()2()1()1(

ppp

ppp

. (4.3)

For example, the average or expectation value from the role of a die is

5.36543216

1

6

1

6

1

6

1

6

1

6

1 . Note that the average value, ε for a

probability distribution may not correspond to an actual outcome of any individual event

ε(i). Calculating an expectation value is analogous to determining of a center of mass of a

collection of objects located at ri and with mass mi,

mmm

mmmcm

...

...

21

2211 rrrr . (4.4)

where probabilities play the role of statistical weight of the ε(i) in the distribution.

A probability distribution can be calculated for any mathematical function of the

event space, f[ε(i)],

)(...)2()1(

)()]([...)2()]2([)1()]1([)()(

ppp

pfpfpfff

. (4.5)

For a normalized distribution, Eq.(4.5) is written more compactly as,

)(])([)(1

ipiffi

. If f[ε(i)] = ε(i)m, the resulting average is called the m

th moment

of the distribution,

1

)()(i

mmm ipi . (4.6)

For example with m = 2 for a die we have, ε2 = 15.617. The m

th central moment of a

distribution is the average of the mth

power of the difference of each value from the

average,

1

)()()(i

mmmipi . (4.7)

By definition of the average, the first central moment of any distribution is zero. The

second central moment is called the variance, the square root of which is the standard

deviation, usually shown as σ. The standard deviation is a measure of how much on

average any measurement will differ from the average of the distribution. For example,

for a die, (ε2–ε)

2 = 2.9167 and σ = 1.7078 which means that the average outcome of a

role a die will differ by 1.7078 from the expectation value of 3.5. The standard deviation

is a measure of the width or spread of a probability distribution.

4.1.2. Continuous stochastic variables

Some stochastic processes have a continuous range of outcomes (events)

represented by the variable x. In these cases, the continuous events are categorized by a

continuously varying range of probabilities p(x). The continuous variable x can have



values within a range [xmin, xmax] which can be infinite, (–∞,+∞), semi-infinite, such as

[0,+∞), or bound such as [0,1].

The event x is represented as a point on the real number line between the lower and

upper limits. For continuous variables, there are an infinite number of x values in a range

[xmin,xmax] and instead of determining probabilities associated with observing the exact

value of x, the probability of observing the variable over a narrow range between x and

x+dx, shown as p(x)dx is used. The probability of obtaining the exact value of x is

vanishingly small which is why a small range dx around each value of x is considered to

determine a finite probability. Similar to discrete probabilities, the probabilities

associated with continuous variables must also be zero or positive over the range of x.

The probabilities for continuous variables are often scaled to give a normalized

distribution, meaning the integrated probability distribution over the range of the variable

is equals 1,

1)(max

min

x

xdxxp . (4.8)

The average or mean of a function of the variable x, f(x), is defined over the same

range as,

max

min)()()()(

x

xdxxpxfxfxf . (4.9)

The simplest case is the average value of the variable x itself,

max

min

)(x

xdxxxpxx . (4.10)

If f(x) = xm, the resulting average is called the m

th moment of the distribution,

max

min)(

x

xmmm dxxpxxx . (4.11)

The mth

central moment of a distribution is defined as,

max

min)()( x

x

mmmdxxpxxxxxx . (4.12)

For m = 2, the value (x–x)2 is called the variance of the distribution and

2xx is the standard deviation.

A widely encountered continuous distribution is the Gaussian (“normal” or bell

curve) distribution function,

])(exp[)( 20xxxp

, (4.13)

which is shown in Figure 4.1. In the Gaussian distribution, the range of the variable x is

between –∞ to +∞ and the parameter α characterizes the width of the distribution. The

normalization of the Gaussian distribution function in Eq. (4.13) is proven in Appendix

4.1. For the Gaussian distribution, in Appendix 4.1 we show that x = x0 and (x – x)2

= 1/2α = σ2. It is convenient to write the Gaussian distribution function in terms of the

standard deviation,

22

0 2/)(

22

1)(

xxexp

. (4.14)



Figure 4.1. The Gaussian distribution

function with three values of the α-

parameter. The average value of the

distribution remains constant, but the

width (and therefore standard deviation) of

the distribution increases for smaller α

values.

4.2. Multivariable Distributions: Independent Variables and Convolution

The behavior of many systems depends on more than one stochastic variable. To

characterize the stochastic behavior of the system as a whole, behavior of the individual

stochastic variables must be known. Before giving the general description of how this is

achieved, consider the familiar example of predicting the total value of the role of two

dice.

When two die, labelled I and II, are rolled, the total value, E2, is a number ranging

between 2 to 12, which depends on the role value from each of the individual die, I(i)

and II(i). There are 36 possible outcomes for the role of the two dice which are

summarized in Table 4.1. In anticipation of future use, the E2 is called the macrostate of

the process, and the combination of the states of the individual die (I,II) the

corresponding microstate of the event. As seen in Table 4.1, different microstates can

lead to the same macrostate. The number of microstates associated with the macrostate is

called the “degeneracy”, Ω(E2), of that macrostate.

Table 4.1. The 36 possible outcomes (microstates) for the role of two die and the ten

corresponding overall outcomes (macrostates).

Values of individual die,

(εI,εII) (microstates)

Overall outcome,

E2 (macrostate)

Degeneracy of

macrostate, Ω(E2)

(1,1)

(1,2) (2,1)

(1,3) (3,1) (2,2)

(1,4) (4,1) (2,3) (3,2)

(1,5) (5,1) (2,4) (4,2) (3,3)

(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)

(2,6) (6,2) (3,5) (3,6) (4,4)

(3,6) (6,3) (4,5) (5,4)

(4,6) (6,4) (5,5)

(5,6) (6,5)

(6,6)

2

3

4

5

6

7

8

9

10

11

12

1

2

3

4

5

6

5

4

3

2

1

If the dice are fair, all 36 microstates are equally probable. However, the ten

macrostates are not equally probable and those having a greater number of associated



microstates (degeneracies) occur with greater probability. For example, rolling a

macrostate E2 = 6 (with five associated microstates) is five time more probable than

rolling a macrostate E2 = 12 (with one associated microstate).

The probability for the two-dice macrostate is written as P2(E2), where the subscript

2 denotes that two stochastic variables determine the probability. We can similarly study

macrostates EN and the corresponding probability distributions, PN(EN) for roles of N dice

which will depend on event probabilities for each of the individual N dice, I(i), II(i), …,

N(i). If the N dice are fair, the probability distributions α(i) will be identical. The

probabilities p1(ε) to P5(E) for the role of 1 to 5 dice are given in Table 4.2 and plotted in

Figure 4.2.

Figure 4.2. The normalized probability distributions for outcomes of the roles of one die (black),

two (red), three (green), four (blue), and five (orange) dice. In accordance with the central limit

theorem, as the number of die increase, the probability distribution of the macrostate variable X

approaches the Gaussian form.



Table 4.2. The value of the role of dice and corresponding probabilities for 1 to 5 dice.

Total

role, X

Probability Total

role, X

Probability Total

role, X

Probability Total

role, X

Probability

One die Three dice Four dice Five dice

1

2

3

4

5

6

1/6

1/6

1/6

1/6

1/6

1/6

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

1/216

3/216

6/216

10/216

15/216

21/216

25/216

27/216

27/216

25/216

21/216

15/216

10/216

6/216

3/216

1/216

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

1/1296

4/1296

10/1296

20/1296

35/1296

56/1296

80/1296

104/1296

125/1296

140/1296

146/1296

140/1296

125/1296

104/1296

80/1296

56/1296

35/1296

20/1296

10/1296

4/1296

1/1296

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1/7776

5/7776

15/7776

35/7776

70/7776

126/7776

205/7776

305/7776

420/7776

540/7776

651/7776

735/7776

780/7776

780/7776

735/7776

651/7776

540/7776

420/7776

305/7776

205/7776

126/7776

70/7776

35/7776

15/7776

5/7776

1/7776

Two dice

2

3

4

5

6

7

8

9

10

11

12

1/36

2/36

3/36

4/36

5/36

6/36

5/36

4/36

3/36

2/36

1/36

For stochastic systems where a continuous macrostate variable, X, depends on a

combination of N stochastic continuous variables xI, xII, …, xN, the probability

distribution (probability density), PN(X) gives the probability PN(X)dX of observing the

macrostate variable in the range of X to X+dX with the subscript N denoting a probability

that is dependent on N “microscopic” stochastic variables. In the general case, stochastic

variables xI, xII, …, xN can have different normalized probability distributions, f1(xI),

g1(xII), …, h1(xN), where the subscript 1 designates the one-variable probability

distributions. If the N stochastic variables are independent of one another and the

probability distribution each variable is not affected by the value of other variables, the

probability of the macrostate variable X is,

'

,..., 111 )()()()(NIII xxx NIIIN xhxgxfXP . (4.15)

The prime on the summation indicates that the sum only applies to combinations of the xI,

xII, …, xN which give the specific value of the macrostate X. The simplest case of Eq.



(4.15) is when the distribution functions of all N variables are identical and f1(xI) = f1(xII)

= f1(xIII) = … = f1(xN).

A simple but important case is when macrostate variable X which is the sum of the

N variables, X = xI + xII + …+ xN. The total kinetic energy of an ideal gas system of N

molecules, EN is a physical example of this case. If each molecule has an energy i, then

EN = εI + εII + …+ εN.

If the variables xI , …, xN vary between min

Ix to maxIx ,…, min

Nx to maxNx , the range of

the sum macrostate variable X lies between minmin

II

min

I

min

NxxxX and maxmax

II

max

I

max

NxxxX . Similar to Eq. (4.9), the average of a function of the

macrostate sum variable X is,

max

min)()()()(

X

X N dXXPXfXfXf . (4.16)

For the average of the macrostate variable, f(X) = X,

max

min)(

X

X N dXXXPXX . (4.17)

This average is related to the distribution functions of the individual variables, f1(xα).

Substituting Eq. (4.15) into Eq. (4.17) gives,

max

min

max

min 11 )()(...Ix

Ix

Nx

Nx

NININI dxdxxhxfxxX . (4.18)

By expanding the integral and separating variables, the average of the macrostate variable

X simplifies to,

N

i

iNIII xxxxX1

. (4.19)

Similarly, the variance of the macrostate variable X is the sum of the variance of the

individual variables xi,

N

i

i

j

jj

i

ii

NIIINIIIN

xxxxx

xxxxxxXX

1

2

1

222

)(

(4.20)

The term in the second row is obtained by rearranging the terms after the squaring.

For a N-variable system where all variables xi have uniform distributions,

111 hgf , the average and variance of the sum variable reduce to simple forms,

)()( 21

2IN

I

xNX

xNX

(4.21)

The relation in Eq. (4.21(b)) is used as the definition to determine whether variables xI,

xII, …, xN are uncorrelated. The relations obtained in Eqs. (4.21), while simple, are

extremely important. For example, we have,

NxN

N

X I

N 11

. (4.22)

Equation (4.22) shows that relative to the mean of the macrostate variable X, the standard

deviation σN becomes narrower as N increases. An explicit illustration of this behavior is



seen in the molecular kinetic energy distributions discussed below. Note that Eqs.(4.18)-

(4.22) hold regardless of the specific form of the probability distributions for the

variables!

An important result, called the central limit theorem, applies to systems with large

numbers of stochastic variables. This theorem states if the individual one-variable

distributions f1(xi) have proper boundary conditions (are finite at the limiting values) and

finite variance values σ1, for large numbers of variables, the N-variable probability

distribution PN(X) becomes Gaussian,

22

2

2

1)(lim N

XX

NN

NeXP

. (4.23)

The central limit theorem is independent of the functional form of the one-variable

distributions p1(xi) and even applies to discrete distributions such as the probability

distribution for total value of the roles of N dice, PN(E), which become Gaussian for large

N. The operation of the central limit theorem can be seen in probability distribution for

sum role of five dice, P4(X) in Fig. 4.2 which is already starting to look Gaussian.

The mathematical procedure for calculating the probability distribution function for

the variable X, involves mixing or convolution of one-variable distribution functions. The

convolution procedure is illustrated for X = xI + xII with a distribution function P2(X).

Different combinations of xI and xII values can give rise to the same X and the probability

of observing the sum to be between X and X + dX is,

max

min 2

max

min

max

min 2

22

),(

),()(

),()(

Ix

Ix

III

Ix

Ix

IIx

IIx

IIIIIIIII

IIIdXXIIxIxX

III

dxxXxP

dxdxxxPXxx

dxdxxxPdXXP

(4.24)

The subscript on the integrals in the first line shows that the integration limits are

constrained such that xI + xII always remains between X and X+dX. This constraint can be

introduced into the integrals by using the delta function δ(xI+xII−X) which is a concise

way of summarizing the two conditions,

0 if 1

0 if 0)(

Xxx

XxxXxx

III

IIIIII (4.25)

The delta-function eliminates all values of the integrand except when, xII = X – xI, and

allows the use the full range minix to max

ix for each integration limit. This simplifies the

evaluation of the integral. To get the last equality in Eq. (4.24), we use the properties of

the delta function to we integrated over the variable xII.

If xI and xII are independent (as in the case of the energies of the two non-

interacting molecules or the role of two dice), P2(xI, X-xI) = p1(xI)p1(X-xI) and Eq. (4.24)

reduces to,

max

min 112 )()()( Ix

Ix

III dxxXpxpdXXP . (4.26)

The first one-particle probability distribution represents the contribution of xI and the

second the contribution of xII to the convoluted integral.



In general, an integral of the form,

maxmin

maxmin

2 )()()()()( xx

xx

dxxgxXfdxxXgxfXP (4.27)

over the range of the distribution ],[ maxmin xx , is called convolution of f and g for a

specific X value.

We implicitly used the concept of convolution to calculate the probability

distributions for the role of two or more dice from the probability distributions for the

individual die as shown in Figure 4.2. For two dice we can write,

IIII IIIII EppppEP )()()()()( 11

', 112 , (4.28)

where the prime on the first summation shows that εI and εII are restricted to values such

that the condition on εI + εII = E is satisfied. For example, the probability of rolling a 7

from two dice is the convolution of the probabilities of the two individual die subject to

the constraint that εI + εII = 7,

36

16

)1()6()2()5()3()4()4()3()5()2()6()1(

)7()()()()7(

111111111111

11'

, 112

pppppppppppp

ppppPI IIIII III

.

(4.29)

In Appendix 4.2, we show how the convolution of two Gaussian functions is

performed.

4.3 The Maxwell – Boltzmann Velocity Distribution

In previous sections of this chapter, the mathematical form of the probability distribution

was assumed to be known or could be guessed based on the intrinsic nature of the

stochastic variables under study. In this Section, we derive the Maxwell-Boltzmann

velocity distribution function of molecules in an ideal gas where the energies of

molecules are independent of one another. The assumptions we apply about the nature of

probability distributions of physical quantities take us from the realm of mathematical

probability theory into the field of kinetic theory of gases and statistical mechanics. These

assumptions lie at the heart of our understanding of the connection between the

microscopic and macroscopic viewpoints. We shall see that the Maxwell-Boltzmann

velocity distribution has applicability beyond the ideal gas model and in classical

systems, represents the velocity distribution of molecules regardless of their physical

state.

4.3.1. The concept of temperature from the mechanical analysis of an ideal gas

An ideal gas is a simple model for the behavior of gas systems where the motion of

molecules are assumed to be governed by classical mechanics and molecules interact

with each other or with the vessel walls only during collisions. The Maxwell-Boltzmann

distribution describes the velocities of molecules in an ideal gas. Before deriving the

Maxwell-Boltzmann velocity distribution, we review the concept of temperature of an

ideal gas from a molecular point of view.

A microscopic interpretation of temperature was given by the Swiss mathematician

Daniel Bernoulli in 1738. He analyzed the motion of a collection of non-interacting



spherical molecules which make up an ideal gas. This type of analysis which assumes

random motion and collisions in the gas phase and uses probability concepts to describe

dynamic processes is the subject of the kinetic theory of gases.

The ideal gas, represented schematically in Figure 4.3, is considered to be a

collection of non-interacting molecules of mass m which have volumes that are very

small compared to the volume of the vessel the gas is confined to. The collisions between

the molecules and the walls of the container are considered to be elastic, meaning there is

no loss of the total kinetic energy of the center of mass motion of the molecules during a

collision. Microscopically, this implies that no kinetic energy of the center of mass

motion enters the internal degrees of motion of the molecules (i.e., molecular rotation or

intramolecular vibrations) after the collision is completed.

Figure 4.3. The kinetic theory of gases picture of an ideal

gas. Molecules are moving randomly with a spread of

velocities in all three spatial directions. Molecules

occasionally collide with each other and with the walls of the

vessel.

The molecules in an ideal gas have a distribution of velocities. Each molecule i has

a velocity vi with components vix, viy, and viz. The magnitude of the velocity or speed vi of

molecule i is,

2,

2,

2, ziyixii vvvv . (4.29)

In an isotropic environment, it is “reasonable” to expect there will be no bias in the

distribution of the velocity components in the three Cartesian directions of the gas

molecules. Therefore, the average square speed of the molecules, v2 should be equally

distributed among the squared averages of the components of the velocity in the three

Cartesian directions,

32222 vvvv zyx , (4.30)

where the brackets represent averages over all molecules in the system. This relation

guarantees that the gas does not spontaneously flow in any of the Cartesian directions.

According to this mechanical picture of the gas, the direction of motion, and

therefore the momentum of a molecule changes when it collides with the container walls.

This change in momentum of a molecule is the result of a force from the wall (Newton’s

second law) and the molecule in turn exerts an equal force on the wall in the opposite

direction (Newton’s third law). The details of this process are shown in Figure 4.4. The

total force on the surface A of the confining wall leads to the gas exerting a pressure P =

F/A on the walls of the container.



Figure 4.4. The collision of a molecule with velocity

component vxi with the wall in the x-direction. The time

between consecutive collision with the wall is related to

the time it takes the molecule to traverse the distance

2Lx.

To determine the average pressure exerted by all molecules on the wall Wx1 with

area A, of the container, consider a molecule with the velocity component vxi in the

positive x-direction. As a result of the elastic collision of the molecule with the wall, the

velocity component in the x-direction is reversed to -vxi. If this molecule does not undergo

collisions with other gas molecules, it will move towards the opposite wall Wx2 collide

with it and then be reflected back towards Wx1. The time between consecutive collisions

with the wall Wx1 is, Δt = 2Lx/vxi, where Lx is the length of the container in the x-direction.

The molecule has undergone a change in momentum of Δpxi = 2mvxi during the time Δt

which according to Newton’s second and third laws will equal the force the molecule

exerts on the wall during that time. The contribution to the pressure on the wall Wx1 of

area A as a result of the collision of this one molecule is,

V

mv

AL

mv

A

tp

A

fP

ix

x

ixixixi

2,

2,,,

, (4.31)

where V is the volume of the container (occupied by the gas). If there are N molecules in

the system, the total pressure on the wall in the x-direction is,

V

EN

V

vNm

V

vNm

V

mvPP

KxN

i

ixN

iitot

3

2

3

22

1

2,

1

, (4.32)

where Eq. (4.30) was used and EK = mv2/2 is the average kinetic energy of the gas

molecules in the system. This is Bernoulli’s expression which related the pressure of an

ideal gas system (a macroscopically measurable quantity), to the average speed or kinetic

energy of the molecules in the ideal gas (a microscopic and non-measureable quantity).

The macroscopic ideal gas law was discovered from the empirical observations of

Robert Boyle (1662) and Jacques Charles (1780s). Boyle demonstrated that at constant

temperature, the pressure of an ideal gas is inversely proportional to its volume, and

Charles demonstrated the proportionality of the gas volume to its temperature at constant

external pressure. For a fixed volume and temperature, it was also known that the

pressure of a gas depends on the amount of gas in the system. The ideal gas equation of

state is a combination of these observations. In molecular terms, the ideal gas law is

equivalent to,



V

NkTPtot , (4.33)

where T is the temperature and k is Boltzmann’s constant. Comparing Eqs. (4.32) and

(4.33), the temperature of an ideal gas is identified as the average kinetic energy of the

gas molecules,

KEk

vmk

T3

2

3

1 2 . (4.34)

This is a simple example of statistical mechanical reasoning where the macroscopic

temperature is related to the microscopic average kinetic energy of the molecules in the

gas. At the heart of this relation is the statistical / probabilistic assumption about the

average velocity components of the gas molecules given in Eq. (4.30).

Equation (4.34) relates the average molecular speed to the temperature, but does not

give any information about the distribution function of the velocities. The molecules in an

ideal gas have different velocities, vi. The velocity distribution of the ideal gas system

gives the probability of a molecule having a certain velocity in a given temperature and

can be used to derive the speed, v, and energy, , distributions for the individual

molecules as well. Convolution of the distributions of the energies of the individual

molecules is used to determine the distribution of the total energy, EN, for the N-molecule

ideal gas system. These distributions give insight into the behavior of collective

properties and can illustrate how descriptions of a system change as we go from an

individual molecule to the collective (macroscopic) system viewpoint.

For completeness and future reference, we present the mechanical view of the

pressure in a system in Appendix 4.3.

4.3.2 The Maxwell – Boltzmann distribution of velocities for an ideal gas

As seen in Figure 4.1, different probability distributions can give the same average

(mean) value for the variable. Knowledge of the temperature of the system does not

provide sufficient information to derive the distribution of the velocity among the

molecules in an ideal gas system. Different velocity distributions can be envisioned. For

example, all molecules can move with the speed equal to the average value, i.e. vi2 = v

2

for all i. Another example is a distribution where half of the molecules move with vi2

=

v2/2 and the other half move with vj

2 = 3v

2/2. In both of these cases, the average

square speed of all molecules in the system is v2, and both distributions would give the

same temperature for the gas sample. The dynamical behavior of these two systems,

however, would be very different. Among all possible distributions of the velocity, in

1866, James Clerk Maxwell set out to find the most probable distribution of velocities

which is still consistent with the known temperature. Velocity distributions of the gas

molecules can be measured experimentally and Maxwell’s velocity distribution has been

verified.

Before describing Maxwell’s derivation of the velocity distribution, we summarize

the mechanism by which different velocities are generated of the molecules in a gas.

Consider two molecules in a gas moving towards each other with different initial

velocities as shown in Figure 4.5. These molecules “collide” if they move closer than the

range of their mutual interaction potential. For hard sphere molecules, this means the

molecules collide if the impact parameter, b, is less than the sum of their radii. The

collision changes the velocities and redistributes the kinetic energy among the two



molecules. For elastic collisions, linear momentum, angular momentum, and kinetic

energy are conserved, i.e. their total amounts are the same before and after the collision.

Many collisions (between 1010

– 1011

collisions per second in a typical case) occur in a

typical gas sample with ~1023

molecules. These collisions become a strong mechanism

for velocity redistribution. During any period of observation, we cannot assign fixed

velocities to individual gas molecules, but only say there is a distribution of velocity

among all molecules in the gas at any given time. Maxwell was aiming to derive the form

of this velocity distribution without having to survey the velocities of a collection of

molecules moving at different velocities.

Figure 4.5. The schematic representation of a collision between two molecules moving with an

impact parameter b. A pre-collision configuration is shown in (a), the closest distance of approach

is shown in (b), and the after-impact velocities of the molecules are shown in (c). The

conservation of energy, linear, and angular momentum determine the final velocities from the

values of the initial velocity and impact parameter.

Maxwell made a set of deceptively simple arguments to derive the most probable

distributions for the x, y, and z components of the molecular velocity in an ideal gas. In

the most general case, the distribution of velocities, P(v), will depend on both the

magnitude of the velocity and the direction of motion of the molecules. Maxwell’s first

assumption is that for a system in equilibrium, the velocity distribution of molecules is

isotropic and does not depend on spatial direction in the system. In other words, there is

no preference to moving in any specific direction and the velocity distribution (whatever

form it has) only depends on the magnitude of the speed,

),,()()( zyx vvvPvPP v , (4.35)

where P(v) is the distribution function for the molecular speeds. This implies that in the

velocity distribution, the Cartesian components appear only as the combination,

2222zyx vvvv . (4.36)

The physical basis of this assumption is that at equilibrium, our choice of Cartesian

coordinate system for describing the velocities of the molecules in a gas is arbitrary. We

can choose the x, y, and z-coordinate system to point in any arbitrary direction and this



should not affect the mathematical form of the velocity distribution. The magnitude of

velocity (speed) is an invariant, independent of the coordinate system.

Maxwell’s second postulate is that the values of the velocity components vx, vy, and

vz of the molecules are independent of one another and their distribution functions have

identical functional form. What this assumption entails, for example, is that the velocities

in the y- and z-directions are not affected by a molecule moving very fast in the x-

direction. Mathematically this is written as,

)()()(),,()( zyxzyx vPvPvPvvvPvP , (4.37)

where P(vx) is the distribution function of the x-component of the velocity. The reasoning

behind Eq. (4.37) is that the choice of the coordinate system is arbitrary and in the

absence of any external effects (such as pressure gradients, external electrical and

gravitational fields), the distribution functions for the velocity in the three Cartesian

directions should be identical.

The mathematical form of the velocity distribution function in an ideal gas is

dictated by these two simple assumptions. Taking the derivative of each side of Eq.

(4.37) with respect to one velocity component, say vx, and using the chain rule and Eq.

(4.36), we get,

)()()()()()(

zyx

xx

xx

vPvPdv

vdP

v

v

dv

vdP

dv

dv

dv

vdP

dv

vdP . (4.38)

Dividing both sides of the last equality in Eq. (4.38) by P(v) and separating variables

gives,

xxx

x

vvPdv

vdP

vvPdv

vdP

)(

1)(

)(

1)( . (4.39)

The left hand side of Eq. (4.39) is only a function of the variable v and the right hand side

is only a function of vx. This shows that both sides must be equal a constant, C′. Solving

the equation for vx gives the distribution function for each of the components of the

velocity,

BCvvPdvvCvP

vdPC

vvPdv

vdPxxx

x

x

xxx

x 2)(ln)(

)(

)(

1)(, (4.40)

or in a more familiar Gaussian form,

2xCv

x AevP

)( . (4.41)

Similar expressions hold for the vy and vz components of the velocity. Maxwell’s

assumptions lead to a Gaussian distribution for each of the velocity components with two

underdetermined constants A and C. By applying the normalization condition to the

probability distribution in Eq. (4.41), noting that vx varies between -∞ and +∞, the

constant A is determined (see Eq.(A4.46) in Appendix 4.4),

2

21

1 xCvxxx e

CvPdvvP

/

)()(

. (4.42)

The constant C can be determined by using the kinetic theory of gases result Eq. (4.34),

kTmvx 2

12

2

1 , where k is the Boltzmann constant, T is the temperature and m is the

molecular mass. Using Eq. (4.42) gives,



kT

mC

kTdvvPvmmv xxxx

22)(2

2

12

2

1

. (4.43)

Equation (A4.47) in Appendix 4.4 is used in evaluating the integral in Eq. (4.43). The

distribution function for vx is thus,

kTxmv

x ekT

mvP

222/1

2)(

. (4.44)

Similar distribution functions holds for vy and vz. By comparing Eq. (4.44) with (4.14),

we see that the variance of the Gaussian probability distribution, σ2 = kT/m, and the

distribution becomes broader at higher temperatures and for lighter particles. The velocity

component distributions at three temperatures are shown in Figure 4.6.

Figure 4.6. The probability distributions for

velocity components of ideal gas molecules

at three temperatures. The distributions of

the velocity components are Gaussian with a

width that depends on temperature and

molecular mass.

We can use the distribution functions for the velocity components to calculate the

distribution function for the speed. The probability of simultaneously having speed

between v and v+dv is related to the distributions of velocity components between vx and

vx+dvx, vy and vy+dvy, vz and vz+dvz,

zyxkTzvyvxvm

zyxzyx dvdvdvekT

mdvdvdvvPvPvPdvvP

2)222(2/3

2)()()()(

.

(4.45)

Converting the volume element in Eq. (4.45) from Cartesian space to the spherical

polar coordinate system gives the familiar form,

ddvdvdvdvdv zyx sin2 . (4.46)

Substituting Eq. (4.46) in Eq. (4.45) and integrating over the angle variables (from 0 to

180) and (from 0 to 360) gives the distribution function for molecular speed,

dvvekT

mdvvP kTmv 22

2/32

24)(

. (4.47)

Note that P(v) contains the factor of v2 and is not Gaussian. The distribution functions for

molecular speed at same three temperatures shown in Figure 4.6 are shown in Figure 4.7.



Figure 4.7. The probability distribution

for the speed of ideal gas molecules at

three temperatures. Note that the

distributions are skewed towards higher

speeds and become broader at higher

temperatures.

The speed distribution can be used to determine the kinetic energy distribution of

each molecule, ε = mv2/2. Changing variables from speed to kinetic energy, and using dε

= mvdv in Eq. (4.47) we get,

dekT

dP kT

2/1

2/3

112

)( . (4.48)

The subscript 1 indicates a one-molecule kinetic energy distribution function. The kinetic

energy probability distribution is plotted in the Figure 4.8.

Figure 4.8. The probability

distribution for the kinetic energy of

an ideal gas molecule showing the

most probable and average energies.

The variance of the distribution is also

shown. Plot for 200 K, 500 K, and

1000 K.

For one molecule in an ideal gas, the average energy is,

0 232/3

2/3

01 )exp(

12)( kTdkT

kTdP

(4.49)

Equation (A4.43) in the Appendix 4.4 is used to derive the last equality. The most

probable energy (max) is,

2

00)(

max/2/11/2/1

211 kT

eed

dP kT

kT

kT

. (4.50)

The standard deviation for the energy distribution is = (3/2)1/2

kT = 1.22kT is large

relative to the most probable energy εmax = kT/2, which shows that individual molecules



have large spreads of energy in the ideal gas. The fact that the average and most probable

energies are separated by a relatively large energy value shows that the one particle

energy distribution is skewed.

4.3.3. Energy distributions for collections of molecules in an ideal gas

Knowing the probability distribution for the energy of individual molecules, the

next stage is to determine the probability distribution, P2(E2), for the total energy for two

molecules in an ideal gas system, where E2 = εI + εII.

To determine the two-molecule distribution function, the convolution of to one-

molecule probability distributions is performed,

2

2 2

222

0 211

0 0 112

222

)()(

)()()(

),()(

EIII

E EIIIIIIIII

IIIdEEE

III

dEPP

ddPPE

ddPdEEP

III

(4.51)

where the definition of the delta function given in Eq. (4.25) is used. The subscript on the

first set of integrals shows that the integration is constrained to values of εI and εII where

E2 < εI + εII < E2+dE2. Substituting the normalized one molecule energy distribution

given in Eq. (4.48) and evaluating the integral in Eq. (4.51) using Eq.(A4.50) of

Appendix 4.4 gives,

kTE

E kTEkT

eEkT

deEekT

EP

/2223

20

/)2(2/12

/2/1

322

)(2

1

)(4

)(

. (4.52)

From this distribution, the average two-particle energy is,

kTkTkT

dEeEkT

EkTE

3)/1(

6

)(2

1

)(2

1

4320/3

2322 , (4.53)

and it can be shown that the most probable energy is kTE 22max, and the standard

deviation of the energy is kT32 .

The procedure for determining P3(E3) and E3 for a collection of three molecules is

given in Appendix 4.5.

The convolution procedure can be repeated to calculate the probability distribution

for the total energy of N molecules, EN = εI + εII + ··· + εN to give,

)exp()(! 123

1)(

]1)2/3[(

2/3kTEE

kTNEP N

NNNNN

. (4.54)

This probability distribution gives the average energy as,

kTN

dEeEkTN

E NkTEN

NNNN

2

3

)( ! 123

1

0

/2/3

2/3

, (4.55)

and most probable energy and standard deviation are kTNE N 123max, , and

kTNN )23( .



To compare the relative width of the energy distributions for different numbers of

molecules, we define probability distributions in terms of the reduced energy E*i =

Ei/Emax,i, which the energy relative to the most probable energy,

**)(**2

1)( 1

2/*2/11

dPdedP , (4.56)

and

*2

*22

*2

*222*

2222 )(4)( dEEPdEeEdEEPE

. (4.57)

The general expression for the N-molecule energy distribution is,

***)12/3()12/3(*

2/3

)()!12/3(

)12/3()( NNN

NENNN

N

NNN dEEPeEN

NdEEP

. (4.58)

For small N the factorial in Eq. (4.58) can be calculated directly. For large N values,

Stirling’s approximation given in Eq.(A4.38) of the Appendix 4.4 can be used to

determine (3N/2 – 1)!. The total reduced energy distributions for 1 to 10 molecules are

shown in Figure 4.9.

Figure 4.9. The kinetic energy

distribution for 1 to 10 molecules

plotted as a function of the energy

reduced by the most probable

energy. The distribution becomes

sharper with greater numbers of

molecules.

The energy distributions become sharper and the skew less pronounced as the

number of molecules increases. Figure 4.10 shows that for collections of larger numbers

of molecules, the distributions of energy relative to the most probable energy become

very narrow. Extrapolating to macroscopic systems with N of the order of 1020

, the

energy distribution is effectively a single valued (δ-) function and single, well-defined

value for the energy is always observed from measurements of the energy EN for this

system.

Figure 4.10. The kinetic

energy distribution for systems

with 10 to 1000 molecules

plotted as a function of the

E*max,N =1

E*N

P(E

*N

)

E/Emax,N

P(E

/Em

ax,N

)



energy reduced to the most probable energy.

The analysis and Figures 4.9 and 4.10 show how distributions of many molecule

macroscopic properties behave differently from distributions of one molecule

microscopic properties. New behaviors emerge as the number of molecules in a system

increases and new relations, such as equations of state, may be discovered between well-

defined many-molecule quantities. Such laws do not necessarily hold between

corresponding quantities describing small numbers of molecules. These new emerging

relations are a consequence of the laws of mechanics and the laws of probability for

systems with large numbers of variables.

Up to this point, we have shown that the energy distribution in the ideal gas system

becomes sharper as the number of molecules increases. In the next chapter we discuss

systems where the total energy is not a simple sum of one molecule energies. We see that

even for systems with interacting molecules, well-defined, system-wide properties also

emerge as we go to macroscopic sized systems. By introducing the mathematical concept

of the ensemble, first developed by Boltzmann and Gibbs, we illustrate that the

mathematical analysis used to determine distribution functions for collective ideal gas

properties can be extended to collective properties of systems with interacting particles.

4.3.4 Assigning initial velocities to molecules in molecular dynamics simulations

The discussions of the previous sections imply that if a molecular dynamics simulation is

properly set up and performed, collisions between molecules in a system lead to a

Maxwell-Boltzmann distribution of velocities. At the beginning of a simulation, how do

we set up the molecular velocities such that the system is at a desired temperature? This

is done with random number generators which are available in all modern computer

programs. Two methods are used to set molecular velocities in a system to the proper

Maxwell-Boltzmann distribution at a temperature T.

In the first method, for the N-molecule system we generate a set of 3N random

numbers, each of which can vary in the range of -1 to +1. The random numbers from α1

to αN are associated with the velocity components vx1, to vxN, and similarly for the

remaining 2N random numbers are associated with the y- and z-coordinates of the

velocities. For a proper velocity distribution, we must have, vx = vy = vz = 0. These

conditions are guaranteed by the range and nature of the random numbers since for a set

of random numbers α = 0. In addition, we scale the random numbers by some constant

factor such that they satisfy the relations for velocity components at the required

temperature T,

2

3

2

1

2

3 22 NkTmvNmvN x . (4.59)

In this way, the velocity components of the molecules are assigned proper initial average

kinetic energy at the beginning of the simulation. After the simulation begins and the

molecules collide, a proper Maxwell-Boltzmann distribution of velocity components

automatically develops.



The second method for assigning the initial molecular velocities is based on a

sampling technique developed by statisticians George Box and Mervin Muller (in 1958).

This method uses two random numbers which vary in the range of [0,1] to generate a

Gaussian distribution of variables. In the Box-Muller method, if the two random numbers

are shown as ξ1 and ξ2, the variable X calculated by,

)2cos(ln2 12 X , (4.60)

has a Gaussian distribution with a variance of σ2. By generating 2N random numbers, we

can generate N numbers representing the velocity components of molecules in the system

which have the proper Gaussian form. By relating the variance to the temperature of the

simulation, the velocities components can be scaled to give the required temperature.

4.3.5 Phase space description of an ideal gas

The phase space description introduced in Chapter 1 was applied to mechanical systems

with a relatively small number of degrees of freedom and corresponding {q, p} pairs.

Even for a system with two degrees of freedom, there are four phase space variables, and

the phase space trajectory cannot be represented in three-dimensional space. Nonetheless

the concept of the phase space trajectory is useful in guiding our thinking on the global

behavior of mechanical systems.

In the case of an ideal gas in a cubic vessel of length L on each side, the phase space

consists of molecular coordinates randomly distributed between -L/2 and L/2 with the

conjugate momenta of the molecules distributed according to the Maxwell – Boltzmann

distribution. Each molecule is represented by a point in a six-dimensional phase space

consisting of {x, y, z, px, py, pz}, the so-called μ-space (molecule space), see the Chapter

5, and the distribution of N molecular positions / momenta in the system form a

probability cloud in the μ-space. As the system evolves, the shape of the phase space

cloud changes as molecules collide with each other and the walls.

IN a 6N-dimensional phase space of {xi, yi, zi, pxi, pyi, pzi, …, xi, yi, zi, pxN, pyN, pzN}

with i from 1 to N, the so-called γ-space (gas space), the initial state of the entire system

is represented by a single point. This single point moves in 6N-dimensional phase space

as the molecules collide with each other and the walls. The trajectory in this phase space

is complex, but is always constrained to move on a surface in the phase space with

constant energy,

𝐸𝑁 = 𝑝1

2(𝑡0)

2𝑚1+

𝑝22(𝑡0)

2𝑚2+ ⋯ +

𝑝𝑁2 (𝑡0)

2𝑚𝑁 (4.61)

In reality, for an N-molecule ideal gas system, the exact location and momentum of

each molecule are not known and so we cannot pinpoint a specific location in 6N-

dimensional phase space. We just know that the molecules are randomly distributed in

the volume available to them and that the distribution of velocities obeys the Maxwell-

Boltzmann probability distribution corresponding to the temperature of the system. The

only constraint on the distribution of velocities is that they must satisfy Eq. (4.61). This

knowledge can generates a probability distribution in 6N-dimensional phase space which

can be thought of as a cloud of all possible states with molecules in different locations

and momenta distributed according to the Maxwell-Boltzmann distributions. If the

system is evolving with a constant total energy, all points in this probability cloud in 6N-

dimensional space at time t0 are constrained to move on a constant energy hypersurface.



The details of the time evolution of the probability distribution of molecules in the

in 6N-dimensional phase space is the subject of classical statistical mechanics. We will

not explicitly discuss Liouville’s equation (developed by Gibbs based on the work of the

French mathematician Joseph Liouville in 1838) which governs the time evolution of the

classical phase space probability distribution. Instead, in Chapter 5, we first approach

statistical mechanics through a quantum mechanical description of the state of a system.

Afterwards, we adapt the formalism to the classical mechanical description.

References

M. Abramovitz and I. A. Stegun, Handbook of Mathematical Functions, with Formulas,

Graphs, and Mathematical Tables, Dover Publications, New York, 1972.

P. A. Bromiley, Products and Convolutions of Gaussian Probability Density Functions,

Tina Memo No. 2003-003, Manchester, 2013.

H. Goldstein, C. Poole, and J. Safko, Classical Mechanics, 3rd

Ed., Addison Wesley, San

Francisco, 2001.

J. P. Hansen and I. R. McDonald, Theory of Simple Liquids, 3rd

Ed., 2005.

N. G. van Kampen, Stochastic Processes in Physics and Chemistry, 3rd

Ed. North

Holland, 2007.

D. A. McQuarrie, Statistical Mechanics, Harper Row, New York, 1976.

Wikipedia article on Liouville’s theorem has a good animated illustration of the time

evolution of a phase space probability distribution.

See: http://en.wikipedia.org/wiki/Liouville%27s_theorem_(Hamiltonian)



Appendix 4.1 Normalization, Mean, and Standard Deviation of the Gaussian Function

To verify the normalization of the Gaussian function given in Eq. (4.13), we evaluate the

integral, I over the entire range of the Cartesian variable x,

dxxxI ])(exp[ 2

0

. (A4.1)

This integral is determined by applying a well-known mathematic trick. Instead of I, we

calculate I2,

ddedede

dyyydxxxI

)22(22

20

20

2 ])(exp[])(exp[

(A4.2)

In the second equality, we changed the integration variables to ξ = (x – x0) and ζ = (y –

y0). A coordinate transformation changes this double Cartesian integral to a double

integral over the polar coordinates r and θ,

20 0

22 rdrdeI r . (A4.3)

where the relations r2 = ξ

2 +ζ

2 and rdrdθ = dξdζ are used. In polar form, the double

integral is easy to evaluate; integration over θ yields 2π, and the integration over r yields

1/2α. This proves that I2 = 1 and I = 1. In other words, the Gaussian distribution as

defined in Eq. (4.13) is normalized. From these calculations, we see that,

de2

(A4.4)

The mean of the Gaussian distribution is given by,

0

2

02

0 )(])(exp[

x

dexdxxxxx

(A4.5)

The second moment of the Gaussian distribution can be similarly calculated,

dexxdxxxxx22

0022

022 )2(])(exp[ (A4.6)

The first integral on the right hand side of Eq. (A4.6) is solved using integration by parts

with ξ = u and –2αξexp(–αξ2)dξ = dv to give,

2

1

2

1 2222

deede . (A4.7)

Therefore,

20

2

2

1xx

. (A4.8)

From Eqs. (A4.5) and (A4.8), the first two central moments of the Gaussian

distribution can be calculated,

0])(exp[)( 002

00 xxdxxxxxxx

, (A4.9)



2

122 2

0022222 xxxxxxxxxx . (A4.10)

Appendix 4.2 Convolution of Gaussian functions

The mathematical development presented below follows Bromiley.

An interesting property of Gaussian functions is that the product of two Gaussian

functions fa and fb with different means and variance values is also a Gaussian function.

Consider the Gaussian distributions fa with a mean of xa and a standard deviation of σa,

and fb with a mean of xb and a standard deviation of σb,

22/2)(

2

1)( aaxx

aa exf

and

22 2/)(

2

1)( bbxx

bb exf

(A4.11)

After straightforward but lengthy algebra, the product of the two Gaussian distributions

functions can be shown to be a scaled Gaussian function of the form,

22/2)(

2)()()( ababxx

abbaab e

Sxfxfxf

. (A4.12)

The mean of the product Gaussian fab′(x) is,

22

22

ba

abbaab

xxx

. (A4.13)

The standard deviation of fab′(x) is,

22

22

ba

baab

. (A4.14)

The product Gaussian is not normalized and has a scaling factor,

)(2/)(

22

222

)(2

1baba xx

ba

eS

. (A4.15)

Consider fa with {xa = 5, a = 2} and fb with {xb = 2; b = 3}. Figure A4.1(a) shows the

original Gaussian functions and their product. Equations (A4.13) and (A4.14) give xab =

4.077 and ab = 1.664 for the mean and standard deviation of the product distribution and

the scaling factor, S of the product Gaussian is 0.0783. Note that the mean of the product

Gaussian lies between the means of the two separate Gaussians and the standard

deviation of the product Gaussian is smaller than those of the two individual Gaussians. A commonly encountered case is where the two stochastic variables have identical

Gaussian distributions. In this case Eqs. (A4.13) to (A4.15) simplify to,

aab xx ; 2

aab

; and

a

S2

1 . (A4.16)

The non-normalized form of the product Gaussian given in Eq. (A4.12) for this special case is,

2/2)(

22

1)( aaxx

a

ab exf

(A4.17)

The distribution functions for this case are shown in Figure A4.1(b).



Figure A4.1. (a)The product of two Gaussian functions fa(x) (red curve) and fb(x) (green curve) is

another Gaussian function (blue curve) plotted in normalized form. (b) The product of two

identical Gaussian functions (blue curve) is a narrower Gaussian function (green curve), plotted

in normalized form.



The convolution of two distributions was defined in Eq. (4.27). We consider the

convolution of two Gaussian functions fa(xI) and fb(xII) (see Eq. (A4.11)) representing the

stochastic variables xI and xII. Knowing the distributions of the individual variables xI and

xII, we want to determine the distribution function for the joint variable X = xI + xII.

The convolution of the two functions is,

I

22/2)I(22/2)I(

III2

2

1

)()()(

dxee

dxxXfxfXP

bbxxXaaxx

ba

ba

(A4.16)

The prime shows that the probability distribution given for X in Eq. (A4.16) is not

normalized. After some algebra, the integral can be calculated. The normalized

distribution for the X variable is,

)22(22)]([

222

)(2

1)( babxaxX

ba

eXP

(A4.17)

The proof of Eq. (A4.17) involves the use of Fourier transforms of the Gaussian

distributions fa and fb and is given in Bromiley.

The way the convolution formula in Eq. (A4.16) works is illustrated for the case

where {xI = xII = 5, I = II =2} in Figure A4.2. For this case, Eq. (A4.16) becomes,

I

8/2)5I(8/2)5I(2

8

1)( dxeeXP

xXx

, (A4.18)

For each value of X = xI + xII, the integrand is a product of two Gaussian functions, the

first centered at xI = 5, and the second offset to be centered at X - xI = X –5, as shown

in Figure A4.2. The product Gaussian function in the integrand of Eq.(A4.18) is shown in

blue color in Figure A4.2, where the integrand is shown for several values of X. The

product Gaussian function and integrand are largest when X = 10.

From Eq. (A4.17), the normalized probability distribution for the convolution of

two Gaussian functions is,

162)10(

24

1)( XeXP

. (A4.19)

This implies that it is more probable for X to have values near xI + xII = 10. In other

words, since the individual p1(xI) and p1(xII) distributions are more likely to have values

near their average, the sum value will also not deviate greatly from the sum of the two

averages. It is highly unlikely that the values of xI and xII will simultaneously deviate

greatly from their averages, so getting very large are very small values of X will be less

likely.

In this example, for the one variable distributions,

4.05

2

II

b

I

a

xx

(A4.20)

while, the two variable distribution gives,

I

a

xX

2

1283.0

10

222 (A4.21)





Figure A4.2. The product of two Gaussian functions fa(x) (red curve) and fb(x) (green curve) is

another Gaussian function, shown in each case by the blue curve. The convolution of the two

distributions for each separation, P2(X) is related to the area under the blue curve.

Appendix 4.3 The virial equation and the microscopic mechanical view of pressure

We usually associate the laws of mechanics as being general and applying to motion of

masses under all conditions. For systems with periodic motions or confined systems, a set

of “statistical” laws can relate the time- or phase space averages of different quantities.

For an N particle system, consider the quantity,

N

iiiG rp (A4.22)

called the virial, after the Latin word for force, by Rudolf Clausius in 1860. The reason

for defining this quantity will become clear when we study its properties. The time

derivative of the virial, G is,

N

i

ii

N

ii

i

dt

d

dt

d

dt

dG rpr

p (A4.23)

Using Newton’s second law of motion, the first term is written in terms of the total force

on each particle i,

N

iii

N

ii

i

dt

drFr

p (A4.24)

The second term is can be rewritten in terms of the kinetic energy of the system,

Kmvmdt

d N

ii

N

iii

N

i

ii 22 vv

rp (A4.25)

The time average of dG/dt in Eq. (A4.23) over a length of time is obtained integrating

the time derivatives,

KGG

dtdt

dG N

iii 2

)0()(10

rF

(A4.26)

There are two sets of conditions over which the left hand side of Eq. (A4.26) is equal to

zero. For systems with periodic motion, at the time of the period = tp, the value of G()

= G(0) and we have,

KN

iii 20 rF (A4.27)

For confined systems, as , the values of pi and ri remain finite. So G() - G(0) in Eq.

(A4.26) will remain finite, while the time of sampling can grow infinitely long.

Therefore, for these cases, Eq. (A4.27) is also valid. For these cases,

N

iiiK rF

2

1 (A4.28)

Equation (A4.28) is called the virial theorem of classical mechanics and is a rather

amazing result. We will verify its correctness for a few sample periodic systems before

using it to extract an expression for pressure in a microscopic system.



For a harmonic oscillator system with x0 = 0 and mass m, Eq. (A4.28) is written

as,

)(2

1

2

1

2

1 22 xUkxxkxmv (A4.29)

This interesting result shows that the average kinetic energy and potential energy of a

harmonic oscillator over a period of the motion are equal. To verify this relation, we use

Eqs. (1.11) and (1.12) explicitly,

22

20

220

2220

22

42

)(cos2

)(cos222

1

Ck

Cm

dCm

dttCm

dtvm

mv

(A4.30)

22

20

220

220

22

42

)(sin2

)(sin222

1

Ck

Ck

dCk

dttCk

dtxk

kx

(A4.31)

For a mass moving under a Coulombic (or gravitational) force, we can use Eq.

(A4.28) to obtain the interesting result,

)(2

1

2

1

2

1

2rU

r

Qr

r

QK (A4.32)

without having to explicitly use the expressions for the speed and position obtained from

the solution of Newton’s equations of motion for this system. Equation (A4.32) describes

the relations between the average kinetic and potential energies for the earth and other

planets in their motion around the sun.

For a confined gas or liquid system, Eq. (A4.28) can be used to derive the

expression for the pressure. From Eq. (4.34), we know that,

NkTvmKN

iii

2

3

2

1 2 (A4.33)

The force on a molecule in a confined fluid system can have contributions from external

sources, primarily the wall with which the molecule collides and internal sources, from

the intermolecular interactions. We write the right hand side of Eq. (A4.28) as,

N

iiiiext

N

iii rFFrF )( int,, (A4.34)

The external forces only applies when molecule are at the surface of the system, through

the agency of the external pressure,

dAPPdAS

N

iii

N

iiiext rnrnrF ', (A4.35)

Using Gauss’s theorem which converts the integral over the surface to an integral over

the volume of the confined system, we get,



PVdVPV

N

iiiext 3, rrF (A4.36)

Substituting Eqs. (A4.33) and (A4.36) into the virial theorem (A4.28) gives,

N

iiiPVNkT rFint,

2

1

2

3

2

3 (A4.37)

Rearranging Eq. (A4.37) gives the virial equation for the pressure of a fluid system.

N

iiiNkTPV rFint,

3

1

-Pressure is a scalar quantity in isotropic liquid and gas systems. P = F/A and the force is

the same on a unit area aligned in any direction. In solids, pressure is not necessarily

isotropic and should be represented by a tensorial quantity.

-Pressure is affected by intermolecular attractions. Recall the van der Waals equation

where a density dependent correction to the pressure is introduced

Appendix 4.4 Some useful mathematical relations and integral formulas

Stirling’s Approximation for N!

The factorial operator for an integer is defined as,

NNN )(! 1321 (A4.38)

along with the definition, 1! = 1.

For the factorial of very large integers, Stirling’s approximation for ln(N!) can be used,

NN

m

NNNxdxmN1

1

lnlnln!ln . (A4.39)

A more exact expression for the factorial of large integers is,

NNNNN 2lnln!ln (A4.40)

Exponential integrals

0 1

!

n

axnn

a

ndxexA for integer n (A4.41)

0 2/3

2/1

2/1

1

2 adxexA ax

(A4.42)

0 2/5

2/3

2/3

1

4

3

adxexA ax

(A4.43)

0 2/7

2/5

2/5

1

8

15

adxexA ax

(A4.44)

Generally,

0 1

)1(n

axn

na

ndxexA for half integer n (A4.45)



where the Γ function is a generalization of the factorial function to non-integer values.

Gaussian integrals 2/1

00

2

12

dxeI x (A4.46)

0

2/1

2/even

22

)1(5312

nn

dxexIn

axn

n

(A4.47)

0 2/)1(

21

odd 2

)!(2

nn

dxexIn

axn

n

(A4.48)

Beta function integral

For integer values,

)!1(

!!)1(),(

1

0 nm

nmdxxxnmB nm (A4.49)

For half integer values, this integral becomes,

)2(

)1()1()1(),(

1

0

nm

nmdxxxnmB nm

(A4.50)

For the special case of m = n = ½, Eq. (A4.50) reduces to

8)1(

2

1,

2

1 10

dxxxB (A4.51)

For other half integer values, the -function is defined as,

n

nn

2

)12(31

2

1

(A4.52)

and for integer values,

!1 nn (A4.53)

Appendix 4.5 Energy distribution for three-molecules

To get the three-molecule distribution function a convolution of a two-molecule energy

probability distribution and a one-molecule energy probability distributions is performed,

3

3 3

333

0 321

0 0 213

333

)()(

)()()(

),()(

EIII

E EIIIIIIIII

III

dEEE

III

dEPP

ddPPE

ddPdEEP

III

(A4.54)

Substituting the normalized one molecule energy distribution given in Eq. (4.48) and the

two-molecule energy distribution from Eq.(4.52) and evaluating the integral in Eq.

(A4.54) using Eq.(A4.50) of Appendix 4.4 gives,

kTEE kTEkT eEkT

deEekT

EP/27

3290

)(23

21

293333 3

)(105

16)(

1)(

(A4.55)



From this distribution, the average three-particle energy is,

kTkTkT

dEeEkT

EkTE

2

9

)/1(32

945

)(105

16

)(15

16

2112930

293293

3

,

(A4.56)

and most probable energy is Emax = 7kT/2.

This procedure can be repeated to get the n-molecule energy distribution for any

number of molecules n from a collection of N molecules.

Probability Theory in Molecular Simulations Extended ...€¦ · 4. Probability Theory 4.0.1 Deterministic and stochastic processes ...

Documents