Introduction to Molecular Simulation Chapter 4. Probability Theory © Saman Alavi Page 1 The University of Ottawa CHM 4390c/8309f,g - Introduction to Molecular Simulation Probability Theory in Molecular Simulations Extended Lecture Notes 4
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 1
The University of Ottawa
CHM 4390c/8309f,g - Introduction to Molecular Simulation
Probability Theory in Molecular Simulations Extended Lecture Notes 4
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 2
4. Probability Theory
4.0.1 Deterministic and stochastic processes
Laws of physics and chemistry allow the behavior of some systems to be predicted
with high accuracy. For example, if the initial position and velocity of a satellite are
known, the laws of mechanics allow the prediction of positions and velocities at all times
in the future. For such cases, the laws of classical mechanics are said to be deterministic.
In stochastic (Greek: stochastikos “skillful in aiming”) systems, future states can at
best be predicted with limited accuracy. Different factors cause a system to become
stochastic. In principle, the laws governing the behavior of the system may be known, but
uncertainty in the initial conditions may prevent the prediction of its future state with
sufficient accuracy. This is the case in games of chance which use dice or roulette
wheels. If we know the mass and shape of dice, the initial velocity with which they are
thrown, the air temperature and flow at the time the throw, the hardness of the surface on
which the dice bounce, etc., we will be able to predict the faces of the dice which come
up. The difficulty of this analysis and the sensitive dependence of the outcome of a throw
on slight uncontrollable variations in the conditions of the experiment usually prevent us
from predicting the outcome of a throw with certainty.
Quantum mechanical systems, on the other hand, are inherently stochastic in
predicting mechanical variables. Heisenberg’s uncertainty principle states that there is an
intrinsic limit to the accuracy with which initial mechanical variables can be determined
for atomic-scale particles. As a result, future states, as characterized by mechanical
variables position and momentum, are in principle, unknowable (or undefined) with
complete certainty. Quantum mechanics is deterministic in predicting the wave function
of a system at a time in the future from knowledge of the present wave function of a
system, however, that does not translate into determinism of the mechanical variables.
Concepts of probability theory are used to describe the extent to which predictions
can be made on the behavior of stochastic systems. Assigning probabilities to possible
outcomes of an event, through knowledge of the probability distribution, are the best that
can be done in these cases.
Probabilities can describe the outcomes of discrete (countable) events like the flip
of a coin, the role of dice, the spin of a roulette wheel, or the spin state of an electron. The
“event space” in these cases is a limited set of possible outcomes. Probabilities can also
be used to describe the outcome of events associated with continuous variables. For
example, if we are predicting the speed of a particular molecule in a gas sample or the
height of a person on a college campus, the event space varies continuously.
4.0.2 Probability theory in statistical mechanics
Statistical mechanics provides another context in which probability theory is used.
Statistical mechanics gives probability distributions which relate averages of microscopic
variables to macroscopic observables, taking into account the type of interactions of the
system and the surroundings. These probability distributions are derived to give results
that are consistent with the macroscopic laws of thermodynamics.
From a macroscopic point of view, we often do not need the level of detailed
knowledge of a system that a full microscopic deterministic description provides. For
example, a classical molecular dynamics simulation can determine the positions and
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 3
velocities of molecules at all times in the future from knowledge of their present state.
However, this knowledge is too detailed and often we summarize this information as
probability distributions for velocities or spatial distributions of molecules as a way of
capturing system-wide properties. In molecular dynamics, the calculated probability
distributions of mechanical variables must be consistent with the properties of the
probability distribution required by statistical mechanics. As we shall see in future
chapters, this provides guidance on how to couple the results of mechanical variables in
molecular dynamics simulations to environmental variables.
After a short introduction to concepts of probability theory and the notation used,
the connection between averages of probability distributions with macroscopic quantities
will be demonstrated. We will see how well-defined macroscopic variables arise from
microscopic descriptions of systems with a sufficiently large number of molecules.
4.1. Single variable probability distributions
4.1.1. Discrete stochastic variables
Stochastic process that have a finite “event space” are described by possible
outcomes, ε(1), ε(2), …, ε(ν), and their associated discrete probabilities p(1), p(2), …,
p(ν). The event space and associated probabilities constitute the probability distribution,
)()3()2()1(
)()3()2()1(
pppp
. (4.1)
The numerical values chosen to represent each event ε(i) and the probabilities assigned to
each event depend on intrinsic features of the system. The values of individual
probabilities are zero or positive numbers and are often scaled such that probability
distribution is normalized, i.e., the sum of all probabilities is one,
11
i
ip . (4.2)
Discrete probability distributions are encountered in games of chance, but many
natural phenomena also have discrete event spaces. Examples of discrete probabilities
are,
In flipping a coin, the two possible events are the scoring of a heads or tails. A
numerical value assigned for a heads toss is ε(1) = 1 and tails toss is ε(2) = –1 . If
the coin is fair we assign p(1) = p(2) = ½.
In the role of a die, there are six possible events. The role of a 1 assigned is ε(1) =
1, role of a 2 is assigned ε(2) = 2 and so on up to ε(6) = 6. For a fair die, we assign
p(1) = p(2) = … = p(6) = 1/6.
One-dimensional random walk, representing diffusion of a particle in a medium, is
composed of individual steps of motion of the particle to the left ε(1) = –1 or to the
right ε(2) = +1. For an unbiased random walk, p(1) = p(2) = ½. In a biased walk
(for example, the diffusion of an ion in an electric field in electrophoresis), the
probabilities of motion to the left and right are not identical and p(1) p(2). A
random walk with n steps is described by the binomial distribution.
In bingo, for the first draw, drawing a 1 represented by ε(1) = 1, drawing a 2
represented by ε(2) = 2, …, drawing a 75 represented by ε(75) = 75. If all the bingo
balls are identical, p(1) = p(2) = … = p(75) = 1/75. The probabilities for drawing
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 4
subsequent numbers change. After drawing the first ball, the event space will
become smaller and have 74 members.
Probability distributions are characterized by a number of mathematic properties.
The probability distribution gives the average (mean, expectation value) associated with
observations of an event,
)(...)2()1(
)()(...)2()2()1()1(
ppp
ppp
. (4.3)
For example, the average or expectation value from the role of a die is
5.36543216
1
6
1
6
1
6
1
6
1
6
1 . Note that the average value, ε for a
probability distribution may not correspond to an actual outcome of any individual event
ε(i). Calculating an expectation value is analogous to determining of a center of mass of a
collection of objects located at ri and with mass mi,
mmm
mmmcm
...
...
21
2211 rrrr . (4.4)
where probabilities play the role of statistical weight of the ε(i) in the distribution.
A probability distribution can be calculated for any mathematical function of the
event space, f[ε(i)],
)(...)2()1(
)()]([...)2()]2([)1()]1([)()(
ppp
pfpfpfff
. (4.5)
For a normalized distribution, Eq.(4.5) is written more compactly as,
)(])([)(1
ipiffi
. If f[ε(i)] = ε(i)m, the resulting average is called the m
th moment
of the distribution,
1
)()(i
mmm ipi . (4.6)
For example with m = 2 for a die we have, ε2 = 15.617. The m
th central moment of a
distribution is the average of the mth
power of the difference of each value from the
average,
1
)()()(i
mmmipi . (4.7)
By definition of the average, the first central moment of any distribution is zero. The
second central moment is called the variance, the square root of which is the standard
deviation, usually shown as σ. The standard deviation is a measure of how much on
average any measurement will differ from the average of the distribution. For example,
for a die, (ε2–ε)
2 = 2.9167 and σ = 1.7078 which means that the average outcome of a
role a die will differ by 1.7078 from the expectation value of 3.5. The standard deviation
is a measure of the width or spread of a probability distribution.
4.1.2. Continuous stochastic variables
Some stochastic processes have a continuous range of outcomes (events)
represented by the variable x. In these cases, the continuous events are categorized by a
continuously varying range of probabilities p(x). The continuous variable x can have
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 5
values within a range [xmin, xmax] which can be infinite, (–∞,+∞), semi-infinite, such as
[0,+∞), or bound such as [0,1].
The event x is represented as a point on the real number line between the lower and
upper limits. For continuous variables, there are an infinite number of x values in a range
[xmin,xmax] and instead of determining probabilities associated with observing the exact
value of x, the probability of observing the variable over a narrow range between x and
x+dx, shown as p(x)dx is used. The probability of obtaining the exact value of x is
vanishingly small which is why a small range dx around each value of x is considered to
determine a finite probability. Similar to discrete probabilities, the probabilities
associated with continuous variables must also be zero or positive over the range of x.
The probabilities for continuous variables are often scaled to give a normalized
distribution, meaning the integrated probability distribution over the range of the variable
is equals 1,
1)(max
min
x
xdxxp . (4.8)
The average or mean of a function of the variable x, f(x), is defined over the same
range as,
max
min)()()()(
x
xdxxpxfxfxf . (4.9)
The simplest case is the average value of the variable x itself,
max
min
)(x
xdxxxpxx . (4.10)
If f(x) = xm, the resulting average is called the m
th moment of the distribution,
max
min)(
x
xmmm dxxpxxx . (4.11)
The mth
central moment of a distribution is defined as,
max
min)()( x
x
mmmdxxpxxxxxx . (4.12)
For m = 2, the value (x–x)2 is called the variance of the distribution and
2xx is the standard deviation.
A widely encountered continuous distribution is the Gaussian (“normal” or bell
curve) distribution function,
])(exp[)( 20xxxp
, (4.13)
which is shown in Figure 4.1. In the Gaussian distribution, the range of the variable x is
between –∞ to +∞ and the parameter α characterizes the width of the distribution. The
normalization of the Gaussian distribution function in Eq. (4.13) is proven in Appendix
4.1. For the Gaussian distribution, in Appendix 4.1 we show that x = x0 and (x – x)2
= 1/2α = σ2. It is convenient to write the Gaussian distribution function in terms of the
standard deviation,
22
0 2/)(
22
1)(
xxexp
. (4.14)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 6
Figure 4.1. The Gaussian distribution
function with three values of the α-
parameter. The average value of the
distribution remains constant, but the
width (and therefore standard deviation) of
the distribution increases for smaller α
values.
4.2. Multivariable Distributions: Independent Variables and Convolution
The behavior of many systems depends on more than one stochastic variable. To
characterize the stochastic behavior of the system as a whole, behavior of the individual
stochastic variables must be known. Before giving the general description of how this is
achieved, consider the familiar example of predicting the total value of the role of two
dice.
When two die, labelled I and II, are rolled, the total value, E2, is a number ranging
between 2 to 12, which depends on the role value from each of the individual die, I(i)
and II(i). There are 36 possible outcomes for the role of the two dice which are
summarized in Table 4.1. In anticipation of future use, the E2 is called the macrostate of
the process, and the combination of the states of the individual die (I,II) the
corresponding microstate of the event. As seen in Table 4.1, different microstates can
lead to the same macrostate. The number of microstates associated with the macrostate is
called the “degeneracy”, Ω(E2), of that macrostate.
Table 4.1. The 36 possible outcomes (microstates) for the role of two die and the ten
corresponding overall outcomes (macrostates).
Values of individual die,
(εI,εII) (microstates)
Overall outcome,
E2 (macrostate)
Degeneracy of
macrostate, Ω(E2)
(1,1)
(1,2) (2,1)
(1,3) (3,1) (2,2)
(1,4) (4,1) (2,3) (3,2)
(1,5) (5,1) (2,4) (4,2) (3,3)
(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)
(2,6) (6,2) (3,5) (3,6) (4,4)
(3,6) (6,3) (4,5) (5,4)
(4,6) (6,4) (5,5)
(5,6) (6,5)
(6,6)
2
3
4
5
6
7
8
9
10
11
12
1
2
3
4
5
6
5
4
3
2
1
If the dice are fair, all 36 microstates are equally probable. However, the ten
macrostates are not equally probable and those having a greater number of associated
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 7
microstates (degeneracies) occur with greater probability. For example, rolling a
macrostate E2 = 6 (with five associated microstates) is five time more probable than
rolling a macrostate E2 = 12 (with one associated microstate).
The probability for the two-dice macrostate is written as P2(E2), where the subscript
2 denotes that two stochastic variables determine the probability. We can similarly study
macrostates EN and the corresponding probability distributions, PN(EN) for roles of N dice
which will depend on event probabilities for each of the individual N dice, I(i), II(i), …,
N(i). If the N dice are fair, the probability distributions α(i) will be identical. The
probabilities p1(ε) to P5(E) for the role of 1 to 5 dice are given in Table 4.2 and plotted in
Figure 4.2.
Figure 4.2. The normalized probability distributions for outcomes of the roles of one die (black),
two (red), three (green), four (blue), and five (orange) dice. In accordance with the central limit
theorem, as the number of die increase, the probability distribution of the macrostate variable X
approaches the Gaussian form.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 8
Table 4.2. The value of the role of dice and corresponding probabilities for 1 to 5 dice.
Total
role, X
Probability Total
role, X
Probability Total
role, X
Probability Total
role, X
Probability
One die Three dice Four dice Five dice
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1/216
3/216
6/216
10/216
15/216
21/216
25/216
27/216
27/216
25/216
21/216
15/216
10/216
6/216
3/216
1/216
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1/1296
4/1296
10/1296
20/1296
35/1296
56/1296
80/1296
104/1296
125/1296
140/1296
146/1296
140/1296
125/1296
104/1296
80/1296
56/1296
35/1296
20/1296
10/1296
4/1296
1/1296
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
1/7776
5/7776
15/7776
35/7776
70/7776
126/7776
205/7776
305/7776
420/7776
540/7776
651/7776
735/7776
780/7776
780/7776
735/7776
651/7776
540/7776
420/7776
305/7776
205/7776
126/7776
70/7776
35/7776
15/7776
5/7776
1/7776
Two dice
2
3
4
5
6
7
8
9
10
11
12
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
For stochastic systems where a continuous macrostate variable, X, depends on a
combination of N stochastic continuous variables xI, xII, …, xN, the probability
distribution (probability density), PN(X) gives the probability PN(X)dX of observing the
macrostate variable in the range of X to X+dX with the subscript N denoting a probability
that is dependent on N “microscopic” stochastic variables. In the general case, stochastic
variables xI, xII, …, xN can have different normalized probability distributions, f1(xI),
g1(xII), …, h1(xN), where the subscript 1 designates the one-variable probability
distributions. If the N stochastic variables are independent of one another and the
probability distribution each variable is not affected by the value of other variables, the
probability of the macrostate variable X is,
'
,..., 111 )()()()(NIII xxx NIIIN xhxgxfXP . (4.15)
The prime on the summation indicates that the sum only applies to combinations of the xI,
xII, …, xN which give the specific value of the macrostate X. The simplest case of Eq.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 9
(4.15) is when the distribution functions of all N variables are identical and f1(xI) = f1(xII)
= f1(xIII) = … = f1(xN).
A simple but important case is when macrostate variable X which is the sum of the
N variables, X = xI + xII + …+ xN. The total kinetic energy of an ideal gas system of N
molecules, EN is a physical example of this case. If each molecule has an energy i, then
EN = εI + εII + …+ εN.
If the variables xI , …, xN vary between min
Ix to maxIx ,…, min
Nx to maxNx , the range of
the sum macrostate variable X lies between minmin
II
min
I
min
NxxxX and maxmax
II
max
I
max
NxxxX . Similar to Eq. (4.9), the average of a function of the
macrostate sum variable X is,
max
min)()()()(
X
X N dXXPXfXfXf . (4.16)
For the average of the macrostate variable, f(X) = X,
max
min)(
X
X N dXXXPXX . (4.17)
This average is related to the distribution functions of the individual variables, f1(xα).
Substituting Eq. (4.15) into Eq. (4.17) gives,
max
min
max
min 11 )()(...Ix
Ix
Nx
Nx
NININI dxdxxhxfxxX . (4.18)
By expanding the integral and separating variables, the average of the macrostate variable
X simplifies to,
N
i
iNIII xxxxX1
. (4.19)
Similarly, the variance of the macrostate variable X is the sum of the variance of the
individual variables xi,
N
i
i
j
jj
i
ii
NIIINIIIN
xxxxx
xxxxxxXX
1
2
1
222
)(
(4.20)
The term in the second row is obtained by rearranging the terms after the squaring.
For a N-variable system where all variables xi have uniform distributions,
111 hgf , the average and variance of the sum variable reduce to simple forms,
)()( 21
2IN
I
xNX
xNX
(4.21)
The relation in Eq. (4.21(b)) is used as the definition to determine whether variables xI,
xII, …, xN are uncorrelated. The relations obtained in Eqs. (4.21), while simple, are
extremely important. For example, we have,
NxN
N
X I
N 11
. (4.22)
Equation (4.22) shows that relative to the mean of the macrostate variable X, the standard
deviation σN becomes narrower as N increases. An explicit illustration of this behavior is
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 10
seen in the molecular kinetic energy distributions discussed below. Note that Eqs.(4.18)-
(4.22) hold regardless of the specific form of the probability distributions for the
variables!
An important result, called the central limit theorem, applies to systems with large
numbers of stochastic variables. This theorem states if the individual one-variable
distributions f1(xi) have proper boundary conditions (are finite at the limiting values) and
finite variance values σ1, for large numbers of variables, the N-variable probability
distribution PN(X) becomes Gaussian,
22
2
2
1)(lim N
XX
NN
NeXP
. (4.23)
The central limit theorem is independent of the functional form of the one-variable
distributions p1(xi) and even applies to discrete distributions such as the probability
distribution for total value of the roles of N dice, PN(E), which become Gaussian for large
N. The operation of the central limit theorem can be seen in probability distribution for
sum role of five dice, P4(X) in Fig. 4.2 which is already starting to look Gaussian.
The mathematical procedure for calculating the probability distribution function for
the variable X, involves mixing or convolution of one-variable distribution functions. The
convolution procedure is illustrated for X = xI + xII with a distribution function P2(X).
Different combinations of xI and xII values can give rise to the same X and the probability
of observing the sum to be between X and X + dX is,
max
min 2
max
min
max
min 2
22
),(
),()(
),()(
Ix
Ix
III
Ix
Ix
IIx
IIx
IIIIIIIII
IIIdXXIIxIxX
III
dxxXxP
dxdxxxPXxx
dxdxxxPdXXP
(4.24)
The subscript on the integrals in the first line shows that the integration limits are
constrained such that xI + xII always remains between X and X+dX. This constraint can be
introduced into the integrals by using the delta function δ(xI+xII−X) which is a concise
way of summarizing the two conditions,
0 if 1
0 if 0)(
Xxx
XxxXxx
III
IIIIII (4.25)
The delta-function eliminates all values of the integrand except when, xII = X – xI, and
allows the use the full range minix to max
ix for each integration limit. This simplifies the
evaluation of the integral. To get the last equality in Eq. (4.24), we use the properties of
the delta function to we integrated over the variable xII.
If xI and xII are independent (as in the case of the energies of the two non-
interacting molecules or the role of two dice), P2(xI, X-xI) = p1(xI)p1(X-xI) and Eq. (4.24)
reduces to,
max
min 112 )()()( Ix
Ix
III dxxXpxpdXXP . (4.26)
The first one-particle probability distribution represents the contribution of xI and the
second the contribution of xII to the convoluted integral.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 11
In general, an integral of the form,
maxmin
maxmin
2 )()()()()( xx
xx
dxxgxXfdxxXgxfXP (4.27)
over the range of the distribution ],[ maxmin xx , is called convolution of f and g for a
specific X value.
We implicitly used the concept of convolution to calculate the probability
distributions for the role of two or more dice from the probability distributions for the
individual die as shown in Figure 4.2. For two dice we can write,
IIII IIIII EppppEP )()()()()( 11
', 112 , (4.28)
where the prime on the first summation shows that εI and εII are restricted to values such
that the condition on εI + εII = E is satisfied. For example, the probability of rolling a 7
from two dice is the convolution of the probabilities of the two individual die subject to
the constraint that εI + εII = 7,
36
16
)1()6()2()5()3()4()4()3()5()2()6()1(
)7()()()()7(
111111111111
11'
, 112
pppppppppppp
ppppPI IIIII III
.
(4.29)
In Appendix 4.2, we show how the convolution of two Gaussian functions is
performed.
4.3 The Maxwell – Boltzmann Velocity Distribution
In previous sections of this chapter, the mathematical form of the probability distribution
was assumed to be known or could be guessed based on the intrinsic nature of the
stochastic variables under study. In this Section, we derive the Maxwell-Boltzmann
velocity distribution function of molecules in an ideal gas where the energies of
molecules are independent of one another. The assumptions we apply about the nature of
probability distributions of physical quantities take us from the realm of mathematical
probability theory into the field of kinetic theory of gases and statistical mechanics. These
assumptions lie at the heart of our understanding of the connection between the
microscopic and macroscopic viewpoints. We shall see that the Maxwell-Boltzmann
velocity distribution has applicability beyond the ideal gas model and in classical
systems, represents the velocity distribution of molecules regardless of their physical
state.
4.3.1. The concept of temperature from the mechanical analysis of an ideal gas
An ideal gas is a simple model for the behavior of gas systems where the motion of
molecules are assumed to be governed by classical mechanics and molecules interact
with each other or with the vessel walls only during collisions. The Maxwell-Boltzmann
distribution describes the velocities of molecules in an ideal gas. Before deriving the
Maxwell-Boltzmann velocity distribution, we review the concept of temperature of an
ideal gas from a molecular point of view.
A microscopic interpretation of temperature was given by the Swiss mathematician
Daniel Bernoulli in 1738. He analyzed the motion of a collection of non-interacting
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 12
spherical molecules which make up an ideal gas. This type of analysis which assumes
random motion and collisions in the gas phase and uses probability concepts to describe
dynamic processes is the subject of the kinetic theory of gases.
The ideal gas, represented schematically in Figure 4.3, is considered to be a
collection of non-interacting molecules of mass m which have volumes that are very
small compared to the volume of the vessel the gas is confined to. The collisions between
the molecules and the walls of the container are considered to be elastic, meaning there is
no loss of the total kinetic energy of the center of mass motion of the molecules during a
collision. Microscopically, this implies that no kinetic energy of the center of mass
motion enters the internal degrees of motion of the molecules (i.e., molecular rotation or
intramolecular vibrations) after the collision is completed.
Figure 4.3. The kinetic theory of gases picture of an ideal
gas. Molecules are moving randomly with a spread of
velocities in all three spatial directions. Molecules
occasionally collide with each other and with the walls of the
vessel.
The molecules in an ideal gas have a distribution of velocities. Each molecule i has
a velocity vi with components vix, viy, and viz. The magnitude of the velocity or speed vi of
molecule i is,
2,
2,
2, ziyixii vvvv . (4.29)
In an isotropic environment, it is “reasonable” to expect there will be no bias in the
distribution of the velocity components in the three Cartesian directions of the gas
molecules. Therefore, the average square speed of the molecules, v2 should be equally
distributed among the squared averages of the components of the velocity in the three
Cartesian directions,
32222 vvvv zyx , (4.30)
where the brackets represent averages over all molecules in the system. This relation
guarantees that the gas does not spontaneously flow in any of the Cartesian directions.
According to this mechanical picture of the gas, the direction of motion, and
therefore the momentum of a molecule changes when it collides with the container walls.
This change in momentum of a molecule is the result of a force from the wall (Newton’s
second law) and the molecule in turn exerts an equal force on the wall in the opposite
direction (Newton’s third law). The details of this process are shown in Figure 4.4. The
total force on the surface A of the confining wall leads to the gas exerting a pressure P =
F/A on the walls of the container.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 13
Figure 4.4. The collision of a molecule with velocity
component vxi with the wall in the x-direction. The time
between consecutive collision with the wall is related to
the time it takes the molecule to traverse the distance
2Lx.
To determine the average pressure exerted by all molecules on the wall Wx1 with
area A, of the container, consider a molecule with the velocity component vxi in the
positive x-direction. As a result of the elastic collision of the molecule with the wall, the
velocity component in the x-direction is reversed to -vxi. If this molecule does not undergo
collisions with other gas molecules, it will move towards the opposite wall Wx2 collide
with it and then be reflected back towards Wx1. The time between consecutive collisions
with the wall Wx1 is, Δt = 2Lx/vxi, where Lx is the length of the container in the x-direction.
The molecule has undergone a change in momentum of Δpxi = 2mvxi during the time Δt
which according to Newton’s second and third laws will equal the force the molecule
exerts on the wall during that time. The contribution to the pressure on the wall Wx1 of
area A as a result of the collision of this one molecule is,
V
mv
AL
mv
A
tp
A
fP
ix
x
ixixixi
2,
2,,,
, (4.31)
where V is the volume of the container (occupied by the gas). If there are N molecules in
the system, the total pressure on the wall in the x-direction is,
V
EN
V
vNm
V
vNm
V
mvPP
KxN
i
ixN
iitot
3
2
3
22
1
2,
1
, (4.32)
where Eq. (4.30) was used and EK = mv2/2 is the average kinetic energy of the gas
molecules in the system. This is Bernoulli’s expression which related the pressure of an
ideal gas system (a macroscopically measurable quantity), to the average speed or kinetic
energy of the molecules in the ideal gas (a microscopic and non-measureable quantity).
The macroscopic ideal gas law was discovered from the empirical observations of
Robert Boyle (1662) and Jacques Charles (1780s). Boyle demonstrated that at constant
temperature, the pressure of an ideal gas is inversely proportional to its volume, and
Charles demonstrated the proportionality of the gas volume to its temperature at constant
external pressure. For a fixed volume and temperature, it was also known that the
pressure of a gas depends on the amount of gas in the system. The ideal gas equation of
state is a combination of these observations. In molecular terms, the ideal gas law is
equivalent to,
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 14
V
NkTPtot , (4.33)
where T is the temperature and k is Boltzmann’s constant. Comparing Eqs. (4.32) and
(4.33), the temperature of an ideal gas is identified as the average kinetic energy of the
gas molecules,
KEk
vmk
T3
2
3
1 2 . (4.34)
This is a simple example of statistical mechanical reasoning where the macroscopic
temperature is related to the microscopic average kinetic energy of the molecules in the
gas. At the heart of this relation is the statistical / probabilistic assumption about the
average velocity components of the gas molecules given in Eq. (4.30).
Equation (4.34) relates the average molecular speed to the temperature, but does not
give any information about the distribution function of the velocities. The molecules in an
ideal gas have different velocities, vi. The velocity distribution of the ideal gas system
gives the probability of a molecule having a certain velocity in a given temperature and
can be used to derive the speed, v, and energy, , distributions for the individual
molecules as well. Convolution of the distributions of the energies of the individual
molecules is used to determine the distribution of the total energy, EN, for the N-molecule
ideal gas system. These distributions give insight into the behavior of collective
properties and can illustrate how descriptions of a system change as we go from an
individual molecule to the collective (macroscopic) system viewpoint.
For completeness and future reference, we present the mechanical view of the
pressure in a system in Appendix 4.3.
4.3.2 The Maxwell – Boltzmann distribution of velocities for an ideal gas
As seen in Figure 4.1, different probability distributions can give the same average
(mean) value for the variable. Knowledge of the temperature of the system does not
provide sufficient information to derive the distribution of the velocity among the
molecules in an ideal gas system. Different velocity distributions can be envisioned. For
example, all molecules can move with the speed equal to the average value, i.e. vi2 = v
2
for all i. Another example is a distribution where half of the molecules move with vi2
=
v2/2 and the other half move with vj
2 = 3v
2/2. In both of these cases, the average
square speed of all molecules in the system is v2, and both distributions would give the
same temperature for the gas sample. The dynamical behavior of these two systems,
however, would be very different. Among all possible distributions of the velocity, in
1866, James Clerk Maxwell set out to find the most probable distribution of velocities
which is still consistent with the known temperature. Velocity distributions of the gas
molecules can be measured experimentally and Maxwell’s velocity distribution has been
verified.
Before describing Maxwell’s derivation of the velocity distribution, we summarize
the mechanism by which different velocities are generated of the molecules in a gas.
Consider two molecules in a gas moving towards each other with different initial
velocities as shown in Figure 4.5. These molecules “collide” if they move closer than the
range of their mutual interaction potential. For hard sphere molecules, this means the
molecules collide if the impact parameter, b, is less than the sum of their radii. The
collision changes the velocities and redistributes the kinetic energy among the two
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 15
molecules. For elastic collisions, linear momentum, angular momentum, and kinetic
energy are conserved, i.e. their total amounts are the same before and after the collision.
Many collisions (between 1010
– 1011
collisions per second in a typical case) occur in a
typical gas sample with ~1023
molecules. These collisions become a strong mechanism
for velocity redistribution. During any period of observation, we cannot assign fixed
velocities to individual gas molecules, but only say there is a distribution of velocity
among all molecules in the gas at any given time. Maxwell was aiming to derive the form
of this velocity distribution without having to survey the velocities of a collection of
molecules moving at different velocities.
Figure 4.5. The schematic representation of a collision between two molecules moving with an
impact parameter b. A pre-collision configuration is shown in (a), the closest distance of approach
is shown in (b), and the after-impact velocities of the molecules are shown in (c). The
conservation of energy, linear, and angular momentum determine the final velocities from the
values of the initial velocity and impact parameter.
Maxwell made a set of deceptively simple arguments to derive the most probable
distributions for the x, y, and z components of the molecular velocity in an ideal gas. In
the most general case, the distribution of velocities, P(v), will depend on both the
magnitude of the velocity and the direction of motion of the molecules. Maxwell’s first
assumption is that for a system in equilibrium, the velocity distribution of molecules is
isotropic and does not depend on spatial direction in the system. In other words, there is
no preference to moving in any specific direction and the velocity distribution (whatever
form it has) only depends on the magnitude of the speed,
),,()()( zyx vvvPvPP v , (4.35)
where P(v) is the distribution function for the molecular speeds. This implies that in the
velocity distribution, the Cartesian components appear only as the combination,
2222zyx vvvv . (4.36)
The physical basis of this assumption is that at equilibrium, our choice of Cartesian
coordinate system for describing the velocities of the molecules in a gas is arbitrary. We
can choose the x, y, and z-coordinate system to point in any arbitrary direction and this
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 16
should not affect the mathematical form of the velocity distribution. The magnitude of
velocity (speed) is an invariant, independent of the coordinate system.
Maxwell’s second postulate is that the values of the velocity components vx, vy, and
vz of the molecules are independent of one another and their distribution functions have
identical functional form. What this assumption entails, for example, is that the velocities
in the y- and z-directions are not affected by a molecule moving very fast in the x-
direction. Mathematically this is written as,
)()()(),,()( zyxzyx vPvPvPvvvPvP , (4.37)
where P(vx) is the distribution function of the x-component of the velocity. The reasoning
behind Eq. (4.37) is that the choice of the coordinate system is arbitrary and in the
absence of any external effects (such as pressure gradients, external electrical and
gravitational fields), the distribution functions for the velocity in the three Cartesian
directions should be identical.
The mathematical form of the velocity distribution function in an ideal gas is
dictated by these two simple assumptions. Taking the derivative of each side of Eq.
(4.37) with respect to one velocity component, say vx, and using the chain rule and Eq.
(4.36), we get,
)()()()()()(
zyx
xx
xx
vPvPdv
vdP
v
v
dv
vdP
dv
dv
dv
vdP
dv
vdP . (4.38)
Dividing both sides of the last equality in Eq. (4.38) by P(v) and separating variables
gives,
xxx
x
vvPdv
vdP
vvPdv
vdP
)(
1)(
)(
1)( . (4.39)
The left hand side of Eq. (4.39) is only a function of the variable v and the right hand side
is only a function of vx. This shows that both sides must be equal a constant, C′. Solving
the equation for vx gives the distribution function for each of the components of the
velocity,
BCvvPdvvCvP
vdPC
vvPdv
vdPxxx
x
x
xxx
x 2)(ln)(
)(
)(
1)(, (4.40)
or in a more familiar Gaussian form,
2xCv
x AevP
)( . (4.41)
Similar expressions hold for the vy and vz components of the velocity. Maxwell’s
assumptions lead to a Gaussian distribution for each of the velocity components with two
underdetermined constants A and C. By applying the normalization condition to the
probability distribution in Eq. (4.41), noting that vx varies between -∞ and +∞, the
constant A is determined (see Eq.(A4.46) in Appendix 4.4),
2
21
1 xCvxxx e
CvPdvvP
/
)()(
. (4.42)
The constant C can be determined by using the kinetic theory of gases result Eq. (4.34),
kTmvx 2
12
2
1 , where k is the Boltzmann constant, T is the temperature and m is the
molecular mass. Using Eq. (4.42) gives,
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 17
kT
mC
kTdvvPvmmv xxxx
22)(2
2
12
2
1
. (4.43)
Equation (A4.47) in Appendix 4.4 is used in evaluating the integral in Eq. (4.43). The
distribution function for vx is thus,
kTxmv
x ekT
mvP
222/1
2)(
. (4.44)
Similar distribution functions holds for vy and vz. By comparing Eq. (4.44) with (4.14),
we see that the variance of the Gaussian probability distribution, σ2 = kT/m, and the
distribution becomes broader at higher temperatures and for lighter particles. The velocity
component distributions at three temperatures are shown in Figure 4.6.
Figure 4.6. The probability distributions for
velocity components of ideal gas molecules
at three temperatures. The distributions of
the velocity components are Gaussian with a
width that depends on temperature and
molecular mass.
We can use the distribution functions for the velocity components to calculate the
distribution function for the speed. The probability of simultaneously having speed
between v and v+dv is related to the distributions of velocity components between vx and
vx+dvx, vy and vy+dvy, vz and vz+dvz,
zyxkTzvyvxvm
zyxzyx dvdvdvekT
mdvdvdvvPvPvPdvvP
2)222(2/3
2)()()()(
.
(4.45)
Converting the volume element in Eq. (4.45) from Cartesian space to the spherical
polar coordinate system gives the familiar form,
ddvdvdvdvdv zyx sin2 . (4.46)
Substituting Eq. (4.46) in Eq. (4.45) and integrating over the angle variables (from 0 to
180) and (from 0 to 360) gives the distribution function for molecular speed,
dvvekT
mdvvP kTmv 22
2/32
24)(
. (4.47)
Note that P(v) contains the factor of v2 and is not Gaussian. The distribution functions for
molecular speed at same three temperatures shown in Figure 4.6 are shown in Figure 4.7.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 18
Figure 4.7. The probability distribution
for the speed of ideal gas molecules at
three temperatures. Note that the
distributions are skewed towards higher
speeds and become broader at higher
temperatures.
The speed distribution can be used to determine the kinetic energy distribution of
each molecule, ε = mv2/2. Changing variables from speed to kinetic energy, and using dε
= mvdv in Eq. (4.47) we get,
dekT
dP kT
2/1
2/3
112
)( . (4.48)
The subscript 1 indicates a one-molecule kinetic energy distribution function. The kinetic
energy probability distribution is plotted in the Figure 4.8.
Figure 4.8. The probability
distribution for the kinetic energy of
an ideal gas molecule showing the
most probable and average energies.
The variance of the distribution is also
shown. Plot for 200 K, 500 K, and
1000 K.
For one molecule in an ideal gas, the average energy is,
0 232/3
2/3
01 )exp(
12)( kTdkT
kTdP
(4.49)
Equation (A4.43) in the Appendix 4.4 is used to derive the last equality. The most
probable energy (max) is,
2
00)(
max/2/11/2/1
211 kT
eed
dP kT
kT
kT
. (4.50)
The standard deviation for the energy distribution is = (3/2)1/2
kT = 1.22kT is large
relative to the most probable energy εmax = kT/2, which shows that individual molecules
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 19
have large spreads of energy in the ideal gas. The fact that the average and most probable
energies are separated by a relatively large energy value shows that the one particle
energy distribution is skewed.
4.3.3. Energy distributions for collections of molecules in an ideal gas
Knowing the probability distribution for the energy of individual molecules, the
next stage is to determine the probability distribution, P2(E2), for the total energy for two
molecules in an ideal gas system, where E2 = εI + εII.
To determine the two-molecule distribution function, the convolution of to one-
molecule probability distributions is performed,
2
2 2
222
0 211
0 0 112
222
)()(
)()()(
),()(
EIII
E EIIIIIIIII
IIIdEEE
III
dEPP
ddPPE
ddPdEEP
III
(4.51)
where the definition of the delta function given in Eq. (4.25) is used. The subscript on the
first set of integrals shows that the integration is constrained to values of εI and εII where
E2 < εI + εII < E2+dE2. Substituting the normalized one molecule energy distribution
given in Eq. (4.48) and evaluating the integral in Eq. (4.51) using Eq.(A4.50) of
Appendix 4.4 gives,
kTE
E kTEkT
eEkT
deEekT
EP
/2223
20
/)2(2/12
/2/1
322
)(2
1
)(4
)(
. (4.52)
From this distribution, the average two-particle energy is,
kTkTkT
dEeEkT
EkTE
3)/1(
6
)(2
1
)(2
1
4320/3
2322 , (4.53)
and it can be shown that the most probable energy is kTE 22max, and the standard
deviation of the energy is kT32 .
The procedure for determining P3(E3) and E3 for a collection of three molecules is
given in Appendix 4.5.
The convolution procedure can be repeated to calculate the probability distribution
for the total energy of N molecules, EN = εI + εII + ··· + εN to give,
)exp()(! 123
1)(
]1)2/3[(
2/3kTEE
kTNEP N
NNNNN
. (4.54)
This probability distribution gives the average energy as,
kTN
dEeEkTN
E NkTEN
NNNN
2
3
)( ! 123
1
0
/2/3
2/3
, (4.55)
and most probable energy and standard deviation are kTNE N 123max, , and
kTNN )23( .
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 20
To compare the relative width of the energy distributions for different numbers of
molecules, we define probability distributions in terms of the reduced energy E*i =
Ei/Emax,i, which the energy relative to the most probable energy,
**)(**2
1)( 1
2/*2/11
dPdedP , (4.56)
and
*2
*22
*2
*222*
2222 )(4)( dEEPdEeEdEEPE
. (4.57)
The general expression for the N-molecule energy distribution is,
***)12/3()12/3(*
2/3
)()!12/3(
)12/3()( NNN
NENNN
N
NNN dEEPeEN
NdEEP
. (4.58)
For small N the factorial in Eq. (4.58) can be calculated directly. For large N values,
Stirling’s approximation given in Eq.(A4.38) of the Appendix 4.4 can be used to
determine (3N/2 – 1)!. The total reduced energy distributions for 1 to 10 molecules are
shown in Figure 4.9.
Figure 4.9. The kinetic energy
distribution for 1 to 10 molecules
plotted as a function of the energy
reduced by the most probable
energy. The distribution becomes
sharper with greater numbers of
molecules.
The energy distributions become sharper and the skew less pronounced as the
number of molecules increases. Figure 4.10 shows that for collections of larger numbers
of molecules, the distributions of energy relative to the most probable energy become
very narrow. Extrapolating to macroscopic systems with N of the order of 1020
, the
energy distribution is effectively a single valued (δ-) function and single, well-defined
value for the energy is always observed from measurements of the energy EN for this
system.
Figure 4.10. The kinetic
energy distribution for systems
with 10 to 1000 molecules
plotted as a function of the
E*max,N =1
E*N
P(E
*N
)
E/Emax,N
P(E
/Em
ax,N
)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 21
energy reduced to the most probable energy.
The analysis and Figures 4.9 and 4.10 show how distributions of many molecule
macroscopic properties behave differently from distributions of one molecule
microscopic properties. New behaviors emerge as the number of molecules in a system
increases and new relations, such as equations of state, may be discovered between well-
defined many-molecule quantities. Such laws do not necessarily hold between
corresponding quantities describing small numbers of molecules. These new emerging
relations are a consequence of the laws of mechanics and the laws of probability for
systems with large numbers of variables.
Up to this point, we have shown that the energy distribution in the ideal gas system
becomes sharper as the number of molecules increases. In the next chapter we discuss
systems where the total energy is not a simple sum of one molecule energies. We see that
even for systems with interacting molecules, well-defined, system-wide properties also
emerge as we go to macroscopic sized systems. By introducing the mathematical concept
of the ensemble, first developed by Boltzmann and Gibbs, we illustrate that the
mathematical analysis used to determine distribution functions for collective ideal gas
properties can be extended to collective properties of systems with interacting particles.
4.3.4 Assigning initial velocities to molecules in molecular dynamics simulations
The discussions of the previous sections imply that if a molecular dynamics simulation is
properly set up and performed, collisions between molecules in a system lead to a
Maxwell-Boltzmann distribution of velocities. At the beginning of a simulation, how do
we set up the molecular velocities such that the system is at a desired temperature? This
is done with random number generators which are available in all modern computer
programs. Two methods are used to set molecular velocities in a system to the proper
Maxwell-Boltzmann distribution at a temperature T.
In the first method, for the N-molecule system we generate a set of 3N random
numbers, each of which can vary in the range of -1 to +1. The random numbers from α1
to αN are associated with the velocity components vx1, to vxN, and similarly for the
remaining 2N random numbers are associated with the y- and z-coordinates of the
velocities. For a proper velocity distribution, we must have, vx = vy = vz = 0. These
conditions are guaranteed by the range and nature of the random numbers since for a set
of random numbers α = 0. In addition, we scale the random numbers by some constant
factor such that they satisfy the relations for velocity components at the required
temperature T,
2
3
2
1
2
3 22 NkTmvNmvN x . (4.59)
In this way, the velocity components of the molecules are assigned proper initial average
kinetic energy at the beginning of the simulation. After the simulation begins and the
molecules collide, a proper Maxwell-Boltzmann distribution of velocity components
automatically develops.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 22
The second method for assigning the initial molecular velocities is based on a
sampling technique developed by statisticians George Box and Mervin Muller (in 1958).
This method uses two random numbers which vary in the range of [0,1] to generate a
Gaussian distribution of variables. In the Box-Muller method, if the two random numbers
are shown as ξ1 and ξ2, the variable X calculated by,
)2cos(ln2 12 X , (4.60)
has a Gaussian distribution with a variance of σ2. By generating 2N random numbers, we
can generate N numbers representing the velocity components of molecules in the system
which have the proper Gaussian form. By relating the variance to the temperature of the
simulation, the velocities components can be scaled to give the required temperature.
4.3.5 Phase space description of an ideal gas
The phase space description introduced in Chapter 1 was applied to mechanical systems
with a relatively small number of degrees of freedom and corresponding {q, p} pairs.
Even for a system with two degrees of freedom, there are four phase space variables, and
the phase space trajectory cannot be represented in three-dimensional space. Nonetheless
the concept of the phase space trajectory is useful in guiding our thinking on the global
behavior of mechanical systems.
In the case of an ideal gas in a cubic vessel of length L on each side, the phase space
consists of molecular coordinates randomly distributed between -L/2 and L/2 with the
conjugate momenta of the molecules distributed according to the Maxwell – Boltzmann
distribution. Each molecule is represented by a point in a six-dimensional phase space
consisting of {x, y, z, px, py, pz}, the so-called μ-space (molecule space), see the Chapter
5, and the distribution of N molecular positions / momenta in the system form a
probability cloud in the μ-space. As the system evolves, the shape of the phase space
cloud changes as molecules collide with each other and the walls.
IN a 6N-dimensional phase space of {xi, yi, zi, pxi, pyi, pzi, …, xi, yi, zi, pxN, pyN, pzN}
with i from 1 to N, the so-called γ-space (gas space), the initial state of the entire system
is represented by a single point. This single point moves in 6N-dimensional phase space
as the molecules collide with each other and the walls. The trajectory in this phase space
is complex, but is always constrained to move on a surface in the phase space with
constant energy,
𝐸𝑁 = 𝑝1
2(𝑡0)
2𝑚1+
𝑝22(𝑡0)
2𝑚2+ ⋯ +
𝑝𝑁2 (𝑡0)
2𝑚𝑁 (4.61)
In reality, for an N-molecule ideal gas system, the exact location and momentum of
each molecule are not known and so we cannot pinpoint a specific location in 6N-
dimensional phase space. We just know that the molecules are randomly distributed in
the volume available to them and that the distribution of velocities obeys the Maxwell-
Boltzmann probability distribution corresponding to the temperature of the system. The
only constraint on the distribution of velocities is that they must satisfy Eq. (4.61). This
knowledge can generates a probability distribution in 6N-dimensional phase space which
can be thought of as a cloud of all possible states with molecules in different locations
and momenta distributed according to the Maxwell-Boltzmann distributions. If the
system is evolving with a constant total energy, all points in this probability cloud in 6N-
dimensional space at time t0 are constrained to move on a constant energy hypersurface.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 23
The details of the time evolution of the probability distribution of molecules in the
in 6N-dimensional phase space is the subject of classical statistical mechanics. We will
not explicitly discuss Liouville’s equation (developed by Gibbs based on the work of the
French mathematician Joseph Liouville in 1838) which governs the time evolution of the
classical phase space probability distribution. Instead, in Chapter 5, we first approach
statistical mechanics through a quantum mechanical description of the state of a system.
Afterwards, we adapt the formalism to the classical mechanical description.
References
M. Abramovitz and I. A. Stegun, Handbook of Mathematical Functions, with Formulas,
Graphs, and Mathematical Tables, Dover Publications, New York, 1972.
P. A. Bromiley, Products and Convolutions of Gaussian Probability Density Functions,
Tina Memo No. 2003-003, Manchester, 2013.
H. Goldstein, C. Poole, and J. Safko, Classical Mechanics, 3rd
Ed., Addison Wesley, San
Francisco, 2001.
J. P. Hansen and I. R. McDonald, Theory of Simple Liquids, 3rd
Ed., 2005.
N. G. van Kampen, Stochastic Processes in Physics and Chemistry, 3rd
Ed. North
Holland, 2007.
D. A. McQuarrie, Statistical Mechanics, Harper Row, New York, 1976.
Wikipedia article on Liouville’s theorem has a good animated illustration of the time
evolution of a phase space probability distribution.
See: http://en.wikipedia.org/wiki/Liouville%27s_theorem_(Hamiltonian)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 24
Appendix 4.1 Normalization, Mean, and Standard Deviation of the Gaussian Function
To verify the normalization of the Gaussian function given in Eq. (4.13), we evaluate the
integral, I over the entire range of the Cartesian variable x,
dxxxI ])(exp[ 2
0
. (A4.1)
This integral is determined by applying a well-known mathematic trick. Instead of I, we
calculate I2,
ddedede
dyyydxxxI
)22(22
20
20
2 ])(exp[])(exp[
(A4.2)
In the second equality, we changed the integration variables to ξ = (x – x0) and ζ = (y –
y0). A coordinate transformation changes this double Cartesian integral to a double
integral over the polar coordinates r and θ,
20 0
22 rdrdeI r . (A4.3)
where the relations r2 = ξ
2 +ζ
2 and rdrdθ = dξdζ are used. In polar form, the double
integral is easy to evaluate; integration over θ yields 2π, and the integration over r yields
1/2α. This proves that I2 = 1 and I = 1. In other words, the Gaussian distribution as
defined in Eq. (4.13) is normalized. From these calculations, we see that,
de2
(A4.4)
The mean of the Gaussian distribution is given by,
0
2
02
0 )(])(exp[
x
dexdxxxxx
(A4.5)
The second moment of the Gaussian distribution can be similarly calculated,
dexxdxxxxx22
0022
022 )2(])(exp[ (A4.6)
The first integral on the right hand side of Eq. (A4.6) is solved using integration by parts
with ξ = u and –2αξexp(–αξ2)dξ = dv to give,
2
1
2
1 2222
deede . (A4.7)
Therefore,
20
2
2
1xx
. (A4.8)
From Eqs. (A4.5) and (A4.8), the first two central moments of the Gaussian
distribution can be calculated,
0])(exp[)( 002
00 xxdxxxxxxx
, (A4.9)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 25
2
122 2
0022222 xxxxxxxxxx . (A4.10)
Appendix 4.2 Convolution of Gaussian functions
The mathematical development presented below follows Bromiley.
An interesting property of Gaussian functions is that the product of two Gaussian
functions fa and fb with different means and variance values is also a Gaussian function.
Consider the Gaussian distributions fa with a mean of xa and a standard deviation of σa,
and fb with a mean of xb and a standard deviation of σb,
22/2)(
2
1)( aaxx
aa exf
and
22 2/)(
2
1)( bbxx
bb exf
(A4.11)
After straightforward but lengthy algebra, the product of the two Gaussian distributions
functions can be shown to be a scaled Gaussian function of the form,
22/2)(
2)()()( ababxx
abbaab e
Sxfxfxf
. (A4.12)
The mean of the product Gaussian fab′(x) is,
22
22
ba
abbaab
xxx
. (A4.13)
The standard deviation of fab′(x) is,
22
22
ba
baab
. (A4.14)
The product Gaussian is not normalized and has a scaling factor,
)(2/)(
22
222
)(2
1baba xx
ba
eS
. (A4.15)
Consider fa with {xa = 5, a = 2} and fb with {xb = 2; b = 3}. Figure A4.1(a) shows the
original Gaussian functions and their product. Equations (A4.13) and (A4.14) give xab =
4.077 and ab = 1.664 for the mean and standard deviation of the product distribution and
the scaling factor, S of the product Gaussian is 0.0783. Note that the mean of the product
Gaussian lies between the means of the two separate Gaussians and the standard
deviation of the product Gaussian is smaller than those of the two individual Gaussians. A commonly encountered case is where the two stochastic variables have identical
Gaussian distributions. In this case Eqs. (A4.13) to (A4.15) simplify to,
aab xx ; 2
aab
; and
a
S2
1 . (A4.16)
The non-normalized form of the product Gaussian given in Eq. (A4.12) for this special case is,
2/2)(
22
1)( aaxx
a
ab exf
(A4.17)
The distribution functions for this case are shown in Figure A4.1(b).
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 26
Figure A4.1. (a)The product of two Gaussian functions fa(x) (red curve) and fb(x) (green curve) is
another Gaussian function (blue curve) plotted in normalized form. (b) The product of two
identical Gaussian functions (blue curve) is a narrower Gaussian function (green curve), plotted
in normalized form.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 27
The convolution of two distributions was defined in Eq. (4.27). We consider the
convolution of two Gaussian functions fa(xI) and fb(xII) (see Eq. (A4.11)) representing the
stochastic variables xI and xII. Knowing the distributions of the individual variables xI and
xII, we want to determine the distribution function for the joint variable X = xI + xII.
The convolution of the two functions is,
I
22/2)I(22/2)I(
III2
2
1
)()()(
dxee
dxxXfxfXP
bbxxXaaxx
ba
ba
(A4.16)
The prime shows that the probability distribution given for X in Eq. (A4.16) is not
normalized. After some algebra, the integral can be calculated. The normalized
distribution for the X variable is,
)22(22)]([
222
)(2
1)( babxaxX
ba
eXP
(A4.17)
The proof of Eq. (A4.17) involves the use of Fourier transforms of the Gaussian
distributions fa and fb and is given in Bromiley.
The way the convolution formula in Eq. (A4.16) works is illustrated for the case
where {xI = xII = 5, I = II =2} in Figure A4.2. For this case, Eq. (A4.16) becomes,
I
8/2)5I(8/2)5I(2
8
1)( dxeeXP
xXx
, (A4.18)
For each value of X = xI + xII, the integrand is a product of two Gaussian functions, the
first centered at xI = 5, and the second offset to be centered at X - xI = X –5, as shown
in Figure A4.2. The product Gaussian function in the integrand of Eq.(A4.18) is shown in
blue color in Figure A4.2, where the integrand is shown for several values of X. The
product Gaussian function and integrand are largest when X = 10.
From Eq. (A4.17), the normalized probability distribution for the convolution of
two Gaussian functions is,
162)10(
24
1)( XeXP
. (A4.19)
This implies that it is more probable for X to have values near xI + xII = 10. In other
words, since the individual p1(xI) and p1(xII) distributions are more likely to have values
near their average, the sum value will also not deviate greatly from the sum of the two
averages. It is highly unlikely that the values of xI and xII will simultaneously deviate
greatly from their averages, so getting very large are very small values of X will be less
likely.
In this example, for the one variable distributions,
4.05
2
II
b
I
a
xx
(A4.20)
while, the two variable distribution gives,
I
a
xX
2
1283.0
10
222 (A4.21)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 28
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 29
Figure A4.2. The product of two Gaussian functions fa(x) (red curve) and fb(x) (green curve) is
another Gaussian function, shown in each case by the blue curve. The convolution of the two
distributions for each separation, P2(X) is related to the area under the blue curve.
Appendix 4.3 The virial equation and the microscopic mechanical view of pressure
We usually associate the laws of mechanics as being general and applying to motion of
masses under all conditions. For systems with periodic motions or confined systems, a set
of “statistical” laws can relate the time- or phase space averages of different quantities.
For an N particle system, consider the quantity,
N
iiiG rp (A4.22)
called the virial, after the Latin word for force, by Rudolf Clausius in 1860. The reason
for defining this quantity will become clear when we study its properties. The time
derivative of the virial, G is,
N
i
ii
N
ii
i
dt
d
dt
d
dt
dG rpr
p (A4.23)
Using Newton’s second law of motion, the first term is written in terms of the total force
on each particle i,
N
iii
N
ii
i
dt
drFr
p (A4.24)
The second term is can be rewritten in terms of the kinetic energy of the system,
Kmvmdt
d N
ii
N
iii
N
i
ii 22 vv
rp (A4.25)
The time average of dG/dt in Eq. (A4.23) over a length of time is obtained integrating
the time derivatives,
KGG
dtdt
dG N
iii 2
)0()(10
rF
(A4.26)
There are two sets of conditions over which the left hand side of Eq. (A4.26) is equal to
zero. For systems with periodic motion, at the time of the period = tp, the value of G()
= G(0) and we have,
KN
iii 20 rF (A4.27)
For confined systems, as , the values of pi and ri remain finite. So G() - G(0) in Eq.
(A4.26) will remain finite, while the time of sampling can grow infinitely long.
Therefore, for these cases, Eq. (A4.27) is also valid. For these cases,
N
iiiK rF
2
1 (A4.28)
Equation (A4.28) is called the virial theorem of classical mechanics and is a rather
amazing result. We will verify its correctness for a few sample periodic systems before
using it to extract an expression for pressure in a microscopic system.
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 30
For a harmonic oscillator system with x0 = 0 and mass m, Eq. (A4.28) is written
as,
)(2
1
2
1
2
1 22 xUkxxkxmv (A4.29)
This interesting result shows that the average kinetic energy and potential energy of a
harmonic oscillator over a period of the motion are equal. To verify this relation, we use
Eqs. (1.11) and (1.12) explicitly,
22
20
220
2220
22
42
)(cos2
)(cos222
1
Ck
Cm
dCm
dttCm
dtvm
mv
(A4.30)
22
20
220
220
22
42
)(sin2
)(sin222
1
Ck
Ck
dCk
dttCk
dtxk
kx
(A4.31)
For a mass moving under a Coulombic (or gravitational) force, we can use Eq.
(A4.28) to obtain the interesting result,
)(2
1
2
1
2
1
2rU
r
Qr
r
QK (A4.32)
without having to explicitly use the expressions for the speed and position obtained from
the solution of Newton’s equations of motion for this system. Equation (A4.32) describes
the relations between the average kinetic and potential energies for the earth and other
planets in their motion around the sun.
For a confined gas or liquid system, Eq. (A4.28) can be used to derive the
expression for the pressure. From Eq. (4.34), we know that,
NkTvmKN
iii
2
3
2
1 2 (A4.33)
The force on a molecule in a confined fluid system can have contributions from external
sources, primarily the wall with which the molecule collides and internal sources, from
the intermolecular interactions. We write the right hand side of Eq. (A4.28) as,
N
iiiiext
N
iii rFFrF )( int,, (A4.34)
The external forces only applies when molecule are at the surface of the system, through
the agency of the external pressure,
dAPPdAS
N
iii
N
iiiext rnrnrF ', (A4.35)
Using Gauss’s theorem which converts the integral over the surface to an integral over
the volume of the confined system, we get,
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 31
PVdVPV
N
iiiext 3, rrF (A4.36)
Substituting Eqs. (A4.33) and (A4.36) into the virial theorem (A4.28) gives,
N
iiiPVNkT rFint,
2
1
2
3
2
3 (A4.37)
Rearranging Eq. (A4.37) gives the virial equation for the pressure of a fluid system.
N
iiiNkTPV rFint,
3
1
-Pressure is a scalar quantity in isotropic liquid and gas systems. P = F/A and the force is
the same on a unit area aligned in any direction. In solids, pressure is not necessarily
isotropic and should be represented by a tensorial quantity.
-Pressure is affected by intermolecular attractions. Recall the van der Waals equation
where a density dependent correction to the pressure is introduced
Appendix 4.4 Some useful mathematical relations and integral formulas
Stirling’s Approximation for N!
The factorial operator for an integer is defined as,
NNN )(! 1321 (A4.38)
along with the definition, 1! = 1.
For the factorial of very large integers, Stirling’s approximation for ln(N!) can be used,
NN
m
NNNxdxmN1
1
lnlnln!ln . (A4.39)
A more exact expression for the factorial of large integers is,
NNNNN 2lnln!ln (A4.40)
Exponential integrals
0 1
!
n
axnn
a
ndxexA for integer n (A4.41)
0 2/3
2/1
2/1
1
2 adxexA ax
(A4.42)
0 2/5
2/3
2/3
1
4
3
adxexA ax
(A4.43)
0 2/7
2/5
2/5
1
8
15
adxexA ax
(A4.44)
Generally,
0 1
)1(n
axn
na
ndxexA for half integer n (A4.45)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 32
where the Γ function is a generalization of the factorial function to non-integer values.
Gaussian integrals 2/1
00
2
12
dxeI x (A4.46)
0
2/1
2/even
22
)1(5312
nn
dxexIn
axn
n
(A4.47)
0 2/)1(
21
odd 2
)!(2
nn
dxexIn
axn
n
(A4.48)
Beta function integral
For integer values,
)!1(
!!)1(),(
1
0 nm
nmdxxxnmB nm (A4.49)
For half integer values, this integral becomes,
)2(
)1()1()1(),(
1
0
nm
nmdxxxnmB nm
(A4.50)
For the special case of m = n = ½, Eq. (A4.50) reduces to
8)1(
2
1,
2
1 10
dxxxB (A4.51)
For other half integer values, the -function is defined as,
n
nn
2
)12(31
2
1
(A4.52)
and for integer values,
!1 nn (A4.53)
Appendix 4.5 Energy distribution for three-molecules
To get the three-molecule distribution function a convolution of a two-molecule energy
probability distribution and a one-molecule energy probability distributions is performed,
3
3 3
333
0 321
0 0 213
333
)()(
)()()(
),()(
EIII
E EIIIIIIIII
III
dEEE
III
dEPP
ddPPE
ddPdEEP
III
(A4.54)
Substituting the normalized one molecule energy distribution given in Eq. (4.48) and the
two-molecule energy distribution from Eq.(4.52) and evaluating the integral in Eq.
(A4.54) using Eq.(A4.50) of Appendix 4.4 gives,
kTEE kTEkT eEkT
deEekT
EP/27
3290
)(23
21
293333 3
)(105
16)(
1)(
(A4.55)
Introduction to Molecular Simulation Chapter 4. Probability Theory
© Saman Alavi Page 33
From this distribution, the average three-particle energy is,
kTkTkT
dEeEkT
EkTE
2
9
)/1(32
945
)(105
16
)(15
16
2112930
293293
3
,
(A4.56)
and most probable energy is Emax = 7kT/2.
This procedure can be repeated to get the n-molecule energy distribution for any
number of molecules n from a collection of N molecules.