-
Continuous Probability Distributions
Ka-fu WONG
23 August 2007
Abstract
In previous chapter, we noted that a lot of continuous random
variables can be approximated bydiscrete random variables. At the
same time, a lot of discrete random variables may be approximated
bycontinous random variables. Thus, we are studying continuous
probability distributions not just for thesake understanding
continuous random variables, but also for the sake of understanding
its approximationto discrete random variables. In a lot of
important cases, it turns out that the approximation are goodenough
and continuous probability distributions are easier to work with –
if we know enough aboutit. Among continuous probability
distributions, normal distribution is the most important.
Normaldistribution will be used over and over again in later
chapters.
The most difficult part about continuous probability
distributions is the understanding of its connec-tion with the
discrete ones. Once this is done, a lot of results about discrete
probability distributionscan be easily extended to the continuous
case.
A continuous random variable can assume an infinite uncountable
number of values within a given range
or interval(s). Recall that a discrete random variable can also
take infinite number of values. A continiuous
random variable differs in that the number of values it can take
is uncountable, i.e., impossible to list all the
values it can take. For instance, it is not possible to list all
the real numbers within the interval [0,1]. The
associated probability distributions to continous random
variables is logically called continuous probability
distributions.
Example 1 (Continuous random variables): A continuous random
variable is a variable that can
assume any value in an interval. Some of the variables are
continuous in nature. For examples:
1. The thickness of our Microeconomics textbook.
2. The time required to complete a homework assignment.
3. The temperature of a patient.
4. The distance travelled from my home to school.
Other variables are continious because they are averages of
discrete random variables.
1. Ginni Coefficient, a measure of inequality in an economy.
2. Unemployment rate.
1
-
3. Inflation rate.
4. A stock market index, such as Hang Seng Index, Dow Jones
Averages, Nasdaq, and S&P
500.
5. A student’s grade point average.
6. Average age of students in this class.
7. Average weekly working hours of employees.
8. Average hourly salary of students working part-time.
These variables can potentially take on any value, depending
only on our ability of measuring
and reporting them accurately.
The above examples suggest that averages are better
characterized as continuous rather than discrete
variables even if the underlying variables used for the mean
calculations were discrete.
Conceptually, continuous probability distribution should be very
similar to discrete probability distrib-
ution. After all, the two types of probability distributions may
be viewed as approximation of each other.
The only main difference is that the probability that a
continuous random variable takes a specific value is
zero. However, the probability that a continuous random variable
takes a value between an interval [a, b]
can be positive. This difference has led to changes in the
definitions and calculations.
1 Features of a Continuous Probability Distribution
Probability distribution may be classified according to the
number of random variables it describes.
Number of random variables Joint distribution
1 Univariate probability distribution
2 Bivariate probability distribution
3 Trivariate probability distribution
... ...
n Multivariate probability distribution
These distributions have similar characteristics. We will
discuss these characteristics for the univariate
and the bivariate distribution. The extension to the
multivariate will be straightforward.
2
-
Theorem 1 (Charateristics of a Univariate Continuous
Distribution): Suppose the random
variable X is defined on the interval between a and b, i.e., X ∈
[a, b]. That is X can take any
value between [a, b].
1. The probability that X takes a value between an interval [c,
d] is
P (X ∈ [c, d]) =∫ d
c
f(x)dx
where f(x) denote the probability density function of X. Note
that the expression has an
interpretation parallel to the discrete case. f(x)dx may be
interpreted as the probability or
the area under the density curve define in the neighborhood of
x, say [x, x + dx].
2. The probability density function f(x) is non-negative and may
be larger than 1.
3. P (X ∈ [c, d]) =∫ d
cf(x)dx is between 0 and 1.00, i.e., [0, 1].
4. The sum of the probabilities of the various outcomes is 1.00.
That is,
P (X ∈ [−∞,∞]) = P (X ∈ [a, b]) =∫ b
a
f(x)dx = 1
5. Let the events defined on the two non-overlapping intervals,
[c1, d1] and [c2, d2], be X ∈
[c1, d1] and X ∈ [c2, d2]. These two events are mutually
exclusive. That is,
P (X ∈ [c1, d1] and X ∈ [c2, d2]) = 0.
P ((X ∈ [c1, d1] or X ∈ [c2, d2]) = P ((X ∈ [c1, d1]) + P (X ∈
[c2, d2])
The use of integration (∫
) in the definition could be terrifying, especially for students
who had never
studied it before. We are terrified only because we do not know
there is a simple connection between the
discrete probability distribution and the continuous probability
distribution. Let us make the connection
below and leave the introduction of integration in a
mathematical appendix.
3
-
2 Making a connection between discrete and continuous
distribu-
tions
To understand the difference between the two types of
distributions, let’s start with a series of questions
(from simple to complicated), of which we can easily derive good
answers.
2.1 Imagine throwing a dart at [0, 1]
Consider a continuous random variable that is defined over a
segment of line [0, 1]. We can imagine throwing
a dart at the segment and each point in the segment has equal
chance of being hit by the dart.
1. What is the probability that a dart randomly thrown will end
up in the segment [0, 1]? The answer is
simple. Since the dart has to land on somewhere on [0, 1], the
probability of the dart landing on the
segment [0, 1] is 1.
2. What is the probability that a dart randomly thrown will end
up in the segment [0, 12 ]? The answer is
slightly more difficult. Since the dart has equal chance to land
on any point on [0, 1], the probability
of having the dart landing on half of the line [0, 1], i.e., the
segment [0, 12 ], has to be 1/2.
3. What is the probability that a dart randomly thrown will end
up in the segment [0, 14 ]? The answer is
slightly more difficult. Since the dart has equal chance to land
on any point on [0, 1], the probability
of having the dart landing on a quarter of the line [0, 1],
i.e., the segment [0, 14 ], has to be 1/4.
4. What is the probability that a dart randomly thrown will end
up in the segment [0, 18 ]? The answer is
slightly more difficult. Since the dart has equal chance to land
on any point on [0, 1], the probability
of having the dart landing on an eighth of the line [0, 1],
i.e., the segment [0, 18 ], has to be 1/8.
5. What is the probability that a dart randomly thrown will end
up in the segment [ 28 ,38 ]? The answer is
slightly more difficult. The length of the segment is 1/8, the
same as the last question. Since the dart
has equal chance to land on any point on [0,1], the probability
of having the dart landing on an eighth
of the line [0,1], i.e., the segment [ 28 ,38 ], has to be
1/8.
6. What is the probability that a dart randomly thrown will end
up in the segment [ 28 ,38 ] AND [
58 ,
68 ]?
The answer is not difficult. We have two non-overlapping
segments of equal length. It is impossible for
any throw to land on both of these two non-overlapping segments.
Thus, the probability should be 0.
4
-
7. What is the probability that a dart randomly thrown will end
up in the segment [ 28 ,38 ] OR [
58 ,
68 ]?
The answer is slightly more difficult. We have two
non-overlapping segments of equal length. The
probability should equal the the sum of the probability of
landing on the segment [ 28 ,38 ] and the
probability of landing on the segment [ 58 ,68 ]. That is, it
should be 1/8+1/8=2/8.
8. What is the probability that a dart randomly thrown will end
up in the segment [0, k], where k < 1? The
answer is slightly more difficult. As we learn from the previous
discussions, there is a 1/2 probability
of having the dart landing on the segment [0, 12 ], 1/4 on the
segment [0,14 ], 1/8 on the segment [0,
18 ].
It is not too difficult to induce that the probability of having
the dart landing on the segment [0,k] is
simply k.
9. What is the probability that a dart randomly thrown will end
up in the segment [k1, k2], where k1 < k2?
The answer is slightly more difficult. It is not too difficult
to induce that the probability of having the
dart landing on the segment [k1, k2] is simply k2 − k1.
10. What is the probability that a dart randomly thrown will end
up in the segment [k1, k2] AND [k3, k4],
where k1 < k2 < k3 < k4? The answer is not difficult.
We have two non-overlapping segments. It is
impossible for any throw to land on both of these two
non-overlapping segments. Thus, the probability
should be 0.
11. What is the probability that a dart randomly thrown will end
up in the segment [k1, k2] OR [k3, k4],
where k1 < k2 < k3 < k4? The answer is not difficult.
We have two non-overlapping segments. The
probability should equal the the sum of the probability of
landing on the segment [k1, k2] and the
probability of landing on the segment [k3, k4]. That is, it
should be (k2 − k1) + (k4 − k3).
12. What is the probability that a dart randomly thrown will end
up exactly at a single point 2/3? The
answer is slightly more difficult. There are infinite
uncountable points on the entire segment [0, 1]. The
probability of the dart ending up at the point 2/3 is like the
probability of the dart ending up in an
interval with zero length. That is, it should be zero.
13. What is the probability that a dart randomly thrown will end
up exactly at one of the two points, 1/3
or 2/3? The answer is slightly more difficult. Since the two
points are non-overlapping, the probability
should be the sum of the individual ones, i.e., 0 = 0 + 0.
14. What is the probability that a dart randomly thrown will end
up exactly at one of the 99 points, 1/100,
2/100, ..., or 99/100? The answer is simple. Since the 99 points
are non-overlapping, the probability
5
-
should be the sum of the individual ones, i.e., 0.
2.2 Imagine throwing a dart at [a, b]
Let’s repeat the above questions and answers with a slight
change. Consider a continuous random variable
that is defined over a segment of line [a, b], where a < b.
We can imagine throwing a dart at the segment
and each point in the segment has equal chance of being hit by
the dart.
1. What is the probability that a dart randomly thrown will end
up in the segment [a, b]? The answer is
simple. Since the dart has to land on somewhere on [a, b], the
probability of the dart landing on the
segment [a,b] is 1.
2. What is the probability that a dart randomly thrown will end
up in the segment [a, a+ 12 (b− a)]? The
answer is slightly more difficult. Since the dart has equal
chance to land on any point on [a, b], the
probability of having the dart landing on half of the line [a,
b], i.e., the segment [a, a + 12 (b − a)], has
to be 1/2.
3. What is the probability that a dart randomly thrown will end
up in the segment [a, a+ 14 (b− a)]? The
answer is slightly more difficult. Since the dart has equal
chance to land on any point on [a, b], the
probability of having the dart landing on a quarter of the line
[a, b], i.e., the segment [a, a + 14 (b− a)],
has to be 1/4.
4. What is the probability that a dart randomly thrown will end
up in the segment [a, a+ 18 (b− a)]? The
answer is slightly more difficult. Since the dart has equal
chance to land on any point on [a, b], the
probability of having the dart landing on an eighth of the line
[a, b], i.e., the segment [a, a + 18 (b− a)],
has to be 1/8.
5. What is the probability that a dart randomly thrown will end
up in the segment [a+ 28 (b−a), a+38 (b−a)]?
The answer is slightly more difficult. The length of the segment
is 1/8, the same as the last question.
Since the dart has equal chance to land on any point on [a, b],
the probability of having the dart landing
on an eighth of the line [a, b], i.e., the segment [a + 28 (b−
a), a +38 (b− a)], has to be 1/8.
6. What is the probability that a dart randomly thrown will end
up in the segment [a+ 28 (b−a), a+38 (b−a)]
AND [a + 58 (b− a), a +68 (b− a)]? The answer is not difficult.
We have two non-overlapping segments
of equal length. It is impossible for any throw to land on both
of these two non-overlapping segments.
Thus, the probability should be 0.
6
-
7. What is the probability that a dart randomly thrown will end
up in the segment [a+ 28 (b−a), a+38 (b−a)]
OR [a + 58 (b − a), a +68 (b − a)]? The answer is slightly more
difficult. We have two non-overlapping
segments of equal length. The probability should equal the the
sum of the probability of landing on the
segment [a+ 28 (b−a), a+38 (b−a)] and the probability of landing
on the segment [a+
58 (b−a), a+
68 (b−a)].
That is, it should be 1/8+1/8=2/8.
8. What is the probability that a dart randomly thrown will end
up in the segment [a, k], where k < 1?
The answer is slightly more difficult. As we learn from the
previous discussions, it is not too difficult
to induce that the probability of having the dart landing on the
segment [a, k] is simply (k−a)/(b−a).
9. What is the probability that a dart randomly thrown will end
up in the segment [k1, k2], where k1 < k2?
The answer is slightly more difficult. It is not too difficult
to induce that the probability of having the
dart landing on the segment [k1, k2] is simply (k2 − k1)/(b−
a).
10. What is the probability that a dart randomly thrown will end
up in the segment [k1, k2] AND [k3, k4],
where k1 < k2 < k3 < k4? The answer is not difficult.
We have two non-overlapping segments. It is
impossible for any throw to land on both of these two
non-overlapping segments. Thus, the probability
should be 0.
What is the probability that a dart randomly thrown will end up
in the segment [k1, k2] OR [k3, k4], where
k1 < k2 < k3 < k4? The answer is not difficult. We have
two non-overlapping segments. The
probability should equal the the sum of the probability of
landing on the segment [k1, k2] and the
probability of landing on the segment [k3, k4]. That is, it
should be [(k2 − k1) + (k4 − k3)]/(b− a).
1. What is the probability that a dart randomly thrown will end
up exactly at a single point k1? The
answer is slightly more difficult. There are infinite
uncountable points on the entire segment [a, b]. The
probability of the dart ending up at the point k1 is like the
probability of the dart ending up in an
interval with zero length. That is, it should be zero.
2. What is the probability that a dart randomly thrown will end
up exactly at one of the two points, k1
or k2? The answer is slightly more difficult. Since the two
points are non-overlapping, the probability
should be the sum of the individual ones, i.e., 0 = 0 + 0.
3. What is the probability that a dart randomly thrown will end
up exactly at one of the 99 points, k1,
k2, ..., or k99? The answer is simple. Since the 99 points are
non-overlapping, the probability should
be the sum of the individual ones, i.e., 0.
7
-
2.3 Deriving the probability density function (pdf)
Based on what we know about discrete probability distribution,
it might appear that there is inconsistency
between the notions P (x ∈ [k1, k2]) = (k2 − k1)/(b − a) and P
(x = k) = 0. Can we write P (x ∈ [k1, k2])
as a sum of probability of individual events? Yes, with an
introduction of the density concept. That is,
we would like to define a density c such that P (x ∈ [a, k]) =
(k − a) × c. What would c be? We know
that c must also satisfy P (x ∈ [a, b]) = (b − a) × c = 1,
implying c = 1/(b − a). It is not too difficult
to check that the density so defined will generate results that
are consistent with the discussions above.
In particular, P (x ∈ [a, k]) = (k − a) × c = (k − a)/(b − a), P
(x ∈ [k1, k2]) = (k2 − k1)/(b − a), and
P (x = k) = P (x ∈ [k, k]) = (k − k)/(b− a) = 0.
What we have just discussed is the so called uniform
distribution over the interval [a, b].
Definition 1 (Uniform distribution): If a and b are numbers on
the real line, the random variable
(X) ∼ U(a, b), i.e., has a uniform distribution if the density
function looks like
f(x) =
1
(b−a) for a ≤ x ≤ b
0 otherwise
and the cumulative distribution function (cdf) looks like
F (x) = Prob(X < x) =
0 for x ≤ ax−a(b−a) for a ≤ x ≤ b
1 for b ≤ x
Recall that in discrete case, the expected value of a random
variable X with probability mass function
P(X) is
E(X) =∑X
XP (X)
In continuous random variable case, P (X = k) = 0, how do we
compute the expected value? We can use
an approximation. Since P (X ∈ [k, k + dx]) > 0, it is
possible to imagine the relevant segment is divided
into many smaller intervals of length dx ([a, a + dx], [a + dx,
a + 2dx], ..., [a + ( b−adx − 1)dx, a + (b−adx )dx]),
and obtain an approximation with the formula above by replacing
P (X = k) with P (X ∈ [k, k + dx]), and
replacing X with one of the three possibilities:
8
-
1. the lower bound of [k, k + dx], i.e, k.
E(X) =( b−adx )∑i=1
(a + (i− 1)dx)P (X ∈ [a + (i− 1)dx, a + idx]) =( b−adx )∑i=1
(a + (i− 1)dx)× c× dx)
2. the upper bound of [k, k + dx], i.e, k + dx.
E(X) =( b−adx )∑i=1
(a + idx)P (X ∈ [a + (i− 1)dx, a + idx]) =( b−adx )∑i=1
(a + idx)× c× dx
3. the mid-point of [k, k + dx], i.e, k + dx/2.
E(X) =( b−adx )∑i=1
(a + (i− 12)dx)P (X ∈ [a + (i− 1)dx, a + idx]) =
( b−adx )∑i=1
(a + (i− 12)dx)× c× dx
The accuracy of such approximation depend on dx. Generally, the
smaller dx, the more accurate is the
approximation. To obtain a more accurate approximation, we can
imagine dx shrinking towards zero. In
this case the number of terms being added expands to
infinity.
E(X) = limdx→0
( b−adx )∑i=1
(a + (i− 1)dx)× c× dx = limdx→0
( b−adx )∑i=1
xi × c× dx =∫ b
a
xcdx
a
(x1)
x
c=1/(b-a)
xi xi+dx
y=f(x)
bx2dx dx
x3dx
9
-
In the expression above, we are really using the integral sign
to stand for the limit of the sum.
limdx→0
( b−adx )∑i=1
=∫ b
a
In the above discussion, we consider only the case when the dart
will land on an interval [a, b] with
equal chance, i.e., uniform distribution. The discussion can be
easily extended to other situations with
the interval [a, b] subdivided into many sub-intervals and the
densities are held the same within each sub-
intervals. Let the interval be of length dx as before, we have
the intervals [a, a+dx], [a+dx, a+2dx], ..., [a+
( b−adx − 1)dx, a + (b−adx )dx], and the corresponding densities
in each intervals can be labelled differently as
f(x1), ..., f(xn), where n = (b − a)/dx. With this information,
we can answer many questions similar to
those discussed earlier. Again, at the limiting case with dx
approaching zero, the probability will be defined
as
P (x ∈ [k1, k2]) = limdx→0
∑[k1,k2]
fidx =∫ k2
k1
f(x)dx
More generally we can define the cumulative distribbution
function (cdf) and use cdf to compute the
probability.
Definition 2 (Cumulative distribbution function (cdf)): Let f(x)
be the pdf of the a continuous
random variable X. The cumulative distribbution function (cdf)
is
F (x) = P (X < x) =∫ x−∞
f(x)dx
P (x ∈ [k1, k2]) =∫ k2
k1
f(x)dx =∫ k2−∞
f(x)dx−∫ k1−∞
f(x)dx = F (k2)− F (k1)
Given the cdf F (x), we can also derive the pdf f(x)
f(x) =d
dxF (x)
where ddx means “differentiate with respect to x”.
We can check such definitions are consistent at least with the
uniform distribution. In fact, it holds for
10
-
any continuous probability distributions.
The above discussions suggest that the density function defines
the probability distribution in the con-
tinuous case.
Example 2 (Part-time Work on Campus): A student has been offered
part-time work in a
laboratory. The professor says that the work will vary from week
to week. The number of hours
will be between 10 and 20 with a uniform probability density
function:
1. How tall is the rectangle?
2. What is the probability of getting less than 15 hours in a
week?
3. Given that the student gets at least 15 hours in a week, what
is the probability that more
than 17.5 hours will be available?
x
c=0.1
y=f(x)
2010
Because the probability is uniformly distributed, the pdf can be
illustrated as an rectangle above
and the height of the rectangle is the very uniform probabilty,
i.e. 1/(20 − 10) = 0.1. If we
employ the same letters we used previously, we have a = 10, b =
20, and c = 0.1. The pdf is
f(x) =
1
20−10 = 0.1 for 10 ≤ x ≤ 20
0 otherwise
11
-
and the cdf is
F (x) = Prob(X < x) =
0 for x ≤ 10x−1020−10 for 10 ≤ x ≤ 20
1 for 20 ≤ x
Thus Prob(X < 15) = 15−1020−10 = 0.5, and Prob(X > 17.5|X
> 15) =Prob(X>17.5&X>15)
Prob(X>15) =Prob(X>17.5)Prob(X>15) =
17.5−1015−10 = 0.5.
Remember that the probability for a continuous random variable
to be equal to an exact value
is always zero. Therefore we are always relieved from
considering about the boundary of any
intervals, i.e. “” or “≥”.
Example 3 (Customer Complaints): You are the manager of the
complaint department for a
large mail order company. Your data and experience indicate that
the time it takes to handle a
single call denote ranges from 0 to 15 minutes and is denoted as
T and has a rectangular triangle
probability density function, with a height of 2/15.
1. Show that the area under the triangle is 1.
2. Find the probability that a call will take longer than 10
minutes. That is, find P (T > 10).
3. Given that the call takes at least 5 minutes, what is the
probability that it will take longer
than 10 minutes? That is, find P (T > 10|T > 5).
4. Find P (T < 10).
We know the area under the pdf curve is the probability and it
must be 1 because from 0 to 15
are all possible outcomes. The pdf of the function should be
like the follows if we believe the
company tries to handle all calls as soon as possible:
12
-
t
2/15
y=f(t)
15100 5
P (T > 10) should be the area under the pdf curve where 15
> t > 10. Basic geometry helps us
to get that area equals 1/9, i.e. P (T > 10) = 19 =
0.1111.
P (T > 10|T > 5) = P (T>10&T>5)P (T>5) =P
(T>10)P (T>5) =
1/94/9 = 0.25.
P (T < 10) = 1− P (T > 10) = 1− 1/9 = 8/9 = 0.8889.
3 Normal distributions
The most popular distribution we will use after this chapter is
normal distribution. It pays to understand it
very well.
Definition 3 (Normal distribution): The random variable X ∼ N(µ,
σ2), i.e., has a normal
distribution if the random variable is defined on the whole real
line [−∞,∞] and has the following
density function
f(x) =1
σ√
2πe−
12 �
2
� =x− µ
σ
where µ and σ are the mean and standard deviation of the random
variable (hence σ2 is the
variance) , π = 3.14159..., and e = 2.71828... is the base of
natural or Naperian logarithms.
Normal distributions are characterized by its mean and variance.
Often, a normal random vari-
13
-
able, X, distributed as normal with mean µ and variance σ2 will
be denoted as
X ∼ N(µ, σ2)
where “∼” reads “distributed as”.
Normal distribution has several main characteristics:
1. It is bell-shaped and single-peaked (unimodal) at the exact
center of the distribution, µ.
2. Symmetrical about its mean. The arithmetic mean, median, and
mode of the distribution
are equal and located at the peak. Thus half the area under the
curve is above the mean
and half is below it.
3. The normal probability distribution is asymptotic. That is
the curve gets closer and closer
to the X-axis but never actually touches it.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
-5 -4 -3 -2 -1 0 1 2 3 4 5
N(0,0.3)
N(0,0.5)
N(0,1)
N(-1,0.5) N(3,0.5)
N(0,2)
Probabi l i ty
x
Given the density function as described above, unlike uniform
distribution, it is not easy to integrate to
obtain probability of a normal random variable lying within a
segment. It is not to difficult to get Excel
to do the calculation, even for different combinations of mean
and variance. However, many years ago,
computational power was limited. Statistician worked very hard
to come up with a table people can easily
read off from it the cumulative distribution function of normal
random variable, i.e., P (X < x). Do we
need to have a table for each combination of µ and σ2? It turns
out that all normal random variables
can be transformed to standard normal random variables easily.
Thus, for those who know how to do the
transformation, we only need one table, the standard normal
table.
14
-
Definition 4 (Standard Normal distribution): A standard normal
random variable is a normal
random variable with zero mean (µ = 0) and unit standard
deviation (σ = 1). Its probability
density is defined on a real line (from −∞ to ∞)
f(x) =1√2π
e−12 x
2
0
0.2
0.4
0.6
0.8
1
1.2
-2.5 -1.5 -0.5 0.5 1.5 2.5
cdf
pdf
Probabi l i ty
x
The standard normal distribution is sometimes known as
z-distribution.
Theorem 2 (Transform to Standard Normal Distribution): A linear
transformation of a normal
random variable will remain normal. In particular, any normal
random variable with mean µ
and standard deviation σ can be transformed to a standard normal
random variable
Z =X − µ
σ
Thus, P (X ∈ [a, b]) = P (Z ∈ [(a − µ)/σ, (b − µ)/σ]). With this
property, we do not need a separate
probability tables for different µ and σ. Instead, we only need
one table — the standard normal table.
Although we can easily calculate probability of such normal
random variable of different combinations of
mean and variance lying in an interval with the help of a
computer, we need to learn how to use the tables
in exams.
Example 4 (How to read the standard normal distribution
table):Suppose we have the following
part of a standard normal distribution table accompanied with a
graph:
15
-
z ... 0.03 0.04 0.05 0.06 0.07 ...
... ... ... ... ... ... ... ...
0.7 ... 0.2673 0.2704 0.2734 0.2764 0.2794 ...
0.8 ... 0.2967 0.2995 0.3023 0.3051 0.3078 ...
0.9 ... 0.3238 0.3264 0.3289 0.3315 0.334 ...
1 ... 0.3485 0.3508 0.3531 0.3554 0.3577 ...
1.1 ... 0.3708 0.3729 0.3749 0.377 0.379 ...
... ... ... ... ... ... ... ...
x
0 z
Probability
Let’s first make clear what does this table mean. The most left
column together with the top
row combines to make the number we want, i.e. the z in the
graph. The inner part of the table
is the resulting probability, i.e. the yellow area in the graph.
For example, suppose we want to
know Prob(0 < X < 0.84), we just find the row of 0.8 and
the column of 0.04, and then read the
the number in the cell, i.e. 0.2995, or we have Prob(0 < X
< 0.84) = 0.2995.
Try the follows:
1. Prob(0 < X < 1.163)
This is the same as our previous example. Because the table
gives only approximate values,
we have to round 1.163 to its nearest percentile, 1.16 (i.e. we
assume Prob(0 < X <
1.163)=Prob(0 < X < 1.16) approximately). Then we just
look the value up from the table
at row 1.1 and column 0.06, which is simply 0.377.
2. Prob(X > 0.77)
First remember one of the very nice properties of normal
distribution is its symmertry,
i.e. Prob(X > µ) = Prob(X < µ) = 0.5. This point tells us
to get Prob(X > 0.77)
we need just one step more. We first find out that Prob(0 < X
< 0.77) = 0.2794. Then
Prob(X > 0.77) = 0.5− 0.2794 = 0.2206.
3. Prob(0.77 < X < 1.16)
We should have no trouble to get Prob(0 < X < 0.77) =
0.2794 and Prob(0 < X < 1.16) =
0.377, as we have done already. Then Prob(0 < X < 1.16) is
simply their difference:
Prob(0 < X < 1.16) = 0.377− 0.2794 = 0.0976.
4. Prob(−0.85 < X < 0)
16
-
Symmertry helps again here, telling that Prob(−0.85 < X <
0) = Prob(0 < X < 0.85). We
have no problem finding Prob(0 < X < 0.85), which is
actually at row 0.8 and column 0.05,
0.3023.
5. Prob(X < −1.16)
We apply symmertry again to get Prob(X < −1.16) = Prob(X >
1.16) = 0.5 − Prob(0 <
X < 1.16) = 0.123.
6. Prob(−0.85 < X < 0.77)
Since for X it is mutually exclusive to fall into either (−0.85,
0) or (0, 0.77), we know
Prob(−0.85 < X < 0.77) = Prob(−0.85 < X < 0) +
Prob(0 < X < 0.77) = 0.3023 +
0.2794 = 0.5817.
7. Sometimes we want reversely use the table to find the
particular z instead of the probability.
Suppose Prob(−k < X < k) = 0.75, (k > 0). What is the k
here?
Considering the symmertry of normal distribution, 0.75 = Prob(−k
< X < k) = 2 ×
Prob(0 < X < k), i.e. Prob(0 < X < k) = 0.375.
Therefore we look for 0.375 (or some
other number that is closest to 0.375) in the table. We find on
row 1.1 and column 0.05 the
probability is 0.3749. Therefore a reasonably precise result
would be k = 1.15.
Recall that all normal distributions with different µand σ, i.e.
different X N(µ, σ2) can be
transferred to standard normal through a simple linear
transformation:
Z =X − µ
σ
Thus the standard normal distribution table helps us much more
than the aforementioned. See
the following more examples:
1. Given X N(1, 0.25), what is Prob(0 < X < 1.5)?
Transfer X to the standard normal Z, we have Prob(0 < X <
1.5) = Prob( 0−10.5 < Z <
1.5−10.5 ) = Prob(−2 < Z < 1). We are capable to calculate
this through the above exercises.
Note that the given table in this example is not enough and a
full table covering −2and 1
must be found. The answer should be close to Prob(0 < X <
1.5) = 0.8185, as different
tables report at different accuracy.
2. Given X N(−2, 4), and Prob(−1 < X < k) = 0.10. What is
k?=0.6915+0.1=0.7915
17
-
Again we do the standard normal transfer first. The lower
boundary -1 would be transferred
to −1−(−2)2 = 0.5; the upper boundary k would be1−k2 after the
linear transfer. Thus
Prob(−1 < X < 1) = Prob(.5 < Z < 1− k2
) = Prob(0 < Z <1− k
2)−Prob(0 < Z < .5) = 0.1
or
Prob(0 < Z <1− k
2) = 0.1 + Prob(Z < .5)
= 0.1 + 0.1915
= 0.2915
Then we look for the closest number to 0.2915 from a full
standard normal table. At ordinary
accuracy requiremnt, 1−k2 = 0.81 would be good enough, or k = 1−
2× 0.81 = −0.62.
Example 5 (Normal distribution): Suppose you work in Quality
Control for GE. Light bulb life
has a normal distribution with µ = 2000 hours and σ = 200
hours.
1. What is the probability that a bulb will last Between 2000
and 2400 hours?
P (2000 < X < 2400) = P [(2000− µ)/σ < (X − µ)/σ <
(2400− µ)/σ]
= P [0 < (X − µ)/σ < (2400− µ)/σ]
= P [0 < Z < 2]
= 0.4772
2. What is the probability that a bulb will last less than 1470
hours?
P (X < 1470) = P [(X − µ)/σ < (1470− µ)/σ]
= P [Z < −2.65]
= P [Z > 2.65]
= 0.5− P [0 < Z < 2.65]
= 0.5− 4960
= 0.004
18
-
Example 6 (Normal distribution): The daily water usage per
person in New Providence, New
Jersey is normally distributed with a mean of 20 gallons and a
standard deviation of 5 gallons.
1. About 68 percent of those living in New Providence will use
how many gallons of water?
Note that we know that P (µ − σ < X < µ + σ) = 0.6826.
Thus, about 68% of the daily
water usage will lie between 15 (µ− σ) and 25 (µ + σ)
gallons.
2. What is the probability that a person from New Providence
selected at random will use
between 20 and 24 gallons per day?
P (20 < X < 24) = P [(20− 20)/5 < (X − 20)/5 < (24−
20)/5] = P [0 < Z < 0.8]
The area under a normal curve between a z-value of 0 and a
z-value of 0.80 is 0.2881. We
conclude that 28.81 percent of the residents use between 20 and
24 gallons of water per day.
3. What percent of the population use between 18 and 26 gallons
of water per day?
P (18 < X < 26) = P [(18− 20)/5 < (X − 20)/5 < (26−
20)/5]
= P (−0.4 < Z < 1.2)
= P (−0.4 < Z < 0) + P (0 < Z < 1.2)
= P (0 < Z < 0.4) + P (0 < Z < 1.2)
= 0.1554 + 0.3849
= 0.5403
Example 7 (Normal distribution): Professor Wong has determined
that the scores in his statis-
tics course are approximately normally distributed with a mean
of 72 and a standard deviation
of 5. He announces to the class that the top 15 percent of the
scores will earn an A. What is the
lowest score a student can earn and still receive an A?
To begin let k be the score that separates an A from a B. 15
percent of the students score more
than k, then 35 percent must score between the mean of 72 and
k.
1. Write down the relation between k and the probability: P (X
> k) = 0.15 and P (X < k) =
1− P (X > k) = 0.85
19
-
2. Transform X into z:
P [(X − 72)/5) < (k − 72)/5] = P [Z < (k − 72)/5]
3. We look for s = (k − 72)/5 such that
P [0 < Z < s] = 0.85− 0.5 = 0.35
From the standar normal table, we find s = 1.04:
P [0 < z < 1.04] = 0.35
4. Compute k: (k − 72)/5 = 1.04 implies k = 77.2
Thus, those with a score of 77.2 or more earn an A.
Example 8 (Stock returns): Let X be daily stock returns
(percentage change of daily stock
prices). Suppose X is distributed as normal with mean 0 and
variance 4, i.e., X ∼ N(0, 4).
What is the probability that the daily stock returns will lie
between -2 and +2?
First make the linear transformation from (−2, 2) N(0, 4) to
standard normal Z:
P (−2 < X < 2) = P (−2− 0√4
< Z <2− 0√
4) = P (−1 < Z < 1)
and because of the symmetry of normal distribution,
P (−2 < X < 2) = 2P (0 < Z < 1)
Then we can easily find the probability form the standard normal
table:
P (−2 < X < 2) = 2× 0.3413 = 0.6826
The probability that the daily stock returns will lie between -2
and +2 is 0.6826.
Example 9 (Personal income): Let X be monthly personal income
(in dollars). Suppose
log(X) is distributed as normal with mean 9 and variance 16,
i.e., log(X) ∼ N(9, 16). What is
20
-
the probability that the monthly personal income of a randomly
drawn person will be less than
5000? Is there a reason we assume log(X) instead of X to follow
a normal distribution?
For simlicity, let us denote Y = log(X) as a new random
variable, and we know Y follows a
normal distribution of N(9, 16). Because Y = log(X) is a
monotonically increasing function
(i.e. the value of Y does not decrease as X increases), for the
interval that X < 5000, we have
Y < log(5000) = 8.5172.
Next we can transfer Y to standard normal, Z:
P (Y < 8.5172) = P (Z <8.5172− 9√
16= −0.12) = 0.5− P (0 < Z < 0.12)
Looking up the standard normal table, the probability is 0.5−
0.0478 = 0.4522.
3.1 Checking for normality
Normal distribution is one of the most important distributions.
But, how do we know that data is distributed
as normal, at least approximately? There are at least two ways
to check.
First, we can check the moments. We know that normal distributed
random variable has zero skewness
(due to symmetry of the distribution) and zero excess kurtosis1.
If the sample skewness and excess kurtosis
are close to zero, we will be convinced that the data is likely
from a normally distributed population.
Simulation 1 (Checking the skewness and kurtosis for
normality):
1. Generate 50 observations from a standard normal distribution
N(0, 1), another 50 observa-
tions from uniform distribution U(0, 1). Compute their sample
skewness and excess kurto-
sis2.
skewness =√
n∑n
i=1(xi − x̄)3
[∑n
i=1(xi − x̄)2]3/2
kurtosis =n
∑ni=1(xi − x̄)4
[∑n
i=1(xi − x̄)2]2 − 3
2. Repeat last step 1000 times. Report the skewness and excess
kurtosis calculation, and the
average skewness over these 1000 repetitions.1For instance,
refer to section 16.7 Tests for Skewness and Excess Kurtosis, p.567
of Estimation and Inference in Econometrics
by Davidson and MacKinnon.2Note that a slight adjustment is need
to obtain an unbiased estimator of these statistics.
21
-
Below is a group of possible resulted we have generated:
Distribution Observation # Average Skewness Average Excess
KurtosisU(0,1) 50 0.0114 -1.1496U(-2,2) 50 -0.0077 -1.1375N(0,1) 50
-0.0008 0.0031N(-2,5) 50 0.0077 -0.0290
The table reports that normal distribution has much
closer-to-zero skewness and excess kurtosis
than uniform distribution, and this holds regardless of the
means and variances.
[Refer to sim1.xls]
Alternatively, we can use normal probability plot. Suppose we
have n observations in the sample.
1. Sort them in ascending order.
2. Compute the empirical z value (i.e., (x− x̄)/σ).
3. Generate a column 0.5, 1.5, ..., [0.5 + (n− 1)]. Call this
column U .
4. Generate another column p(z) = U/n.
5. Generate another column theoretical z = NORMSINV (p(z)).
6. Plot empirical z against the theoretical z.
If the data has normal distribution, the plot should be a
straight line. We illustrate the steps to do so in the
following example.
Example 10 (normal probability plot):
1. Generate 1000 observations from N(0, 1).
2. Sort them in ascending order.
3. Compute the empirical z value (i.e., (x− x̄)/σ).
4. Generate a column 0.5, 1.5, ..., [0.5 + (n− 1)]. Call this
column U .
5. Generate another column p(z) = U/n.
6. Generate another column theoretical z = NORMSINV (p(z)).
7. Plot empirical z against the theoretical z.
22
-
8. Repeat with 1000 observations drawn from U(0, 1).
The two plots are shown below.
Normal probablity plot (underlying population = normal)
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
Theoretical z value
z va
lue
fro
m d
ata
Normal probablity plot (underlying population = uniform)
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
Theoretical z value
z va
lue
fro
m d
ata
As obvious from the two plots, the plot lies more or less on the
straight line when underlying
population is normal but not when the underlying population is
uniformly distributed.
23
-
4 Bivariate distributions
Except for the use of integrations, the bivariate distributions
of continuous random variables are not very
different from that of discrete random variables.
Theorem 3 (Characteristics of a Bivariate Continuous
Distribution): If X and Y are continuous
random variables that are defined on intervals [a, b] and [c, d]
respectively. The joint probability
density is denoted as f(x, y).
1. The probability density function f(x, y) is non-negative and
may be larger than 1.
2. The probability that X takes a value between an interval
[m,n] and Y between [k, l] is
P (X ∈ [m,n], Y ∈ [k, l]) =∫ x=n
x=m
∫ y=ly=k
f(x, y)dxdy
3. [P (X ∈ [m,n], Y ∈ [k, l]) =∫ x=n
x=m
∫ y=ly=k
f(x, y)dxdy is between 0 and 1.00.
4. The sum of the probabilities of the various outcomes is 1.00.
That is,
P (X ∈ [−∞,∞], Y ∈ [−∞,∞]) = P (X ∈ [a, b], Y ∈ [c, d]) =∫
x=b
x=a
∫ y=dy=c
f(x, y)dxdy = 1
5. Let the events defined on the two non-overlapping regions,
([m1, n1], [k1, l1]) and ([m2, n2], [k2, l2])
be X ∈ [m1, n1], and Y ∈ [k1, l1] and X ∈ [m2, n2] and Y ∈ [k2,
l2] These two events are
mutually exclusive. That is,
P ((X ∈ [m1, n1] and Y ∈ [k1, l1]) and (X ∈ [m2, n2] and Y ∈
[k2, l2])) = 0.
P ((X ∈ [m1, n1] and Y ∈ [k1, l1]) or (X ∈ [m2, n2] and Y ∈ [k2,
l2]))
= P (X ∈ [m1, n1] and Y ∈ [k1, l1]) + P (X ∈ [m2, n2] and Y ∈
[k2, l2])
6. The marginal density function of X is
f(x) =∫ ∞
y=−∞f(x, y)dy =
∫y
f(x, y)dy.
Note that the marginal density function of X is used when we do
not care about the values
24
-
Y takes. Similarly the marginal density function of Y is
f(y) =∫ ∞
x=−∞f(x, y)dx =
∫x
f(x, y)dx.
7. The conditional density function of X given Y :
f(x|y) =
f(x, y)/f(y) if f(y) > 00 if f(y) = 0Definition 5
(Independent bivariate uniform distribution): If a, b, c and d are
numbers on the
real line, , the random variable (X1, X2) ∼ U(a, b, c, d), i.e.,
has a independent bivariate uniform
distribution if
f(x1, x2) =
1
(b−a)(d−c) for a ≤ x1 ≤ b and c ≤ x2 ≤ d
0 otherwise
Definition 6 (Bivariate normal distribution): The two random
variables (X, Y ) ∼ N(µx, µy, σ2x, σ2y, ρ),
i.e., has a bivariate normal distribution with a correlation
coefficient of ρ if the two random vari-
ables are defined jointly on the whole real line [−∞,∞] and has
the following density function
f(x, y) =1
2πσxσy√
1− ρ2e−
12 [�
2x+�
2y−2ρ�x�y)/(1−ρ
2)]
�x =x− µx
σx
�y =y − µy
σy
5 Expectations
The expectation plays a central role in statistics and
Economics. Expectation (often known as mean) reports
the central location of the data. Sometimes, it is also known as
the long-run average value of the random
variable, i.e., the average of the outcomes of many
experiments.
25
-
Definition 7 (Expectation (mean)): Let X be a continuous random
variable defined over [a, b],
with a probability density function of f(x). The expectation of
X is
E(X) =∫ x=∞
x=−∞xf(x)dx =
∫x
xf(x)dx
The expectation, E(X), is often denoted by a Greek letter µ
(pronounced as mu).
Thus, expectation of a random variable is a weight average of
all the possible values of the random
variable, weighted by its probability density function.
Definition 8 (Conditional Expectation): For bivariate
probability distribution, the conditional
expectation or conditional mean E(X|Y = y) is computed by the
formula:
E(X|Y = y) =∫
x
xf(x|y)dx
The unconditional expectation or mean of X is related to the
conditional mean.
E(X) =∫
y
E(X|Y = y)f(y)dy
= E[E(X|Y )]
Theorem 4 (Expectation of a linear transformed random variable):
If a and b are constants and
X is a random variable, then
1. E(a) = a
2. E(bX) = bE(X)
3. E(a + bX) = a + bE(X)
Proof: In our proof, we will only show the most general case E(a
+ bX) = a + bE(X).
E(a + bX) =∫
x
(a + bx)f(a + bx)dx
=∫
x
(a + bx)f(x)dx
=∫
x
af(x)dx +∫
x
bxf(x)dx
= a∫
x
f(x)dx + b∫
x
xf(x)dx
26
-
= a + bE(X)
Definition 9 (Variance): Let X be a continuous random variable
defined over [a, b], with a
probability density function of f(x). The variance of X is
V (X) =∫
x
(x− E(X))2 f(x)dx
The variance, V (X), is often denoted by a Greek letter σ2
(pronounced as sigma square).
Note that variance of a random variable is the expectation of
squared deviation of the random variable
from its mean. That is, if we define a tranformed variable as Z
= ((X − E(X))2, V (X) = E(Z). Thus,
we will expect the variance of a transformed variable will be
similar to the ones of the expectation of a
transformed variable.
Example 11 (Variance of a random variable): Suppose X and Y are
jointly distributed random
variables with probability density function f(x, y). The
variance of X is
V (X) =∫
y
∫x
(x− E(X))2f(x, y)dxdy
Definition 10 (Conditional Variance): For bivariate probability
distribution, the conditional
expectation or conditional mean V (X|Y ) is computed by the
formula:
V (X|Y = y) =∫
x
(x− E(X|Y = y))2f(x|y)dx
Theorem 5 (Variance of a linear transformed random variable): If
a and b are constants and
X is a random variable, then
1. V (a) = 0
2. V (bX) = b2V (X)
3. V (a + bX) = b2V (X)
Proof: In our proof, we will only show the most general case V
(a + bX) = b2V (X).
V (a + bX) = E[(a + bX)− (a + bE(X))]2
27
-
= E[(bX − bE(X)]2
= E[b(X − E(X))]2
= E[b2(X − bE(X)2]
= b2E[(X − bE(X)2]
= b2V (X)
Definition 11 (Covariance): Covariance between two random
variables X and Y is
C(X, Y ) = E[(X − E(X))(Y − E(Y ))]
=∫
x
∫y
(x− E(X)(y − E(Y ))PXY (x, y)
Note that the covariance can be written as
C(X, Y ) = E[(X − E(X))(Y − E(Y ))]
= E[XY − E(X)Y −XE(Y ) + E(X)E(Y )]
= E[XY ]− E[E(X)Y ]− E[XE(Y )] + E[E(X)E(Y )]
= E[XY ]− E(X)E(Y )− E(X)E(Y ) + E(X)E(Y )
= E[XY ]− E(X)E(Y )
Theorem 6 (Covariance of a linear transformed random variable):
If a and b are constants and
X is a random variable, then
1. C(a, b) = 0
2. C(a, bX) = 0
3. C(a + bX, Y ) = bC(X, Y )
Proof: In our proof, we will only show the most general case C(a
+ bX, Y ) = bC(X, Y ).
C(a + bX, Y ) = E{[(a + bX)− (a + bE(X))][Y − E(Y )]}
= E{[(bX − bE(X)][Y − E(Y )]}
= E{[b(X − E(X))][Y − E(Y )]}
28
-
= bE{[X − E(X)][Y − E(Y )]}
= bC(X, Y )
Theorem 7 (Variance of a sum of random variables): If a and b
are constants, X and Y are
random variables, then
1. V (X + Y ) = V (X) + V (Y ) + 2C(X, Y )
2. V (aX + bY ) = a2V (X) + b2V (Y ) + 2abC(X, Y )
Proof: In our proof, we will only show the most general case V
(aX +bY ) = a2V (X)+b2V (Y )+
2abC(X, Y ).
V (aX + bY ) = E[(aX + bY )− (aE(X) + bE(Y ))]2
= E[aX − aE(X) + bY − bE(Y )]2
= E[a(X − E(X)) + b(Y − E(Y ))]2
= E[a2(X − E(X))2 + b2(Y − E(Y ))2 + 2ab(X − E(X))(Y − E(Y
))]
= a2E[(X − E(X))2] + b2E[(Y − E(Y ))2] + 2abE[(X − E(X))(Y − E(Y
))]
= a2V (X) + b2V (Y ) + 2abC(X, Y )
Definition 12 (Independence): Consider two random variables X
and Y with joint probability
density f(x, y), marginal probability f(x), f(y), conditional
probability f(x|y) and f(y|x).
1. They are said to be independent of each other if and only
if
f(x, y) = f(x)× f(y) for all x and y.
X and Y are independent if each cell probability, f(x, y), is
the product of the corresponding
marginal probability density function.
2. X is said to be independent of Y if and only if
f(x|y) = f(x) for all x and y.
29
-
3. Y is said to be independent of X if and only if
f(y|x) = f(y) for all x and y.
Theorem 8 (Consequence of Independence): If X and Y are
independent random variables, we
will have
E(XY ) = E(X)E(Y )
E(XY ) = E(X)E(Y ) does not always imply that the random
variables X and Y are independent.
6 The Normal Approximation to the Binomial
The normal distribution (a continuous distribution) yields a
good approximation of the binomial distribution
(a discrete distribution) for large values of n.
Recall for the binomial experiment:
1. There are only two mutually exclusive outcomes (success or
failure) on each trial.
2. A binomial distribution results from counting the number of
successes.
3. Each trial is independent.
4. The probability is fixed from trial to trial, and the number
of trials n is also fixed.
The normal probability distribution is generally a good
approximation to the binomial probability distribu-
tion when nπ and n(1−π) are both greater than 5 – because of the
Central Limit Theorem (to be discussed
in next Chapter). However, because the normal distribution can
take all real numbers (is continuous) but
the binomial distribution can only take integer values (is
discrete), we will need to correct for the continuity.
A normal approximation to the binomial should identify the
binomial event “8” with the normal interval
“(7.5, 8.5)” (and similarly for other integer values).
30
-
Binomial Event Normal Interval
0 (-0.5,0.5)
1 (0.5,1.5)
2 (1.5,2.5)
3 (2.5,3.5)
... ...
x (x− 0.5, x + 0.5)
... ...
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
B(p,n)=B(.3,100)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
N(np,np(1-p))=N(30,21)
Example 12 (Continuity correction in normal approximation of
binomial): If n = 20 and
31
-
π = .25, what is the probability that X is greater than or equal
to 8?
The normal approximation without the continuity correction
factor yields
Z =(8− 20× .25)
(20× .25× .75)0.5= 1.55
, P (X ≥ 8) = P (Z ≥ 1.55) is approximately .0606 (from the
standard normal table).
The continuity correction factor requires us to use 7.5 in order
to include 8 since the inequality
is weak and we want the region to the right.
Z =(7.5− 20× .25)
(20× .25× .75)0.5= 1.29
P (X ≥ 7.5) = P (Z ≥ 1.29) is .0985. The exact solution from
binomial distribution function is
.1019. Thus, the normal approximation with continuity correction
yiled a good approximation to
binomial.
Example 13 (Continuity correction in normal approximation of
binomial): A recent study by
a marketing research firm showed that 15% of American households
owned a video camera. For
a sample of 200 homes, how many of the homes would you expect to
have video cameras?
1. Compute the mean: µ = nπ = 200× .15 = 30.
2. Compute the variance: σ2 = nπ(1− π) = 200× .15× (1− .15) =
25.5. Standard deviation
is σ =√
25.5 = 5.0498.
3. “Less than 40” means “less or equal to 39”. We use the
correction factor, so X is 39.5. Hence,
P (X < 39.5) = P [(X−30)/5.0498 < (39.5−30)/5.0498] = P [Z
< 1.88] = P [Z < 0]+P [0 <
Z < 1.88] = .5 + .4699 = .9699.
Thus, the likelihood that less than 40 of the 200 homes have a
video camera is about 97%.
7 The exponential distribution
Exponential distribution is often used to model the length of
time between the occurrences of two events or
between two ocurrences of the same event (the time between
arrivals).
Example 14 (Exponential distribution):
32
-
1. Time taken for your instructor to respond to your email.
2. Time between the birth of two babies.
3. Time taken to find a new job since layoff – the so-called
unemployment spell.
4. Time to complete a wage bargaining between a company and a
labor union.
5. Time to complete the accession to WTO.
6. Time between two major floods in Sichuan, China.
7. Time taken for the police to solve a crime case.
8. Time taken for obtain a job promotion.
Definition 13 (Exponential distribution):
The exponential random variable T (T > 0) has a probability
density function
f(t) = λe−λt for t > 0
where λ is the mean number of occurrences per unit time; t is
the length of time until the next
occurrence; e = 2.71828.
The cumulative distribution function (the probability that an
arrival time is less than some
specified time t) is
F (t) = Prob(T < t) = 1− e−λt
The mean and variance of an exponential random variable are
E(T ) =1λ
V ar(T ) =1λ2
Note that exponential random variable requires only one
parameter, its mean λ (lambda). What are the
other distributions that are completely characterized by one
parameter?
Example 15 (Exponential distribution): Customers arrive at the
service counter at the rate of
15 per hour. What is the probability that the arrival time
between consecutive customers is less
than three minutes?
33
-
Let T be the arrival time between consecutive customers. The
mean number of arrivals per hour
is 15, so λ = 15. Three minutes is .05 hours. Hence, we have
P (T < .05) = 1− e−λt = 1− e−(15)(.05) = 0.5276
So there is a 52.76% probability that the arrival time between
successive customers is less than
three minutes.
Example 16 (Exponential distribution): On average, it takes
about seven years to accede to
the GATT/WTO. What is the probability that a country takes more
than 15 years to accede to
GATT/WTO?
Let T be the time taken. Since E(T ) = 1λ = 7, we have λ =17 .
Hence, we have
P (T < 15) = 1− e−λt = 1− e−( 17 )(15) = 0.883
P (T ≥ 15) = 1− P (T < 15) = e−( 17 )(15) = 0.117
Thus, there is a 11.7% probability that it takes more than 15
years to accede to GATT/WTO.3
[Still need to write a lot more numerical examples for different
sections!!]
[Plot some figures associated with the numerical examples!!]
3It took China 15 years to accede to GATT/WTO.
34
-
References
[1] Davidson, Russell, and James G. MacKinnon (1993): Estimation
and Inference in Econometrics. Oxford
University Press.
35
-
Mathematical appendix (A brief introduction to integration)
Integration is a useful tool to estimate the area under a curve
above zero. Suppose we have a curve defined
by the function of f(x). Suppose we want to compute the area of
f(x) above 0 between a and b. Let’s
consider several cases.
1. Let’s start from the simpliest case when f(x)is a constant,
i.e., f(x) = k. In this case, the area is
actually a rectangle. It is easy to compute the area as (b− a)×
k.
2. Suppose f(x) = k1 for x ∈ [a, c1] and f(x) = k2 for x ∈ [c1,
b]. In this case, the area is actually a sum
of two rectangles. It is still easy to compute the area as (b−
c1)× k2 + (c1 − a)× k1.
3. Suppose f(x) = k1 for x ∈ [a, c1], f(x) = k2 for x ∈ [c1,
c2]and f(x) = k3 for x ∈ [c2, b]. In
this case, the area is actually a sum of three rectangles. It is
still easy to compute the area as
(b− c1)× k1 + (c2 − c1)× k2 + (c2 − a)× k3.
The three examples illustrates that how one can compute the area
if the area is a combination of rectangles.
For a general f(x), we can approximate the area by a sum of
rectangles. Let’s define
g(x) =
k1 for x ∈ [a, a + dx]
k2 for x ∈ [a + dx, a + 2dx]
k3 for x ∈ [a + 2dx, a + 3dx]
.... ....
or
g(x) = km for x ∈ [a + (m− 1)dx, a + mdx]
If g(x) approximate f(x) well, the approximated area is a sum of
rectangles
k1 × dx + k2 × dx + k3 × dx + ...
What are the values of k1, k2, ...? One possibility is to take
k1 = f(a). That is,
g(x) = f(a + (m− 1)dx) for x ∈ [a + (m− 1)dx, a + mdx]
36
-
When the width of each rectangle is small, it does not matter if
we take f(a + (m− 1)dx) or f(a + mdx) or
any point in [a + (m− 1)dx, a + mdx].
Thus, the area may be approximated by
f(a)× dx + f(a + dx)× dx + f(a + 2dx)× dx + ...
The approximation is better if dx can be made to be close to
zero. The area will be a sum of infinite
very small rectangles. We write the area as ∫ ba
f(x)dx
Mathematicians have developed neat ways to approximate this
area. All we need is to borrow those
formula from them.
Theorem 9 (Some integration formula):
1. Let f(x) = k, a constant (such as the density of a uniform
distribution). The area under
the curve f(x) in the interval [a, b] is
∫ ba
f(x)dx = k∫ b
a
dx = k(x|b − x|a) = k(b− a)
where g(x)|a = g(a).
2. Let f(x) = kx, a liner curve (such as the product of the
random variable and the density of
a uniform distribution). The area under the curve f(x) in the
interval [a, b] is
∫ ba
f(x)dx = k∫ b
a
xdx = k(x2
2|b −
x2
2|a) = k(
b2
2− a
2
2)
where g(x)|a = g(a).
Problem sets
We have tried to include some of the most important examples in
the text. To get a good understand of
the concepts, it is most useful to re-do the examples and
simulations in the text. Work on the following
problems only if you need extra practice or if your instructor
assigns them as an assignment. Of course, the
more you work on these problems, the more you learn.
37
-
Challenge 1 (mean and standard deviation of a uniform
distribution): Show that the mean and
standard deviation of a uniform random variable X are
E(X) =c + d
2
and
V (X) =(c− d)2
12
Challenge 2 (mean and standard deviation of a uniform
distribution): Show that the exponen-
tial distribution is memoryless, i.e., P (T > s + t|T > s)
= P (T > t) for all s, t ≥ 0.
Challenge 3 (mean and standard deviation of an univarate uniform
distribution): Let X be
uniformly distributed on the interval [a, b]. Find E(X) and V
ar(X).
Challenge 4 (mean and standard deviation of a bivariate uniform
distribution): Let (X, Y ) be
jointly uniformly distributed on the rectangle [a, b, c, d].
Find E(X) and V ar(X).
Solutions to problem set
1. It is very straight-forward to guess that the mean of a
uniform distribution is simply c+d2 because at
any point between the interval (c, d) the point probability is
the same, i.e. f(x) ≡ 1d−c , x ∈ (c, d).
However we can still prove it mathematically:
E(X) =∫ d
c
xf(x)dx
=∫ d
c
x
d− cdx
=1
d− c
∫ dc
xdx
=1
d− c× x
2
2|dc
=1
d− c× (d− c)(d + c)
2
=c + d
2
38
-
and the variance:
V (X) =∫ d
c
(x− µ)2f(x)dx
=1
d− c
∫ dc
(x2 − 2µx + µ2)dx
=1
d− c× (x
3
3− µx2 + µ2x)|dc
=d2 + cd + c2
3− µ(c + d) + µ2
=d2 + cd + c2
3− (c + d)
2
2+
(c + d)2
4
=(4d2 + 4cd + 4c2)− (6c2 + 12cd + 6d2) + (3c2 + 6cd + 3d2)
12
=(c− d)2
12
2. “Memoryless” is a property of exponential distribution. For
example, suppose at a specific crossroad,
every λ hours there is a traffic accident. (a) What is the
probability that another accident would happen
in next t hours? (b) Given that s accidents had been witnessed
last week, what is the probability that
another accident would happen in next t hours?
(a) P (T > t) = 1− (1− e−λt) = e−λt;
(b) P (T > s + t|T > s) = [1− (1− e−λ(s+t))]/[1− (1−
e−λs)] = e−λ(s+t)/e−λs = e−λt.
We find that P (T > t) ≡ (T > s + t|T > s), which means
that it matters not how many accidents
had happened before - in another word, memoryless - to determine
the future probability of another
accident. (Similar property could be found in “Geomertric
distribution”).
3. Refer to 1.
4. We may give a reasonable guess that a single point
probability of the bivariate uniform distribution
is 1(b−a)(d−c) , i.e fxy(x, y) ≡1
(b−a)(d−c) , x ∈ (a, b), y ∈ (c, d). Then we can run
mathematically to the
expectation:
E(X) = E[E(X|Y )]
= E[∫ b
a
xfx(x|y)dx|Y ]
= E(a + b
2|Y )
39
-
=∫ d
c
a + b2
dy
=(a + b)(d− c)
2
and the variance is
V (X) =
40
Features of a Continuous Probability DistributionMaking a
connection between discrete and continuous distributionsImagine
throwing a dart at [0,1]Imagine throwing a dart at [a,b]Deriving
the probability density function (pdf)
Normal distributionsChecking for normality
Bivariate distributionsExpectationsThe Normal Approximation to
the BinomialThe exponential distribution