New Chapter 3 Random Variables and Probability Distributionsjqfan/fan/classes/245/chap3.pdf · 2017. 9. 21. · Chapter 3 Random Variables and Probability Distributions 3.1 Random

Chapter 3

Random Variables and Probability Distributions

3.1 Random Variables §3.1∗

Different experiments yield different outcomes and we are interested in

some numerical aspects of the random outcome.

•No. of people voting for a candidate;

•# of times that the ball in a roulette lands in even0∗ This is the section number in the textbook

54

ORF 245: Random Variables – J.Fan 55

numbered pockets.

Random variable: X is a function on Ω. Formally,

it should be written as X(ω), but the outcome ω is often suppressed.

Example 3.1 Random dialing

Consider a random number dialer which picks a telephone number at

random in a certain area. Let Y be 1 if the call is picked up and 0

otherwise. Then the sample space is all allowable phone numbers and

Y is a binary random variable, called a Bernoulli r.v.

> sample(9999999,10) #draw a random sample of size 10 from 1:9999999

2097071 4378927 374022 4330670 9301962 4432222 9323759 5897322 835193 9509962

Example 3.2 Toss a coin 3 times: the sample space is Ω =

H,T × H,T × H,T.


Let X be the number of heads. Then,

Outcomes HHH HTH THH HHT HTT THT TTH TTT

Notation ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8

X(ω) = 3 2 2 2 1 1 1 0

FX is called a binomial r.v. w/ parameters n = 3 (no. of trials)

and p = 0.5 (prob. of success). It counts # of S in a Bernoulli trial.

> rbinom(30,3,0.5) #draw 30 times from Binomial dist with n=3, p=0.5

[1] 1 2 3 2 3 0 1 2 1 1 1 1 1 2 1 1 1 2 1 1 1 1 0 1 1 3 2 1 1 3

Example 3.3 Rare disease

For a rare disease, a commonly used method is to sample until getting

a certain # of cases. Let X be the number of samples required to ob-

tain the first case. Then, the sample space is Ω = S, FS, FFS, · · ·


and X(ω) is simply the number of letters in ω.

Example 3.4 Spatial data

Let X be the current temperature at a random location (defined by

latitude and longitude). Then, the sample space is [0, 180] × [0, 360]

and X(ω) = current temperature at that location.

Range of a random variable is all of its possible values. When the

range is countable, the random variable is discrete. When the range

is an interval on the number line, it is a continuous.

Examples: In Ex 3.1 – 3.3, the random variables are discrete; while

in Ex 3.4, the random variable is continuous.


3.2 Probability Distributions §3.2

Prob dist says how the total probability of 1 is distributed among

possible values of a r.v. X . For a discrete X, it is given by

p(x) = P (X = x) = Pω : X(ω) = x, for all x in the range ,

also called probability mass function (pmf).

Example 3.2 (continued). The range of X is 0, 1, 2, 3.

p(0) = P (X = 0) =1

8, p(1) =

3

8, p(2) =

3

8, p(3) =

1

8.

It can easily be visualized by the line diagram (graph), called a bi-

nomial dist with no. of trials n = 3 and prob. of success p = 0.5.


Figure 3.1: The line diagrams for the pmf in Examples 3.2 and 3.3. It is equivalent to the histogram in this case. A probabilityhistogram represents probability by area.

Example 3.3 (continued). The range of X is 1, 2, · · · . Let p be

the prevalence probability of the disease and q = 1− p. Then,

p(1) = P (X = 1) = p, p(2) = qp, p(3) = q2p,

p(x) = P (X = x) = qx−1p, · · · .x︷︸︸︷

FFFFFFS

It is referred to as a geometric distribution with parameter p.


Example 3.5 Sampling inspection

In 100 products, 3 of them are defective. Suppose that we pick 4

products at random. Let X be the number of defective products.

Find the distribution of X .

First of all, the range of X is 0, 1, 2, 3. Now,

p(0) = P (X = 0) =

(974

)(1004

) =97×96×95×94

4×3×2×1100×99×98×97

4×3×2×1

=96× 95× 94

100× 99× 98= 88.36%.

Similarly,

p(1) = P (X = 1) =

(973

)(31

)(1004

) = = = 11.28%,

p(2) = P (X = 2) =

(972

)(32

)(1004

) = = = 0.36%


and

p(3) = P (X = 3) =

(971

)(33

)(1004

) = = ≈ 0%.

called a hypergeometric distribution.

Cumulative distribution function (cdf) is defined as

F (x) = P (X ≤ x) =∑y:y≤x

p(y),

the probability that the observed value of X is at most x.

Example 3.2 (continued). The cdf of X is

F (0) =1

8, F (1) =

1

8+

3

8=

1

2, F (2) = F (1) + p(2) =

7

8, F (3) = 1.

Example 3.3 (continued). For any given integer x,

F (x) =

x∑y=1

pqy−1 = p1− qx

1− q= 1− qx.


For noninteger x, replace x above by its integer part [x].x=seq(0,40, 0.02) #create x values for calculating CDF of Ex 3.3

p = 0.3; q = 1-p #setting parameters

cdf = 1 - q^floor(x) #calculate CDF

plot (x, cdf, col="red", type="l", xlab="x", ylab="density")

p = 0.1; q = 1-p #setting parameters

cdf = 1 - q^floor(x)

lines(x, cdf, col="blue")

title("Ex 3.3: CDF of genometric distribution with p = 0.3 and 0.1")

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

x

y

Ex 3.1: A binomial Distribution

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

x

dens

ity

Ex 3.3: CDF of genometric distribution with p = 0.3 and 0.1

Figure 3.2: The cumulative distribution of Examples 3.1 and 3.3 (with parameter p = 0.3 and 0.1), which are a step function.


3.3 Expected Values §3.3

Used to summarize the outcome of a r.v.

Example 3.6 Playing a roulette

If you bet $1 on an even number, the chance to win is 18/38. What

is the expected payoff?

Imagine that you play 38000 times, you would expect to win 18000

times and lose 20000 times. Thus,

Expected payoff per game =18000× ($1) + 20000× ($− 1)

38000

=18

38× ($1) +

20

38× (−$1)︸︷︷︸∑

value×its probability

= −$0.05263


That is, you expect to lose 5 cents per game.

Definition: E(X) = µx =∑

j xj ×P(X = xj).

Interpretation: It is the long-run average. It is not a value

that is likely/expected to get.

> x = rbinom(100000,1,18/38) #play 100K (long-run) games with Bernoulli trials

> x=2*x-1 #convert into $1 and -$1

> x[1:30] #outcomes of first 30 games

[1] -1 1 1 1 -1 -1 1 -1 1 -1 1 1 1 1 1 -1 -1 -1 -1 1 -1 -1 1 -1 1 -1 -1 -1 -1 -1

> mean(x); sd(x) #average and SD of the outcomes of the game

[1] -0.05442 [1] 0.9985231

Example 3.7 Rolling a dice.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

What is the expected number of spots X on the top?

EX = 1× 1

6+ 2× 1

6+ · · · + 6× 1

6= 3.5.


Example 3.3 (continued). The expected value of the geometric dis-

tribution is

E(X) =

∞∑x=1

xqx−1p =d

dq(

∞∑n=1

qn)p = = 1/p.

Example 3.8 Group testing.

For a rare disease with 1% prevalence rate, the following group testing

is used. Pull the blood sample of 10 people together. If the result is

negative, all of them are negative. If the result is positive, test them

individually. If each test costs $1, what is the expected cost?


Pnone of 10 have disease = 0.9910 = 0.9044. Thus,

Expected cost = $1× 0.9044 + $11× 0.0956 = $1.956,

comparing with $10 of the naive method. Q: How to simulate?

Function of a r.v.: Y = g(X) is a r.v. defined as Y (ω) =

g(X(ω)). The pmf is given by

P (Y = y) =∑

xj:g(xj)=yP (X = xj).

Its expected value is given by E(Y ) =∑i yiP (Y = yi) and satisfies

♠ Eg(X) =∑i g(xi)P (X = xi).

♠ E(aX + b) = aE(X) + b.


Example 3.2 (Continued). Let X = No. of heads in 3 tosses. Find

the distribution of Y = (X − 2)2 and its expected value.

Note that

X = 0 1 2 3

Y = 4 1 0 1. The range of Y is 0, 1, 4 with

P (Y = 0) = P (X = 2) =3

8,

P (Y = 1) = P (X = 1) + P (X = 3) =1

2,

P (Y = 4) = P (X = 0) =1

8.

Method 1: (p(y)) EY = 0× 38 + 1× 1

2 + 4× 18 = 1.

Method 2: (p(x)) E(X − 2)2 = 4× 18 + 1× 3

8 + 0× 38 + 1× 1

8 = 1.

Variance: var(X) = σ2 = E(X − µ)2. SD(X) = σ =√

var(X). It

shows the typical size of the deviation of the r.v. X from µ.


FE(X) and SD(X) are long-run ave and SD if X is drawn repeatedly.

Shortcut formula: var(X) = E(X2)− (EX)2.

Properties: Fvar(aX + b) = a2 var(X)

FSD(aX + b) = |a| SD(X)

Example 3.8: (generalized, binary outcomes).

Let Y be the cost. Assume that P (Y = a) = p and P (Y = b) = q.

Then, EY = ap + bq. Similarly, EY 2 = a2p + b2q. Thus,

var(Y ) = a2p + b2q − (ap + bq)2 = (a− b)2pq.

Hence, SD(Y ) = |b− a|√pq.

—For roulette game in Ex 3.6, SD of gain = 2√

1838 ∗

2038 ≈ $1

—For data in Ex 3.8, we have SD(Y ) = 10√.9044 ∗ 0.0956 ≈ $2.94.


3.4 Probability density functions §4.1 & 4.2

Continuous rv’s: FIncome of a randomly drawn tax-payer;

FAmount of precipitation per year at a randomly selected location.


( )y f x=

ba

( )P a X b≤ ≤

Figure 3.3: Probabilities of a continuous rv are given

by the area under the density function. Hence P (X =

x) = 0 and P (X ≤ x) = P (X < x).

Definition: The probability den-

sity function (pdf) of a continuous rv

X is the function such that

P (a ≤ X ≤ b) =

∫ b

af (x)dx,

denoted by X ∼ f .

The density function should satisfy

♠ f (x) ≥ 0;


♠∫ +∞−∞ f (x)dx = 1⇐⇒ Total area is one.


Area = 1( )y f x=

Example 3.9 Uniform distribution.

Assume that measurements of gene expressions using a microarray

technique are recorded to the 2nd decimal point. Let X be

the rounding error, which is assumed to be uniformly distributed on

[−0.005, 0.005). The density of X is

f (x) =

100, if −0.005 ≤ x < 0.005

0, otherwise

Denoted by X ∼ unif[−0.005, 0.005).

The definition can easily be generalized to interval (a, b).


Figure 3.4: Density functions of X ∼ unif(a, b) and X ∼ exp(λ).

Example 3.10 Modeling of life time: products, firms.

Suppose that the life time of a product X follows the exponential

distribution with parameter λ:

f (x) = λ exp(−λx), for x ≥ 0

It is easy to check that f is a pdf:∫ +∞

−∞f (x)dx = λ

∫ ∞0

exp(−λx)dx =

∫ ∞0

exp(−y)dy = 1.

> rexp(5,2) #draw 5 data from exponential dist with lambda=2

[1] 0.2414199 1.4790871 0.1378886 0.8888352 0.3890791


> mean(rexp(1000,2)) #draw 1000 data and compute its average

[1] 0.5202178 #it is approx. 1/lambda

Interpretation of density:

♠ P (X = x) = 0 6= f (x).

♠ P (X ∈ x±∆) ≈ f (x)2∆ for small ∆ —showing how likely X = x at a unit interval near x.

♠ How can one observe a birth weight of 8.43 lbs? Think of X = 8.43 as X ∈ [8.425, 8.435)

due to the accuracy of the weighting device.

Cumulative distribution function:

F (x) = P (X ≤ x) =∫ x−∞ f (y)dy. Hence, F ′(x) = f (x).

Ex 3.10 (Cont): F (x) =∫ x

0 λ exp(−λx)dx = 1− exp(−λx)

Percentile: The p-th quantile ηp is the point such that

p = F (ηp) =

∫ ηp

−∞f (y)dy.


x

0 1 2 3 4 5 6

0.0

0.1

0.2

0.3

(a) density

x

y

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

(b) distribution

F(2)

F -1(0.8)

Figure 3.5: cdf and percentile of a continuous distribution.

η0.5 is called the median (half-life). You can write ηp = F−1(p).

Ex 3.10 (Cont): p = F (ηp) = 1 − exp(−ληp). Thus, ηp =

−log(1−p)λ . Half-life = log(2)/λ, by taking p = .5.


3.5 Expected value and variance of continuous RV’s

Expected value: Eg(X) =∫ +∞−∞ g(x)f (x)dx.

Compared with Eg(X) =∑x g(x)P (X = x) for the discrete case,

pdf plays the same role as pmf P (X = x).

Variance: var(X) = E(X−µ)2 = EX2− (EX)2, where µ = EX .

Example 3.11 Expected value and variance of uniform dist

Suppose that X ∼ unif(0, 1). Then,

EX =

∫ 1

0xdx = = 1/2.

EX2 =

∫ 1

0x2dx = = 1/3.


Hence, var(X) = 1/3− (1/2)2 = 1/12 and SD(X) =√

1/12 = 0.29.

Now, if Y = a + (b− a)X , then Y ∼ unif(a, b). Thus,

EY = = (b + a)/2.

var(Y ) = = (b− a)2/12.

New Chapter 3 Random Variables and Probability Distributionsjqfan/fan/classes/245/chap3.pdf · 2017. 9. 21. · Chapter 3 Random Variables and Probability Distributions 3.1 Random

Documents