Introduction to Statistics : Randomness & Probability
Introduction to StatisticsRandomness and Probability
Part II
Instructor : Siana Halim
-S. Halim -
-S. Halim -
TOPICS
• Understanding Randomness• From Randomness to
Probability• Probability Rules !• Random Variables• Probability Models• Normal Distribution
References:•De Veaux, Velleman , Bock, Stats, Data and Models, Pearson Addison WesleyInternational Edition, 2005•John A Rice, Mathematical Statistics and Data Analysis, Duxbury Press, 1995
Introduction to Statistics : Randomness & Probability
-S. Halim -
Introduction to Statistics : Randomness & Probability
-S. Halim -
Introduction to Statistics : Randomness & Probability
An insurance company offers a “death and disability” policy that pays $10.000 when you die $ 5.000 if you are permanently disabled.
It charge a premium of only $50 a year for this benefit. Is the company likely to make a profit selling such a plan ?
To answer this question, the company needs to know the probability that its clients will die or be disabled in any year.
4. Random Variables
-S. Halim -
Introduction to Statistics : Randomness & Probability
Expected Value : Center
The amount the company pays out on an individual policy is called a random variable because its value is based on the outcome of a random event.
Let X be a random variable, and x is the realization of X
For the insurance company, x can be $10.000 (if you die that year), $5000 (if you are disabled), or 0$ (if neither occurs).
Because we can list all the outcomes, we might formally call this random variable a discrete random variable. Otherwise, we’d call it a continuous random variable. The collection of all possible values and the probabilities that they occur is called the probability model for the random variable.
-S. Halim -
Introduction to Statistics : Randomness & Probability
997/10000Neither
2/10005000Disability
1/100010.000Death
ProbabilityP(X = x)
PayoutX
Policyholderoutcome
We can’t predict what will happen during any given year, but we can say what we expect to happen.
The expected value of a policy is a parameter of this model. In fact, it’s the mean. We’ll signify this with the notation μ (for population mean) or E(X) for expected value.
∑== )()( xpxxEμ
For this case :μ = E(X)
= $ 10.000 (1/1000) + $5000 (2/1000) + $0 (997/1000)
= $20
The expected value of a (discrete) random variable is :
-S. Halim -
Introduction to Statistics : Randomness & Probability
First Center, Now Spread…
For data, we calculated the standard deviation, by first computing the deviation from the mean and squaring it. We do that with (discrete) random variables as well. Fist, we find the deviation of each payout from the mean (expected value) :
997/10002/1000
1/1000
ProbabilityP(X = x)
(0-20) = -20(5000-20)=4980
(10.000-20)= 9980
Deviation(x-μ)
0Neither5000Disability
10.000Death
PayoutX
Policyholder outcome
Next we square each deviation. The variance is the expected value of those squared deviations.
Var (X) = 99802 (1/1000) + 49802 (2/1000) +(-20)2 (997/1000) = 149.600
SD (X) = 78,386$600,149 ≈
-S. Halim -
Introduction to Statistics : Randomness & Probability
)()(
)(.)()( 22
XVarXSD
xXPxXVar
==
=−== ∑σ
μσ
The variance and the standard deviation of a (discrete) random variable are
More About Means and Variances
Var (aX) = a2 Var(X)E (aX) = a E(X)
Var (X ± c) = Var(X)E(X ± c) = E(X) ± c
-S. Halim -
Introduction to Statistics : Randomness & Probability
In general,
• The mean of the sum of two random variables is the sum of the means.
• The mean of the difference of two random variables is the difference of the means
• If the random variables are independent, the variances of their sum or difference is always the sum of the variances.
Var (X ± Y) = Var(X) + Var(Y)E(X ± Y) = E(X) ± E(Y)
Beware ! For random variables
X + X + X ≠ 3X
-S. Halim -
Introduction to Statistics : Randomness & Probability
What Can Go Wrong ?• Probability models are still just models. Models can be useful, but they are not reality.
• If the model is wrong, so is everything else. Before you try to find the mean or standard deviation of a random variable, check to make sure the probability model is reasonable.
• Watch out for variables that aren’t independent. You can add expected values of any two random variables, but you can only add variances of independent random variables.
•Variances of independent random variables add. Standard deviations don’t.
•Variances of independent random variables add, even when you’re looking at the difference between them
• Don’t write independent instances of a random variables with notation that looks like they are the same. Write X1 + X2 + X3 rather than X + X + X
-S. Halim -
Introduction to Statistics : Randomness & Probability
5. Probability Models
Generally, the probability measure on the sample space determines the probabilities of the various values of X; if those values are denoted by then there is a function p such that
and
This function is called the probability mass function, or the freqeuncy function of the random variable X.
,..., 21 xx
)()( xXPxp i ==
∑ =i
ixp 1)(
Probability Mass Function
-S. Halim -
Introduction to Statistics : Randomness & Probability
You’ve got to have the Tiger Woods picture, so you start madly opening boxes of cereal, hoping to find one. Assuming that the pictures are randomly distributed, there’s a 20% chance you succeed on any box you open. We call the act of opening a box a “trial”, and note that:
• There are only two possible outcomes (called success and failure) on each trial. Either you get Tiger’s picture or you don’t
• The probability of success denoted p, is the same on every trial. Here p = 0.2
•The trials are independent. Finding Tiger in the first box does not change what might happen when you reach for the next box.
Situations like this are called Bernoulli trials.
Searching for Tiger
-S. Halim -
Introduction to Statistics : Randomness & Probability
Daniel Bernoulli (1700-1782) discovere of Bernoulli trials.
A Bernoulli random variables takes on only two values; 1 and 0 with probabilities p and q = 1-p, respectively
Its probability mass function is thus
p(1) = p
p(0) = 1-p
100)( ≠≠= xandxifxp ⎩⎨⎧ ==−
=−
otherwisexorxifpp
xpxx
010)1(
)(1
-S. Halim -
Introduction to Statistics : Randomness & Probability
• What‘s the probability that you will find the Tiger‘s picture in the first box of cereal ? It‘s 20%. We could write P(# boxes = 1) = 0.20.
• How about the probability that you don‘t find Tiger until the second box ? Well, that means you fail on the first trial and then succeed on the second. With the probability of success 20%, the probability of failure will be q = 1-0.2 = 80%. Since the trials are independent, the probability of getting your first success on the second trial is P(#boxes = 2)= (0.8)(0.2) = 0.16.
• What are the chances that you won‘t find Tiger until the fifth box of cereal ? You‘d have to fail 4 straight times and then succeed, so P(#boxes = 5) = (0.8)4(0.2) = 0.08192
• How many boxes might you expect to have to open ? We could reason that since Tiger‘s picture is in 20% of the boxes, or 1 in 5, we expect to find his picture on average, in the fifth box; that is, μ = 1/0.2 = 5 boxes.
-S. Halim -
Introduction to Statistics : Randomness & Probability
The Geometric ModelWe are more likely to want to know how long it will take us to achieve a success. The model that tells us this probability is called the Geometric probability model.
Geometric probability model for Bernoulli trials : Geom (p)
p = probability of success(and q = 1-p = probability of failure)
X = number of trials until the first success occursP(X = x) = qx-1 p
Expected value : Standard deviation :
p1
=μ
2pq
=σ
The 10% condition
Bernoulli trials must be independent. If that assumption is violated, it is still okay to proceed as long as the sample is smaller than 10% of the population.
Probability mass function of geometric distribution random variable with p = 1/9
-S. Halim -
Introduction to Statistics : Randomness & Probability
The Binomial ModelSame situation, different question. You buy 5 boxes of cereal. What‘s the probability you get exactly 2 pictures of Tiger Woods ?
We are still talking about Bernoulli trials, BUT with a different question.
We want to find the probability of getting 2 successes among the 5 trials.
We asked how long it would take until our first success
NowBefore
This time we‘re interested in the number of success in the 5 trials. We want to find P(# successes = 2). This is an example of a Binomial probability.
It takes two parameters to define the Binomial model;
the number of trials, n, and the probability of success, p.
We denote this Binom (n,p).
-S. Halim -
Introduction to Statistics : Randomness & Probability
Binomial probability model for
Bernoulli trials : Binom (n,p)
n = number of trials
p = probability of success
(and q = 1-p = probability of failure)
X = number of trials until the first success occurs
Expected value :
Standard deviation :
)!(!!,)(
xnxn
xn
whereqpxn
xXP xnx
−=⎟⎟
⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛== −
np=μnpq=σ
n = 10, p=0.1
n=10,p =0.5
-S. Halim -
Introduction to Statistics : Randomness & Probability
The Poisson Model
When rare events occur together or in clusters, people often want to know if that happend just by chance, or whether something else is going on. If we assume that the events occur independently, we can use a Binomial model to find the probability that a cluster of events like this occurs. For rare events, p will be quiet small, and when n is large it may be difficult to compute the exact probability that a certain size cluster occurs.
Simeon Denis Poisson was a French mathematician interested in events with very small probability. He originally derived his model to approximate the Binomial model when the probability of success, p, is very small and the number of trials, n , is very large.
-S. Halim -
Introduction to Statistics : Randomness & Probability
Poisson probability model for succes : Poisson (λ)
λ = mean number of successes.
X = number of successes
Expected value : E(X) = λ
Standard deviation : SD(X) =
!)(
xexXP
xλλ−==
λ
One of the consequences of the Poisson model is that , as long as the mean rate of occurrences stays constant, the occurence of past events doesn‘t change the probability of future events.
Poisson probability mass function 10)(,5)(,1)(,1.)( ==== λλλλ dcba
-S. Halim -
Introduction to Statistics : Randomness & Probability
For a continuous random variable, the role of the probability mass function is taken by a density function, f(x), which has the properties that:
•
• f is piecewise continuous function
•
If X is a random variable with a density function f, then for any a < b the probability that X falls in the interval (a,b) is the area under the density function between a and b:
0)( ≥xf
∫∞
∞−
= 1)( dxxf
∫=<<b
a
dxxfbXaP )()(
Continuous Random Variables
-S. Halim -
A uniform random variable on the interval [0,1] is a model for what we mean , when we say “choose a number at random between 0 and “.
⎩⎨⎧
><≤≤
=10,010,1
)(xorxx
xf
⎩⎨⎧
><≤≤−
=boraxbxaab
xf,0
),/(1)(
The uniform density function on the interval [0,1] is defined as follow:
The uniform density on a general interval [a,b] is
Uniform Random Variables
Introduction to Statistics : Randomness & Probability
-S. Halim -
∫ ===a
a
dxxfaXP 0)()( )()()( bXaPbXaPbXaP ≤<=<≤=<<
The cdf of continuous random variable X is defined in the same way as for a discrete random variable:
The cdf can be used to evaluate the probability that X falls in an interval:∫∞−
=≤=x
duufxXPxf )()()(
)()()( aFbFbXaP −=≤≤
Properties of Continuous Random Variables
Introduction to Statistics : Randomness & Probability
-S. Halim -
The exponential density function is
The cumulative distribution function is⎩⎨⎧
<≥
=−
0,00,
)(xxe
xfxλλ
∫∞−
−
⎩⎨⎧
<≥−
==x x
xxe
duufxF0,00,1
)()(λ
The exponential distribution is often used to model lifetimes or waiting times, in which context it is conventional to replace x by t.
Exponential densities with (solid), (dotted), and (dashed)
5.=λ1=λ 2=λ
Introduction to Statistics : Randomness & Probability
The Exponential Density
-S. Halim -
Suppose that we consider modeling the lifetime of an electronic component as an exponential random variable, that the component has lasted a length of time s, and that we wish to calculate the probability that it will last at least t more time units, that is, we wish to find P(T>t + s | T>s) :
t
s
st
ee
esTPstTP
sTPsTandtTPsTstTP
λ
λ
λ
−
−
+−
=
=
>+>
=
>>>
=>+>
)(
)()(
)()()|(
We see that the probability that the unit will last t more time units does not depend on s. The exponential distribution is consequently said to be memory less.
But ! It is clearly not a good model for human lifetimes, since the probability that a 16-year-old will live at least 10 more years is not the same as the probability that an 80-year-old will live at least 10 more years.
Introduction to Statistics : Randomness & Probability
-S. Halim -
Introduction to Statistics : Randomness & Probability
6. Normal Probability Models
The density function of the normal distribution depends on two parameters, μ and σ , where
-∞< μ < ∞; σ > 0
∞<<−∞= −− xexf x ,2
1)(22 2/)( σμ
πσ
The parameters μ and σ are called the mean and standard deviation of the normal density.
The special case for which μ=0 and σ=1 are called the standard normal density
-S. Halim -
Introduction to Statistics : Randomness & Probability
-S. Halim -
The standard normal table gives the area to the left of a spesified z:
= Area under curve to the left of z
For the probability of an interval [a,b]
= (Area to the left of b) – (Area to the left of a)
The following properties can be derived from the symmetry of the density about 0 :
(a) = 0.5
(b)
(c) If z > 0:
)( zZP ≤
)( bZaP ≤≤
)0( ≤ZP
)(1)( zZPzZP ≤−=−≤)0(5.0)( zZPzZP ≤<+=≤
)0(5.0)( zZPzZP ≤<−=−≤
Property (c) is needed for using other normal tables that give only that probability )0( zZP ≤<
Introduction to Statistics : Randomness & Probability
Use of Normal Table
-S. Halim -
If X is then is N(0,1). So
Where the probabilites for Z are obtained from the standard normal table
),( σμNσμ−
=XZ
⎟⎠⎞
⎜⎝⎛ −
≤=⎟⎠⎞
⎜⎝⎛ −
≤−
=≤σμ
σμ
σμ bZPbXPbXP )(
⎟⎠⎞
⎜⎝⎛ −
≤≤−
=
⎟⎠⎞
⎜⎝⎛ −
≤−
≤−
=≤≤
σμ
σμ
σμ
σμ
σμ
bZaP
bXaPbXaP )(
Property 1 of Normal Distribution
Introduction to Statistics : Randomness & Probability
-S. Halim -
Property 2 of Normal Distribution
If X is , then
Y = a+bX is
Multiplying by a constant b and adding a constant a only changesthe mean and the variance of the normal distribution
),( σμN),( σμ bbaN +
Property 3 of Normal Distribution
The sum of two independent normals is normal. If X is and Y is , then for independent X and Y
X+Y is where
),( 11 σμN),( 22 σμN
),( σμN22
21
22
21
221
σσσ
σσσ
μμμ
+=
+=
+=
Introduction to Statistics : Randomness & Probability
-S. Halim -
The Normal Approximation To the Binomial Distribution
If X has the binomial b(n,p), where n is large and p is not too near 0 or 1, the distribution of the standardized variable is approximately N(0,1)
(without continuity correction)
(using continuity correction)
npqnpXZ )( −=
⎟⎟⎠
⎞⎜⎜⎝
⎛
−−
≤≤−
−≈≤≤
)1()1()(
pnpnpbZ
pnpnpaPbXaP
⎟⎟⎠
⎞⎜⎜⎝
⎛
−−+
≤≤−−−
≈)1(
5.0)1(
5.0pnpnpbZ
pnpnpaP
Introduction to Statistics : Randomness & Probability
-S. Halim -
Introduction to Statistics : Randomness & Probability
Suppose the Surabaya Red Cross anticipates the need for at least 1850 units of O-negative blood this year. It estimates that it will collect blood from 32.000 donors. How great is the risk that the “Surabaya” Red Cross will fall short of meeting its need ?
The Normal Model to the Rescue !
We can use the Binomial model with n=32000 and p = 0.06. But to calculate the probability of getting exactly 1850 units of O-negative blood from 32.000 donors is tedious (or outright impossible).
Instead, we should use the Normal model.
-S. Halim -
Introduction to Statistics : Randomness & Probability
The Binomial model has mean np = 1920 and standard deviation . We could try approximating its distribution with a Normal model, using the same mean and standard deviation.
48.42≈npq
05.0)65.1(48.4219201850)1850( ≈−<≈⎟
⎠⎞
⎜⎝⎛ −
<=< zPzPXP
There seems to be about a 5% chance that this Red Cross chapter will run short of O-negative blood.
Can we always use a Normal model to make estimates of Binomial probabilities ? NO ! We only use a Normal model only for a large enough number of trials. And what we mean by “large enough” depends on the probability of success.
We need a large sample if the probability of success is very low (or very high).
-S. Halim -
Introduction to Statistics : Randomness & Probability
The success / failure condition
A Binomial model is approximately Normal if we expected at least 10 successes and 10 failures. np ≥ 10 and nq ≥ 10
Math Box A Normal model extends infinitely in both directions. But a Binomial model must have between 0 and n successes, so if we use a Normal to approximate a Binomial, we have to cut off its tails.
np > 9Since q ≤ 1, we can requirenp > 9 qNow simplifyn2p2 > 9 npqSquaring yieldsnp > 3 For a Binomial that’s :μ > 3σOr, in other wordsμ - 3σ > 0We require:
npq
-S. Halim -
Introduction to Statistics : Randomness & Probability
What Can Go Wrong ?
• Be sure you have Bernoulli trials. Be sure to check the requirements first : two outcomes per trial, a constant probability of success, and independence. Remember the 10% condition provides a reasonable substitute for independence.
• Don’t confuse Geometric and Binomial models. Both involve Bernoulli trials, but the issues are different. If you are repeating trials until your first success, that’s a Geometric probability. If you are counting the number of successes in a specific number of trials, that’s a Binomial probability.
• Don’t use the Normal approximation with small n. To use a Normal approximation in place of a Binomial model, there must be at least 10 expected success and 10 expected failures. For a large n when np is small, consider using a Poisson Model instead.