III Modeling Random Behavior A. Probability 1. Overview Statisticians use probability to model uncertainty. Consider-these statements: • The probability that the next batch of Ti0 2 (white pigment) is unacceptable is .01. • There is a 25% chance our firm will get the IBM order. In each case, what we mean by a probability is outcome possible all of set the of size" " interest of set a of size" " the
III Modeling Random Behavior Probability Overview Statisticians use probability to model uncertainty. Consider-these statements: The probability that the next batch of Ti 0 2 (white pigment) is unacceptable is .01. There is a 25% chance our firm will get the IBM order. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
III Modeling Random Behavior
A. Probability
1. Overview
Statisticians use probability to model uncertainty.
Consider-these statements:
• The probability that the next batch of Ti02 (white pigment) is unacceptable is .01.
• There is a 25% chance our firm will get the IBM order.
In each case, what we mean by a probability is
outcomes possible all ofset theof size""
interest ofset a of size"" the
Some notation will aid our discussion.
An event is a set of possible outcomes of interest.
The sample space, S, is the set of all possible outcomes.
If A is an event, the probability that A occurs is
Note:
• 0 ≤ P(A) ≤ 1
• P(S) = 1
• Probability may be either objective (based on prior experience) or poorly subjective.
• Ultimately, the accuracy of specific probabilities depends on assumptions.
• If the assumptions upon which we base a specific probability are wrong, then we should not expect the specific probability to be any good.
S
AAP
of size
of size)(
Example: Making Nickel Battery Plate
A particular process for making nickel battery plate requires an operator to sift nickel powder into a frame.
The process uses a very tight weight specification, which is difficult to make.
The supervisor monitored the last 1000 attempts made by an operator.
The operator successfully made the specification 379 times.
One way to get the probability of a successful attempt is
379.1000
379)attempt successful( P
The supervisor noted, however, that the operator seemed to get better over time.
In this case, the supervisor may believe that the actual probability is something larger than .379.
A perfectly reasonable, but subjective, estimate of theprobability of a successful attempt is 0.4.
2. Making Inferences Using Probabilities.
Suppose the supervisor really believes that the probability of a successful attempt is 0.4.
Suppose further that out of the next 50 attempts, she is never successful.
Would you now believe that the probability of a success is still 0.4?
OF COURSE NOT!
Consider another scenario.
Suppose the first attempt is unsuccessful.
Do you have good reason to believe that the true probability of a success is not 0.4?
Suppose the first two attempts are unsuccessful.
Suppose the first three are unsuccessful.
At what point do we begin to believe that the probability really is not 0.4?
The answer lies in calculating the probability of seeing y in a row assuming a probability of 0.4.
Once that probability is small enough, we can reasonably conclude that the true probability is not 0.4
3. Conditional Probability
Often, two events are related.
Knowing the relationship between the two events allows us to model the behavior of one event in terms of the other.
Conditional Probability quantifies the chances of one event occuring given that the other occurs.
We denote the probability that an event A occurs given that the event B has occured by P(A|B).
The key to conditional probability: the intersection of the two events defines the nature of their relationship.
This concept is best illustrated by an example: Personal Computers
A major manufacturer of personal computers has introduced a new p.c.
• As with most new products, there seem to be some problems.
• This manufacturer offers a one-year warranty on this model.
Let
• A be the event that the hard drive on a specific computer fails within one year.
• B be the event that the floppy drive on a specific computer fails within one year.
Consider a specific computer whose floppy drive has failed.
In this case, we know that the event B has occured.
What now is the probability that this same computer will have its hard drive fail?
What we seek is P(A|B).
Note: once we know that B has occured, the sample space of interest is restricted to B.
Similarly, once we know that B has occured, the set of interest is restricted to that portion of A which resides in B, A ∩ B.
1(B) outcomes possible all ofset theof size
)(interest ofset theof size)|( BA
BAP
S
SBA
of size1
of size1
(B) outcomes possible all ofset theof size
)(interest ofset theof size
S
SBA
of size(B) outcomes possible all ofset theof size
of size)(interest ofset theof size
)(
)(
BP
BAP
Definition: Conditional Probability
Let A and B be events in S.
The conditional probability of B given that A has occurred is
if P(A) > 0.
Similarly, the conditional probability of A given that B has occurred is
if P(B) > 0.
)(
)()|(
AP
BAPABP
)(
)()|(
BP
BAPBAP
Example: Personal Computers - Continued
The reliability engineers have determined that:
P(A) = .02 P(B) = .05 P(A U B) = .01
Note: P(A U B) is the probability that both the hard and floppy drives on a specific computer fail within one year).
The conditional probability that the hard drive fails given that the floppy drive fails is
As a result, if we know that the floppy drive failed on a given machine, then the probability the hard drive will fail also is 20%.
2.05.
01.
)(
)()|(
BP
BAPBAP
4. Independence
In many engineering situations, two events have no real relationship.
Knowing that one event has occured offers no new information about the chances the other will occur.
We call two such events independent.
Independence is important for a number of reasons:
• many engineering events either are independent or close enough for a first approximation
• independence provides a powerful basis for modeling the joint behavior of several events
• the formal concept of a random sample assumes that the observations are independent.
Definition: Independence
Let A, B be events in S. A and B are said to be independent if
P(A | B) = P(A)
Similarly, if A and B are independent, then
P(B | A) = P(B)
Example: Personal Computer - Continued
Recall:
P(A) = .02 P(B) = .05 P(A | B) = .20
Note: the hard drive failing and the floppy drive failing are not independent events because
P(A | B) ≠ P(A)
Why? Many personal computer designs use the floppy drive as an expensive air filter.
As the floppy drive gets dirty, it increases the likelihood of it failing.
Also, as the floppy get dirty, the p.c. does not vent heat as well, which increases the likelihood that the hard drive fails.
5. Basic Rules of Probability
1. 0 ≤ P(A) ≤ 1.
2. If Ø is the empty set, then P(Ø) = 0.
3. The Probability of Complements
If A is an event in some sample space S, then the complement of theset A relative to S is the set of outcomes in S which are not inA.
We denote the complement of A by Ā.
P(Ā) = 1 - P(A)
4. The Additive Law of Probability
If A and B are events in S, then the union of A and B, denoted by A U B, is the set of outcomes either in A or in B or in both.
The Additive Law of Probability is
P(A U B) = P(A) + P(B) - P(A ∩ B)
If A and B are mutually exclusive, then A ∩ B = Ø and P(A ∩ B) = 0; thus,
P(A U B) = P(A) + P(B)
5. The Multiplicative Law of Probability
If A and B are events in S, with P(A) > 0 and P(B) > 0, then
Thus,
P(A ∩ B) = P(A) • P(B | A)
Similarly,
P(A ∩ B) = P(B) • P(A | B)
)(
)()|(
AP
BAPABP
If A and B are independent, then
P(A | B) = P(A) P(B | A) = P(B)
Thus, if A and B are independent, then
P(A ∩ B) = P(A) • P(B)
This property is a very powerful result, making independence quite important for finding the probabilities associated with the intersections of events.
6. Simplest Form of the Law of Total Probability
Let A and B be events in S. We may partition B into two parts:
• that which overlaps A, A ∩ B, and
• that which overlaps Ā, Ā ∩ B
Thus,
P(B) = P(A ∩ B) + P(Ā ∩ B) = P(A) • P(B | A) + P(Ā) • P(B | Ā)
7. The Simplest Form of Bayes Rule
Let A and B be events in S.
Suppose we are given P(A), P(B | A), and P(B | Ā)
)|()()|()(
)|()(
)(
)()|(
ABPAPABPAP
ABPAP
BP
BAPBAP
Example for Toothpaste Containers
A toothpaste company uses four injection molding processes to make its toothpaste containers.
These are older pieces of equipment and subject to problems.
Event Description Prob A Machine 1 has a problem on any specific day 0.1 B Machine 2 has a problem on any specific day 0.2 C Machine 3 has a problem on any specific day 0.05 D Machine 4 has a problem on any specific day 0.05
What is the probability that no problems occur on any specific day?
Note: no problems means
• Machine 1 has no problems, , and
• Machine 2 has no problems, , and
• Machine 3 has no problems, , and
• Machine 4 has no problems,
Thus, we seek the probability of an intersection.
If we can assume independence, then the probability of the intersection is the product of the individual probabilities.
P(no problems)
A
B
C
D
6498.0)95)(.95)(.8)(.9(.
)05.01)(05.01)(2.01)(1.01(
)()()()(
)(
DPCPBPAP
DCBAP
B. Discrete Random Variables
1. Overview
Let Y be the number of problems that occur on a given day.
What does Y = 0 mean?
No problems, which is
What does Y=1 mean?
Exactly one problem, which is
Note: Each one of these events is mutually exclusive of the others; thus,
The manufacturer of nickel battery plate has imposed a tight initial weight specification which is difficult to meet.
Consider the next three attempts made by an operator who has a 40 % chance of being successful.
Let S represent a successful attempt.
Let F represent a failed attempt.
Let Y represent the number of successful attempts she makes.
Consider the probability that exactly two out of these three attempts are successful, i.e, P(Y = 2).
The possible ways she can get exactly two successful attempts are
(SSF) (SFS) (FSS)
Since these events are mutually exclusive, then the probability of exactly two successful attempts is
P(Y = 2) = P(SSF) + P(SFS) + P(FSS)
In this situation, we can reasonably assume that each attempt is independent of the others.
Let p be the probability that she succeeds in meeting the weight specification on any given attempt.
Thus, p = 0.4.
Let q = 1 - p be the probability that she fails. In this specific case, q = .6.
Since each attempt is independent of the others, then
P(SSF) = P(S) • P(S) • P(F) = p • p • q = p2 • q = 0.096
P(SFS) = P(S) • P(F) • P(S) = p • q • p = p2 • q = 0.096
P(FSS) = P(F) • P(S) • P(S) = q • p • p = p2 • q = 0.096
As a result,
P(Y = 2) = P(SSF) + P(SFS) + P(FSS)
= p2 • q + p2 • q + p2 • q
= 3 • p2 • q
= (number of ways to get 3 successes) • p2 • q
= 3 (0.096) = 0.288
In general, if she makes n total attempts, the probability that she succeeds exactly y times is
P(Y = y) = (number of ways, y successes out of n) •py • qn-y
We commonly use the binomial coefficient to denote the number of ways to get y successes from n total attempts.
We define by
By definition 0!= 1
We now can write the probability of obtaining exactly y successes out of n total attempts as
P(Y = y) = (number of ways to get y successes out of n tries) • py • qn-y
=
y
n
y
n
)!(!
!
yny
n
y
n
ynyyny qpyny
nqp
y
n
)!(!
!
Consider an experiment which meets the following conditions:
1. the experiment consists of a fixed number of trials, n;
2. each trial can result in one of only two possible outcomes: a “successes” or a “failure”;
3. the probability, p, of a “success” is constant for each trial;
4. the trials were independent; and
5. the random variable of interest, Y is the number of successes over the n trials.
If these conditions hold, then Y is said to follow a binomial distribution with parameters, n and p.
The probability function for a binomial random variable is
The mean, variance, and standard deviation are
ynyyny qpyny
nqp
y
nyp
)!(!
!)(
npYE )(npq2npq
Example
NASA downloads massive data files from a specific satellite three times a day.
Historically, the probability that the data file is corrupted during transmission is .10.
Consider a day's set of transmissions.
What is the probability that exactly two data files are corrupted?
Let Y = number of files corrupted.
P(y = 2)
027.0
)9(.)1(.3)9(.)1(.!2
!23
)9(.)1(.)!1(!2
!3
)9(.)1(.2
3)2(
22
2
12
ynyqp
y
np
Find the mean number of files corrupted.
μ = np = 3(.1) = .3
Find the variance and standard deviation for the number of files corrupted.
σ2 = npq = 3(.1)(.9) = .27
Using the empirical rule, we expect virtually of the data to fall within the interval
μ ± 3σ = 0.3 ± 3(0.52) = (-1.26, 1.86)
As a result, we should rarely see 2 or more corrupted files.
52.02
4. Poisson Distribution
Many engineering problems require us to model the random behavior of small counts.
For example, a manufacturer of nickel- hydrogen batteries ran into a problem with cells shorting out prematurely.
Each cell used 60 nickel plates.
The manufacturer and its customer cut open several cells and discovered that the problem cells all had plates with “blisters” while the good cells did not.
Two possible approaches:
• Classifies each plate as either conforming (blister free) or non-conforming (one or more blisters).
-- Model with a binomial distribution.
-- Reduces the data into either acceptable or not acceptable.
-- Often ignores the subtleties in the data.
• Count the number of blisters on each cell.
-- Conforming plates have counts of 0.
-- Non-conforming plates have counts of 1 or more.
-- A plate with many blisters truly is defective and does short out a cell.
-- A plate with only one blister may function perfectly well.
Counting the number of blisters provides more information about the specific problem.
The Poisson distribution often proves useful for modeling small counts.
Let λ be the rate of these counts.
If Y follows a Poisson distribution, then
With
)(YE
2
otherwise0
,2,1,0!)()(
yeyyYPyp
y
Example: Consider a maintenance manager of an industrial facility.
Historically, a certain department averages six repairs per week.
What is the probability that during a randomly selected week, this department will require only two repairs?
Let Y = number of repairs.
P(Y=2)
0446.0
)(2
36!2
)6(
!
6
6
2
e
e
ey
y
What is the probability of at least one repair?
What is the expected number of repairs?
9975.
0025.1
1 !0
)(1
)0(1)1(
6
0
e
e
YPYP
6)( YE
What are the variance and standard deviation for the number ofrepairs?
Using the empirical rule, we expect virtually of the data to fall within the interval
As a result, we should rarely see 14 or more repairs in any given week.
45.26
62
)35.13 ,35.1()45.2(363
C. Continuous Random Variables
1. OverviewThe continuous random variables studied in this course have probability density
functions, f(y).
Some Properties of f(y):
1.
2.
3.
4.
5.
1)( dyyf
0)( yf
0 )()()(00
y dyyfyYPyF
2
1)()()()(
1221
y
y yFyFdyyfyYyP0)(
0yYP
0
00)()(
0
y
y dyyfyYP
A very important example of a continuous random variable is one which follows an exponential distribution.
The exponential distribution often provides an excellent model for describing the behavior of equipment life times.
Example:
The times between repairs for an ethanol-water distillation column are well modeled by an exponential distribution which has the form
where λ is the rate of repairs.
In this case, λ = .001 repairs/hr.
Thus, this column, on the average, requires 1 repair every 1000 hours of operation.
otherwise0
0,0)(
yeyf
y
What is the probability that the next time to repair will be less than 100 hours from the previous repair?
For our example,
In our case, λ = .001 and y0 = 100; thus,
dyyfyYP )()(0
0
0
0
1
|0
0
y
yy
y y
e
e
dye
095.0
905.01
1
1)100(1.
)100)(001(.
e
eYP
What is the probability that the time between repairs will be between 500 and 1500 hours?
In this case, y1=500 and y2=1500; thus,
21
2
1
2
1
|
)(21
yy
y
y
y
y
y
y
ee
e
dyeyYyP
383.0)1500500( 5.15.0 eeYP
2. Expected Values – Revisited
For a continuous random variable, Y, the expected value is
The variance of Y is once again
where
Once again, the standard deviation is
dyyyfYE )()(
222 )( yE
dyyfyYE )()( 22
2
Example: The time between repairs
We said these times were well modeled by an exponential distribution with λ = .001 accidents/hr.
otherwise0
0,0)(
yeyf
y
hours 1000001.
1
1
)()(
0
dyey
dyyyfYEy
2
0
2
22
2
)()(
dyey
dyyfyYEy
2
22
2
2
222
1
12
12
)(
YE
1
2
In our case,
1000001.
1
000,000,1001.
12
2
3. Relationship of Distributions and Data Displays
Distributions can provide a powerful basis for modeling the random behavior of important characteristics of interest.
Formal statistical analyses require certain assumptions about the underlying distribution of the data.
Typically, these assumptions center on the “shape” of the data.
Appropriate data displays provide a quick and easy way to check these assumptions, especially the stem-and-leaf display and the histogram.
The theoretical shape of a stem-and-leaf display for a given set of data is
• the probability function, p(y) for a discrete random variable, and
• the pdf, f(y), for a continuous random variable.
Example: Times Between Industrial Accidents
Lucas (1985) analyzed the times between accidents at an industrial facility.
We can model these times by an exponential distribution with λ = 0.05.
The following plot graphs the pdf for this specific distribution.
Consider overlaying an appropriately scaled plot of the pdf on a histogram of the data.
This plot indicates that the exponential distribution does provide a reasonable basis for modeling these times.
4. The Normal Distribution
The normal distribution is the single most important distribution in classical statistics.
Many naturally occuring phenomenon are well modeled by this distribution.
Let Y be a normally distributed random variable, its pdf is given by
Note: the pdf depends on the parameters
• μ – the population mean
• σ2 – the population variance
Thus, E(Y) = μ var(Y) = σ2
2
2
1
2
1)(
y
eyf
The plot of the pdf looks like
The plot is single peaked, centered at μ, symmetric, and the tails die out rapidly.
• 68.3% of the area of the curve falls within the interval μ ± σ
• 95.4% falls within μ ± 2σ
• 99.7% falls within μ ± 3σ
We can find any probabilities we need through the standard normal random variable.
The standard normal distribution has
• μ = 0
• σ2 = 1
We denote a standard normal random variable by Z.
The values listed in Table I of the Appendix are P(Z ≤ z_0).
We often need to use the Z-value associated with specific “tail” areas of the standard normal distribution.
Let represent the Z-value associated with a right hand “tail area” of .
is that value for Z such that
z
z
)( zZP
As a result, is that value from the table which satisfies
or
For example, z0.025 is that Z such that
P(Z ≤ z0.025) = 1.0 - 0.025 = 0.975 .
Looking into the body of the table, we obtain
Z0.025 = 1.96
z
)(0.1 zZP
0.1 tablefrom value
0.1)( zZP
We can transform any normal random variable, Y, to a standard normal, Z, by
By subtracting μ, we recenter the random variable around 0.
By dividing by σ, we rescale the random variable so that the variance is 1.
By subtracting μ, which is the expected value of Y, the expected value of Z is 0.
By dividing by σ, we rescale the random variable so that the resulting Z value represents the number of standard deviations a value of a random variable lies from its mean.
Y
Z
Example: Suppose that you are an engineer assigned to the bottling departmentof the Busch Beer Company.
A particular 12 oz. bottling machine is known to dispense beer according to a normal distribution with a mean of 12 oz and a variance of .04 oz2.
What is the probability that this machine dispenses more than 12.5 oz?
Let Y be the amount dispensed. We seek
0062.9938.1
)5.2(1
)5.2(
2.
0.125.12
5.12
)5.12()5.12(
ZP
ZP
ZP
YP
YPYP
What is the probability that between 11.75 and 12.5 oz. are dispensed?
8882.
1056.9938.
)25.1()5.2(
)5.225.1(
2.
0.125.12
2.
0.1275.11
5.1275.11)5.1275.11(
ZPZP
ZP
ZP
YPYP
D. Random Behavior of Means
1. The Sample Mean
Definition: Sample mean
Let y1, y2, …, yn be a sample of n observations.
The sample mean, , is given by
The sample mean is a measure of the typical value for a data set.
It represents the “center of gravity”.
Example Battery Plate Porosities
Nickel - Hydrogen (Ni-H) batteries use a nickel plate as its anode.
A critical quality characteristic is the plate's porosity which controls the interface of the anode with the potassium hydroxide electrolyte solution.
y
n
ii
yn
y1
1
A recent random sample of ten porosities yielded:
79.1 79.5 79.3 79.3 78.8 79.0 79.2 79.7 79.0 79.2
The sample mean is
21.7910
1.792
11
n
ii
yn
y
2. Random Samples
Define: Random Sample
Let y1, y2, …, yn be a sample of n observations taken from some population.
If these observation are independent of each other and if each observation follows the same distribution, then
y1, y2, …, yn
is said to be a random sample.
All the distribution theory of classical statistics is based upon this concept of a random sample.
3. Central Limit Theorem
Consider taking a series of random samples, all of size n, from some population, and calculating for each one.
Since the data are random, is also a random variable!
An important question: What is its distribution?
If the population from which we sample is normal, the also follows a normal distribution.
But, how often do you know that the population really is normal?
Very Rarely!
y
y
y
The Central Limit Theorem:
Better Known as the Statistician's Full Employment Act.
Consider a population with mean μ and variance σ2.
As the sample size, n, approaches infinity, the distribution of
approaches the standard normal distribution.
Bottom line: If n is sufficiently large, then approximately follows a normal distribution with
• μ
• “standard error”
• Z represents the number of standard errors lies from μ.
n
yZ
/
y
y
n/
What is the catch?
What constitutes sufficiently large?
If the parent population is normal, n = 1
If population is symmetric and the tails die out rapidly,
then n = 3-5 is large enough.
A classic example is the uniform distribution.
Note:
• The distribution is symmetric.
• It does not have a unique peak.
• When its tails die, they die!
In this case, sample sizes of 6-12 are considered adequate for applying the Central Limit Theorem.
As the parent distribution looks less and less normal, the sample size required to assume the Central Limit Theorem gets larger.
Important point: When determining if the sample size is big enough, we need to look at the distribution for the parent population.
In practice, what must we check to see whether the Central Limit Theorem applies?
• Stem-and-Leaf displays
• Normal Probability Plots
4. Normal Probability Plot
The normal probability plot is a simple graphical tool for assessing if the data come close to following a normal distribution.
Many software packages generate it automatically.
If the data follow a normal distribution, the normal probability plot should look like a straight line.
Significant deviations from the straight line suggest that the data are not “well-behaved”.
A reasonable question: How straight is straight?
Many analysts use the “fat pencil” rule.
For a suitably scaled plot, if we can cover the points with a fat pencil, the line is straight enough.