III Modeling Random Behavior Probability Overview

III Modeling Random Behavior

A. Probability

1. Overview

Statisticians use probability to model uncertainty.

Consider-these statements:

• The probability that the next batch of Ti02 (white pigment) is unacceptable is .01.

• There is a 25% chance our firm will get the IBM order.

In each case, what we mean by a probability is

outcomes possible all ofset theof size""

interest ofset a of size"" the

Some notation will aid our discussion.

An event is a set of possible outcomes of interest.

The sample space, S, is the set of all possible outcomes.

If A is an event, the probability that A occurs is

Note:

• 0 ≤ P(A) ≤ 1

• P(S) = 1

• Probability may be either objective (based on prior experience) or poorly subjective.

• Ultimately, the accuracy of specific probabilities depends on assumptions.

• If the assumptions upon which we base a specific probability are wrong, then we should not expect the specific probability to be any good.

S

AAP

of size

of size)(

Example: Making Nickel Battery Plate

A particular process for making nickel battery plate requires an operator to sift nickel powder into a frame.

The process uses a very tight weight specification, which is difficult to make.

The supervisor monitored the last 1000 attempts made by an operator.

The operator successfully made the specification 379 times.

One way to get the probability of a successful attempt is

379.1000

379)attempt successful( P

The supervisor noted, however, that the operator seemed to get better over time.

In this case, the supervisor may believe that the actual probability is something larger than .379.

A perfectly reasonable, but subjective, estimate of theprobability of a successful attempt is 0.4.

2. Making Inferences Using Probabilities.

Suppose the supervisor really believes that the probability of a successful attempt is 0.4.

Suppose further that out of the next 50 attempts, she is never successful.

Would you now believe that the probability of a success is still 0.4?

OF COURSE NOT!

Consider another scenario.

Suppose the first attempt is unsuccessful.

Do you have good reason to believe that the true probability of a success is not 0.4?

Suppose the first two attempts are unsuccessful.

Suppose the first three are unsuccessful.

At what point do we begin to believe that the probability really is not 0.4?

The answer lies in calculating the probability of seeing y in a row assuming a probability of 0.4.

Once that probability is small enough, we can reasonably conclude that the true probability is not 0.4

3. Conditional Probability

Often, two events are related.

Knowing the relationship between the two events allows us to model the behavior of one event in terms of the other.

Conditional Probability quantifies the chances of one event occuring given that the other occurs.

We denote the probability that an event A occurs given that the event B has occured by P(A|B).

The key to conditional probability: the intersection of the two events defines the nature of their relationship.

This concept is best illustrated by an example: Personal Computers

A major manufacturer of personal computers has introduced a new p.c.

• As with most new products, there seem to be some problems.

• This manufacturer offers a one-year warranty on this model.

Let

• A be the event that the hard drive on a specific computer fails within one year.

• B be the event that the floppy drive on a specific computer fails within one year.

Consider a specific computer whose floppy drive has failed.

In this case, we know that the event B has occured.

What now is the probability that this same computer will have its hard drive fail?

What we seek is P(A|B).

Note: once we know that B has occured, the sample space of interest is restricted to B.

Similarly, once we know that B has occured, the set of interest is restricted to that portion of A which resides in B, A ∩ B.

1(B) outcomes possible all ofset theof size

)(interest ofset theof size)|( BA

BAP

S

SBA

of size1

of size1

(B) outcomes possible all ofset theof size

)(interest ofset theof size

S

SBA

of size(B) outcomes possible all ofset theof size

of size)(interest ofset theof size

)(

)(

BP

BAP

Definition: Conditional Probability

Let A and B be events in S.

The conditional probability of B given that A has occurred is

if P(A) > 0.

Similarly, the conditional probability of A given that B has occurred is

if P(B) > 0.

)(

)()|(

AP

BAPABP

)(

)()|(

BP

BAPBAP

Example: Personal Computers - Continued

The reliability engineers have determined that:

P(A) = .02 P(B) = .05 P(A U B) = .01

Note: P(A U B) is the probability that both the hard and floppy drives on a specific computer fail within one year).

The conditional probability that the hard drive fails given that the floppy drive fails is

As a result, if we know that the floppy drive failed on a given machine, then the probability the hard drive will fail also is 20%.

2.05.

01.

)(

)()|(

BP

BAPBAP

4. Independence

In many engineering situations, two events have no real relationship.

Knowing that one event has occured offers no new information about the chances the other will occur.

We call two such events independent.

Independence is important for a number of reasons:

• many engineering events either are independent or close enough for a first approximation

• independence provides a powerful basis for modeling the joint behavior of several events

• the formal concept of a random sample assumes that the observations are independent.

Definition: Independence

Let A, B be events in S. A and B are said to be independent if

P(A | B) = P(A)

Similarly, if A and B are independent, then

P(B | A) = P(B)

Example: Personal Computer - Continued

Recall:

P(A) = .02 P(B) = .05 P(A | B) = .20

Note: the hard drive failing and the floppy drive failing are not independent events because

P(A | B) ≠ P(A)

Why? Many personal computer designs use the floppy drive as an expensive air filter.

As the floppy drive gets dirty, it increases the likelihood of it failing.

Also, as the floppy get dirty, the p.c. does not vent heat as well, which increases the likelihood that the hard drive fails.

5. Basic Rules of Probability

1. 0 ≤ P(A) ≤ 1.

2. If Ø is the empty set, then P(Ø) = 0.

3. The Probability of Complements

If A is an event in some sample space S, then the complement of theset A relative to S is the set of outcomes in S which are not inA.

We denote the complement of A by Ā.

P(Ā) = 1 - P(A)

4. The Additive Law of Probability

If A and B are events in S, then the union of A and B, denoted by A U B, is the set of outcomes either in A or in B or in both.

The Additive Law of Probability is

P(A U B) = P(A) + P(B) - P(A ∩ B)

If A and B are mutually exclusive, then A ∩ B = Ø and P(A ∩ B) = 0; thus,

P(A U B) = P(A) + P(B)

5. The Multiplicative Law of Probability

If A and B are events in S, with P(A) > 0 and P(B) > 0, then

Thus,

P(A ∩ B) = P(A) • P(B | A)

Similarly,

P(A ∩ B) = P(B) • P(A | B)

)(

)()|(

AP

BAPABP

If A and B are independent, then

P(A | B) = P(A) P(B | A) = P(B)

Thus, if A and B are independent, then

P(A ∩ B) = P(A) • P(B)

This property is a very powerful result, making independence quite important for finding the probabilities associated with the intersections of events.

6. Simplest Form of the Law of Total Probability

Let A and B be events in S. We may partition B into two parts:

• that which overlaps A, A ∩ B, and

• that which overlaps Ā, Ā ∩ B

Thus,

P(B) = P(A ∩ B) + P(Ā ∩ B) = P(A) • P(B | A) + P(Ā) • P(B | Ā)

7. The Simplest Form of Bayes Rule

Let A and B be events in S.

Suppose we are given P(A), P(B | A), and P(B | Ā)

)|()()|()(

)|()(

)(

)()|(

ABPAPABPAP

ABPAP

BP

BAPBAP

Example for Toothpaste Containers

A toothpaste company uses four injection molding processes to make its toothpaste containers.

These are older pieces of equipment and subject to problems.

Event Description Prob A Machine 1 has a problem on any specific day 0.1 B Machine 2 has a problem on any specific day 0.2 C Machine 3 has a problem on any specific day 0.05 D Machine 4 has a problem on any specific day 0.05

What is the probability that no problems occur on any specific day?

Note: no problems means

• Machine 1 has no problems, , and



• Machine 4 has no problems,

Thus, we seek the probability of an intersection.

If we can assume independence, then the probability of the intersection is the product of the individual probabilities.

P(no problems)

A

B

C

D

6498.0)95)(.95)(.8)(.9(.

)05.01)(05.01)(2.01)(1.01(

)()()()(

)(

DPCPBPAP

DCBAP

B. Discrete Random Variables

1. Overview

Let Y be the number of problems that occur on a given day.

What does Y = 0 mean?

No problems, which is

What does Y=1 mean?

Exactly one problem, which is

Note: Each one of these events is mutually exclusive of the others; thus,

P(Y = 1)

When all is said and done P(Y = 1) = 0.30305

6498.)0( YPDCBA

DCBADCBADCBADCBA or or or

)()(

)()(

DCBAPDCBAP

DCBAPDCBAP

In a similar manner, we can show that

P(Y = 2) = 0.04455P(Y = 3) = 0.00255P(Y = 4) = 0.00005

Y is an example of a random variable.

We describe the behavior of a random variable by its distribution.

Every random variable has a cumulative distribution function, F(Y) defined by

F(y) = P(Y ≤ y)In our case

y F(y)0 0.649801 0.952852 0.997403 0.999954 1.00000

There are two types of random variables:

• Discrete, which have a countable number of possible values

• Continuous, which are over a continuum (have an uncountable number of value).

Discrete random variables have a probability function, p(y) defined by

p(y) = P(Y = y)

For our example

y p(y)0 0.649801 0.303052 0.044553 0.002554 0.00005

2. Expected Values

Random variables and their distributions provide a way to model random behavior and populations.

Parameters are important characteristics of populations.

For example,

• the typical number of problems which occur each day

• the variability in the number of problems which occur.

We have already outlined a distribution which describes this number.

We can use this distribution to define measures of typical and of variability.

Let Y be the discrete random variable of interest.

For example, let Y be the number of problems which occur on any given day with the injection molding process for toothpaste tubes.

A measure of the typical value for Y is the population example, or the expected value for Y.

A measure of the variability of Y is the population variance, σ2, defined by

where

)()( ypyYEy

222 )( yE

)()( 22 ypyYEy

What are What are the units of the population variance?

As a result, we often use the population standard deviation, σ, as a measure of variability where

Many texts note that virtually all of the data for a particular distribution should fall in the interval μ ± 3σ (the empirical rule).

In general, we should take this recommendation with a grain of salt because very skewed or heavy tailed distributions are exceptions.

The empirical rule does point out that we can begin to describe the behavior of many distributions with just two measures:

• the population mean and

• the population standard deviation.

Many engineers commonly evaluate their data using this notion of the mean plus or minus three standard deviations.

2

Example: Number of problems with an injection molding process for toothpaste tube.

Note:μ ± 3σ = 0.4 ± 3(0.587) = (-1.361, 2.161)

The chances of seeing data within this interval are 99.74\%.

4.0)()( ypyYEy

y p(y) y•p(y) y2 y2•p(y) 0 0.64980 0 0 0 1 0.30305 0.30305 1 0.30305 2 0.04455 0.08910 4 0.17820 3 0.00255 0.00765 9 0.02295 4 0.00005 0.00020 16 0.00080 Total 0.4 0.505

505.0)()( 22 ypyYEy

345.)4.0(505.0)( 2222 YE

587.02

3. Binomial Distribution

The manufacturer of nickel battery plate has imposed a tight initial weight specification which is difficult to meet.

Consider the next three attempts made by an operator who has a 40 % chance of being successful.

Let S represent a successful attempt.

Let F represent a failed attempt.

Let Y represent the number of successful attempts she makes.

Consider the probability that exactly two out of these three attempts are successful, i.e, P(Y = 2).

The possible ways she can get exactly two successful attempts are

(SSF) (SFS) (FSS)

Since these events are mutually exclusive, then the probability of exactly two successful attempts is

P(Y = 2) = P(SSF) + P(SFS) + P(FSS)

In this situation, we can reasonably assume that each attempt is independent of the others.

Let p be the probability that she succeeds in meeting the weight specification on any given attempt.

Thus, p = 0.4.

Let q = 1 - p be the probability that she fails. In this specific case, q = .6.

Since each attempt is independent of the others, then

P(SSF) = P(S) • P(S) • P(F) = p • p • q = p2 • q = 0.096

P(SFS) = P(S) • P(F) • P(S) = p • q • p = p2 • q = 0.096

P(FSS) = P(F) • P(S) • P(S) = q • p • p = p2 • q = 0.096

As a result,

P(Y = 2) = P(SSF) + P(SFS) + P(FSS)

= p2 • q + p2 • q + p2 • q

= 3 • p2 • q

= (number of ways to get 3 successes) • p2 • q

= 3 (0.096) = 0.288

In general, if she makes n total attempts, the probability that she succeeds exactly y times is

P(Y = y) = (number of ways, y successes out of n) •py • qn-y

We commonly use the binomial coefficient to denote the number of ways to get y successes from n total attempts.

We define by

By definition 0!= 1

We now can write the probability of obtaining exactly y successes out of n total attempts as

P(Y = y) = (number of ways to get y successes out of n tries) • py • qn-y

=

y

n

y

n

)!(!

!

yny

n

y

n

ynyyny qpyny

nqp

y

n

)!(!

!

Consider an experiment which meets the following conditions:

1. the experiment consists of a fixed number of trials, n;

2. each trial can result in one of only two possible outcomes: a “successes” or a “failure”;

3. the probability, p, of a “success” is constant for each trial;

4. the trials were independent; and

5. the random variable of interest, Y is the number of successes over the n trials.

If these conditions hold, then Y is said to follow a binomial distribution with parameters, n and p.

The probability function for a binomial random variable is

The mean, variance, and standard deviation are

ynyyny qpyny

nqp

y

nyp

)!(!

!)(

npYE )(npq2npq

Example

NASA downloads massive data files from a specific satellite three times a day.

Historically, the probability that the data file is corrupted during transmission is .10.

Consider a day's set of transmissions.

What is the probability that exactly two data files are corrupted?

Let Y = number of files corrupted.

P(y = 2)

027.0

)9(.)1(.3)9(.)1(.!2

!23

)9(.)1(.)!1(!2

!3

)9(.)1(.2

3)2(

22

2

12

ynyqp

y

np

Find the mean number of files corrupted.

μ = np = 3(.1) = .3

Find the variance and standard deviation for the number of files corrupted.

σ2 = npq = 3(.1)(.9) = .27

Using the empirical rule, we expect virtually of the data to fall within the interval

μ ± 3σ = 0.3 ± 3(0.52) = (-1.26, 1.86)

As a result, we should rarely see 2 or more corrupted files.

52.02

4. Poisson Distribution

Many engineering problems require us to model the random behavior of small counts.

For example, a manufacturer of nickel- hydrogen batteries ran into a problem with cells shorting out prematurely.

Each cell used 60 nickel plates.

The manufacturer and its customer cut open several cells and discovered that the problem cells all had plates with “blisters” while the good cells did not.

Two possible approaches:

• Classifies each plate as either conforming (blister free) or non-conforming (one or more blisters).

-- Model with a binomial distribution.

-- Reduces the data into either acceptable or not acceptable.

-- Often ignores the subtleties in the data.

• Count the number of blisters on each cell.

-- Conforming plates have counts of 0.

-- Non-conforming plates have counts of 1 or more.

-- A plate with many blisters truly is defective and does short out a cell.

-- A plate with only one blister may function perfectly well.

Counting the number of blisters provides more information about the specific problem.

The Poisson distribution often proves useful for modeling small counts.

Let λ be the rate of these counts.

If Y follows a Poisson distribution, then

With

)(YE

2

otherwise0

,2,1,0!)()(

yeyyYPyp

y

Example: Consider a maintenance manager of an industrial facility.

Historically, a certain department averages six repairs per week.

What is the probability that during a randomly selected week, this department will require only two repairs?

Let Y = number of repairs.

P(Y=2)

0446.0

)(2

36!2

)6(

!

6

6

2

e

e

ey

y

What is the probability of at least one repair?

What is the expected number of repairs?

9975.

0025.1

1 !0

)(1

)0(1)1(

6

0

e

e

YPYP

6)( YE

What are the variance and standard deviation for the number ofrepairs?

Using the empirical rule, we expect virtually of the data to fall within the interval

As a result, we should rarely see 14 or more repairs in any given week.

45.26

62

)35.13 ,35.1()45.2(363

C. Continuous Random Variables

1. OverviewThe continuous random variables studied in this course have probability density

functions, f(y).

Some Properties of f(y):

1.

2.

3.

4.

5.

1)( dyyf

0)( yf

0 )()()(00

y dyyfyYPyF

2

1)()()()(

1221

y

y yFyFdyyfyYyP0)(

0yYP

0

00)()(

0

y

y dyyfyYP

A very important example of a continuous random variable is one which follows an exponential distribution.

The exponential distribution often provides an excellent model for describing the behavior of equipment life times.

Example:

The times between repairs for an ethanol-water distillation column are well modeled by an exponential distribution which has the form

where λ is the rate of repairs.

In this case, λ = .001 repairs/hr.

Thus, this column, on the average, requires 1 repair every 1000 hours of operation.

otherwise0

0,0)(

yeyf

y

What is the probability that the next time to repair will be less than 100 hours from the previous repair?

For our example,

In our case, λ = .001 and y0 = 100; thus,

dyyfyYP )()(0

0

0

0

1

|0

0

y

yy

y y

e

e

dye

095.0

905.01

1

1)100(1.

)100)(001(.

e

eYP

What is the probability that the time between repairs will be between 500 and 1500 hours?

In this case, y1=500 and y2=1500; thus,

21

2

1

2

1

|

)(21

yy

y

y

y

y

y

y

ee

e

dyeyYyP

383.0)1500500( 5.15.0 eeYP

2. Expected Values – Revisited

For a continuous random variable, Y, the expected value is

The variance of Y is once again

where

Once again, the standard deviation is

dyyyfYE )()(

222 )( yE

dyyfyYE )()( 22

2

Example: The time between repairs

We said these times were well modeled by an exponential distribution with λ = .001 accidents/hr.

otherwise0

0,0)(

yeyf

y

hours 1000001.

1

1

)()(

0

dyey

dyyyfYEy

2

0

2

22

2

)()(

dyey

dyyfyYEy

2

22

2

2

222

1

12

12

)(

YE

1

2

In our case,

1000001.

1

000,000,1001.

12

2

3. Relationship of Distributions and Data Displays

Distributions can provide a powerful basis for modeling the random behavior of important characteristics of interest.

Formal statistical analyses require certain assumptions about the underlying distribution of the data.

Typically, these assumptions center on the “shape” of the data.

Appropriate data displays provide a quick and easy way to check these assumptions, especially the stem-and-leaf display and the histogram.

The theoretical shape of a stem-and-leaf display for a given set of data is

• the probability function, p(y) for a discrete random variable, and

• the pdf, f(y), for a continuous random variable.

Example: Times Between Industrial Accidents

Lucas (1985) analyzed the times between accidents at an industrial facility.

We can model these times by an exponential distribution with λ = 0.05.

The following plot graphs the pdf for this specific distribution.

Consider overlaying an appropriately scaled plot of the pdf on a histogram of the data.

This plot indicates that the exponential distribution does provide a reasonable basis for modeling these times.

4. The Normal Distribution

The normal distribution is the single most important distribution in classical statistics.

Many naturally occuring phenomenon are well modeled by this distribution.

Let Y be a normally distributed random variable, its pdf is given by

Note: the pdf depends on the parameters

• μ – the population mean

• σ2 – the population variance

Thus, E(Y) = μ var(Y) = σ2

2

2

1

2

1)(

y

eyf

The plot of the pdf looks like

The plot is single peaked, centered at μ, symmetric, and the tails die out rapidly.

• 68.3% of the area of the curve falls within the interval μ ± σ

• 95.4% falls within μ ± 2σ

• 99.7% falls within μ ± 3σ

We can find any probabilities we need through the standard normal random variable.

The standard normal distribution has

• μ = 0

• σ2 = 1

We denote a standard normal random variable by Z.

The values listed in Table I of the Appendix are P(Z ≤ z_0).

Thus, P(Z ≤ 1.96) = 0.9750

Consider P(Z > z0)

P(Z > 2.33) = 1- P(Z ≤ 2.33) = 1 - .9901 = .0099

Finally, consider P(z1 < Z ≤ z2)

Consider P(-1.00 ≤ Z ≤ 1.50)

P(Z ≤ 1.50) - P(Z ≤ -1.00) = .9332 - .1587 = .7745 .

We often need to use the Z-value associated with specific “tail” areas of the standard normal distribution.

Let represent the Z-value associated with a right hand “tail area” of .

is that value for Z such that

z

z

)( zZP

As a result, is that value from the table which satisfies

or

For example, z0.025 is that Z such that

P(Z ≤ z0.025) = 1.0 - 0.025 = 0.975 .

Looking into the body of the table, we obtain

Z0.025 = 1.96

z

)(0.1 zZP

0.1 tablefrom value

0.1)( zZP

We can transform any normal random variable, Y, to a standard normal, Z, by

By subtracting μ, we recenter the random variable around 0.

By dividing by σ, we rescale the random variable so that the variance is 1.

By subtracting μ, which is the expected value of Y, the expected value of Z is 0.

By dividing by σ, we rescale the random variable so that the resulting Z value represents the number of standard deviations a value of a random variable lies from its mean.

Y

Z

Example: Suppose that you are an engineer assigned to the bottling departmentof the Busch Beer Company.

A particular 12 oz. bottling machine is known to dispense beer according to a normal distribution with a mean of 12 oz and a variance of .04 oz2.

What is the probability that this machine dispenses more than 12.5 oz?

Let Y be the amount dispensed. We seek

0062.9938.1

)5.2(1

)5.2(

2.

0.125.12

5.12

)5.12()5.12(

ZP

ZP

ZP

YP

YPYP

What is the probability that between 11.75 and 12.5 oz. are dispensed?

8882.

1056.9938.

)25.1()5.2(

)5.225.1(

2.

0.125.12

2.

0.1275.11

5.1275.11)5.1275.11(

ZPZP

ZP

ZP

YPYP

D. Random Behavior of Means

1. The Sample Mean

Definition: Sample mean

Let y1, y2, …, yn be a sample of n observations.

The sample mean, , is given by

The sample mean is a measure of the typical value for a data set.

It represents the “center of gravity”.

Example Battery Plate Porosities

Nickel - Hydrogen (Ni-H) batteries use a nickel plate as its anode.

A critical quality characteristic is the plate's porosity which controls the interface of the anode with the potassium hydroxide electrolyte solution.

y

n

ii

yn

y1

1

A recent random sample of ten porosities yielded:

79.1 79.5 79.3 79.3 78.8 79.0 79.2 79.7 79.0 79.2

The sample mean is

21.7910

1.792

11

n

ii

yn

y

2. Random Samples

Define: Random Sample

Let y1, y2, …, yn be a sample of n observations taken from some population.

If these observation are independent of each other and if each observation follows the same distribution, then

y1, y2, …, yn

is said to be a random sample.

All the distribution theory of classical statistics is based upon this concept of a random sample.

3. Central Limit Theorem

Consider taking a series of random samples, all of size n, from some population, and calculating for each one.

Since the data are random, is also a random variable!

An important question: What is its distribution?

If the population from which we sample is normal, the also follows a normal distribution.

But, how often do you know that the population really is normal?

Very Rarely!

y

y

y

The Central Limit Theorem:

Better Known as the Statistician's Full Employment Act.

Consider a population with mean μ and variance σ2.

As the sample size, n, approaches infinity, the distribution of

approaches the standard normal distribution.

Bottom line: If n is sufficiently large, then approximately follows a normal distribution with

• μ

• “standard error”

• Z represents the number of standard errors lies from μ.

n

yZ

/

y

y

n/

What is the catch?

What constitutes sufficiently large?

If the parent population is normal, n = 1

If population is symmetric and the tails die out rapidly,

then n = 3-5 is large enough.

A classic example is the uniform distribution.

Note:

• The distribution is symmetric.

• It does not have a unique peak.

• When its tails die, they die!

In this case, sample sizes of 6-12 are considered adequate for applying the Central Limit Theorem.

As the parent distribution looks less and less normal, the sample size required to assume the Central Limit Theorem gets larger.

Important point: When determining if the sample size is big enough, we need to look at the distribution for the parent population.

In practice, what must we check to see whether the Central Limit Theorem applies?

• Stem-and-Leaf displays

• Normal Probability Plots

4. Normal Probability Plot

The normal probability plot is a simple graphical tool for assessing if the data come close to following a normal distribution.

Many software packages generate it automatically.

If the data follow a normal distribution, the normal probability plot should look like a straight line.

Significant deviations from the straight line suggest that the data are not “well-behaved”.

A reasonable question: How straight is straight?

Many analysts use the “fat pencil” rule.

For a suitably scaled plot, if we can cover the points with a fat pencil, the line is straight enough.

Example: The Plate Porosities

The Stem-and-Leaf Display

Stem Leaves No. Depth78.•: 8 1 1 79.*: 001 3 4 79.t: 2233 4 79.f: 5 1 2 79.s: 7 1 1

y

Quantile

s of st

andard

norm

al

79.779.679.579.479.379.279.179.078.978.8

2

1

0

-1

-2

The Normal Probability Plot

For a sample size of 10, we should feel reasonably comfortable assuming the Central Limit Theorem in this case.

5. Using the Central Limit Theorem

Suppose the historic standard deviation for these porosities has been 0.25.

Suppose further that the target porosity is 79.0.

A reasonable question: What is the probability that we see a sample mean for 10 porosities ≥ 79.21 (the observed sample mean from our sample)?

We seek )21.79( yP

nZP

nn

yPyP

/

21.79

/

21.79

/)21.79(

In our case

μ = 79.0 σ = 0.25 n = 10

Thus,

Note: it is a very rare event to see an average of ten porosities greater than or equal to 79.21 when the true mean porosity is 79.0.

But we actually observed an average of 79.21, which suggests that the true mean porosity, at least for the time period studied, is larger than 79.0.

0039.9961.1

)]66.2([1

)66.2(

10/25.0

0.7921.79)21.79(

ZP

ZP

ZPyP

E. Random Behavior of Means, Variance Unknown

1. The Sample Variance

When the variance, σ2 is known, the Central Limit Theorem suggests that

follows a standard normal distribution if $n$ is big enough.

What would seem to be a logical thing to do when σ2 is unknown?

ESTIMATE IT!

n

y

/

Definition: The Sample Variance.

Let y1, y2, …, yn be a random sample of n observations.

\bigskip

The sample variance, s2, is defined by

Note:

• s2 looks like an “average”

In fact, it is the “average” squared deviation from using n-1 instead of n in the denominator.

• The reason for using n-1 will be discussed later.

• s2 ≥ 0

The sample standard deviation, s, is

y

2s

2

1

2 )(1

1yy

ns

n

ii

The computational form of s2 is

Example: Thicknesses of Silicon Wafers

A major semiconductor manufacturer grinds wafers in batches of 31.

For this particular product, suppose that the target thickness is 244 μm.

A random sample yielded the following results:

240 243 250 253 248

)1(

)(1 1

22

2

nn

yyns

n

i

n

i ii

The sample mean, sample variance, and the sample standard deviation are

8.2465

1234

11

n

ii

yn

y

7.27

)4(5

)1234()304662(5

)1(

)(

2

1 1

22

2

nn

yyns

n

i

n

i ii

263.57.272 ss

2. The t Distribution

Question: What distribution does follow?

If the data come from a parent distribution which follows a normal distribution, then

follows a t distribution with n-1 degrees of freedom

1. The t statistic represents the number of estimated standard errors a given value for lies from its mean.

2. The t distribution is shorter, squatter version of the Z.

ns

y

/

ns

y

/

y

3. As n gets sufficiently large the tn-1 distribution is well approximated by the Z distribution.

4. The t statistic is well known to be “robust” to the normality assumption.

In general, we feel comfortable using the t statistic whenever we sample from a “well-behaved” distribution.

As the sample size get bigger, the parent distribution can be less and less well-behaved.

Example: Thicknesses of Silicon Wafers, Continued

For this particular product, suppose that the target thickness is 244 μm.

A random sample yielded the following results:

240 243 250 253 248

We already have found

•

• s2 = 27.7

• s = 5.263

Our t statistic is

This t value suggests that the observed sample mean is quite close to the target value.

8.246y

19.15/263.5

2448.246/

ns

yt

F. The Normal Approximation to the Binomial Distribution

Recall the binomial distribution.

• Y represents the number of “successes” in n trials.

• p is the probability of a success on any given trial.

• n is the total number of trials.

• μ = E(Y) = np.

• σ2 = np(1-p) = npq.

It can be shown that as n gets large, the distribution of approaches a standard normal.

Bottom line: If n is sufficiently large, the binomial distribution is well approximated by a normal.

What is sufficiently large?

)1( pnp

npY

General Rule of Thumb: n is sufficiently large if

• np > 5, (the expected number of successes), and

• n(1-p) > 5 (the expected number of failures).

There is a slight catch.

Let y0 be an integer.

Consider P(Y = y0).

Remember, Y follows a binomial distribution, which is discrete; therefore,

P(Y = y0) > 0 for 0 ≤ y0 ≤ n.

But P(Y = y0) = 0 for a normal random variable.

What should we do?

Recall my example of my height.

To be 6’1” tall means that someone is between 6’0(1/2)” and 6’1(1/2)” tall.

We shall do the same thing know. (Called a correction factor).

Let Y* be a normally distributed random variable with

• mean np

• variance npq.

Note:

1. Y* has the same mean and the same variance as Y, the original binomial random variable.

2. is a standard normal random variable. )1( pnp

npY

We can approximate P(Y = a) by

Similarly,

•

•

•

•

2

1

2

1)( * aYaPaYP

5.)( * aYPaYP

5.)( * aYPaYP

5.)( * aYPaYP

5.)( * aYPaYP

Example: Consider a production line of decorative bricks.

Historically, the probability that any given brick is rejected is 0.01.

Suppose an inspector examines 1000 bricks per day.

What is the probability that she rejects less than 2 bricks?

Let Y = number of defective bricks found.

Note, we seek P(Y < 2).

g

0035.

)70.2(

)99)(.01.0(1000

)01.0(10005.1

)1(

5.1

)1(

)5.2()2(*

*

ZP

ZP

pnp

np

pnp

npYP

YPYP

What is the probability that she rejects between 8 and 13 bricks, inclusive?

We seek P(8 ≤ Y ≤ 1)

6517.

2148.8665.

)11.179.0(

)99(.10

105.13

)99(.10

105.7

)1(

5.13

)1()1(

5.7

)5.135.8()138(*

*

ZP

ZP

pnp

np

pnp

npY

pnp

npP

YPYP

III Modeling Random Behavior Probability Overview

Documents