Top Banner
Confidence Interval Estimation For statistical inference in decision making: Chapter 6
70
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 06.ppt

Confidence Interval Estimation

For statistical inference in

decision making:

Chapter 6

Page 2: Chapter 06.ppt

Objectives

• Central Limit Theorem• Confidence Interval Estimation of the

Mean (σ known)• Interpretation of the Confidence Interval• Confidence Interval Estimation of the

Mean (σ unknown)• Confidence Interval Estimation for the

Proportion• Determining Sample Size

Page 3: Chapter 06.ppt

Central Limit Theorem

Irrespective of the shape of the underlying distribution of the population, by increasing the sample size, sample means & proportions will approximate normal distributions if the sample sizes are sufficiently large.

Page 4: Chapter 06.ppt

Central Limit Theorem in action:

Page 5: Chapter 06.ppt

How large must a sample be for the Central Limit theorem to apply?

The sample size varies according to the shape of the population.

However, for our use, a sample size of 30 or larger will suffice.

Page 6: Chapter 06.ppt

Must sample sizes be 30 or larger for populations that are normally distributed?

No. If the population is normally distributed, the sample means are normally distributed for sample sizes as small as n=1.

Page 7: Chapter 06.ppt

Why not just always pick a sample size of 30?

Page 8: Chapter 06.ppt

How can I tell the shape of the underlying population?

• CHECK FOR NORMALITY:• Use descriptive statistics. Construct stem-and-leaf plots for small

or moderate-sized data sets and frequency distributions and histograms for large data sets.

• Compute measures of central tendency (mean and median) and compare with the theoretical and practical properties of the normal distribution. Compute the interquartile range. Does it approximate the 1.33 times the standard deviation?

• How are the observations in the data set distributed? Do approximately two thirds of the observations lie between the mean and plus or minus 1 standard deviation? Do approximately four-fifths of the observations lie between the mean and plus or minus 1.28 standard deviations? Do approximately 19 out of every 20 observations lie between the mean and plus or minus 2 standard deviations?

Page 9: Chapter 06.ppt

Why do I care if X-bar, the sample mean, is normally distributed?

Page 10: Chapter 06.ppt

Because I want to use Z scores to analyze sample means.

But to use Z scores, the data must be normally distributed.

That’s where the Central Limit Theorem steps in.

Recall that the Central Limit Theorem states that sample means are normally distributed regardless of the shape of the underlying population if the sample size is sufficiently large.

Page 11: Chapter 06.ppt

Recall from Chapter 5:

• Z = (X - µ) ÷ σ

• If sample means are normally distributed, the Z score formula applied to sample means would be:

• Z = [X-bar - µX-bar ] ÷ σ X-bar

Page 12: Chapter 06.ppt

Background

• To determine µX-bar, we would need to randomly draw out all possible samples of the given size from the population, compute the sample means, and average them. This task is unrealistic. Fortunately, µX-bar equals the population mean µ, which is easier to access.

• Likewise, computing the value of σX-bar, we would have to take all possible samples of a given size from a population, compute the sample means, and determine the standard deviation of sample means. This task is also unrealistic. Fortunately, σX-bar can be computed by using the population standard deviation divided by the square root of the sample size.

Page 13: Chapter 06.ppt

Note:

As the sample size increases,

the standard deviation of the sample means becomes smaller and smaller

because the population standard deviation is being divided by larger and larger values of the square root of n.

Page 14: Chapter 06.ppt

The ultimate benefit of the central limit theorem is a useful version of the Z formula for sample means.

Page 15: Chapter 06.ppt

Z Formula for Sample Means:

Z = [X-bar - µ] ÷ σ / √ n

Page 16: Chapter 06.ppt

Example:

The mean expenditure per customer at a tire store is $85.00, with a standard deviation of $9.00.

If a random sample of 40 customers is taken, what is the probability that the sample average expenditure per customer for this sample will be $87.00 or more?

Page 17: Chapter 06.ppt

Because the sample size is greater than 30, the central limit theorem says the sample means are normally

distributed.

Z = [X-bar - µ] ÷ σ / √ n

Z = [$87.00 - $85.00] ÷ $9.00 / √ 40

Z = $2.00 / $1.42 = 1.41

Page 18: Chapter 06.ppt

For Z = 1.41 in the Z distribution table, the probability is .4207.

This represents the probability of getting a mean between $87.00 and the population mean $85.00.

Solving for the tail of the distribution yields

.5000 - .4207 = .0793

• This is the probability of X-bar ≥ $87.00.

Page 19: Chapter 06.ppt

Interpretations

Therefore, 7.93% of the time, a random sample of 40 customers from this population will yield a mean expenditure of $87.00 or more.

OR

From any random sample of 40 customers, 7.93% of them will spend on average $87.00 or more.

Page 20: Chapter 06.ppt

Interpretations

Therefore, 7.93% of the time, a random sample of 40 customers from this population will yield a mean expenditure of $87.00 or more.

From any random sample of 40 customers, 7.93% of them will spend on average $87.00 or more.

Page 21: Chapter 06.ppt

Solve:

Suppose that during any hour in a large department store, the average number of shoppers is 448, with a standard deviation of 21 shoppers.

What is the probability that a random sample of 49 different shopping hours will yield a sample mean between 441 and 446 shoppers?

Page 22: Chapter 06.ppt

Statistical Inference

Page 23: Chapter 06.ppt

Statistical Inference facilitates decision making.

Page 24: Chapter 06.ppt

Via sample data,

we can estimate something about our population,

such as its average value µ,

by using the corresponding sample mean, X-bar.

Page 25: Chapter 06.ppt

Recall that µ,

the population mean to be estimated,

is a parameter,

while X-bar, the sample mean, is a statistic.

Page 26: Chapter 06.ppt

Point Estimate

A point estimate is a statistic taken from a sample and is used to estimate a population parameter.

However, a point estimate is only as good as the sample it represents. If other random samples are taken from the population, the point estimates derived from those samples are likely to vary.

Because of variation in sample statistics, estimating a population parameter with a confidence interval is often preferable to using a point estimate.

Page 27: Chapter 06.ppt

Confidence Interval

A confidence interval is a range of values within which it is estimated with some confidence the population parameter lies.

Confidence intervals can be one or two-tailed.

Page 28: Chapter 06.ppt

Confidence Interval to Estimate µ

• By rearranging the Z formula for sample means, a confidence interval formula is constructed:

• X-bar +/- Z α/2 σ / √ n

• Where:• α = the area under the normal curve outside the

confidence interval• α/2 = the area in one-tail of the distribution outside

the confidence interval

Page 29: Chapter 06.ppt

The confidence interval formula yields a range (interval) within which we feel with some confidence the population mean is located.

It is not certain that the population mean is in the interval unless we have a 100% confidence interval that is infinitely wide, so wide that it is meaningless.

Page 30: Chapter 06.ppt

Confidence interval estimates for five different samples of n=25, taken from a population where

µ=368 and σ=15

Page 31: Chapter 06.ppt

Common levels of confidence intervals used by analysts are

90%, 95%, 98%, and 99%.

Page 32: Chapter 06.ppt

95% Confidence Interval

• For 95% confidence, α = .05 and α / 2 = .025. The value of Z.025 is found by looking in the standard normal table under .5000 - .025 = .4750. This area in the table is associated with a Z value of 1.96.

• An alternate method: multiply the confidence interval, 95% by ½ (since the distribution is symmetric and the intervals are equal on each side of the population mean.

• (½) (95%) = .4750 (the area on each side of the mean) has a corresponding Z value of 1.96.

Page 33: Chapter 06.ppt

In other words, of all the possible X-bar values along the horizontal axis of the normal distribution curve, 95% of them should be within a Z score of 1.96 from

the mean.

Page 34: Chapter 06.ppt

Margin of Error

Z [σ / √ n]

Page 35: Chapter 06.ppt

Example:

• A business analyst for cellular telephone company takes a random sample of 85 bills for a recent month and from these bills computes a sample mean of 153 minutes. If the company uses the sample mean of 153 minutes as an estimate for the population mean, then the sample mean is being used as a POINT ESTIMATE. Past history and similar studies indicate that the population standard deviation is 46 minutes.

• The value of Z is decided by the level of confidence desired. A confidence level of 95% has been selected.

Page 36: Chapter 06.ppt

153 + /- 1.96( 46/ √ 85)= 143.22 ≤ µ ≤ 162.78

• The confidence interval is constructed from the point estimate, 153 minutes, and the margin of error of this estimate, + / - 9.78 minutes.

• The resulting confidence interval is 143.22 ≤ µ ≤ 162.78.

• The cellular telephone company business analyst is 95% confident that the average length of a call for the population is between 143.22 and 162.78 minutes.

Page 37: Chapter 06.ppt

Interpreting a Confidence Interval

• For the previous 95% confidence interval, the following conclusions are valid:

• I am 95% confident that the average length of a call for the population µ, lies between 143.22 and 162.78 minutes.

• If I repeatedly obtained samples of size 85, then 95% of the resulting confidence intervals would contain µ and 5% would not. QUESTION: Does this confidence interval [143.22 to 162.78] contain µ? ANSWER: I don’t know. All I can say is that this procedure leads to an interval containing µ 95% of the time.

• I am 95% confident that my estimate of µ [namely 153 minutes] is within 9.78 minutes of the actual value of µ. RECALL: 9.78 is the margin of error.

Page 38: Chapter 06.ppt

Be Careful! The following statement is NOT true:

“The probability that µ lies between 143.22 and 162.78 is .95.”

Once you have inserted your sample results into the confidence interval formula, the word PROBABILITY can no longer be used to describe the resulting confidence interval.

Page 39: Chapter 06.ppt

Confidence Interval Estimation of the Mean (σ Unknown)

In reality, the actual standard deviation of the population, σ, is usually unknown.

Therefore, we use “s” (sample standard deviation) to compute

the confidence interval for the population mean, µ.

However, by using “s” in place of σ, the standard normal Z distribution no longer applies.

Fortunately, the t-distribution will work, provided the population we obtain the sample is normally distributed.

Page 40: Chapter 06.ppt

Assumptions necessary to use t-distribution

• Assumes random variable x is normally distributed

• However, if sample size is large enough ( > 30), t-distribution can be used when σ is unknown.

• But if sample size is small, evaluate the shape of the sample data using a histogram or stem-and-leaf.

• As the sample size increases, the t-distribution approaches the Z distribution.

Page 41: Chapter 06.ppt

Confidence Interval using a t-distribution

X-bar +/- t α,n-1 [s / √ n

α= confidence interval

n-1 = degrees of freedom

Page 42: Chapter 06.ppt

Example:

• As a consultant I have been employed to estimate the average amount of comp time accumulated per week for managers in the aerospace industry.

• I randomly sample 18 managers and measure the amount of extra time they work during a specific week and obtain the following results (in hours). Assume a 90% confidence interval.

• AEROSPACE DATA6 21 17 20 7 0 8 16 293 8 12 11 9 21 25 15 16

Page 43: Chapter 06.ppt

Solution:

To construct a 90% confidence interval to estimate the average amount of extra time per week worked by a manager in the aerospace industry, I assume that comp time is normally distributed in the population.

The sample size is 18, so df = 17.

A 90% level of confidence results in an α / 2 = .05 area in each tail.

The table t-value is t .05,17 = 1.740.

Page 44: Chapter 06.ppt

With a sample mean of 13.56 hours, and a sample standard deviation of 7.8 hours, the confidence interval is computed:

X-bar +/- t α/2, n-1 S / √ n

=13.56 +/- 1.740 ( 7.8 / √ 18) = 13.56 +/- 3.20

= 10.36 ≤ µ ≤ 16.76

Page 45: Chapter 06.ppt

Interpretation:

The point estimate for this problem is 13.56 hours, with an error of +/- 3.20 hours.

I am 90% confident that the average amount of comp time accumulated by a manager per week in this industry is between 10.36 and 16.76 hours.

Page 46: Chapter 06.ppt

Recommendations:

From these figures, the aerospace industry could attempt to build a reward system for such extra work or evaluate the regular 40-hour week to determine how to use the normal work hours more effectively and thus reduce comp time.

Page 47: Chapter 06.ppt

Solve:I own a large equipment rental company and I want to make

a quick estimate of the average number of days a piece of ditch digging equipment is rented out per person per time. The company has records of all rentals, but the amount of time required to conduct an audit of all accounts would be prohibitive.

I decide to take a random sample of rental invoices. Fourteen different rentals of ditch diggers are selected

randomly from the files.

Use the following data to construct a 99% confidence interval to estimate the average number of days that a ditch digger is rented and assume that the number of days per rental is normally distributed in the population.

Page 48: Chapter 06.ppt

Ditch Digger Data:

3 1 3 2 5 1 2 1 42 1 3 1 1

Page 49: Chapter 06.ppt

Stay-tuned

Page 50: Chapter 06.ppt

Estimating the Population Proportion

For most businesses, estimating market share (their proportion of the market) is important b/c many company decisions evolve from market share information:

• What proportion of my customers pay late?• What proportion don’t pay at all?• What proportion of the produced goods are

defective?• What proportion of the population has cats/

dogs/ horses/ kids/ exercises/ reads?

Page 51: Chapter 06.ppt

Confidence Interval Estimate for the Proportion

• ps +/- Z√ ps(1-ps) / n

• ps - Z√ps(1-ps) /n ≤ p ≤ ps + Z√ps(1-ps) /n

• ps = sample proportion = X / n = number of successes ÷ sample size. This is the POINT ESTIMATE.

• p = population proportion• Z = critical value from the standardized normal

distribution• n = sample size

Page 52: Chapter 06.ppt

ps +/- Z√ ps(1-ps) / n

NOTE: This formula can be applied only when np and n(1-p) are at least 5.

Page 53: Chapter 06.ppt

Example:

A study of 87 randomly selected companies with a telemarketing operation revealed that 39% of the sampled companies had used telemarketing to assist them in order processing.

Using this information, how could a researcher estimate the population proportion of telemarketing companies that use their telemarketing operation to assist them in order processing?

Page 54: Chapter 06.ppt

Solution:

• The sample proportion = .39.

• This is the point estimate of the population proportion, p.

• The Z value for 95% confidence is 1.96.

• The value of (1-p) = 1 - .39 = .61.

Page 55: Chapter 06.ppt

ps +/- Z√ ps(1-ps) / n

ps - Z√ps(1-ps) /n ≤ p ≤ ps + Z√ps(1-ps) /n

• The confidence interval estimate is: .39 – 1.96√(.39) (.61) / 87 ≤ p ≤ .39 + 1.96√(.39) (.61) / 87

.39 - .10 ≤ p ≤ .39 + .10

.29 ≤ p ≤ .49

Page 56: Chapter 06.ppt

Interpretation:

We are 95% confident that the population proportion of telemarketing firms that use their operation to assist order processing is somewhere between .29 and .49.

There is a point estimate of .39 with a margin of error of +/- .10.

Page 57: Chapter 06.ppt

Solve:A clothing company produces men’s jeans. The jeans are

made and sold with either a regular cut or a boot cut. In an effort to estimate the proportion of their men’s jeans

market in Oklahoma City that is for boot-cut jeans, the analyst takes a random sample of 212 jeans sales from the company’s two Oklahoma City retail outlets.

Only 34 of the sales were for boot-cut jeans. Construct a 90% confidence interval to estimate the

proportion of the population in Oklahoma City who prefer boot-cut jeans.

Page 58: Chapter 06.ppt

Solution:

ps = 34/212 = .16A point estimate for boot-cut jeans is .16 or 16%.

The Z value for 90% level of confidence is 1.645.

The confidence interval estimate is:

ps - Z√ps(1-ps) /n ≤ p ≤ ps + Z√ps(1-ps) /n.16 – 1.645√(.16) (.84) / 212 ≤ p ≤ .16 + 1.645√(.16) (.84) / 212.16 - .04 ≤ P ≤ .16 + .04.12 ≤ P ≤ .20We are 90% confident that the proportion of boot-cut jeans

is between 12 and 20 %.

Page 59: Chapter 06.ppt

Estimating Sample Size

The amount of sampling error you are willing to accept and the level of confidence desired, determines the size of your sample.

Page 60: Chapter 06.ppt

Sample size when Estimating µ

n = Z2σ2 / e2

e = Z (σ / √ n

Page 61: Chapter 06.ppt

To determine sample size:

• Know the desired confidence level, which determines the value of Z (the critical value from the standardized normal distribution. Determining the confidence level is subjective.

• Know the acceptable sampling error, e. The amount of error that can be tolerated.

• Know the standard deviation, σ. If unknown, estimate by:• past data• educated guess• estimate σ: [σ = range/4] This estimate is derived from the

empirical rule stating that approximately 95% of the values in a normal distribution are within +/- 2σ of the mean, giving a range within which most of the values are located.

Page 62: Chapter 06.ppt

Example:

Suppose the marketing manager wishes to estimate the population mean annual usage of home heating oil to within +/- 50 gallons of the true value, and he wants to be 95% confident of correctly estimating the true mean.

On the basis of a study taken the previous year, he believes that the standard deviation can be estimated as 325 gallons.

Find the sample size needed.

Page 63: Chapter 06.ppt

Solution:

• With e =50, σ = 325, and 95% confidence (Z = 1.96)

• n = Z2σ2 /e2 = (1.96)2 (325)2 / (50)2

• n = 162.31

• Therefore, n = 163. As a general rule for determining sample size, always round up to the next integer value in order to slightly over satisfy the criteria desired.

Page 64: Chapter 06.ppt

Solve:

Suppose you want to estimate the average age of all Boeing 727 airplanes now in active domestic U.S. service.

You want to be 95% confident, and you want your estimate to be within 2 years of the actual figure.

The 727 was first placed in service about 30 years ago, but you believe that no active 727s in the U.S. domestic fleet are more than 25 years old.

How large a sample should you take?

Page 65: Chapter 06.ppt

Solution:

With E = 2 years,

& Z value for 95% = 1.96,

and σ unknown,

it must be estimated by using σ ≈ range ÷ 4. As the range of ages is 0 to 25 years, σ = 25 ÷ 4 = 6.25.

Page 66: Chapter 06.ppt

n = Z2σ2 /e2

n = Z2σ2 /e2 = (1.96)2 (6.25)2 / (2)2

= 37.52 airplanes.Because you cannot sample 37.52 units, the

required sample size is 38.

If you randomly sample 38 planes, you can estimate the average age of active 727s within 2 years and be 95% confident of the results.

Page 67: Chapter 06.ppt

Solve:

Determine the sample size necessary to estimate µ when values range from 80 to 500, error is to be within 10, and the confidence level is 90 %.

n = Z2σ2 /e2

Answer: 200

Page 68: Chapter 06.ppt

Determining sample size for proportion

n = Z2p(1-p) /e2

• p = population proportion (if unknown, analysts use .5 as an estimate of p in the formula)

• e = error of estimation equal to (ps – p) the difference between the sample proportion and the parameter to be estimated, p. Represents amount of error willing to tolerate.

Page 69: Chapter 06.ppt

Solve:The Packer, a produce industry trade publication, wants to

survey Americans and ask whether they are eating more fresh fruits and vegetables than they did 1 year ago.

The organization wants to be 90% confident in its results and maintain an error within .05. How large a sample should it take?

Page 70: Chapter 06.ppt