Top Banner
Confidence Interval Estimation For single Unknown Population Mean Session-01 An estimator (a sample statistic like sample mean, sample variance etc), is a random variable with a certain probability distribution —its sampling distribution (like sampling distribution for sample mean and sampling distribution for sample variance). Point Estimation A given point estimate is a single realization of the random variable. The actual estimate may or may not be close to the parameter of interest. Therefore, if we only provide a point estimate of the parameter of interest, we are not giving any information about the accuracy of the estimation procedure. For example, saying that the sample mean is 550 is giving a point estimate of the population mean. This estimate does not tell us how close may be to its estimate, 550. Interval Estimation: Suppose, on the other hand, that we also said: “We are 99% confident that is in the interval [449, 551].” This conveys much more information about the possible value o . Now compare this interval with another one: “We are 90% confident that is in the interval [400, 700].” This interval conveys less information about the possible value of , both because it is wider and because the level of confidence is lower. (When based on the same information, however, an interval of lower confidence level is narrower.) Another possible example is GPA of students of this class. If N=35 and n=5 then we can prepare the sampling distribution for sample mean. Then we can find a 90%, 95%, 99% confidence intervals about the population mean GPA of the whole class. Confidence Interval: A confidence interval is a range of numbers believed to include an unknown population parameter . Associated with the
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 17. C.I. Estimation for Single Population

Confidence Interval Estimation For single Unknown Population MeanSession-01

An estimator (a sample statistic like sample mean, sample variance etc), is a random variable with a certain probability distribution—its sampling distribution (like sampling distribution for sample mean and sampling distribution for sample variance).

Point EstimationA given point estimate is a single realization of the random variable. The actual estimate may or may not be close to the parameter of interest. Therefore, if we only provide a point estimate of the parameter of interest, we are not giving any information about the accuracy of the estimation procedure. For example, saying that the sample mean is 550 is giving a point estimate of the population mean. This estimate does not tell us how close may be to its estimate, 550.

Interval Estimation:Suppose, on the other hand, that we also said: “We are 99% confident that is in the interval [449, 551].” This conveys much more information about the possible value o . Now compare this interval with another one: “We are 90% confident that is in the interval [400, 700].” This interval conveys less information about the possible value of , both because it is wider and because the level of confidence is lower. (When based on the same information, however, an interval of lower confidence level is narrower.)

Another possible example is GPA of students of this class. If N=35 and n=5 then we can prepare the sampling distribution for sample mean. Then we can find a 90%, 95%, 99% confidence intervals about the population mean GPA of the whole class.

Confidence Interval:A confidence interval is a range of numbers believed to include an unknown population parameter

. Associated with the interval is a measure of the confidence (1- ) we have that the interval does indeed contain the parameter of interest.

An interval estimate of a population parameter is an interval of the form , where

and depends on the value of the statistic for a particular sample and also on the sampling distribution of .

Thus a random sample of SAT verbal scores for students of the entering freshman class might produce an interval from 530 to 550 within which we expect to find the true average of all SAT verbal scores for the: fresh man class.

The interval computed from the selected sample, is then called a 100(1 —a)% confidence interval, the fraction 1 — a is called the confidence coefficient or the degree of confidence, and the endpoints are called the lower and upper confidence limits.

Page 2: 17. C.I. Estimation for Single Population

Thus, when a = 0.05, we have a 95% confidence interval, and when α = 0.01 we obtain a wider 99% confidence interval. The wider the confidence interval is, the more confident we can be that the given interval contains the unknown parameter. Of course, it is better to be 95% confident that the average life of a certain television transistor is between 6 and 7 years than to be 99% confident that it is between 3 and 10 years. Ideally, we prefer a short interval with a high degree of confidence. Sometimes, restrictions on the size of our sample prevent us from achieving short intervals without sacrificing some of our degree of confidence.

Case#01 Confidence Interval for the Population Mean When Population Standard Deviation Is Known

Let us now consider the interval estimate of . If our sample is selected from a normal population or, failing this, if n is sufficiently large, wc can establish a confidence interval for by considering the sampling distribution of X

According to the central limit theorem, we can expect the sampling distribution of X to be

approximately normally distributed with mean= and standard deviation = . Writing the z-value

above which we find an area of a/2, we can see from figure that:

Where

After simplification, we can get

Page 3: 17. C.I. Estimation for Single Population

In short

Is the confidence interval for the unknown population parameter

Notes When sampling is from the same population, using a fixed sample size, the higher the

confidence level, the wider the interval.

When sampling is from the same population, using a fixed confidence level, the larger the sample size n, the narrower the confidence interval.

QuestionsQ.1. Suppose that you computed a 95% confidence interval for a population mean. The user of

the statistics claims your interval is too wide to have any meaning in the specific use for which it is intended. Discuss and compare two methods of solving this problem.

Q.2. A real estate agent needs to estimate the average value of a residential property of a given size in a certain area. The real estate agent believes that the standard deviation of the property values is $5,500.00 and that property values are approximately normally distributed. A random sample of 16 units gives a sample mean of $89,673.12. Give a 95% confidence interval for the average value of all properties of this kind.

Q.3. A car manufacturer wants to estimate the average miles-per-gallon highway rating for a new model. From experience with similar models, the manufacturer believes the miles-per-gallon standard deviation is 4.6. A random sample of 100 highway runs of the new model yields a sample mean of 32 miles per gallon. Give a 95% confidence interval for the population average miles-per-gallon highway rating.

Q.4. A mining company needs to estimate the average amount of copper ore per ton mined. A random sample of 50 tons gives a sample mean of 146.75 pounds. The population standard deviation is assumed to be 35.2 pounds. Give a 95% confidence interval for the average amount of copper in the “population” of tons mined. Also give a 90% confidence interval and a 99% confidence interval for the average amount of copper per ton.

Page 4: 17. C.I. Estimation for Single Population

Case 02: Confidence Intervals for When Is Unknown- The t Distribution

In real sampling situations, however, the population standard deviation is rarely known. The reason for this is that both and are population parameters. When we sample from a population with the aim of estimating its unknown mean ( ), the other parameter of the same population, the standard deviation, is highly unlikely to be known.

QuestionQ.5. A stock market analyst wants to estimate the average return on a certain stock. A random

sample of 15 days yields an average (annualized) return of 10.37% and a standard deviation of 3.5%. Assuming a normal population of returns, give a 95% confidence interval for the average return on this stock.

Note:Whenever is not known (and the population is assumed normal), the correct distribution to use is the t distribution with n-1 degrees of freedom. Note, however, that for large degrees of freedom, the t distribution is approximated well by the Z distribution.

QuestionsQ.6. A telephone company wants to estimate the average length of long-distance calls during

weekends. A random sample of 50 calls gives a mean 14.5 minutes and standard deviation s= 5.6 minutes. Give a 95% confidence interval and a 90% confidence interval for the average length of a long-distance phone call during weekends.

Q.7. An insurance company handling malpractice cases is interested in estimating the average amount of claims against physicians of a certain specialty. The company obtains a random sample of 165 claims and finds mean= $16,530 and s =$5,542. Give a 95% confidence interval and a 99% confidence interval for the average amount of a claim.

Page 5: 17. C.I. Estimation for Single Population

Q.8. The manufacturer of batteries used in small electric appliances wants to estimate the average life of a battery. A random sample of 12 batteries yields mean=34.2 hours and s = 5.9 hours. Give a 95% confidence interval for the average life of a battery.

Q.9. The following is a random sample of the wealth, in billions of U.S. dollars, of individuals listed on the Forbes “Billionaires” list for 2007.8 2.1, 5.8, 7.3, 33.0, 2.0, 8.4, 11.0, 18.4, 4.3, 4.5, 6.0, 13.3, 12.8, 3.6, 2.4, 1.0 Construct a 90% confidence interval for the average wealth in $ billions for the people on the Forbes list.

Page 6: 17. C.I. Estimation for Single Population

Session-02

Two Samples: Estimating the Difference between Two Means

If we have two populations with means and and variances and , respectively, a point

estimator of the difference between and is given by the statistic . Therefore, to obtain a point estimate of , we shall select two independent random samples, one: from each population, of size n1 and n2, and compute the difference , of the sample means. Clearly, we must consider the sampling distributions of .

According to Central Limit theorem, we can expect the sampling distribution of to be

approximately normally distributed with mean and standard deviation

- Therefore, we can assert with a probability of that the standard normal

variable

will fall between and . Therefore,

After some steps, we will obtain the 100( )% Confidence Interval for the difference of two unknown population parameters .

In short,

The degree of confidence is exact when samples are selected from normal populations. For non-normal populations the central limit theorem allows for a good approximation for reasonable size samples.

Page 7: 17. C.I. Estimation for Single Population

Q.1. An experiment was conducted in which two types of engines, A and B, were compared. Gas mileage, in miles per gallon, was measured. Fifty experiments were conducted using engine type A and 75 experiments were done for engine type B. The gasoline used and other conditions were held constant. The average gas mileage for engine A was 36 miles per gallon and the average for machine B was 42 miles per gallon. Find a 96% confidence interval on

, where and are population mean gas mileage for machines A and B, respectively. Assume that the population standard deviations are 6 and 8 for machines A and B, respectively.

Case#02: When population variances are unknown but samples sizes are large (n1, n2 >=30)

Q.2.Two kinds of thread are being compared for strength. Fifty pieces of each type of thread are tested under similar conditions. Brand A had an average: tensile strength of 78.3 kilograms with a standard deviation of 5.6 kilograms, while brand B had an average tensile strength of 87.2 kilograms with a standard deviation of 6.3 kilograms. Construct a 90%/ 95%, 99% confidence interval for the difference of the population means.

Case#03: When population variances are unknown (but assumed equal) and samples sizes are small (n1, n2 < 30)

Q.3. Students may choose between a 3-semester-hour course in physics without labs and a 4-semester-hour course with labs. The final written examination is the same for each section. If 12 students in the section with labs made an average

Page 8: 17. C.I. Estimation for Single Population

examination grade of 84 with a standard deviation of 4, and 18 students in the section without labs made an average grade of 77 with a standard deviation of 6, find a 99% confidence interval for the difference between the average grades for the two courses. Assume the populations to be approximately normally distributed with equal variances.

Q.4. Two different brands of latex paint are being considered for use. Drying time in hours is being measured on specimen samples of the use of the two paints. Fifteen specimens for each were selected and the drying times are as follows:

Assume the drying time is normally distributed with . Find a 95% confidence

interval on where and are mean drying times.

Page 9: 17. C.I. Estimation for Single Population

Session-03

Case#04: When population variances are unknown AND assumed unequal and samples sizes are small (n1,n2 <30)

Q.5. A taxi company is trying to decide: whether to purchase brand A or brand B tires for its fleet of taxis. To estimate the difference in the two brands, an experiment is conducted using 12 of each brand. The tires are run until they wear out. The results areBrand A: mean=36300 kilometers, S.D.= 5000 kilometersBrand B: mean=38100 kilometers, S.D.= 6100 kilometersCompute a 95% confidence interval for assuming the populations to be approximately normally distributed. You may not assume that the variances are equal.

Case#05: Paired Observations

At this point we shall consider estimation procedures for the difference of two means when the samples are not independent and the variances of the two populations are not necessarily equal. The situation considered here deals with a very special experimental situation, namely that of paired observations. Unlike the situation described earlier, the conditions of the two populations are not assigned randomly to experimental units. Rather, each homogeneous experimental unit receives both population conditions; as a result, each experimental unit has a pair of observations, one for each population. For example, if we run a test on a new diet using 15 individuals, the weight before and after going on the diet form the information for our two samples. These two populations are '"before" and "after" and the experimental unit is the individual. Obviously, the observations in a pair have something in common.

Page 10: 17. C.I. Estimation for Single Population

To determine if the diet is effective, we consider the differences d1,d2, ... ,dn in the paired observations. These differences are the values of a random sample D1, D2, ... , Dn from a population of differences that we shall assume to be normally distributed with mean μD = μ1 - μ2 and variance σ2

D. We estimate σ2D, by s2

D, the variance of the differences that constitute our sample. The point estimator of μD is given by D.

Page 11: 17. C.I. Estimation for Single Population

Session-04

Large-Sample Confidence Intervals for the Population Proportion p

Sometimes interest centers on a qualitative, rather than a quantitative, variable. We may be interested in the relative frequency of occurrence of some characteristic in a population. For example, we may be interested in the proportion of people in a population who are users of some product or the proportion of defective items produced by a machine. In such cases, we want to estimate the population proportion p.

A point estimator of the proportion p in a binomial experiment is given by the statistic ,

where X represents the number of successes in n trials. Therefore, the sample proportion will

be used as the point estimate of the parameter.

NOTE: If the unknown proportion p is not expected to be too close to 0 or 1, we can establish a

confidence interval for p by considering the sampling distribution of P. Recall our large-sample rule of thumb: For estimating p, a sample is considered large

enough when both np and nq are greater than 5. (We guess the value of p when determining whether the sample is large enough. As a check, we may also compute and once the sample is obtained.)

Designating a failure in each binomial trial by the value 0 and a success by the value 1, the number of successes, x, can be interpreted as the sum of n values consisting only of zeros and ones, and is just the sample mean of these it values. Hence, by the central limit theorem, for n sufficiently large, P is approximately normally distributed with mean and variancec

and

And

Therefore

Becomes

Page 12: 17. C.I. Estimation for Single Population

After few steps

NOTE

Example

Example

IMPOTANT NOTE:When n is small and the unknown proportion p is believed to be close to 0 or close to 1, the confidence-interval procedure established here is unreliable and, therefore, should not be used. To be on the safe side, one should require both or to be greater than or equal to 5.

Page 13: 17. C.I. Estimation for Single Population

Question : In a random sample of n = 500 families owning television sets in the city of Hamilton, Canada, it is found that x = 340 subscribed to HBO. Find a 95% confidence interval for the actual proportion of families in this city who subscribe to HBO.Solution: The point estimate of p is = 340/500 = 0.68. Using Table A.3, we find that z0.025 = -1.96. Therefore, the 95% confidence interval for p is

which simplifies to 0.64 < p < 0.72.

Questions

Q.10.A maker of portable exercise equipment, designed for health-conscious people who travel too frequently to use a regular athletic club, wants to estimate the proportion of traveling business people who may be interested in the product. A random sample of 120 traveling business people indicates that 28 may be interested in purchasing the portable fitness equipment. Give a 95% confidence interval for the proportion of all traveling business people who may be interested in the product.

Page 14: 17. C.I. Estimation for Single Population

Q.11. According to The Economist, 55% of all French people of voting age are opposed to the proposed European constitution. Assume that this percentage is based on a Random sample of 800 French people. Give a 95% confidence interval for the population proportion in France that was against the European constitution.

Q.12. Before launching its Buyers’ Assurance Program, American Express wanted to estimate the proportion of cardholders who would be interested in this automatic insurance coverage plan. A random sample of 250 American Express cardholders was selected and sent questionnaires. The results were that 121 people in the sample expressed interest in the plan. Give a 99% confidence interval for the proportion of all interested American Express cardholders.

Example: How large a sample is required if we want to be 95% confident that our estimate of p is within 0.02? Where . (Answer n=2090)Therefore, if we base our estimate of p on a random sample of size 2090, wc can be95% confident that our sample proportion will not differ from the true proportionby more than 0.02.

p_cap q_cap p_cap * q_cap0 1 0

0.1 0.9 0.090.2 0.8 0.160.3 0.7 0.210.4 0.6 0.240.5 0.5 0.25=1/40.6 0.4 0.240.7 0.3 0.210.8 0.2 0.160.9 0.1 0.091 0 0

Notice

Page 15: 17. C.I. Estimation for Single Population

Occasionally, it will be impractical to obtain an estimate of p to be used for determining the sample size for a specified degree of confidence. If this happens, an upper bound for n is established by noting that , which must be at most equal to 1/4, since must lie between 0 and 1. A large sample size will increase the degree of confidence. Therefore, in place of above formula, we use the following formula for determining the sample size.

Example: How large a sample is required if we want to be at least 95% confident that our estimate of p is within 0.02?SolutionWe shall now assume that no preliminary sample has been taken to provide an estimate of p. Consequently, we can be at least 95% confident that our sample proportion will not differ from the true proportion by more than 0.02 if we choose a sample of size

Page 16: 17. C.I. Estimation for Single Population

Session-05Two Samples: Estimating the Difference betweenTwo Proportions

A confidence interval for the difference of two population proportions (p1 - p2) can be established by considering the sampling distribution of . As we know that and are each approximately normally distributed with means p1 and p2 and variances p1q1/n1 and p2q2/n2, respectively.

By choosing independent samples from the two populations, the variables and will be

independent, and then by the reproductive property of the normal distribution, we conclude that -

is approximately normally distributed with mean

And variance

Therefore, we can assert that

Large-Sample Confidence Interval for p1-p2

If p1 and p2 are the proportions of successes in random samples of size n1 and n2 and respectively, with q1 = 1- p1, and q2 = 1 — p2, an approximate 100(1 — )% confidence interval for the difference of two binomial parameters (p1 - p2).

Example: A certain change in a. process for manufacture of component parts is being considered. Samples are taken using both the existing and the new procedure so as to determine if the new process results in an improvement. If 75 of 1500 items from the existing procedure were found to be defective and 80 of 2000 items from the new procedure were found

Page 17: 17. C.I. Estimation for Single Population

to be defective, find a 90% confidence interval for the true difference in the fraction of defectives between the existing and the new process.NoteUp to this point all confidence intervals presented were of the form

point estimate ± K s.e.(point estimate),

where K is a constant (cither t or normal percent point). This is the case when the parameter is a mean, difference between means, proportion, or difference between proportions, due to the symmetry of the t and Z distributions.

Single Sample: Estimating the Variance

If a sample size of n is drawn from a normal population with variance σ2, and the sample variance s2 is computed, we obtain a value of the statistic S2. This computed sample variance will be used as a point estimate of σ2. Hence the statistic S2 is called an estimator of σ2.An interval estimate of σ2 can be established by using the statistic

the statistic X2 has a chi-squared distribution with (n-1) degrees of freedom when samples arc chosen from a normal population. We may write

For our particular random sample of size n, the .sample variance s2 is computed, and the following 100(1 — a)% confidence interval for σ2 is obtained.

Page 18: 17. C.I. Estimation for Single Population

Example: The following are the weights, in decagrams, of 10 packages of grass seed distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2, and 46.0. Find a 95% confidence interval for the variance of all such packages of grass seed distributed by this company, assuming a normal population. (Answer: 0.135 < σ2 < 0.953).

Two Samples: Estimating the Ratio of Two Variances

Page 19: 17. C.I. Estimation for Single Population

If and are the variances of independent samples of size n1 and n2, respectively, from normal populations, then a

100(1- a)% confidence interval for is

Example: A confidence interval for the difference in the mean orthophosphorus contents, measured in milligrams per liter, at two stations on the James River was constructed in Example 9.1 I on page 293 by assuming the normal population variance to be unequal. Justify this assumption by constructing a 98% confidence interval for a\/a2 and for a\/a2, where a\ and a2 are the variances of the populations of orthophosphorus contents at station I and station 2, respectively,

Page 20: 17. C.I. Estimation for Single Population