Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample,…

Uncertainty and confidence

Although the sample mean, , is a unique number for any particular sample, if you pick a different sample you will probably get a different sample mean.

In fact, you could get many different values for the sample mean, and virtually none of them would actually equal the true population mean, .

x

But the sample distribution is narrower than the population distribution,

by a factor of √n.

Thus, the estimates

gained from our samples

are always relatively

close to the population

parameter µ.

n

Sample means,n subjects

nσ

€

σ

Population, xindividual subjects€

x

€

x

If the population is normally distributed N(µ,σ), so will the sampling distribution of Xbar be N(µ,σ/√n). But recall the Central Limit Theorem, even in the cases when the population in not normally distributed, for large n, we’ll have ~normality for Xbar.

• So we’ll use this information to make inferences; i.e., draw conclusions about populations from data in our samples

• We'll consider two types: – Confidence interval estimation– Tests of significance

• In both of these cases, we'll consider our data as either being a random sample from a population or as data from a randomized experiment

• Start with estimation… there are two situations we'll consider– estimating the mean of a population of measurements– estimating the proportion p of Ss in a population of Ss

and Fs

• In either case, we'll construct a confidence interval of the form estimate +/- M.O.E., where M.O.E. = margin of error of the estimator.

• The MOE gives information on how good the estimate is through the variation in the estimator (its standard error) and through the level of confidence in the confidence interval (through a tabulated value).

• The standard error of an estimator is its estimated standard deviation (treating the estimator as a statistic with a sampling distribution…)

• Best estimator of is and we know from the previous chapter that is approximately

• Best estimator of p is phat and we know from the last chapter that phat is approx.

€

X

€

X

€

N(μ,σ / n )

€

N(p, p(1− p)n

)

Red dot: mean valueof individual sample

95% of all sample means will be within roughly 2 standard deviations (2*σ/√n) of the population parameter

This implies that the population parameter must be within roughly 2 standard deviations from the sample average , in 95% of all samples.

This reasoning is the essence of statistical estimation.

€

σ n

€

x

• So using this fact, we construct a 95% confidence interval for as

Xbar +/- 1.96(σ/sqrt(n)) 1.96(σ/sqrt(n)) is the M.O.E.NOTE: MOE = (number from a table)*(Std. Error)• A most important figure in this context is the one given

on the previous slide (Figure 6.3; eBook 6.1, 3/7)… Be sure you can give a full explanation of what this figure is showing you about confidence intervals

• In general, we can construct a confidence interval for with any level of confidence C as

Xbar +/- z* (σ/sqrt(n)) where z* is the appropriate number from Table A giving confidence C; or you may use the last row of Table D.

Xbar +/- 1.96(σ/sqrt(n)) is the 95% CI for … if the MOE is too large for our purposes, we may want to increase n… in fact, we can set the MOE equal to any number we'd like and solve for n… do the algebra to get n = (1.96 σ/MOE)2 .

Substitute z* for 1.96 to get n for other levels of confidence…

Read the cautions on page 366-367 (eBook 6.1, 7/7). These are very important and we must pay careful attention to them…

A significance test or hypothesis test is a procedure for comparing our data with a

hypothesis whose truth we want to assess. The hypothesis is usually a statement about the

parameters of a population from which we’ve taken our data. The results of the test are given in a

probability statement that measures how well our data and the hypothesis agree with each other…Go over Examples 6.8-6.14 (eBook, section 6.2,

1/8 through 5/8) in detail to see the logic in hypothesis testing…

The general format of a hypothesis test is:STATE THE HYPOTHESESGIVE THE TEST STATISTIC YOU'LL USE IN THE TESTCALCULATE ITS VALUE FOR OUR DATA AND ASSESS

HOW LIKELY IT IS TO HAVE OCCURRED, ASSUMING THE HYPOTHESIS IS TRUE

STATE THE CONCLUSION IN THE CONTEXT OF THE PROBLEM YOU’RE WORKING…

The hypothesis you assume to be true, the one you are comparing against your data is called the null hypothesis and is usually labeled by the symbol H0. The test is designed to assess the strength of evidence against the null hypothesis. Many times the null hypothesis is a statement of “no effect” or of “no difference”… more later on this

• The null hypothesis in Ex. 6.8 is H0:There is no difference in the true mean debts and the alternative hypothesis is Ha:The true mean debts are not the same

• NOTE: these hypotheses always refer to a population parameter or a model, not to a particular outcome (“true mean”). The above alternative is two-sided since we don’t really know whether one mean is larger or smaller than the other...

• The test of hypothesis is based on a test statistic that is a good estimator of the parameter in the null hypothesis; e.g., Xbar estimates , phat estimates p, the difference in sample means estimates the difference in true means, etc. If the test statistic is “far away” from the value of the parameter specified in the null hypothesis, this gives evidence against H0 – the alternative hypothesis determines which direction we should be looking for the evidence, larger or smaller (or both if two-sided). “far away” is in terms of the s.d. of the estimator...

• Whether the test statistic is “likely” or “unlikely” to occur assuming the null hypothesis is true, is determined by computing the p-value of the test; i.e., the probability, assuming the null hypothesis is true, that the TS would take on a value as extreme or more extreme than the one actually observed. See Figures 6.7 and 6.8 on p. 374 (eBook section 6.2, 1/8)

Figure 6.7 Figure 6.8

• Now compare this P-value with the significance level, called , the probability that we regard as decisive, usually picked as .05, but could be different. If our P-value is <= , then we reject the null hypothesis and say that our data are statistically significant at level .

• Then we usually summarize our conclusion in a sentence or two that tells what our test has found…

• The box on p. 383 (6.2,6/8) summarizes the z-test for .

• Go over Example 6.15, p.383-384 (eBook, 6.2, 6/8), to see how a p-value is computed when the alternative hypothesis is two-sided. NOTE: Double the area you find in one tail… see Figure 6.11 below.

z = 1.78

There is an important relationship between confidence intervals and two-sided hypothesis tests: A level two-sided significance test rejects H0: = 0 exactly when the hypothesized value 0 falls outside a level

(1- confidence interval for . In other words, if we can’t say that the hypothesized value of mu, is in our confidence interval, then we would reject a two-sided hypothesis about . That is, values not in our confidence interval would seem to be not compatible with our data…i.e., they would be rejected by our data…

Ex: Your sample gives a 99% confidence interval of With 99% confidence, could samples be from populations with µ = 0.86? µ =

0.85?

0101.084.0 ±=±MOEx

99% C.I.

Logic of confidence interval test

€

x

Cannot rejectH0: = 0.85

Reject H0 : = 0.86

A confidence interval gives a black and white answer: Reject or don't reject H0.

But it also estimates a range of likely values for the true population mean µ.

A P-value quantifies how strong the evidence is against the H0. But if you reject

H0, it doesn’t provide any information about the true population mean µ.

Use and Abuse of Hypothesis Tests…Null hypothesis asserts “no effect”, “no difference” while

the alternative is a research hypothesis asserting that the effect is present or there is a difference…

The P-value gives a way of measuring the amount of evidence provided by the data against H0 . “There is no sharp border between “significant” and “not significant”, only increasingly strong evidence as the P-value decreases” See R.A. Fisher’s opinion on choosing the level of significance for a test at the top of page 396…

Statistical significance is not the same as practical significance. Don’t forget to explore your data thoroughly before doing hypothesis testing…

• Don’t ignore lack of significance – believing an effect is present and not finding it could be important.• Badly designed surveys and experiments cannot be improved by hypothesis testing…• “The reasoning behind statistical significance works well if you decide what effect you’re seeking, design an experiment or sample to search for it, and use a test of significance to weigh the evidence you get” . But be careful about “searching for significance”… see page 399, example 6.28… many tests run at once on the same data will likely turn up some significant results by chance even if all the null hypotheses are true!

• HW: Read section 6.1• Do # 6.1-6.7, 6.10-6.12, 6.17, 6.18, 6.23-

6.26, 6.35

• HW: Read section 6.2 – take some time to understand the logic of hypothesis testing

• pay particular attention to the p-value• Do #6.37-6.50, 6.53, 6.55-6.65, 6.68-6.71,

6.77-6.79

Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample,…

Documents