1 ES9 Chapter 9 ~ Inferences Involving One Population 0 t Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25
Mar 30, 2015
1
ES9 Chapter 9 ~ Inferences Involving One Population
0 t
Student’s t, df = 5
Student’s t, df = 15
Student’s t, df = 25
2
ES9
Chapter Goals
• Learned about confidence intervals and hypothesis testing
• Assumed was known
• Consider both types of inference about when is unknown
• Consider both types of inference about p, the binomial probability of success
• Consider the inference of hypothesis testing of , the population standard deviation
3
ES9
• Inferences about are based on the sample mean x
• If the sample size is large or the sample population is normal:
has a standard normal distribution
)//()(* nxz
9.1 ~ Inference About mean ( unknown)
• If is unknown, use s as a point estimate for
ns /• Estimated standard error of the mean:
)//()( nsxt • Test statistic:
4
ES9
1. When s is used as an estimate for , the test statistic has two sources of variation: sx and
4. The population standard deviation, , is almost never known in real-world problems
The standard error will almost always be estimated using
Almost all real-world inference about the population mean will be completed using the Student’s t-statistic
ns
Student’s t-Statistic
3. Assumption: samples are taken from normal populations
2. The resulting test statistic:
nsx
t Known as the Student’s t-statistic
5
ES9
Properties of the t-Distribution (df>2)
3. t is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom
)1df(
1. t is distributed with a mean of 0
2. t is distributed symmetrically about its mean
4. The t-distribution approaches the normal distribution as the number of degrees of freedom increases
5. t is distributed with a variance greater than 1, but as the degrees of freedom increase, the variance approaches 1
6. t is distributed so as to be less peaked at the mean and thicker at the tails than the normal distribution
6
ES9
0 t
Student’s t, df = 5
Student’s t, df = 15
Normal distribution
Degrees of Freedom, df: A parameter that identifies each different distribution of Student’s t-distribution. For the methods presented in this chapter, the value of df will be the sample size minus 1, df = n 1.
Student’s t-Distributions
7
ES9
1. The number of degrees of freedom associated with s2 is the divisor (n 1) used to define the sample variance s2
Thus: df = n 1
Notes
2. The number of degrees of freedom is the number of unrelated deviations available for use in estimating 2
3. Table for Student’s t-distribution (Table 6 in Appendix B) is a table of critical values. Left column = df. When df > 100, critical values of the t-distribution are the same as the corresponding critical values of the standard normal distribution.
4. Notation: t(df, )
Read as: t of df,
8
ES9
t0 )df, (t
t-Distribution Showing t(df, )
9
ES9
Example
Example: Find the value of t(12, 0.025)
Amount of in one-tail
df 0.025
12 2.18
Portion ofTable 6
...
. . . . . .
0- t (12, 0.025) t18.218.2
025.0025.0
t (12, 0.025)
10
ES9
1. If the df is not listed in the left-hand column of Table 6, use the next smaller value of df that is listed
Notes
t
cumulative probability
2. Most computer software packages will calculate either the area related to a specified t-value or the t-value that bounds a specified area
3. The cumulative distribution function (CDF) is often used to find
area from to t
4. If the area from to t is known and the value of t is wanted, then the inverse cumulative distribution function (INVCDF) is used
11
ES9
Confidence Interval Procedure:
1. Procedure for constructing confidence intervals similar to that used when is known
The assumption for inferences about mean when is unknown: The sampled population is normally distributed
The Assumption...
2. Use t in place of z, use s in place of
3. The formula for the confidence interval for is:
1df wheret(df, to nn
sx
ns
x t(df,
12
ES9
Example: A study is conducted to learn how long it takes the typical tax payer to complete their federal income tax return. A random sample of 17 income tax filers showed a mean time (in hours) of 7.8 and a standard deviation of 2.3. Find a 95% confidence interval for the true mean time required to complete a federal income tax return. Assume the time to complete the return is normally distributed.
Solution:
1. Parameter of InterestThe mean time required to complete a federal income tax return
Example
2. Confidence Interval Criteria
a. Assumptions: Sampled population assumed normal, unknown
b. Test statistic: t will be used
c. Confidence level: = 0.95
13
ES9
3.2 and ,8.7 ,17 sxn3. The Sample Evidence:
Solution Continued
5. The Results:
6.62 to 8.98 is the 95% confidence interval for
4. The Confidence Interval
a. Confidence coefficients:
b. Maximum error:
c. Confidence limits:
18.1)5578.0)(12.2(17
3.2)12.2(
n
sE
98.8 to62.6
18.18.7 to18.18.7
to
ExEx
t(df, /2) = t(16, 0.025) = 2.12
t(16, 0.025)
14
ES9
1. The t-statistic is used to complete a hypothesis test about a population mean
Hypothesis-Testing Procedure
1df with * nns
xt
2. The test statistic:
x3. The calculated t is the number of estimated standard errors
is from the hypothesized mean
4. Probability-Value or Classical Approach
15
ES9
Example: A random sample of 25 students registering for classes showed the mean waiting time in the registration line was 22.6 minutes and the standard deviation was 8.0 minutes. Is there any evidence to support the student newspaper’s claim that registration time takes longer than 20 minutes? Use
= 0.05 and assume waiting time is approximately normal.
Solution:
1. The Set-up
a. Population parameter of concern: the mean waiting time spent in the registration line
b. State the null and alternative hypotheses:
Ho: = 20 () (no longer than)
Ha: > 20 (longer than)
Example
16
ES9
2. The Hypothesis Test Criteria
a. Check the assumptions:The sampled population is approximately normal
b. Test statistic: t* with df = n - 1 = 24
c. Level of significance: = 0.05
Solution Continued
8 and ,6.22 ,25 sxn
625.16.16.2
258206.22
* ns
xt
3. The Sample Evidence
a. Sample information:
b. Calculate the value of the test statistic:
17
ES9
Using the p-Value Procedure:
4. The Probability Distribution
a. The p-value:
)24df with ,625.1*(P tP
Solution Continued
Notes: If this hypothesis test is done with the aid of a computer, most likely the computer will compute the p-value for you
Using Table 6: place bounds on the p-value
Using Table 7: read the p-value directly from the table for many situations:
Using Table 6: 0.05 < P < 0.10
Using Table 7:
b. The p-value is not smaller than the level of significance,
061.0P
18
ES9
Using the Classical Procedure:
4. The Probability Distribution
a. The critical value: t(24, 0.05) = 1.71
b. t* is not in the critical region
5. The Results
a. Decision: Fail to reject Ho
b. Conclusion: There is insufficient evidence to show the mean waiting time is greater than 20 minutes at the 0.05 level of significance
Solution Continued
19
ES9
Example: A new study indicates that higher than normal (220) cholesterol levels are a good indicator of possible heart attacks. A random sample of 27 heart attack victims showed a mean cholesterol level of 231 and a standard deviation of 20. Is there any evidence to suggest the mean cholesterol level is higher than normal for heart attack victims?
Use = 0.01
Solution:
1. The Set-up
a. Population parameter of concern: The mean cholesterol level of heart attack victims
b. State the the null and alternative hypothesis:
Ho: = 220 () (mean is not greater than 220)Ha: > 220 (mean is greater than 220)
Example
20
ES9
2. The Hypothesis Test Criteria
a. Assumptions: We will assume cholesterol level is at least approximately normal.
b. Test statistic: t* ( unknown), df = n 1 = 26
c. Level of significance: = 0.01 (given)
Solution Continued
3. The Sample Evidence
a. Sample information:
b. Calculate the value of the test statistic:
20 and ,231 ,27 sxn
858.2849.311
2720220231
* ns
xt
21
ES9
4. The Probability Distribution
a. The critical value: t(26, 0.01) = 2.48
Solution Continued
5. The Results
a. Decision: Reject H0
b. Conclusion: At the 0.01 level of significance, there is sufficient evidence to suggest the mean cholesterol level in heart attack victims is higher than normal
0 t
01.0
48.2t(26, 0.01)
*
b. t* falls in the critical region
22
ES9 9.2 ~ Inferences About the
Binomial Probability of Success
• Possibly the most common inference of all
• Many examples of situations in which we are concerned about something either happening or not happening
• Two possible outcomes, and multiple independent trials
23
ES9
1. p: the binomial parameter, the probability of success on asingle trial
Background
2. : the observed or sample binomial probability'p
nx
p ' x represents the number of successes that occur in a sample consisting of n trials
pqnpqnp 1 where, , 3. For the binomial random variable x:
4. The distribution of x is approximately normal if n is larger than 20 and if np and nq are both larger than 5
24
ES9
1. a mean equal to p,
2. a standard error equal to , and
3. an approximately normal distribution if n is sufficiently large
'p
'p npq /)(
Sampling Distribution of p': If a sample of size n is randomly selected from a large population with p = P(success), then the sampling distribution of p' has:
Sampling Distribution of p'
In practice, use of the following guidelines will ensure normality:
1. The sample size is greater than 20
2. The sample consists of less than 10% of the population
3. The products np and nq are both larger than 5
25
ES9
The assumptions for inferences about the binomial parameter p: The n random observations forming the sample are selected independently from a population that is not changing during the sampling
The Assumptions...
Confidence Interval Procedure:
The unbiased sample statistic p' is used to estimate the population proportion p
'1' and /' where pqnxp
The formula for the confidence interval for p is:
nqp
pnqp
z(p''
' to''
' z(
26
ES9
Example: A recent survey of 300 randomly selected fourth graders showed 210 participate in at least one organized sport during one calendar year. Find a 95% confidence interval for the proportion of fourth graders who participate in an organized sport during the year.
Solution:
1. Describe the population parameter of concern
The parameter of interest is the proportion of fourth graders who participate in an organized sport during the year
Example
2. Specify the confidence interval criteria
a. Check the assumptionsThe sample was randomly selectedEach subject’s response was independent
27
ES9
590)300/90(300'
5210)300/210(300'
20300
nq
np
n
b. Identify the probability distribution
z is the test statistic
p' is approximately normal
c. Determine the level of confidence: 0.95
Solution Continued
70.0300/210/' nxp
3. Collect and present sample evidence
Sample information: n = 300, and x = 210
The point estimate:
28
ES9
4. Determine the confidence interval
a. Determine the confidence coefficients:Using Table 4, Appendix B: z( /2) = z(0.025) = 1.96
Solution Continued
7519.0 to6481.0
0519.070.0 to0519.070.0
' to'
EpEp
c. Find the lower and upper confidence limits:
0519.0)0265.0)(96.1(0007.0)96.1(
300)30.0)(70.0(
96.1''
nqp
E
b. The maximum error of estimate:
z( /2)
29
ES9
d. The Results
0.6481 to 0.7519 is a 95% confidence interval for the true proportion of fourth graders who participate in an organized sport during the year
Solution Continued
E: maximum error of estimate
: confidence level
p*: provisional value of p (q* = 1 p*)
If no provisional values for p and q are given use p* = q* = 0.5
(Always round up)
2
2 **][
E
qpn
Sample Size Determination:z( /2)
30
ES9
Example: Determine the sample size necessary to estimate the true proportion of laboratory mice with a certain genetic defect. We would like the estimate to be within 0.015 with 95% confidence.
Example
Solution:
1. Level of confidence: = 0.95, z(/2) = z(0.025) = 1.96
2. Desired maximum error is E = 0.015.
3. No estimate of p given, use p* = q* = 0.5
4. Use the formula for n:
4269 44.4268000225.09604.0
)015.0(
)5.0()5.0()96.1(**][2
2
2
2
n
E
qpn
z( /2)
31
ES9
211 210.75 000225.00474.0
)015.0(
)9875.0()0125.0()96.1(**][2
2
2
2
n
E
qpn
Note: Suppose we know the genetic defect occurs in approximately 1 of 80 animals
Use p* = 1/80 = 0.125:
Note
As illustrated here, it is an advantage to have some indication of the value expected for p, especially as p becomes increasingly further from 0.5
z( /2)
32
ES9
nx
p
npq
ppz ' where
'*
Hypothesis-Testing Procedure: For hypothesis tests concerning the binomial parameter p, use the test statistic z*:
Example: (Probability-Value Approach) A hospital administrator believes that at least 75% of all adults have a routine physical once every two years. A random sample of 250 adults showed 172 had physicals within the last two years. Is there any evidence to refute the administrator's claim? Use = 0.05
Hypothesis-Testing Procedure
33
ES9
Solution1. The Set-up
a. Population parameter of concern: the proportion of adults who have a physical every two years
b. State the null and alternative hypotheses:Ho: p = 0.75 (>)Ha: p < 0.75
2. The Hypothesis Test Criteria
a. Assumptions: 250 adults independently surveyed
b. Test statistic: z*n = 250np = (250)(0.75) = 187.5 > 5nq = (250)(0.25) = 62.5 > 5
c. Level of significance: = 0.05
34
ES9
688.0250/172' and ,172 ,250 pxn
26.202738.0
062.000075.0
062.0250
)25.0)(75.0(75.0688.0'
*
npq
ppz
Solution Continued3. The Sample Evidence
a. Sample information:
b. The test statistic:
4. The Probability Distribution
a. The p-value:
Use Table 3 Appendix B, Table 5 Appendix B, or use a computer
35
ES9
0119.04881.05000.0)26.2( zPP
026.2 z
p-value
Solution Continued
b. The p-value is smaller than the level of significance, 5. The Results
a. Decision: Reject Ho
b. Conclusion: There is evidence to suggest the proportion of adults who have a routine physical exam every two years is less than 0.75 at the 0.05 level of significance
36
ES9
Example: (Classical Procedure) A university bookstore employee in charge of ordering texts believes 65% of all students sell their statistics books back to the bookstore at the end of the class. To test this claim, 200 statistics students are selected at random and 141 plan to sell their texts back to the bookstore. Is there any evidence to suggest the proportion is different from 0.65? Use = 0.01
65.0p
Solution:
1. The Set-up
a. Population parameter of concern: p = the proportion of students who sell their statistics books back to the bookstore
b. The null and alternative hypotheses:
Ho: p = 0.65
Ha:
Example
37
ES9
Solution Continued2. The Hypothesis Test Criteria
a. Assumptions: Sample randomly selected. Each subject’s responsewas independent of other responses.
b. Test statistic: z*n = 200np = (200)(0.65) = 130 > 5 ; nq = (200)(0.35) = 70 > 5
c. Level of significance: = 0.01
705.0200/141' ,141 ,200 pxn
63.103373.0055.0
0011375.0055.0
200)35.0)(65.0(
65.0705.0'*
npq
ppz
3. The Sample Evidence
a.
b. Calculate the value of the test statistic:
38
ES9
4. The Probability Distributiona. The critical value: z(0.005) = 2.58
Solution Continued
5. The Resultsa. Decision: Do not reject Ho
b. Conclusion: There is no evidence to suggest the true proportion of students who sell their statistics texts back to the bookstore is different from 0.65 at the 0.05 level of significance
058.2 58.2 z
005.0 005.0
b. z* is not in the critical region
*
39
ES9
Notes
1. There is a relationship between confidence intervals and two-tailed hypothesis tests when the level of confidence and the level of significance add up to 1
2. The confidence interval and the width of the noncritical region are the same
3. The point estimate is the center of the confidence interval, and the hypothesized mean is the center of the noncritical region
4. If the hypothesized value of p is contained in the confidence interval, then the test statistic will be in the noncritical region
5. If the hypothesized value of p does not fall within the confidence interval, then the test statistic will be in the critical region
40
ES9
9.3 ~ Inferences About Variance & Standard Deviation
• Problems often arise that require us to make inferences about variability
• Usually use variance to make inferences about variability
• Inferences about the variance of a normally distributed population use the chi-square, 2, distribution
41
ES9
Background
1. The chi-square distributions are a family of probability distributions
Properties of the Chi-Square Distribution:
1. 2 is nonnegative in value; it is zero or positively valued
2. Each distribution is identified by a parameter called the number of degrees of freedom
2. 2 is not symmetrical; it is skewed to the right
3. 2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom
42
ES9
0 2
2, df = 6
2, df = 10
2, df = 20
Various Chi-Square Distributions
43
ES9
Critical Values for Chi-Square Distribution
1. Table 8 in Appendix B
2. 2(df, )
The symbol used to identify the critical value of a chi-square distribution with df degrees of freedom
It denotes a point on the measurement axis so that there is of the area to the right of that point
3. Since the chi-square distribution is not symmetrical, the critical values associated with right and left tails are given separately in Table 8
44
ES9
0 22(df, )
2 Distribution Showing 2(df,)
45
ES9
Example: Find the value of 2(18, 0.025)
Example
Area in Right-hand Taildf 0.025
18 31.5
Portion of Table 8
...
. . . . . .
0 2(18, 0.025) 2
31.5
0.025
46
ES9
0 2(12, 0.99) 2
3.57
0.01 0.99
Example: Find the value of 2(12, 0.99)
Example
Portion of Table 8
Area to the Right0.99
Area in Left-hand Taildf 0.010
12 3.57
...
. . . . . .
. . . . . .
47
ES9
Notes
1. When df > 2, the mean value of the chi-square distribution is df
2. The mean is located to the right of the mode (the value where the curve reaches it high point) and just to the right of the median (the value that splits the distribution, 50% on either side)
mode xx~0 2
Illustration:
48
ES9
The Assumptions...
The assumptions for inferences about the variance 2 or standard deviation The sample population is normally distributed
Hypothesis-Testing Procedure:
1. The statistical procedures for standard deviation are very sensitive to nonnormal distributions (skewness, in particular). This makes it difficult to decide if a significant result is due to sample evidence or a violation of assumptions.
2. The test statistic:
If the random sample is drawn from a normal population with known variance 2, then 2* has a chi-square distribution with n 1 degrees of freedom
2
22 )1(*
sn
49
ES9
Example: A machine used to fill 5-gallon buckets of driveway sealer has a standard deviation of 2.5 ounces. A random sample of 24 buckets showed a standard deviation of 2.9 ounces. Is there any evidence to suggest an increase in variability at the 0.05 level of significance? Assume the amount of driveway
sealer in a bucket is normally distributed.
Solution:
1. The Set-up
a. Population parameter of concern: the variance 2 for the amount of driveway sealer in a 5-gallon bucket
b. State the null and alternative hypotheses:
Ho: 2 = 6.25 () (variance is not larger than 6.25)
Ha: 2 > 6.25 (variance is larger than 6.25)
Example
50
ES9
2. The Hypothesis Test Criteria
a. Check the assumptions: The sample population is assumed to be normally distributed
b. Test statistic: 2* with df = n 1 = 24 1 = 23
c. Level of significance: = 0.05
Solution Continued
95.3025.6
)41.8)(124()1(*
2
22
sn
3. The Sample Evidence
a. Sample information: n = 24, 2 = (2.9)2 = 8.41
b. Calculate the value of the test statistic:
51
ES9
4. The Probability Distribution (p-Value Approach)
a. The p-value: Use Table 8 Appendix B, or use a computer 0.10 < P < 0.25
b. The p-value is larger than the level of significance,
Solution Continued
5. The Results
a. Decision: Fail to reject H0
b. Conclusion: There is insufficient evidence to show the variability has increased at the 0.05 level of significance
4. The Probability Distribution (Classical Approach)
a. The critical value: 2(23, 0.05) = 35.2
b. 2* is not is the critical region
~ or ~
52
ES9
Notes
1. Use Table 8 in Appendix B to place bounds on the p-value
2. A calculator or computer may be used to find the exact p-value. Use the 2 probability or the 2 cumulative probability distribution commands.
3. If the sample data is skewed, only one outlier can greatly influence the standard deviation. Therefore, it is very important, especially in small samples, that the sampled population be normal. Otherwise, the statistical procedure is not reliable.