7/28/2019 10 Chi Sq Anova
1/35
Elementary StatisticsLarson Farber
10Chi-Square Tests and
the F-Distribution
7/28/2019 10 Chi Sq Anova
2/35
Goodness of Fit
Section 10.1
7/28/2019 10 Chi Sq Anova
3/35
The distributions are skewed right and are not
symmetric. The value of is greater than or equal to 0.
for 3 or more d.f.
Several important statistical tests use aprobabili
ty distribution known as chi square,denoted .
for 1 or 2 d.f.
00
Chi-Square Distributions
is a family of distributions. The graph of the distributiondepends on the number of degrees of freedom (number of
free choices) in a statistical experiment.
7/28/2019 10 Chi Sq Anova
4/35
A multinomial experiment is a probabilityexperiment in which there are a fixed numberof independent trials and there are more thantwo possible outcomes for each trial.
A chi-square goodness-of-fit test is used to testwhether a frequency distribution fits a specificdistribution.
The probability for each outcome is fixed.
The sum of the probabilities of all possible
outcomes is one.
Multinomial Experiments
7/28/2019 10 Chi Sq Anova
5/35
Example: A social service organization claims 50% of all
marriages are the first marriage for both bride and
groom, 12% are first for the bride only, 14% for the
groom only and 24% a remarriage for both.
First Marriage %Bride and Groom 50Bride only 12Groom only 14Neither 24
H0: The distribution of first-time marriages is 50% for
both bride and groom, 12% for the bride only, 14% for
the groom only. 24% are remarriages for both.
H1: The distribution of first-time marriages differs fromthe claimed distribution.
Chi-Square Test for Goodness-of-Fit
7/28/2019 10 Chi Sq Anova
6/35
Expected frequency, E, is the calculatedfrequency for the category using the specifieddistribution. Ei= npi
103(.50) = 51.50
103(.12) = 12.36
103(.14) = 14.42
12
%50
14
First MarriageBride and GroomBride onlyGroom only
Neither 24 103(.24) = 24.72
In a survey of 103 married couples, find the E=
expected number in each category.
Observed frequency, O, is the frequency of thecategory found in the sample.
E= np
Goodness-of-Fit Test
7/28/2019 10 Chi Sq Anova
7/35
If the observed frequencies are obtained from arandom sample and each expected frequency is atleast 5, the sampling distribution for the goodness-of-fit test is a chi-square distribution with k 1
degrees of freedom (where k = the number ofcategories).
O = observed frequency in each category
Chi-Square Test
The test statistic is:
E= expected frequency in each category
7/28/2019 10 Chi Sq Anova
8/35
H0
: The distribution of first-time marriages is 50% for both bride andgroom, 12% for the bride only, 14% for the groom only. 24% areremarriages for both.
Ha: The distribution of first-time marriages differs from the claimeddistribution.
First Marriage f
Bride and Groom 55
Bride only 12
Groom only 12
Neither 24
A social service organization claims 50% of all marriages are thefirst marriage for both bride and groom, 12% are first for the brideonly, 14% for the groom only, and 24% a remarriage for both. Theresults of a study of 103 randomly selected married couples are
listed in the table. Test the distribution claimed by the agency.
2. State the level of significance.
1. Write the null and alternative hypothesis.
Use .
7/28/2019 10 Chi Sq Anova
9/35
A chi-square distribution with 4 1 = 3 d.f.
2
(O E)2
12.25__0.12965.85640.5184
(O E)2/E
0.23790.01050.40610.02100.6755
51.5_12.3614.4224.72
103.__
E
Bride and groomBride onlyGroom onlyNeitherTotal
%
50121424
100
O
55121224
103
11.340
6. Find the test statistic.
5. Find the rejection region.
4. Find the critical value.
3. Determine the sampling distribution.
= 0.6755
7/28/2019 10 Chi Sq Anova
10/35
The test statistic 0.6755 does not fall in the rejection region,so fail to reject H0.
The distribution fits the specified distribution for first-time marriages.
11.340
7. Make your decision.
8. Interpret your decision.
7/28/2019 10 Chi Sq Anova
11/35
Independence
Section 10.2
7/28/2019 10 Chi Sq Anova
12/35
A chi-square test may be used to determine whethertwo variables (i.e., gender and job performance) are
independent. Two variables are independent if the
occurrence of one of the variables does not affect the
occurrence of the other.
The following contingency table reflects the gender
and job performance evaluation of 220 accountants.
Low Average Superior Total
Total 36 156 28 220
Male 22 81 9 112
Female 14 75 19 108
Test for Independence
7/28/2019 10 Chi Sq Anova
13/35
Expected Values
Assuming the variables are independent, then the expectedvalue of each cell is:
E1,1 = (112)(36)/220 = 18.33 E1,2 = (112)(156)/220 = 79.42
All other expected values can be found by subtractingfrom the total of the row or the column.
Low Average Superior Total
Total 36 156 28 220
Female 17.67 76.58 13.75 108
Male 18.33 79.42 14.25 112
7/28/2019 10 Chi Sq Anova
14/35
The sampling distribution is a distributionwith degrees of freedom equal to:
Sampling Distribution
(Number of rows 1) (Number of columns 1)
Example: Find the sampling distribution for a test ofindependence that has a contingency table of 4rows and 3 columns.
The sampling distribution is a distribution with( 4 1) (31) = 32 = 6 d.f.
7/28/2019 10 Chi Sq Anova
15/35
1. Write the null and alternative hypothesis.
2. State the level of significance.
H0: Gender and job performance are independent.
Ha: Gender and job performance are not independent.
Application
The following table reflects the gender and job performanceevaluation of 220 accountants. Test the claim that gender andjob performance are independent. Use .
Low Average Superior Total
Total 36 156 28 220
Male 22 81 9 112
Female 14 75 19 108
7/28/2019 10 Chi Sq Anova
16/35
Since there are 2 rows and 3 columns, the sampling
distribution is a chi-square distribution with (21)(3 1) = 2 d.f.
5. Find the rejection region.
4. Find the critical value.
5.990
3. Determine the sampling distribution.
6. Find the test statistic.
7/28/2019 10 Chi Sq Anova
17/35
Chi-Square Test
220
2281
914
7519
O
220.00
18.3379.4214.2517.67
76.5813.75
E
13.49
27.6113.49
2.5027.61
2.50
(OE)2
0.74
1.940.76
0.032.01
0.03
5.51
(OE)2/E
= 5.51
7/28/2019 10 Chi Sq Anova
18/35
7. Make your decision.
8. Interpret your decision.
The test statistic, 5.51, does not fall inthe rejection region, so fail to reject H0.
Gender and job evaluation are independentvariables. Do not hire accountants based ontheir gender, since gender does not influencejob performance levels.
05.99
7/28/2019 10 Chi Sq Anova
19/35
Comparing TwoVariances
Section 10.3
7/28/2019 10 Chi Sq Anova
20/35
0 1 2 3 4 50.00.10.20.30.4
0.50.60.70.8
To compare population variances, and , use the F-distribution.
d.f.N = 8
d.f.D = 20
Let s12 and s2
2 represent the sample variances of two differentpopulations. If both populations are normal and the populationvariances, and , are equal, then the sampling distributionis called an F-distribution. s1
2 always represents the larger of
the two variances.
Two-Sample Test for Variances
7/28/2019 10 Chi Sq Anova
21/35
To test whether variances of two normally distributedpopulations are equal, randomly select a sample fromeach population.
Let s12 and s2
2 represent the sample variances where
The test statistic is:
The sampling distribution is an Fdistribution withnumerator d.f. = n1 1 and denominator d.f. = n2 1.
F-Test for Variances
In F-tests for equal variances, only use the right tailcritical value. For a right-tailed test, use the critical valuecorresponding to the one in the table for the given .For a two-tail test, use the right-hand critical valuecorresponding to .
7/28/2019 10 Chi Sq Anova
22/35
An engineer wants to perform a t-test to see if the mean gasconsumption of Car A is lower than that for Car B. A randomsample of gas consumption of 16 Car As has a standard
deviation of 4.5. A random sample of the gas consumption of 22Car Bs has a standard deviation of 4.2. Should the engineer
use the t-test with equal variances or the one for unequalvariances? Use .
Since the sample variancefor Car A is larger than thatfor Car B, use s1
2 torepresent the samplevariance for car A.
Application
1. Write the null and alternative hypothesis.
2. State the level of significance.
7/28/2019 10 Chi Sq Anova
23/35
An Fdistribution with d.f.N = 15, d.f.D = 21
0 1 2 3 4 50.00.10.20.3
0.40.50.60.70.8
2.53
0.025
6. Find the test statistic.
5. Find the rejection region.
4. Find the critical value.
3. Determine the sampling distribution
7/28/2019 10 Chi Sq Anova
24/35
0.025
0 1 2 3 4 5
0.00.10.20.30.40.50.60.70.8
Since F= 1.148 does not fall in the rejection region, fail toreject the null hypothesis.
There is not enough evidence to reject the claim that thevariances are equal. In performing a t-test for the means ofthe two populations, use the test for equal variances.
7. Make your decision.
8. Interpret your decision.
7/28/2019 10 Chi Sq Anova
25/35
Analysis of Variance
Section 10.4
7/28/2019 10 Chi Sq Anova
26/35
One-way analysis of variance (ANOVA) is a hypothesis testingtechnique that is used to compare means from three or more
populations.
The variance is calculated in twodifferent ways and the ratio of thetwo values is formed.
1. MSB, Mean Square Between, the variance betweensamples, measures the differences related to the treatmentgiven to each sample.2. MSW, Mean Square Within, the variance within samples,measures the differences related to entries within the same
sample. The variance within samples is due to sampling error.
ANOVA
H0: (All population means are equal.)H
a
: At least one of the means is different from the others.
7/28/2019 10 Chi Sq Anova
27/35
Each group is given a different treatment. The variation fromthe grand mean (mean of all values in all groups) ismeasured. The treatment (or factor) is the variable thatdistinguishes members of one sample from another.
First calculate SSBand then divide by k 1, thedegrees of freedom. (k= the number of treatmentsor factors.)
Mean Square Between
7/28/2019 10 Chi Sq Anova
28/35
Calculate SSW
and divide by Nk, the degrees of freedom.
IfMSB is close in value to MSW, the variation is notattributed to different effects the different treatments have
on the variable. The ratio of the two measures (F-ratio) isclose to 1.
IfMSB is significantly greater than MSW, the variation isprobably due to differences in the treatments or factors, and
the F-ratio will differ significantly from 1.
Mean Square Within
7/28/2019 10 Chi Sq Anova
29/35
The table shows the annual amount spent on reading (in $) for arandom sample of American consumers from four regions. At
, can you conclude that the mean annual amounts spentare different?
West
223184221269199
171
South
10314316411999
Midwest
246169246158167
76 214
Northeast
30858
141109220
144316 108 204
1. Write the null and alternative hypothesis.
Analysis of Variance
H0: (All population means are equal.)
Ha: At least one of the means is different from the others.
7/28/2019 10 Chi Sq Anova
30/35
An F distribution with d.f.N = 3, d.f.D = 23
0 1 2 3 4 50.00.10.2
0.30.40.50.60.7
0.8
4. Find the critical value.
2.34
5. Find the rejection region.
0.10
2. State the level of significance.
3. Determine the sampling distribution.
7/28/2019 10 Chi Sq Anova
31/35
West
223184221269
199171
South
103143164119
99
Midwest
246169246158
16776 214
Northeast
30858
141109
220144316 108 204
177.00
4050.05
135.71
1741.39
210.14
1020.80
Calculate the mean and variance for each sample.
Calculate the mean of all values.
6. Find the test statistic.
7/28/2019 10 Chi Sq Anova
32/35
mean n
1 185.14 7 66.26 463.82 177.00 6 0.00 0.03 135.71 7 1704.86 11934.04 210.14 7 1098.26 7687.8
Mean Square Between
7/28/2019 10 Chi Sq Anova
33/35
n s2
1 7 9838.66 59031.9
2 6 4050.05 20250.23 7 1741.39 10448.44 7 1020.80 6124.8
7/28/2019 10 Chi Sq Anova
34/35
7. Make your decision.
8. Interpret your decision.
0.10
0 1 2 3 4 5
0.00.10.20.30.40.5
0.60.70.8
Since F= 1.669 does not fall in the rejection region, fail to
reject the null hypothesis.
There is not enough evidence to support the claim thatthe means are not all equal. Expenses for reading are
the same for all four regions.
2.53
7/28/2019 10 Chi Sq Anova
35/35
One-way Analysis of Variance
Analysis of VarianceSource DF SS MS F PFactor 3 20085 6695 1.61 0.215
Error 23 95857 4168Total 26 15942
Using the P-value method, fail to reject the null
hypothesis, since 0.215 > 0.10. There is not enoughevidence to support that the amount spent on reading isdifferent in different regions.
Minitab Output