10 Chi Sq Anova

7/28/2019 10 Chi Sq Anova

1/35

Elementary StatisticsLarson Farber

10Chi-Square Tests and

the F-Distribution


2/35

Goodness of Fit

Section 10.1


3/35

The distributions are skewed right and are not

symmetric. The value of is greater than or equal to 0.

for 3 or more d.f.

Several important statistical tests use aprobabili

ty distribution known as chi square,denoted .

for 1 or 2 d.f.

00

Chi-Square Distributions

is a family of distributions. The graph of the distributiondepends on the number of degrees of freedom (number of

free choices) in a statistical experiment.


4/35

A multinomial experiment is a probabilityexperiment in which there are a fixed numberof independent trials and there are more thantwo possible outcomes for each trial.

A chi-square goodness-of-fit test is used to testwhether a frequency distribution fits a specificdistribution.

The probability for each outcome is fixed.

The sum of the probabilities of all possible

outcomes is one.

Multinomial Experiments


5/35

Example: A social service organization claims 50% of all

marriages are the first marriage for both bride and

groom, 12% are first for the bride only, 14% for the

groom only and 24% a remarriage for both.

First Marriage %Bride and Groom 50Bride only 12Groom only 14Neither 24

H0: The distribution of first-time marriages is 50% for

both bride and groom, 12% for the bride only, 14% for

the groom only. 24% are remarriages for both.

H1: The distribution of first-time marriages differs fromthe claimed distribution.

Chi-Square Test for Goodness-of-Fit


6/35

Expected frequency, E, is the calculatedfrequency for the category using the specifieddistribution. Ei= npi

103(.50) = 51.50

103(.12) = 12.36

103(.14) = 14.42

12

%50

14

First MarriageBride and GroomBride onlyGroom only

Neither 24 103(.24) = 24.72

In a survey of 103 married couples, find the E=

expected number in each category.

Observed frequency, O, is the frequency of thecategory found in the sample.

E= np

Goodness-of-Fit Test


7/35

If the observed frequencies are obtained from arandom sample and each expected frequency is atleast 5, the sampling distribution for the goodness-of-fit test is a chi-square distribution with k 1

degrees of freedom (where k = the number ofcategories).

O = observed frequency in each category

Chi-Square Test

The test statistic is:

E= expected frequency in each category


8/35

H0

: The distribution of first-time marriages is 50% for both bride andgroom, 12% for the bride only, 14% for the groom only. 24% areremarriages for both.

Ha: The distribution of first-time marriages differs from the claimeddistribution.

First Marriage f

Bride and Groom 55

Bride only 12

Groom only 12

Neither 24

A social service organization claims 50% of all marriages are thefirst marriage for both bride and groom, 12% are first for the brideonly, 14% for the groom only, and 24% a remarriage for both. Theresults of a study of 103 randomly selected married couples are

listed in the table. Test the distribution claimed by the agency.

2. State the level of significance.

1. Write the null and alternative hypothesis.

Use .


9/35

A chi-square distribution with 4 1 = 3 d.f.

2

(O E)2

12.25__0.12965.85640.5184

(O E)2/E

0.23790.01050.40610.02100.6755

51.5_12.3614.4224.72

103.__

E

Bride and groomBride onlyGroom onlyNeitherTotal

%

50121424

100

O

55121224

103

11.340

6. Find the test statistic.

5. Find the rejection region.

4. Find the critical value.

3. Determine the sampling distribution.

= 0.6755


10/35

The test statistic 0.6755 does not fall in the rejection region,so fail to reject H0.

The distribution fits the specified distribution for first-time marriages.

11.340

7. Make your decision.

8. Interpret your decision.


11/35

Independence

Section 10.2


12/35

A chi-square test may be used to determine whethertwo variables (i.e., gender and job performance) are

independent. Two variables are independent if the

occurrence of one of the variables does not affect the

occurrence of the other.

The following contingency table reflects the gender

and job performance evaluation of 220 accountants.

Low Average Superior Total

Total 36 156 28 220

Male 22 81 9 112

Female 14 75 19 108

Test for Independence


13/35

Expected Values

Assuming the variables are independent, then the expectedvalue of each cell is:

E1,1 = (112)(36)/220 = 18.33 E1,2 = (112)(156)/220 = 79.42

All other expected values can be found by subtractingfrom the total of the row or the column.


Total 36 156 28 220

Female 17.67 76.58 13.75 108

Male 18.33 79.42 14.25 112


14/35

The sampling distribution is a distributionwith degrees of freedom equal to:

Sampling Distribution

(Number of rows 1) (Number of columns 1)

Example: Find the sampling distribution for a test ofindependence that has a contingency table of 4rows and 3 columns.

The sampling distribution is a distribution with( 4 1) (31) = 32 = 6 d.f.


15/35



H0: Gender and job performance are independent.

Ha: Gender and job performance are not independent.

Application

The following table reflects the gender and job performanceevaluation of 220 accountants. Test the claim that gender andjob performance are independent. Use .


Total 36 156 28 220

Male 22 81 9 112

Female 14 75 19 108


16/35

Since there are 2 rows and 3 columns, the sampling

distribution is a chi-square distribution with (21)(3 1) = 2 d.f.



5.990




17/35

Chi-Square Test

220

2281

914

7519

O

220.00

18.3379.4214.2517.67

76.5813.75

E

13.49

27.6113.49

2.5027.61

2.50

(OE)2

0.74

1.940.76

0.032.01

0.03

5.51

(OE)2/E

= 5.51


18/35



The test statistic, 5.51, does not fall inthe rejection region, so fail to reject H0.

Gender and job evaluation are independentvariables. Do not hire accountants based ontheir gender, since gender does not influencejob performance levels.

05.99


19/35

Comparing TwoVariances

Section 10.3


20/35

0 1 2 3 4 50.00.10.20.30.4

0.50.60.70.8

To compare population variances, and , use the F-distribution.

d.f.N = 8

d.f.D = 20

Let s12 and s2

2 represent the sample variances of two differentpopulations. If both populations are normal and the populationvariances, and , are equal, then the sampling distributionis called an F-distribution. s1

2 always represents the larger of

the two variances.

Two-Sample Test for Variances


21/35

To test whether variances of two normally distributedpopulations are equal, randomly select a sample fromeach population.

Let s12 and s2

2 represent the sample variances where

The test statistic is:

The sampling distribution is an Fdistribution withnumerator d.f. = n1 1 and denominator d.f. = n2 1.

F-Test for Variances

In F-tests for equal variances, only use the right tailcritical value. For a right-tailed test, use the critical valuecorresponding to the one in the table for the given .For a two-tail test, use the right-hand critical valuecorresponding to .


22/35

An engineer wants to perform a t-test to see if the mean gasconsumption of Car A is lower than that for Car B. A randomsample of gas consumption of 16 Car As has a standard

deviation of 4.5. A random sample of the gas consumption of 22Car Bs has a standard deviation of 4.2. Should the engineer

use the t-test with equal variances or the one for unequalvariances? Use .

Since the sample variancefor Car A is larger than thatfor Car B, use s1

2 torepresent the samplevariance for car A.

Application




23/35

An Fdistribution with d.f.N = 15, d.f.D = 21

0 1 2 3 4 50.00.10.20.3

0.40.50.60.70.8

2.53

0.025




3. Determine the sampling distribution


24/35

0.025

0 1 2 3 4 5

0.00.10.20.30.40.50.60.70.8

Since F= 1.148 does not fall in the rejection region, fail toreject the null hypothesis.

There is not enough evidence to reject the claim that thevariances are equal. In performing a t-test for the means ofthe two populations, use the test for equal variances.




25/35

Analysis of Variance

Section 10.4


26/35

One-way analysis of variance (ANOVA) is a hypothesis testingtechnique that is used to compare means from three or more

populations.

The variance is calculated in twodifferent ways and the ratio of thetwo values is formed.

1. MSB, Mean Square Between, the variance betweensamples, measures the differences related to the treatmentgiven to each sample.2. MSW, Mean Square Within, the variance within samples,measures the differences related to entries within the same

sample. The variance within samples is due to sampling error.

ANOVA

H0: (All population means are equal.)H

a

: At least one of the means is different from the others.


27/35

Each group is given a different treatment. The variation fromthe grand mean (mean of all values in all groups) ismeasured. The treatment (or factor) is the variable thatdistinguishes members of one sample from another.

First calculate SSBand then divide by k 1, thedegrees of freedom. (k= the number of treatmentsor factors.)

Mean Square Between


28/35

Calculate SSW

and divide by Nk, the degrees of freedom.

IfMSB is close in value to MSW, the variation is notattributed to different effects the different treatments have

on the variable. The ratio of the two measures (F-ratio) isclose to 1.

IfMSB is significantly greater than MSW, the variation isprobably due to differences in the treatments or factors, and

the F-ratio will differ significantly from 1.

Mean Square Within


29/35

The table shows the annual amount spent on reading (in $) for arandom sample of American consumers from four regions. At

, can you conclude that the mean annual amounts spentare different?

West

223184221269199

171

South

10314316411999

Midwest

246169246158167

76 214

Northeast

30858

141109220

144316 108 204


Analysis of Variance

H0: (All population means are equal.)

Ha: At least one of the means is different from the others.


30/35

An F distribution with d.f.N = 3, d.f.D = 23

0 1 2 3 4 50.00.10.2

0.30.40.50.60.7

0.8


2.34


0.10




31/35

West

223184221269

199171

South

103143164119

99

Midwest

246169246158

16776 214

Northeast

30858

141109

220144316 108 204

177.00

4050.05

135.71

1741.39

210.14

1020.80

Calculate the mean and variance for each sample.

Calculate the mean of all values.



32/35

mean n

1 185.14 7 66.26 463.82 177.00 6 0.00 0.03 135.71 7 1704.86 11934.04 210.14 7 1098.26 7687.8

Mean Square Between


33/35

n s2

1 7 9838.66 59031.9

2 6 4050.05 20250.23 7 1741.39 10448.44 7 1020.80 6124.8


34/35



0.10

0 1 2 3 4 5

0.00.10.20.30.40.5

0.60.70.8

Since F= 1.669 does not fall in the rejection region, fail to

reject the null hypothesis.

There is not enough evidence to support the claim thatthe means are not all equal. Expenses for reading are

the same for all four regions.

2.53


35/35

One-way Analysis of Variance

Analysis of VarianceSource DF SS MS F PFactor 3 20085 6695 1.61 0.215

Error 23 95857 4168Total 26 15942

Using the P-value method, fail to reject the null

hypothesis, since 0.215 > 0.10. There is not enoughevidence to support that the amount spent on reading isdifferent in different regions.

Minitab Output

10 Chi Sq Anova

Documents