Top Banner

of 35

10 Chi Sq Anova

Apr 03, 2018

Download

Documents

La Je
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/28/2019 10 Chi Sq Anova

    1/35

    Elementary StatisticsLarson Farber

    10Chi-Square Tests and

    the F-Distribution

  • 7/28/2019 10 Chi Sq Anova

    2/35

    Goodness of Fit

    Section 10.1

  • 7/28/2019 10 Chi Sq Anova

    3/35

    The distributions are skewed right and are not

    symmetric. The value of is greater than or equal to 0.

    for 3 or more d.f.

    Several important statistical tests use aprobabili

    ty distribution known as chi square,denoted .

    for 1 or 2 d.f.

    00

    Chi-Square Distributions

    is a family of distributions. The graph of the distributiondepends on the number of degrees of freedom (number of

    free choices) in a statistical experiment.

  • 7/28/2019 10 Chi Sq Anova

    4/35

    A multinomial experiment is a probabilityexperiment in which there are a fixed numberof independent trials and there are more thantwo possible outcomes for each trial.

    A chi-square goodness-of-fit test is used to testwhether a frequency distribution fits a specificdistribution.

    The probability for each outcome is fixed.

    The sum of the probabilities of all possible

    outcomes is one.

    Multinomial Experiments

  • 7/28/2019 10 Chi Sq Anova

    5/35

    Example: A social service organization claims 50% of all

    marriages are the first marriage for both bride and

    groom, 12% are first for the bride only, 14% for the

    groom only and 24% a remarriage for both.

    First Marriage %Bride and Groom 50Bride only 12Groom only 14Neither 24

    H0: The distribution of first-time marriages is 50% for

    both bride and groom, 12% for the bride only, 14% for

    the groom only. 24% are remarriages for both.

    H1: The distribution of first-time marriages differs fromthe claimed distribution.

    Chi-Square Test for Goodness-of-Fit

  • 7/28/2019 10 Chi Sq Anova

    6/35

    Expected frequency, E, is the calculatedfrequency for the category using the specifieddistribution. Ei= npi

    103(.50) = 51.50

    103(.12) = 12.36

    103(.14) = 14.42

    12

    %50

    14

    First MarriageBride and GroomBride onlyGroom only

    Neither 24 103(.24) = 24.72

    In a survey of 103 married couples, find the E=

    expected number in each category.

    Observed frequency, O, is the frequency of thecategory found in the sample.

    E= np

    Goodness-of-Fit Test

  • 7/28/2019 10 Chi Sq Anova

    7/35

    If the observed frequencies are obtained from arandom sample and each expected frequency is atleast 5, the sampling distribution for the goodness-of-fit test is a chi-square distribution with k 1

    degrees of freedom (where k = the number ofcategories).

    O = observed frequency in each category

    Chi-Square Test

    The test statistic is:

    E= expected frequency in each category

  • 7/28/2019 10 Chi Sq Anova

    8/35

    H0

    : The distribution of first-time marriages is 50% for both bride andgroom, 12% for the bride only, 14% for the groom only. 24% areremarriages for both.

    Ha: The distribution of first-time marriages differs from the claimeddistribution.

    First Marriage f

    Bride and Groom 55

    Bride only 12

    Groom only 12

    Neither 24

    A social service organization claims 50% of all marriages are thefirst marriage for both bride and groom, 12% are first for the brideonly, 14% for the groom only, and 24% a remarriage for both. Theresults of a study of 103 randomly selected married couples are

    listed in the table. Test the distribution claimed by the agency.

    2. State the level of significance.

    1. Write the null and alternative hypothesis.

    Use .

  • 7/28/2019 10 Chi Sq Anova

    9/35

    A chi-square distribution with 4 1 = 3 d.f.

    2

    (O E)2

    12.25__0.12965.85640.5184

    (O E)2/E

    0.23790.01050.40610.02100.6755

    51.5_12.3614.4224.72

    103.__

    E

    Bride and groomBride onlyGroom onlyNeitherTotal

    %

    50121424

    100

    O

    55121224

    103

    11.340

    6. Find the test statistic.

    5. Find the rejection region.

    4. Find the critical value.

    3. Determine the sampling distribution.

    = 0.6755

  • 7/28/2019 10 Chi Sq Anova

    10/35

    The test statistic 0.6755 does not fall in the rejection region,so fail to reject H0.

    The distribution fits the specified distribution for first-time marriages.

    11.340

    7. Make your decision.

    8. Interpret your decision.

  • 7/28/2019 10 Chi Sq Anova

    11/35

    Independence

    Section 10.2

  • 7/28/2019 10 Chi Sq Anova

    12/35

    A chi-square test may be used to determine whethertwo variables (i.e., gender and job performance) are

    independent. Two variables are independent if the

    occurrence of one of the variables does not affect the

    occurrence of the other.

    The following contingency table reflects the gender

    and job performance evaluation of 220 accountants.

    Low Average Superior Total

    Total 36 156 28 220

    Male 22 81 9 112

    Female 14 75 19 108

    Test for Independence

  • 7/28/2019 10 Chi Sq Anova

    13/35

    Expected Values

    Assuming the variables are independent, then the expectedvalue of each cell is:

    E1,1 = (112)(36)/220 = 18.33 E1,2 = (112)(156)/220 = 79.42

    All other expected values can be found by subtractingfrom the total of the row or the column.

    Low Average Superior Total

    Total 36 156 28 220

    Female 17.67 76.58 13.75 108

    Male 18.33 79.42 14.25 112

  • 7/28/2019 10 Chi Sq Anova

    14/35

    The sampling distribution is a distributionwith degrees of freedom equal to:

    Sampling Distribution

    (Number of rows 1) (Number of columns 1)

    Example: Find the sampling distribution for a test ofindependence that has a contingency table of 4rows and 3 columns.

    The sampling distribution is a distribution with( 4 1) (31) = 32 = 6 d.f.

  • 7/28/2019 10 Chi Sq Anova

    15/35

    1. Write the null and alternative hypothesis.

    2. State the level of significance.

    H0: Gender and job performance are independent.

    Ha: Gender and job performance are not independent.

    Application

    The following table reflects the gender and job performanceevaluation of 220 accountants. Test the claim that gender andjob performance are independent. Use .

    Low Average Superior Total

    Total 36 156 28 220

    Male 22 81 9 112

    Female 14 75 19 108

  • 7/28/2019 10 Chi Sq Anova

    16/35

    Since there are 2 rows and 3 columns, the sampling

    distribution is a chi-square distribution with (21)(3 1) = 2 d.f.

    5. Find the rejection region.

    4. Find the critical value.

    5.990

    3. Determine the sampling distribution.

    6. Find the test statistic.

  • 7/28/2019 10 Chi Sq Anova

    17/35

    Chi-Square Test

    220

    2281

    914

    7519

    O

    220.00

    18.3379.4214.2517.67

    76.5813.75

    E

    13.49

    27.6113.49

    2.5027.61

    2.50

    (OE)2

    0.74

    1.940.76

    0.032.01

    0.03

    5.51

    (OE)2/E

    = 5.51

  • 7/28/2019 10 Chi Sq Anova

    18/35

    7. Make your decision.

    8. Interpret your decision.

    The test statistic, 5.51, does not fall inthe rejection region, so fail to reject H0.

    Gender and job evaluation are independentvariables. Do not hire accountants based ontheir gender, since gender does not influencejob performance levels.

    05.99

  • 7/28/2019 10 Chi Sq Anova

    19/35

    Comparing TwoVariances

    Section 10.3

  • 7/28/2019 10 Chi Sq Anova

    20/35

    0 1 2 3 4 50.00.10.20.30.4

    0.50.60.70.8

    To compare population variances, and , use the F-distribution.

    d.f.N = 8

    d.f.D = 20

    Let s12 and s2

    2 represent the sample variances of two differentpopulations. If both populations are normal and the populationvariances, and , are equal, then the sampling distributionis called an F-distribution. s1

    2 always represents the larger of

    the two variances.

    Two-Sample Test for Variances

  • 7/28/2019 10 Chi Sq Anova

    21/35

    To test whether variances of two normally distributedpopulations are equal, randomly select a sample fromeach population.

    Let s12 and s2

    2 represent the sample variances where

    The test statistic is:

    The sampling distribution is an Fdistribution withnumerator d.f. = n1 1 and denominator d.f. = n2 1.

    F-Test for Variances

    In F-tests for equal variances, only use the right tailcritical value. For a right-tailed test, use the critical valuecorresponding to the one in the table for the given .For a two-tail test, use the right-hand critical valuecorresponding to .

  • 7/28/2019 10 Chi Sq Anova

    22/35

    An engineer wants to perform a t-test to see if the mean gasconsumption of Car A is lower than that for Car B. A randomsample of gas consumption of 16 Car As has a standard

    deviation of 4.5. A random sample of the gas consumption of 22Car Bs has a standard deviation of 4.2. Should the engineer

    use the t-test with equal variances or the one for unequalvariances? Use .

    Since the sample variancefor Car A is larger than thatfor Car B, use s1

    2 torepresent the samplevariance for car A.

    Application

    1. Write the null and alternative hypothesis.

    2. State the level of significance.

  • 7/28/2019 10 Chi Sq Anova

    23/35

    An Fdistribution with d.f.N = 15, d.f.D = 21

    0 1 2 3 4 50.00.10.20.3

    0.40.50.60.70.8

    2.53

    0.025

    6. Find the test statistic.

    5. Find the rejection region.

    4. Find the critical value.

    3. Determine the sampling distribution

  • 7/28/2019 10 Chi Sq Anova

    24/35

    0.025

    0 1 2 3 4 5

    0.00.10.20.30.40.50.60.70.8

    Since F= 1.148 does not fall in the rejection region, fail toreject the null hypothesis.

    There is not enough evidence to reject the claim that thevariances are equal. In performing a t-test for the means ofthe two populations, use the test for equal variances.

    7. Make your decision.

    8. Interpret your decision.

  • 7/28/2019 10 Chi Sq Anova

    25/35

    Analysis of Variance

    Section 10.4

  • 7/28/2019 10 Chi Sq Anova

    26/35

    One-way analysis of variance (ANOVA) is a hypothesis testingtechnique that is used to compare means from three or more

    populations.

    The variance is calculated in twodifferent ways and the ratio of thetwo values is formed.

    1. MSB, Mean Square Between, the variance betweensamples, measures the differences related to the treatmentgiven to each sample.2. MSW, Mean Square Within, the variance within samples,measures the differences related to entries within the same

    sample. The variance within samples is due to sampling error.

    ANOVA

    H0: (All population means are equal.)H

    a

    : At least one of the means is different from the others.

  • 7/28/2019 10 Chi Sq Anova

    27/35

    Each group is given a different treatment. The variation fromthe grand mean (mean of all values in all groups) ismeasured. The treatment (or factor) is the variable thatdistinguishes members of one sample from another.

    First calculate SSBand then divide by k 1, thedegrees of freedom. (k= the number of treatmentsor factors.)

    Mean Square Between

  • 7/28/2019 10 Chi Sq Anova

    28/35

    Calculate SSW

    and divide by Nk, the degrees of freedom.

    IfMSB is close in value to MSW, the variation is notattributed to different effects the different treatments have

    on the variable. The ratio of the two measures (F-ratio) isclose to 1.

    IfMSB is significantly greater than MSW, the variation isprobably due to differences in the treatments or factors, and

    the F-ratio will differ significantly from 1.

    Mean Square Within

  • 7/28/2019 10 Chi Sq Anova

    29/35

    The table shows the annual amount spent on reading (in $) for arandom sample of American consumers from four regions. At

    , can you conclude that the mean annual amounts spentare different?

    West

    223184221269199

    171

    South

    10314316411999

    Midwest

    246169246158167

    76 214

    Northeast

    30858

    141109220

    144316 108 204

    1. Write the null and alternative hypothesis.

    Analysis of Variance

    H0: (All population means are equal.)

    Ha: At least one of the means is different from the others.

  • 7/28/2019 10 Chi Sq Anova

    30/35

    An F distribution with d.f.N = 3, d.f.D = 23

    0 1 2 3 4 50.00.10.2

    0.30.40.50.60.7

    0.8

    4. Find the critical value.

    2.34

    5. Find the rejection region.

    0.10

    2. State the level of significance.

    3. Determine the sampling distribution.

  • 7/28/2019 10 Chi Sq Anova

    31/35

    West

    223184221269

    199171

    South

    103143164119

    99

    Midwest

    246169246158

    16776 214

    Northeast

    30858

    141109

    220144316 108 204

    177.00

    4050.05

    135.71

    1741.39

    210.14

    1020.80

    Calculate the mean and variance for each sample.

    Calculate the mean of all values.

    6. Find the test statistic.

  • 7/28/2019 10 Chi Sq Anova

    32/35

    mean n

    1 185.14 7 66.26 463.82 177.00 6 0.00 0.03 135.71 7 1704.86 11934.04 210.14 7 1098.26 7687.8

    Mean Square Between

  • 7/28/2019 10 Chi Sq Anova

    33/35

    n s2

    1 7 9838.66 59031.9

    2 6 4050.05 20250.23 7 1741.39 10448.44 7 1020.80 6124.8

  • 7/28/2019 10 Chi Sq Anova

    34/35

    7. Make your decision.

    8. Interpret your decision.

    0.10

    0 1 2 3 4 5

    0.00.10.20.30.40.5

    0.60.70.8

    Since F= 1.669 does not fall in the rejection region, fail to

    reject the null hypothesis.

    There is not enough evidence to support the claim thatthe means are not all equal. Expenses for reading are

    the same for all four regions.

    2.53

  • 7/28/2019 10 Chi Sq Anova

    35/35

    One-way Analysis of Variance

    Analysis of VarianceSource DF SS MS F PFactor 3 20085 6695 1.61 0.215

    Error 23 95857 4168Total 26 15942

    Using the P-value method, fail to reject the null

    hypothesis, since 0.215 > 0.10. There is not enoughevidence to support that the amount spent on reading isdifferent in different regions.

    Minitab Output