Top Banner
Chi-Squared Hypothesis Testing Using One-Way and Two- Way Frequency Tables of Categorical Variables
22

Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Dec 16, 2015

Download

Documents

Melvyn Quinn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Chi-Squared Hypothesis Testing

Using One-Way and Two-Way Frequency Tables of

Categorical Variables

Page 2: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

2 Hypothesis Test

Goodness-of-Fit

Independence

Homogeneity

Page 3: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Analyzing an Exam Question

How does a teacher determine if students were “clueless” on an exam question vs. students were unprepared for that particular exam question?

Page 4: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Goodness-of-Fit TestIf you need to test whether populations are

distributed evenly (or “preset” proportions), then use Goodness-of-Fit test.

1. This requires a one-way frequency (count) table.

2. Random sample is required for counts.

3. Expected cell counts greater than 5.

What’s an expected cell count?

Page 5: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Expected Cell Count?

Suppose 300 students answered a multiple choice question with the following distribution. Did the students randomly select answers (I.e. are the answers equally distributed)?

The expected cell count for A is 300(1/5) = 60. As the same is true for B thru E. If we assume the answers are equally distributed (null hypothesis), then we “share” the 300 responses equally.

A B C D E

68 53 78 42 59

Page 6: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Observed vs. Expected

The observed values are the actual sampled counts (occurrences).

The expected values are the hypothesized outcomes based on the null hypothesis.

In this example, we are assuming the each answer was equally selected by students.

A B C D E

Observed 68 53 78 42 59

Expected 60 60 60 60 60

Page 7: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

2 Statistic

The computer (or calculator) will calculate the chi-squared statistic for you, and determine the degrees of freedom and p-value.

Expected

ExpectedObserved 22

What is degrees of freedom?

Page 8: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Chi-Squared Statistic and p-value

2 = 6.5, df = 4, P(2 > 6.5) = .16479

Page 9: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

2 Statistic

Ho: A = B = C = D = E

Ha: at least one is different

2 = 12.7, df = 4, P(2 > 12.7) = .0128

A B C D E

Observed 68 53 78 42 59

Expected 60 60 60 60 60

Page 10: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Goodness-of-Fit Test

What if the hypothesized proportions were not all the same?

Example:Does the color of your car influence the

chance it will be stolen? Suppose it is known that all cars in the world consist of 15% white, 30% black, 35% red, 15% blue, and 5% other colors.

Page 11: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Color of Stolen Car

Ho: W = .15, B = .30, R = .35, U = .30, E = .05

Ha: at least one is different

White Black Red Blue Other

Obsv 140 230 270 100 90

Expect 124.5 249.0 290.5 124.5 41.5

2 = 66.33, df = 4, P(2 > 66.33) = 1.3x10-13

Page 12: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Two-Way Tables

Homogeneity—tests for equal category proportions for all populations (because separate random samples were used to collect information).

Independence—tests for an independence (no association) between 2 categorical variables.

Don’t worry; same test!

Page 13: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

College Students’ Drinking Levels

The data on drinking behavior for independently chosen random samples of male and female students was collected.

Does there appear to be a gender difference with respect to drinking behavior?

Page 14: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Homogeneity TestGen der

Drinking Men Women

None 140 186

(158.6) (167.4)

Low 478 661

(554.0) (585.0)

Moderate 300 173

(230.1) (242.9)

High 63 16

(38.4) (40.6)

Page 15: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

College Students’ Drinking Levels

Ho: True proportions for the 4 drinking levels are the same for males and females.

Ha: At least one true proportion is different.

2 = 96.53, df = (4 – 1)(2 – 1) = 3P(2 > 96.53) = 8.68 x 10-21

Reject Ho; data indicates that malesand females differ with respectto drinking levels.

Page 16: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Sexual Risk-Taking Factors Among Adolescents

Each person in a random sample of sexually active teens was classified according to gender and contraceptive use.

Is there a relationship between gender and contraceptive use by sexually active teens?

Page 17: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Independent (No Association) Test

Gen der

Contraceptive Use

Female Male

Rarely/Never 210 350

(224) (336)

Sometimes/

Most Times

190 320

(204) (306)

Always 400 530

(372) (558)

Page 18: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Sexual Risk-Taking Factors Among Adolescents

Ho: Gender and contraceptive use have no association (independent).

Ha: Gender and contraceptive use have an association (dependent).

2 = 6.572, df = (3 – 1)(2 – 1) = 2P(2 > 6.572) = .035

Reject Ho and conclude there is an association between gender and contraceptive use.

Page 19: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Expected (Cell) Countfor Two-Way Tables

GrandTotal

lColumnTotaRowTotaluntExpectedCo

Page 20: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Conditions (Requirements) for 2 Test with 2-Way Tables

1) Random Sample

2) At least 80% of Expected Cell Counts are greater than 5.

3) All Expected Cell Counts and Observed values are greater than or equal to 1.

Page 21: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Titanic

Moviemakers of Titanic imply that lower-class passengers were treated unfairly.

Was that accurate?

Page 22: Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.

Likelihood of Survival on Titanic?

Ho: C = 109/1318, W = 402/1318, M = 807/1318

Ha: at least one is different

2 = 225.16, df = 2, P(2 > 225.16) = 0.000

Reject Ho and conclude at least one proportion is different.

Children Women Men

Observed 57 296 146

Expected 41.269 152.199 305.533