Top Banner
Chi Squared Tests
22

Chi Squared Tests

Mar 21, 2016

Download

Documents

kyros

Chi Squared Tests. Introduction. Two statistical techniques are presented. Both are used to analyze nominal data. A goodness-of-fit test for a multinomial experiment. A contingency table test of independence. The test statistics in both cases follow the c 2 distribution. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chi Squared Tests

Chi Squared Tests

Page 2: Chi Squared Tests

Introduction

• Two statistical techniques are presented. Both are used to analyze nominal data.– A goodness-of-fit test for a multinomial experiment.– A contingency table test of independence.

• The test statistics in both cases follow the 2

distribution.

Page 3: Chi Squared Tests

• The hypothesis tested involves the “success” probabilities p1, p2, …, pk.of a multinomial distribution.

• The multinomial experiment is an extension of the binomial experiment.– There are n independent trials.– The outcome of each trial can be classified into one of k

categories, called cells.– The probability pi for an outcome to fall into cell i remains

constant for each trial. By assumption, p1 + p2 + … +pk = 1.

– Trials in the experiment are independent.

Chi-Squared Goodness-of-Fit Test

Page 4: Chi Squared Tests

• Our objective is to find out whether there is sufficient evidence to reject a pre-specified set of values for pi .

• The hypotheses:

H0 : p1 = a1, p2 = a2, ..., pk = akH1 : At least one pi ≠ ai

• The test builds on comparing actual frequency and the expected frequency of occurrences in all cells.

Page 5: Chi Squared Tests

• Example 16.1– Two competing companies A and B have been

dominant players in the market. Both companies conducted recent advertising campaigns on their products.

– Market shares before the campaigns were:• Company A = 45%• Company B = 40%• Other competitors = 15%.

An Example

Page 6: Chi Squared Tests

• Example 16.1 – continued– To study the effect of the campaigns on the market shares, a

survey was conducted.

– 200 customers were asked to indicate their preference regarding the products advertised.

– Survey results:• 102 customers preferred the company A’s product,• 82 customers preferred the company B’s product,• 16 customers preferred the competitors product.

Page 7: Chi Squared Tests

• Example 16.1 – continued

Can we conclude at 5% significance level that the market shares were affected by the advertising campaigns?

Page 8: Chi Squared Tests

• Solution– The population investigated is the brand preferences.– The data are nominal (A, B, or other)– This is a multinomial experiment (three categories).– The question of interest: Are p1, p2, and p3 different

after the campaign from their values prior to the campaigns?

Page 9: Chi Squared Tests

1

2

3

1

2

3

• The hypotheses are:H0: p1 = .45, p2 = .40, p3 = .15H1: At least one pi changed.

The expected frequency for eachcategory (cell) if the null hypothesis is true is shown below:

90 = 200(.45)

30 = 200(.15)

102 82

16

What actual frequencies did the sample return?

80 = 200(.40)

Page 10: Chi Squared Tests

• The statistic is:

Intuitively, this measures the extent of differences between the observed and the expected frequencies.

• The rejection region is:€

2 =( f i − ei)

2

eii=1

k

∑where ei = npi

2 > χ α ,k−12

Page 11: Chi Squared Tests

• Example 16.1 – continued

18.830

)3016(80

)8082(90

)90102( 22k

1i

22 =

−+

−+

−= ∑

=

α ,k−12 = χ .05,3−1

2 = 5.99147

The p − value = P(χ 2 > 8.18) = .01679[this come from Excel : = CHIDIST(8.18,2)]

Page 12: Chi Squared Tests

• Example 16.1 – continued

0

0.005

0.01

0.015

0.02

0.025

0 2 4 6 8 10 12

Conclusion: Since 8.18 > 5.99, there is sufficient evidence at 5% significance level to reject the null hypothesis. At least one of the probabilities pi is different. Thus, at least two market shares have changed.

P valueAlpha

5.99 8.18Rejection region

2 with 2 degrees of freedom

Page 13: Chi Squared Tests

Required Conditions – The Rule of Five

• The test statistic used to perform the test is only approximately Chi-squared distributed.

• For the approximation to apply, the expected cell frequency has to be at least 5 for all cells (npi 5).

• If the expected frequency in a cell is less than 5, combine it with other cells.

Page 14: Chi Squared Tests

Chi-squared Test of a Contingency Table

• This test is used to test whether…– two nominal variables are related?– there are differences between two or more

populations of a nominal variable?• To accomplish the test objectives, we need to

classify the data according to two different criteria.

• The idea is also based on goodness of fit.

Page 15: Chi Squared Tests

• Example 16.2– In an effort to better predict the demand for courses

offered by a certain MBA program, it was hypothesized that students’ academic background affect their choice of MBA major, thus, their courses selection.

– A random sample of last year’s MBA students was selected. The following contingency table summarizes relevant data.

Page 16: Chi Squared Tests

Degree Accounting Finance MarketingBA 31 13 16 60

BENG 8 16 7 31BBA 12 10 17 60

Other 10 5 7 3961 44 47 152

There are two ways to view this problem

If each undergraduate degree is considered a population, do these populations differ?

If each classification is considered a nominal variable, are these twovariables dependent?

The observed values

Page 17: Chi Squared Tests

• Solution– The hypotheses are:

H0: The two variables are independent

H1: The two variables are dependent

k is the number of cells in the contingency table.

– The test statistic

∑=

−=

k

1i i

2ii2

e)ef(

– The rejection region

2)1c)(1r(,

2−−α>

Since ei = npi but pi is unknown, we need to estimate the unknown probability from the data, assuming H0 is true.

Page 18: Chi Squared Tests

Under the null hypothesis the two variables are independent:

P(Accounting and BA) = P(Accounting)*P(BA)

Undergraduate MBA MajorDegree Accounting Finance Marketing Probability

BA 60 60/152BENG 31 31/152BBA 39 39/152Other 22 22/152

61 44 47 152Probability 61/152 44/152 47/152

The number of students expected to fall in the cell “Accounting - BA” iseAcct-BA = n(pAcct-BA) = 152(61/152)(60/152) = [61*60]/152 = 24.08

= [61/152][60/152].

60

61 152

The number of students expected to fall in the cell “Finance - BBA” iseFinance-BBA = npFinance-BBA = 152(44/152)(39/152) = [44*39]/152 = 11.29

44

39

152

Estimating the expected frequencies

Page 19: Chi Squared Tests

eij = (Column j total)(Row i total)Sample size

• The expected frequency of cell of row i and column j in the contingency table is calculated by:

Page 20: Chi Squared Tests

∑=

−=

k

1i i

2ii2

e)ef(

Undergraduate MBA MajorDegree Accounting Finance Marketing

BA 31 (24.08) 13 (17.37) 16 (18.55) 60BENG 8 (12.44) 16 (8.97) 7 (9.58) 31BBA 12 (15.65) 10 (11.29) 17 (12.06) 39Other 10 (8.83) 5 (6.39) 7 (6.80) 22

61 44 47 152

The expected frequency

31 24.08

31 24.08

31 24.08

31 24.08

31 24.08

(31 - 24.08)2

24.08 +….+

5 6.39

5 6.39

5 6.395 6.39

(5 - 6.39)2

6.39 +….+

7 6.80

7 6.80

7 6.80

(7 - 6.80)2

6.80

7 6.80

2= = 14.70

2 =( f i − ei)

2

eii=1

k

Calculation of the 2 statistic

• Solution – continued

Page 21: Chi Squared Tests

• Conclusion: Since 2 = 14.70 > 12.5916, there is sufficient evidence to infer at 5% significance level that students’ undergraduate degree and MBA students courses selection are dependent.

• Solution – continued– The critical value in our example is:

α ,(r−1)(c−1)2 = χ .05,(4 −1)(3−1)

2 = 12.5916

Page 22: Chi Squared Tests

Degree MBA Major3 11 11 11 12 21 3. .

. .

Code:Undergraduate degree 1 = BA2 = BENG3 = BBA4 = OTHERSMBA Major 1 = ACCOUNTING2 = FINANCE3 = MARKETING

Contingency Table1 2 3 Total

1 31 13 16 602 8 16 7 313 12 10 17 394 10 5 7 22Total 61 44 47 152Test Statistic CHI-Squared = 14.7019P-Value = 0.0227

Select the Chi squared / raw data Option from Data Analysis Plus under tools. See Xm16-02

Define a code to specify each nominal value. Input the data in columns one column for each category.

Using the computer