Chapter 11: Comparisons Involving Proportions and a Test of Independence

1 Slide

© 2009 Econ-2030-Applied Statistics-Dr. Tadesse.

Chapter 11: Comparisons Involving Proportions

and a Test of Independence

Inferences About the Difference Between Two Population Proportions

Test of Independence: Contingency Tables

Hypothesis Test for Proportions of a Multinomial Population

2 Slide


Inferences About the Difference BetweenTwo Population Proportions

Interval Estimation of p1 - p2

Hypothesis Tests About p1 - p2

3 Slide


Expected Value

Sampling Distribution of p p1 2

E p p p p( )1 2 1 2

p pp pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( )

where: n1 = size of sample taken from population 1 n2 = size of sample taken from population 2

Standard Deviation (Standard Error)

4 Slide


If the sample sizes are large, the sampling distribution of can be approximated by a normal probability distribution.

p p1 2

The sample sizes are sufficiently large if all of these conditions are met:

n1p1 > 5 n1(1 - p1) > 5

n2p2 > 5 n2(1 - p2) > 5


5 Slide



p1 – p2

p pp pn

p pn1 2

1 1

1

2 2

2

1 1 ( ) ( )

p p1 2

6 Slide



Interval Estimate

1 1 2 21 2 / 2

1 2

(1 ) (1 )p p p pp p z

n n

7 Slide


Market Research Associates is conducting research to evaluate the effectiveness of a client’s

new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” ofthe client’s product.


Example:

The new campaign has been initiated with TV and newspaper advertisements running for three weeks.

8 Slide


A survey conducted immediatelyafter the new campaign showed 120of 250 households “aware” of theclient’s product.


Does the data support the positionthat the advertising campaign has provided an increased awareness ofthe client’s product?

9 Slide


Point Estimator of the Difference BetweenTwo Population Proportions

= sample proportion of households “aware” of the product after the new campaign

= sample proportion of households “aware” of the product before the new campaign

1p

2p

p1 = proportion of the population of households “aware” of the product after the new campaign p2 = proportion of the population of households “aware” of the product before the new campaign

1 2120 60 .48 .40 .08250 150p p

10 Slide


.08 + 1.96(.0510).08 + .10

.48(.52) .40(.60).48 .40 1.96250 150


Hence, the 95% confidence interval for the differencein before and after awareness of the product is-.02 to +.18.

For = .05, z.025 = 1.96:

11 Slide


Hypothesis Tests about p1 - p2

Hypotheses Testing

H0: p1 - p2 < 0Ha: p1 - p2 > 0 1 2: 0aH p p

0 1 2: 0H p p 0 1 2: 0H p p 1 2: 0aH p p

0 1 2: 0H p p 1 2: 0aH p p

Left-tailed Right-tailed Two-tailed

We focus on tests involving no difference betweenthe two population proportions (i.e. p1 = p2)

12 Slide



1 2p p Pooled Estimate of Standard Error of

1 21 2

1 1(1 )p p p pn n

1 1 2 2

1 2

n p n ppn n

where:

13 Slide



1 2

1 2

( )1 1(1 )

p pzp p

n n

Test Statistic

14 Slide


Can we conclude, using a .05 levelof significance, that the proportion ofhouseholds aware of the client’s productincreased after the new advertisingcampaign?


Example: Market Research Associates

15 Slide



1. Develop the hypotheses. H0: p1 - p2 < 0Ha: p1 - p2 > 0

p1 = proportion of the population of households “aware” of the product after the new campaign

p2 = proportion of the population of households “aware” of the product before the new campaign

16 Slide



2. Specify the level of significance. = .05

3. Compute the value of the test statistic.

p

250 48 150 40250 150

180400

45(. ) (. ) .

sp p1 245 55 1

2501150 0514 . (. )( ) .

(.48 .40) 0 .08 1.56.0514 .0514z

17 Slide



Using the Critical Value Approach

5. Compare the Test Statistic with the Critical Value.Because 1.56 < 1.645, we cannot reject H0.

For = .05, z.05 = 1.6454. Determine the critical value and rejection rule.

We cannot conclude that the proportion of householdsaware of the client’s product increased after the newcampaign.

18 Slide



5. Compare the p-value with significance level.

We cannot conclude that the proportion of householdsaware of the client’s product increased after the newcampaign.

4. Compute the p –value.For z = 1.56, the p–value = .0594

Because p–value > = .05, we cannot reject H0.

Using the p –Value Approach

19 Slide


Hypothesis (Goodness of Fit) Testfor Proportions of a Multinomial Population

1. Set up the null and alternative hypotheses.

2. Select a random sample and record the observed frequency, fi , for each of the k categories.

3. Assuming H0 is true, compute the expected frequency, ei , in each category by multiplying the category probability by the sample size.

20 Slide


22

1

( )f ee

i i

ii

k

4. Compute the value of the test statistic.

Note: The test statistic has a chi-square distributionwith k – 1 df provided that the expected frequenciesare 5 or more for all categories.

fi = observed frequency for category iei = expected frequency for category ik = number of categories

where:


21 Slide


where is the significance level and

there are k - 1 degrees of freedom

p-value approach:

Critical value approach:

Reject H0 if p-value <

5. Rejection rule:2 2

Reject H0 if


22 Slide


Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a log cabin, a split-level, and an A-frame. To help in production planning, management would like to determine if previous customer purchases indicate that there is a preference in the style selected.

23 Slide


Split- A-Model Colonial Log Level Frame# Sold 30 20 35 15

The number of homes sold of eachmodel for 100 sales over the past twoyears is shown below.


24 Slide


The Hypotheses


where: pC = population proportion that purchase a colonial pL = population proportion that purchase a log cabin pS = population proportion that purchase a split-level pA = population proportion that purchase an A-frame

H0: pC = pL = pS = pA = .25Ha: The population proportions are not equal pC = .25, pL = .25, pS = .25, and pA = .25

25 Slide


Rejection Rule

2

7.815

Do Not Reject H0 Reject H0


With = .05 and k - 1 = 4 - 1 = 3 degrees of freedom

Reject H0 if p-value < .05 or 2 > 7.815.

26 Slide


Expected Frequencies

Test Statistic

22 2 2 230 25

2520 25

2535 25

2515 25

25 ( ) ( ) ( ) ( )


e1 = .25(100) = 25 e2 = .25(100) = 25 e3 = .25(100) = 25 e4 = .25(100) = 25

= 1 + 1 + 4 + 4 = 10

27 Slide


Conclusion Using the Critical Value Approach


We reject, at the .05 level of significance,the assumption that there is no home stylepreference.

2 = 10 > 7.815

28 Slide



Conclusion Using the p-Value Approach

The p-value < . We can reject the null hypothesis.

Because 2 = 10 is between 9.348 and 11.345, the area in the upper tail of the distribution is between .025 and .01.

Area in Upper Tail .10 .05 .025 .01 .0052 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

29 Slide



e i jij

(Row Total )(Column Total ) Sample Size

1. Set up the null and alternative hypotheses.

2. Select a random sample and record the observed frequency, fij , for each cell of the contingency table.

3. Compute the expected frequency, eij , for each cell.

30 Slide


22

( )f e

eij ij

ijji

5. Determine the rejection rule.

Reject H0 if p -value < or .

2 2

4. Compute the test statistic.

where is the significance level and,with n rows and m columns, there are(n - 1)(m - 1) degrees of freedom.


31 Slide


Each home sold by Finger LakesHomes can be classified according toprice and to style. Finger Lakes’manager would like to determine ifthe price of the home and the style ofthe home are independent variables.

Contingency Table (Independence) Test

Example

32 Slide


Price Colonial Log Split-Level A-Frame

The number of homes sold foreach model and price for the past twoyears is shown below. For convenience,the price of the home is listed as either$99,000 or less or more than $99,000.

> $99,000 12 14 16 3< $99,000 18 6 19 12


33 Slide


Hypotheses


H0: Price of the home is independent of the style of the home that is purchasedHa: Price of the home is not independent of the style of the home that is purchased

34 Slide


Expected Frequencies


Price Colonial Log Split-Level A-Frame Total< $99K> $99K Total 30 20 35 15 100

12 14 16 3 4518 6 19 12 55

35 Slide


Rejection Rule


2.05 7.815 With = .05 and (2 - 1)(4 - 1) = 3 d.f.,

Reject H0 if p-value < .05 or 2 > 7.815

22 2 218 16 5

16 56 11

113 6 75

6 75 ( . )

.( ) . . ( . )

. .

= .1364 + 2.2727 + . . . + 2.0833 = 9.149

Test Statistic

36 Slide


Conclusion Using the Critical Value Approach


We reject, at the .05 level of significance,the assumption that the price of the home isindependent of the style of home that ispurchased.

2 = 9.145 > 7.815

37 Slide


Conclusion Using the p-Value Approach

The p-value < . We can reject the null hypothesis.

Because 2 = 9.145 is between 7.815 and 9.348, the area in the upper tail of the distribution is between .05 and .025.

Area in Upper Tail .10 .05 .025 .01 .0052 Value (df = 3) 6.251 7.815 9.348 11.345 12.838


Chapter 11: Comparisons Involving Proportions and a Test of Independence

Documents

p1 p2 example

new campaign p1

applied statistics

new campaign p2

sampling distribution

p1 p2 test statistic

p1 p21

interval estimation