Top Banner
Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data
37

Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Jan 17, 2016

Download

Documents

Bryan Wade
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 1

Dr. Ka-fu Wong

ECON1003Analysis of Economic Data

Page 2: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 2l

GOALS Conduct the sign test for dependent samples using the

binomial distribution as the test statistic. Conduct the sign test for dependent samples using the

normal distribution as the test statistic. Conduct a test of hypothesis for the population median. Conduct a test of hypothesis for dependent samples

using the Wilcoxon signed-rank test. Conduct the Wilcoxon rank-sum test for independent

samples. Conduct the Kruskal-Wallis test for several independent

samples. Compute and interpret Spearman’s coefficient of rank

correlation. Conduct a test of hypothesis to determine whether the

correlation among the ranks in the population is different from zero.

Chapter SixteenNonparametric Methods: Nonparametric Methods: Analysis of Ranked DataAnalysis of Ranked Data

Page 3: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 3

Tests based on signs and ranks

Some of the tests we discussed earlier may be conducted differently based on the signs and ranks of data.

Tests based on signs and ranks can Deal with a wider range of data type. Most the

tests we talked about mainly deals with ratio level data. Rank-based and sign-based tests can deal with ordinal level data.

Requires less distributional assumptions. Some of the tests we talked about requires normality and sometimes same variance across populations.

Tests based on signs and ranks are thus known to be non-parametric – requires less parametric assumptions.

Page 4: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 4

The Sign Test

The Sign Test is based on the sign of a difference between two related observations. No assumption is necessary regarding the

shape of the population of differences. The binomial distribution is the test statistic

for small samples and the standard normal (z) for large samples.

The test requires dependent (related) samples.

Recall in Chapter 11, we tested the difference of paired sample based on the mean differences of different pairs. A t-statistic was used for the test.

Page 5: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 5

The Sign Test applications

Test of the “before/after” experiments.Have sales increased after the

introduction of a new marketing strategy?

Have general prices fallen after the outbreak of SARS in Hong Kong?

Have stock prices fallen after the outbreak of SARS in Hong Kong?

Page 6: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 6

The Sign Test procedure

Procedure to conduct the test: Determine the sign of the difference between related

pairs. Determine the number of usable pairs. Compare the number of positive (or negative)

differences to the critical value. Idea of the test:

Any observation pair is classified as positive (success) or negative (failure). Hence the distribution of observed positive should be approximated binomial. Similarly for observed negatives.

If we were to test no change from one group to the other (or before and after), the probability of success (positive) in any single draw under the null is =.5. In s sample of n observations (n usable pairs without ties), the probability of observing X positives may be computed using the binomial probability formula

Page 7: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 7

Normal Approximation

If both n and n(1-) are greater than 5, we can use z distribution as an approximation to the binomial distribution, with some adjustment of the continuity correction factor (see Chapter 7).

If the number of pluses or minuses is more than n/2, then

If the number of pluses or minuses is less than n/2, then

n

nXz

5.

5.)5.(

n

nXz

5.

5.)5.(

Page 8: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 8

EXAMPLE 1

The Gagliano Research Institute for Business Studies is comparing the research and development expense (R&D) as a percent of income for a sample of glass manufacturing firms for 2000 and 2001. At the .05 significance level has the R&D expense declined? Use the sign test.

Company 2000 2001

Savoth Glass 20 16

Ruisi Glass 14 13

Rubin Inc. 23 20

Vaught 24 17

Lambert Glass 31 22

Pimental 22 20

Olson Glass 14 20

Flynn Glass 18 11

Page 9: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 9

EXAMPLE 1 continued

Company 2000 2001 Difference

Sign

Savoth Glass 20 16 4 +

Ruisi Glass 14 13 1 +

Rubin Inc. 23 20 3 +

Vaught 24 17 7 +

Lambert Glass 31 22 9 +

Pimental 22 20 2 +

Olson Glass 14 20 -6 -

Flynn Glass 18 11 7 +

First, Compute the differences and determine the signs

Page 10: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 10

EXAMPLE 1 continued

Step 1: If the R&D expense remains more or less

unchanged, the probability that a random draw firm should have a higher R&D expense (with + sign) should be about 0.5.H0: =.5

If the R&D expense has decline, the probability that a random draw firm should have a higher R&D expense should be lower than 0.5.H1: <.5

Page 11: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 11

EXAMPLE 1 continued

Step 1: H0: =.5

H1: <.5

Step 2: H0 is rejected if the number of negative signs is 0 or 1, because z critical value is -1.65, and hence critical value of X = (-1.65) 0.5 (8)0.5 + 0.5(8)-0.5 =1.166 at 0.05 level of significance, based on the normal approximation.

Step 3: There is one negative difference, i.e., 7 positive difference. That is, there was an increase in the percent for one company.

Step 4: H0 is rejected. We conclude that R&D expense as a percent of income declined from 2000 to 2001.

Page 12: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 12

Using sign test to test a hypothesis about a Median

When testing the value of the median, we use the normal approximation to the binomial distribution. Any observation that is above the proposed median is

classified as positive (success), and below as negative (failure).

If the proposed median is correct, the observed positives should be about 50% of the sample size. Hence the distribution of observed positive should be approximated binomial with =.5. Similar for observed negatives.

As before, the probability of success (positive) in any single draw under the null is =.5. In s sample of n observations (n usable pairs without ties), the probability of observing X positives may be computed using the binomial probability formula.

When sample size is large, the z distribution is used as an approximation, with a continuity factor correction.

Page 13: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 13

EXAMPLE 2

The Gordon Travel Agency claims that their median airfare for all their clients to all destinations is $450. This claim is being challenged by a competing agency, who believe the median is different from $450. A random sample of 300 tickets revealed 170 tickets were below $450. Use the 0.05 level of significance.

Page 14: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 14

Example 2 Continued

H0: median = $450 versus H1: median ≠ $450

Above the proposed median implies positive sign. Below the proposed median implies negative sign. Because we have a two-sided alternative, at 0.05

significance level, H0 is rejected if z is less than –1.96 or greater than 1.96.

252.23005.

)300(50.)5.170(5.

50.)5.(

n

nXz

Because z is larger than 1.96, H0 is rejected. We conclude that the median is not $450.

Page 15: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 15

Wilcoxon Signed-Rank Test

If the assumption of normality is violated for the paired-t test (recall Chapter 11), the paired-t test cannot be used.

Wilcoxon signed-rank test does not assume normality and hence can be used in this situation.

The test requires the ordinal scale of measurement.

The observations must be related or dependent.

As an alternative to Sign Test to test “no change” in “before/after experiments”.

Page 16: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 16

Wilcoxon Signed-Rank Test

The steps for the test are:1. Compute the differences between related

observations.2. Rank the absolute differences from low to high.3. Return the signs to the ranks and sum positive

and negative ranks. If there is no change after the experiment,

the ranks of the absolute difference should be due to some random errors. Hence, the two rank sums should be close.

4. Compare the smaller of the two rank sums with the T value, obtained from Appendix H.

Page 17: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 17

EXAMPLE 3

Use the Wilcoxon matched-pair signed-rank test to determine if the R&D expenses as a percent of income (EXAMPLE 1) have declined. Use the .05 significance level.

Step 1: H0: The percent stayed the same.

H1: The percent declined.

Step 2: H0 is rejected if the smaller of the rank sums is less than or equal to 5. See Appendix H.

Page 18: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 18

Example 3 Continued

Company 2000 2001 Diff Abs-diff

Rank R+ R-

Savoth Glass 20 16 4 4 4 4 *

Ruisi Glass 14 13 1 1 1 1 *

Rubin Inc. 23 20 3 3 3 3 *

Vaught 24 17 7 7 7 * *

Lambert Glass 31 22 9 9 8 8 *

Pimental 22 20 2 2 2 2 *

Olson Glass 14 20 -6 6 5 * 5

Flynn Glass 18 11 7 7 6 6 *

The smaller rank sum is 5, which is equal to the critical value of T. H0 is rejected. The percent has declined from one year to the next.

Page 19: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 19

Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test is used to determine if two independent samples came from the same or equal populations. No assumption about the shape of the

population is required. The data must be at least ordinal scale. Each sample must contain at least eight

observations. If the two samples are from the same population,

the average of the ranks of the two samples should be about the same.

Recall a similar t-test in Chapter 11 requires the data to follow normal distribution and have equal population variances.

Page 20: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 20

Wilcoxon Rank-Sum Test

12)1(

2)1(

2121

211

nnnn

nnnW

z

For a one sided test, W have to be chosen to be consistent with the hypothesis.

To determine the value of the test statistic W, all data values are ranked from low to high as if they were from a single population.

The sum of ranks for each of the two samples is determined.

The sum of ranks of the first sample – W – is used to compute the test statistic :

Implied sum of ranks for the first sample under the null.

Page 21: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 21

EXAMPLE 4

Hills Community College purchased two vehicles, a Ford and a Chevy, for the administration’s use when traveling. The repair costs for the two cars over the last three years is shown on the next slide. At the .05 significance level is there a difference in the two distributions?

Ford ($) Chevy ($)

25.31 14.89

33.68 20.31

46.89 25.97

51.83 33.68

87.65 68.98

87.90 78.23

90.89 80.31

120.67 81.75

157.90

Page 22: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 22

EXAMPLE 4 continued

Ford ($) Rank Chevy ($) Rank

25.31 3 14.89 1.0

33.68 5.5 20.31 2.0

46.89 7.0 25.97 4.0

51.83 8.0 33.68 5.5

87.65 13.0 68.98 9.0

87.90 14.0 78.23 10.0

90.89 15.0 80.31 11.0

120.67 16.0 81.75 12.0

157.90 17.0

81.5 71.5

First, rank the combined sample and compute the sum of ranks separately for the two samples.

Page 23: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 23

EXAMPLE 4 continued

Step 1: H0: The repair costs are the same.

H1: The repair costs are not the same.

Because it is a two-sided test, W can be either the rank sum of the smaller sample or the larger sample.

Step 2: H0 is rejected if z >1.96 or z is less than –1.96

If we have H0: The repair costs are the same.H1: The repair costs is lower for Ford than for Chevy.

Somehow we should reject the null and favor the alternative, if the average of rank of ford is lower than that for Chevy. That should correspond to small z. Hence we will choose “Ford” as the first sample. See the textbook for additional example.

Page 24: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 24

Example 4 continued

Step 3: The value of the test statistic is 0.914.

914.0

12)198)(9(8

2)198(8

5.81

12)1(

2)1(

2121

211

nnnn

nnnW

z

Step 4: We do not reject the null hypothesis. We cannot conclude that there is a difference in the distributions of the repair costs of the two vehicles.

Page 25: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 25

Kruskal-Wallis Test: Analysis of Variance by Ranks

This is used to compare three or more samples to determine if they came from equal populations. The ordinal scale of measurement is required. It is an alternative to the one-way ANOVA. The chi-square distribution is the test statistic

with degree of freedom equal to the number of samples minus 1.

Each sample should have at least five observations.

The sample data is ranked from low to high as if it were a single group.

Page 26: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 26

Kruskal-Wallis Test: Analysis of Variance by Ranks continued

The test statistic is given by:

)1(3)(

...)()(

)1(

12 2

2

22

1

21

n

n

R

n

R

n

R

nnH

k

k

If the samples come from the same population, the mean sum of squared ranks should be approximately the same across samples.

If the samples are not from the same population, some mean sum of squared ranks may explode – returning a big H.

Page 27: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 27

EXAMPLE 5

Keely Ambrose, director of Human Resources for Miller Industries, wishes to study the percent increase in salary for middle managers at the four manufacturing plants. She gathers a sample of managers and determines the percent increase in salary from last year to this year. At the 5% significance level can Keely conclude that there is a difference in the percent increases for the various plants?

Page 28: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 28

EXAMPLE 5 continued

Millville

Rank

Camden

Rank Eaton Rank Vineland

Rank

2.2 2.0 1.9 1.0 3.7 6.0 5.7 9.0

3.6 5.0 2.7 3.0 4.5 7.0 6.8 10.5

4.9 8.0 3.1 4.0 7.1 13.5 8.9 16.0

6.8 10.5 6.9 12.0 9.3 17.0 11.6 18.5

7.1 13.5 8.3 15.0 11.6 18.5 13.9 20.0

39.0 35 62.0 74.0

First, rank the combined sample.

Page 29: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 29

EXAMPLE 5 continued

Step 1: H0: The populations are the same.

H1: The populations are not the same.

Step 2: H0 is rejected if 2 is greater than 7.185. There are 3 degrees of freedom at the .05 significance level.

Page 30: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 30

Example 5 continued

949.5

)120(35

74

5

62

5

35

5

39

)120(20

12

)1(3)()()()(

)1(

12

2222

24

2

23

2

22

1

21

n

n

R

n

R

n

R

n

R

nnH

k

The null hypothesis is not rejected. There is no difference in the percent increases in the four plants.

Step 1: H0: The populations are the same.

H1: The populations are not the same.

Step 2: H0 is rejected if 2 is greater than 7.185. There are 3 degrees of freedom at the .05 significance level.

Page 31: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 31

Rank-Order Correlation

Spearman’s coefficient of rank correlation reports the association between two sets of ranked observations. The features are:

It can range from –1.00 up to 1.00.

It is similar to Pearson’s coefficient of correlation, but is based on ranked data.

Page 32: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 32

Spearman Coefficient of Rank Correlation

The formula to find the coefficient of rank correlation is:

d is the difference in the ranks and n is the number of observations.

)1(

61

2

2

nn

drs

Page 33: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 33

Testing the Significance of rs

State the null hypothesis: Rank correlation in population is 0.

State the alternate hypothesis: Rank correlation in population is not 0.

The value of the test statistic is computed from the formula:

21

2

ss r

nrt

Page 34: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 34

Example 6

Below are the pre-season football rankings for the Atlantic Coast Conference by the coaches and sports writers. Determine the coefficient of rank correlation between the two groups.

School Coaches Writers

Maryland 2 3

NC State 3 4

NC 6 6

Virginia 5 5

Clemson 4 2

Wake Forest 7 8

Duke 8 7

Florida State 1 1

Page 35: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 35

Example 6 Continued

School Coaches Writers d D2

Maryland 2 3 -1 1

NC State 3 4 -1 1

NC 6 6 0 0

Virginia 5 5 0 0

Clemson 4 2 2 4

Wake Forest 7 8 -1 1

Duke 8 7 1 1

Florida State 1 1 0 0

Total 8

Compute the differences in ranks and their squares.

Page 36: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 36

Example 6 Continued

There is a strong correlation between the ranks of the coaches and the sports writers.

905.0)18(8

)8(61

)1(

61

22

2

nn

drs

211.5905.01

28905.0

1

222

s

s r

nrt

Since the test statistic is larger than the critical value (2.447 from t-distribution with 6 degree of freedom), the null hypothesis of zero correlation is rejected..

Page 37: Ka-fu Wong © 2003 Chap 16- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.

Ka-fu Wong © 2003 Chap 16- 37

- END -

Chapter SixteenNonparametric Methods: Nonparametric Methods: Analysis of Ranked DataAnalysis of Ranked Data