BINF702 SPRING 2014 - George Mason University

BINF702 SPRING 2014 - CHAPTER 9 NONPARAMETRIC METHODS 1

BINF702 SPRING 2014

Chapter 9 - Nonparametric Methods


Why Nonparametric Methods?

Our previous estimation and hypothesis testing methods assumed knowledge of the underlying distribution of the data.

Parametric statistical methods.

Nonparametric methods make fewer assumptions about the underlying distribution from which the data was drawn.


Section 9.1 Introduction (Data Types)

Def. 9.1 – Cardinal data are data that are on a scale where it is meaningful to measure the distance between possible data values.

Ex.

Height, weight, cholesterol level

Def. 9.2 – For cardinal data, if the zero point is arbitrary, then the data are on an interval scale; if the zero point is fixed, then the data are on a ratio scale.

Ex

Temperature is of type interval unless measured in Kelvin

Height is of type ratio


Ordinal and Nominal Data

Def. 9.3 – Ordinal data are data that can be ordered but do not have specific numeric values. Thus, common arithmetic cannot be performed on ordinal data in a meaningful way.

Ex.

Cancer classification (normal, mild, moderate, severe)

Def. 9.4 – Data are on a nominal scale if different data values can be classified into categories but the categories have no specific ordering.

Disease names


Summary of the Types of Data


Section 9.2 The Sign Test

Example 9.7 – Dermatology Suppose we wish to compare the effectiveness of two ointments (A, B) in reducing excessive redness in people who cannot otherwise be exposed to sunlight. Ointment A is randomly applied to the left arm or right arm, and ointment B is applied to the corresponding area on the other arm. The person is then exposed to 1 hours of sunlight and the two arms are compared for degrees of redness. Suppose only the following qualitative assessments can be made:

The A arm is not as red as the B arm.

The B arm is not as red as the A arm.

The arms are equally red.

Of 45 people tested with the condition, 22 are better off in the A arm, 18

are better off in the B arm, and 5 are equally well off on both arms. How

can we decide if this evidence is sufficient to conclude that ointment A is

better than ointment B?


Section 9.2.1 Normal-Theory Method

Let xi = degree of redness on the A arm, yi = degree of redness on the B arm for the i-th person. Consider di = xi – yi. Consider the hypothesis H0:D = 0 versus H1:D not equal 0 where D = the population median of the di or the 50th percentile of the underlying distribution of the di. We can’t observe the di but we can observe C = the number where di > 0.

Under H0 P(di > 0) = ½. We will form a binomial based test statistic using the normal approximation to the binomial. Hence we require npq > 5 (n)(1/2)(1/2) > 5 n > 20


Section 9.2.1 The Sign Test (Normal Theory Methods)

Eq. 91 – The Sign Test To test the hypothesis H0:D = 0 versus H1:D not equal 0 where the number of nonzero di’s = n >=20 and C = number of di’s where di > 0 if

Then H0 is rejected. Otherwise H0 is accepted.

2 1 / 2 1 1 / 2

1 1/ 4 / 4

2 2 2 2

n nC c z n or C c z n


Section 9.2.1 The Sign Test (Normal Theory Methods) Eq. 9.2 Computation of the p-Value for the Sign Test

(Normal-Theory Method)

.5

22* 12/ 4

.522*

2/ 4

1.0 if 2

nC

np if C

n

nC

np if C

n

np C


Section 9.2.1 The Sign Test (Normal Theory Methods)

Alternate p-value computation

Where C = number of di > 0 and D = number of di < 0

12* 1 if and 1.0 if C = D

C Dp C D p

n


Example 9.8 (Eq. 9.2 in R)

sign.approx <- function(C, D, n){

denom = sqrt(n/4)

#browser()

if(C > (n/2)){

num = C - (n/2) - .5

p = 2 * (1 - pnorm(num/denom))

}

else{

if(C < (n/2)){

num = C - (n/2) + .5

p = 2 * pnorm(num/denom)

}

else{num/denom

p = 1.0

}

}

return(p)

}

>source("F:/fall2

004/binf702/sign.

approx.R")

>sign.approx(18,2

2,40)

[1] 0.6352563


Section 9.2.2 – Exact Method

Eq 9.3 Computation of the p-Value for the Sign Test (Exact Test, n < 20)

0

1/ 2 2*

2

1/ 2, 2*

2

/ 2, 1.0

This equation is a special case of Equation 7.44.

nn

k C

nC

k

nIf C n p

k

nIf C n p

k

If C n p

Where did this n come from?


Section 9.2.2 – Exact Method Suppose we want to compare two different types of eye drops (A, B) that are intended to

prevent redness in people with hay fever. Drug A is randomly given to one eye and drug B the other eye. The redness is noted at baseline and after 20 minutes by an observer who is not aware of which drug is administered to which eye. We find that for 15 people with an equal amount of redness in each eye at baseline, after 10 minute the drug A eye is less red than the drug B eye for 8 people; the drug B eye is less red than the drug A eye for 2 people; and the eyes are equally red for 5 people. Assess the statistical significance of the results. Craft your calculation using binom.test in R. Solve this using binom.test in R.

Another Sign Test Example - I The data are a subset of data reported by Ijzermans (1970) from an investigation into the

susceptibility to corrosion of a stainless steel alloy. The data we will work with is the

percentage of chromium in 12 samples of the alloy. We are interested in testing the

hypothesis that the median percentage of chromium content is 18% against the alternative

that it is not.

data <- read.table("table3_9",header=T)

data

Sample X.CR

1 1 17.4

2 2 17.9

3 3 17.6

4 4 18.1

5 5 17.6

6 6 18.9

7 7 16.9

8 8 17.5

9 9 17.8

10 10 17.4

11 11 24.6

12 12 26.0

cro <- data[,2]

mu <- 18 # hypothesized value

b <- sum(cro > mu) # test statistic

b

[1] 4

BINF702 SPRING 2014 - CHAPTER 9 NONPARAMETRIC

METHODS 14

Another Sign Test Example - II

Can you solve this in R?


Another Sign Example

Suppose an ophthalmologist reviews fundus photographs of 30 patients with macular degeneration both before and 3 moths after receiving a laser treatment. To assess the efficacy of treatment, each patient is rated as improved, remained the same, or declined. If 20 patients improved, 7 declined, and 3 remained the same, then assess whether or not patients undergoing this treatment are showing significant change from baseline to 3 months afterward. Report a p-value.



Section 9.3 The Wilcoxon Signed-Rank Test (A Nonparametric Analog of the Paired t-Test)

Ex. 9.10 Consider the data in Ex. 9.7 from a different perspective. We assumed that the only possible assessment was that the degree of sunburn with ointment A was either better or worse than that with ointment B. Suppose instead that the degree of burn can be quantified on a 10-point scale, with 10 being the worst burn and 1 being no burn at all. We can now compute di = xi – yi, where xi = degree of burn for ointment A and yi = degree of burn for ointment B. If di is positive, then ointment B is doing better than ointment A; if di is negative , then ointment A is doing better then ointment B. For example, if di = + 5, then the degree of redness is 5 units greater on the ointment A arm than on the ointment B arm, whereas if di = -3, then the degree of redness is 3 units less on the ointment A arm than on the ointment B arm. How can this additional information be used to test if the ointments are equally effective?



We wish to test H0 : D = 0 vs. H1 : D != 0 where D = median score difference between the ointment A and ointment B arms. If D < 0, then ointment A is better; if D > 0, then ointment B is better. We will assume that the di have an underlying continuous distribution.

We can’t use a paired t-test because the data is ordinal.



Eq. 9.4 Ranking Procedure for the Wilcoxon Signed-Rank Test

1. Arrange the differences di in order of absolute value.

2. Count the number of differences with the same absolute value.

3. Ignore, the observations where di = 0 and rank the remaining observations from 1, for the observations with the lowest absolute value, up to n, for the observations with the highest absolute value.

4. If there is a group of several observations with the same absolute value, then find the lowest rank in the range = 1 + R and the highest rank in the range = G + R, where R = the highest rank used prior to considering this group and G = the number of differences in the range of ranks for the group. Assign the average rank = (lowest rank in the range + highest rank in the range)/2 as the rank for each difference in the group.



Ex. 9.11

From Table 9.1 we note that 14 people have absolute value 1. This groups ranks range from 1 to 14 with an average rank of (1 + 14) / 2 = 7.5

The group of 10 people with absolute value 2 has a rank range from (1 + 14) to (10 + 14) = 15 to 24 and an average rank = (15 + 24) / 2 = 19.5

Section 9.3 The Wilcoxon Signed-Rank Test (A Nonparametric Analog of the Paired t-Test) Eq. 9.5 Wilcoxon Signed-Rank

Test (Normal Approximation Method for Two-Sided Level Test)

1. Rank the differences shown in eq. 9.4.

2. Compute the rank sum R1 of the positive differences.

3. Compute

If there are no ties (i.e., no groups

of differences with the same

absolute value.)

1

( )( 1) 1

4 2

( 1)(2 1) / 24

n nR

Tn n n



Section 9.3 The Wilcoxon Signed-Rank Test (A Nonparametric Analog of the Paired t-Test) Eq 9.5 (cont)

3. Compute

if there are ties, where ties refers

to the number of differences with

the same absolute value in the ith

tied group and g is the number of

tied groups.

1

3

1

( 1)1/ 2

4

( 1)(2 1) / 24 / 48g

i i

i

n nR

T

n n n t t


Section 9.3 The Wilcoxon Signed-Rank Test (A Nonparametric Analog of the Paired t-Test) Eq. 9.5 (cont.)

4. If

T > z1-/2 then reject H0. Otherwise, accept H0.

5. The p-value for the test is given by p = 2 * [1 – fT)]

6. This test should be used only if the number of nonzero differences is >= 16 and if the difference scores have an underlying continuous symmetric distribution.


Section 9.3 The Wilcoxon Signed-Rank Test (A Nonparametric Analog of the Paired t-Test) Eq. 9.5 (cont.)

Ex. 9.12

1

1

3 3 3 3

1

3 3 3 3

2 22

1

10(7.5) 6(19.5) 2(28.0) 248

( ) 40(41) / 4 410

( ) 40(41)(81) / 24 [(14 14) (10 10) 7 7 1 1

2 2 2 2 3 3 1 1 ]/ 48 5449.75

( ) 14(7.5) 10 19.5 40 / 4 5449.75

| 248 410 | 1/ 2 / 73.82 2.19

R

E R

Var R

Var R

T

2 1 2.19 .029p f


The Wilcoxon Signed Rank Test in R - I

wilcox.test package:stats R Documentation

Wilcoxon Rank Sum and Signed Rank Tests

Description:

Performs one and two sample Wilcoxon tests on vectors of data; the

latter is also known as 'Mann-Whitney' test.

## Default S3 method:

wilcox.test(x, y = NULL,

alternative = c("two.sided", "less", "greater"),

mu = 0, paired = FALSE, exact = NULL, correct = TRUE,

conf.int = FALSE, conf.level = 0.95, ...)

The Wilcoxon Signed Rank Test in R - II

Arguments:

x: numeric vector of data values.

y: an optional numeric vector of data values.

alternative: a character string specifying the alternative hypothesis,

must be one of '"two.sided"' (default), '"greater"' or

'"less"'. You can specify just the initial letter.

mu: a number specifying an optional location parameter.

paired: a logical indicating whether you want a paired test.

exact: a logical indicating whether an exact p-value should be

computed.

correct: a logical indicating whether to apply continuity correction

in the normal approximation for the p-value.

normal approximation is used.


The Wilcoxon Signed Rank Test in R - III

conf.int: a logical indicating whether a confidence interval should be

computed.

conf.level: confidence level of the interval.

formula: a formula of the form 'lhs ~ rhs' where 'lhs' is a numeric

variable giving the data values and 'rhs' a factor with two

levels giving the corresponding groups.

data: an optional data frame containing the variables in the model

formula.

subset: an optional vector specifying a subset of observations to be

used.

na.action: a function which indicates what should happen when the data

contain 'NA's. Defaults to 'getOption("na.action")'.

...: further arguments to be passed to or from methods.



The Wilcoxon Signed Rank Test in R - IV

Details:

The formula interface is only applicable for the 2-sample tests.

If only 'x' is given, or if both 'x' and 'y' are given and

'paired' is 'TRUE', a Wilcoxon signed rank test of the null that

the distribution of 'x' (in the one sample case) or of 'x-y' (in

the paired two sample case) is symmetric about 'mu' is performed.

Otherwise, if both 'x' and 'y' are given and 'paired' is 'FALSE',

a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is

carried out. In this case, the null hypothesis is that the

location of the distributions of 'x' and 'y' differ by 'mu'.

By default (if 'exact' is not specified), an exact p-value is

computed if the samples contain less than 50 finite values and

there are no ties. Otherwise, a


In the Presence of Ties

The p-values are not really correct in the presence of ties so one

should install exactRankTests and use this in the presence of ties

library(exactRankTests)

One then uses

wilcox.exact

Example 9.12 in R



Example 9.12 in R With the Tie Robust Test


Another Example of the Wilcoxon Signed-Ranks Test

An interview panel of ten interviewers were asked to rate the final candidates on a scale of 1 to 20 in terms of their suitability for a vacant post. Is one candidate rated significantly higher than the other by the interviewers?

Interviewer Candidate 1 Candidate 2

1 14 10

2 17 7

3 12 14

4 16 6

5 14 14

6 10 4

7 17 10

8 12 4

9 6 11

10 18 6


Another Example of the Wilcoxon Signed-Ranks Test


Example Repeated With the Ties Robust Test


Example Repeated With the coin library

> x = c(14,17,12,16,14,10,17,12,6,18)

> y = c(10, 7, 14, 6, 14, 4, 10, 4, 11, 6)

> wilcoxsign_test(x ~ y, alternative =

"two.sided", distribution = exact())

Exact Wilcoxon-Signed-Rank Test

data: y by x (neg, pos)

stratified by block

Z = 2.1936, p-value = 0.02734

alternative hypothesis: true mu is not equal to 0


Section 9.4 The Wilcoxon Rank-Sum Test (A Nonparamatric Analog to the t test for Two Independent Samples)

Example 9.15 Ophthalmology Different genetic types of the disease retinitis pigmentosa (RP) are thought to have different rates of progression, with the dominant form of the disease progressing the slowest, the recessive form of the disease the next slowest, and the sex-linked form of the disease the quickest. This hypothesis can be tested by comparing the visual acuity of people ages 10-19 who have different genetic types of RP. Suppose there are 25 people with dominant disease and 30 people with sex-linked disease. The best-corrected visual acuities (i.e., with appropriate glasses) in the better eyes of these people are presented in Table 9.2. How can the data be used to test if the median visual acuity is different in the two groups?



What do we wish to test in example 9.15?

H0: medianD = medianSL versus H1:medianD != medianSL where medianD and medianSL are the median visual acuities in the dominant and sex-linked groups respectively.

The two-sample t test for independent sample would be appropriate except that the visual acuity data cannot be given a specific numerical value that ophthalmologist would agree on.



Eq. 9.6 Ranking Procedure for the Wilcoxon Rank-Sum Test

1. Combine the data from the two groups and order the values from the lowest to the highest, or in the case of visual acuity, from the best visual acuity (20-20) to worst visual acuity (20-80)

2. Assign ranks to the individual values, with the best visual acuity (20-20) having the lowest rank and worst visual acuity (20-80) have the highest rank, or vice versa.

3. If a group of observations has the same value, then compute the range of ranks for the group, as was done for the signed-rank test in eq. 9.4 and assign the average rank for each observation in the group.



Eq. 9.7 Wilcoxon Rank-Sum Test (Normal Approximation Method for Two-Sided level Test)

1. Rank the observations as shown in Eq. 9.6.

2. Compute the rank sum R1 in the first sample (the choice of sample is arbitrary).

3. Compute (assuming no ties)

1 1 2

1

1 21 2

1 1

2 2

112

n n nR

Tn n

n n



Eq. 9.7 Wilcoxon Rank-Sum Test (Normal Approximation Method for Two-Sided level Test)

3. Compute (if there are ties)

1 1 2

1

2

11 21 2

1 2 1 2

1 1 1 2

1 1

2 2

1

112 1

unless 1 / 2 then 0

g

i i

i

n n nR

T

t tn n

n nn n n n

R n n n T



Section 9.4 The Wilcoxon Rank-Sum Test (A Nonparamatric Analog to the t

test for Two Independent Samples)

where ti refers to the number of observations with the same

value in the i-th tied group, and g is the number of tied groups.

4. If

T > z1-/2

Then reject H0, otherwise accept H0.

5. Compute the exact p-value by

p = 2 * [1 – (T)]

6. This test should be used only if both n1 and n2 are at least 10, and if there is an underlying continuous distribution



Consider Ex. 9.17




Ex 9.17 Repeated With the ties Robust Test

> x = c(rep(3.5,5),rep(13.5,9),rep(25.5,6),rep(34.0,3),rep(42.5,2))

> y = c(rep(3.5,1),rep(13.5,5),rep(25.5,4),rep(34.0,4),rep(42.5,8), rep(50,5),rep(53.5,2), rep(55,1))

> wilcox.exact(x,y,paired=FALSE)

Exact Wilcoxon rank sum test

data: x and y

W = 154, p-value = 8.496e-05



Ex 9.17 Repeated With the coin library

> xxx = c(rep(3.5,5),rep(3.5,1),rep(13.5,9),rep(13.5,5),rep(25.5,6),rep(25.5,4),rep(34.0,3),rep(34.0,4),rep(42.5,2), rep(42.5,8),rep(50,5),rep(53.5,2), rep(55,1))

> gene = factor(c(rep("dominant",5),rep("sex-linked",1),rep("dominant",9),rep("sex-linked",5),rep("dominant",6),rep("sex-linked",4),rep("dominant",3),rep("sex-linked",4),rep("dominant",2),rep("sex-linked",8), rep("sex-linked",5),rep("sex-linked",2), rep("sex-linked",1)))

> wilcox_test(xxx ~ gene, alternative = "two.sided", distribution = exact())

Exact Wilcoxon Mann-Whitney Rank Sum Test

data: xxx by gene (dominant, sex-linked)

Z = -3.7975, p-value = 8.496e-05



Challenge Problems


Problem # 1 Suppose we wish to compare

the recovery times of patients after 2 different versions of some operation, say removing the gallbladder. Operation A is performed through a vertical incision; Operation B through an oblique incision. Each operation is performed alternately (A, B, A, B, etc.) on a consecutive series of patients suffering from gallbladder disease, and the recovery times (say, number of days in the hospital after operation, including the day of operation and the day of discharge from hospital) are then collected as follows.

Patient # Days to recover from operation

Patient # DTRFO

1 16 2 18

3 20 4 19

5 25 6 15

7 19 8 16

9 22 10 21

11 15 12 17

13 22 14 17

15 19 16 14


Problem # 1 Solution



Problem # 2

A man who was sick in bed decided to count the number of advertisements delivered per day by 2 competing radio stations, WHO and WHY. Each morning he tossed a penny to determine which station he would listen to that day; WHO won the toss 5 times and WHY won 3 times. Eight days of this made his illness fatal, but he left us the following results to analyze-

WHO (Sample A) = 341, 326, 360, 305, 326

WHY (Sample B) = 352, 382, 347

WHY’s average is obviously higher than WHO’s, but is the difference statistically significant?


Problem 2 Solution



Problem # 3 Suppose you are investigating a new sleeping pill called Nockout, and want to

compare it with a standard sedative called phenobarbitone. Things like sedative can very quite a bit from one person to another, so it is best to try both drugs on every person taking part in the experiment. So you collect 10 people suffering from chronic insomnia and one one night give half of them (selected at random) Nockout pills, and the other half Phenobarbitone and observe the number of hours that each person sleeps. You are able to only assess the sleep length to the ¼ hour. A few nights later when you can be sure that the effect of the first sedative has worn off completely each person is given his second pill which is whichever pill he didn’t get the first time and the hours of sleep are note again. The results of such a trial expressed in hours of sleep are given below.


Problem # 3 Data

Patient J.B. R.A. S.T. S.L. P.Q. E.V. J.T. L.O. E. M. B.O.

With

Pheno

7.5 7 7 5.75 4.25 9.25 8 7.25 8.5 7.75

With Nock

8 6 6.75 5 4.5 8 7.5 6.25 8 7.75


Problem # 3 Solution



Chapter 9 Homework

9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 9.10, 9.11, 9.12

BINF702 SPRING 2014 - George Mason University

Documents