Nonparametric Statistics INSY 8970 M06 Maghsoodloo 1.1 ...maghssa/INSY8970/Chapter1.pdfNonparametric Statistics INSY 8970 M06 Maghsoodloo 1.1 MEASUREMENT SCALES (a) The lowest measurement

Nonparametric Statistics INSY 8970 M06 Maghsoodloo 1.1 MEASUREMENT SCALES (a) The lowest measurement level is the Nominal scale which is used just to separate elements into different classes or categories. Examples are Success/Failure; Go/No-Go; Grades A, B, C, D, F; License plates and SSN. If 10 machines are assigned the numbers 1 through 10 merely for identification purposes, then we are using a nominal scale for measurements because we could have just as well assigned the names A, B, C, …, J to the 10 machines.

(b) The second lowest measurement level is the ordinal scale where only the or = comparisons amongst elements can be made; ordinal scale simply consists of ranks. For example, if 4 types of injuries can occur in a manufacturing plant but not all four are of equal importance, we may assign rank 1 to the lowest priority, rank 2 to 2nd least important, rank 3, and then rank 4 to the highest priority. Note that the scale here is still discrete and generally a rank of 1 implies the least preferred.

(c) The first scale where measurements are continuous is the interval, where not only relative order of two measurements are discernable but also the size of their difference. The interval scale does not have a unique well-defined zero-point (or origin) and a well-defined unit distance, but it is possible to arbitrarily define a zero-point and unit distance. A prime example of a variable that is measured in an interval scale is temperature; note that temperature has an arbitrary zero-point and arbitrary unit distance. Other examples of (random) variables that are measured in an interval scale is time and humidity (although I am not sure about humidity).

(d) The highest level is the ratio scale which has all the characteristics of an interval scale but it does have a well-defined unique zero-point. Examples of variables that are measured in a ratio scale are height, weight, distance, income, yield, velocity, etc. Just like the interval scale, one unit is arbitrarily defined.

2

2

1.2 Nonparametric Test A statistical test (or method) is nonparametric if it satisfies at least one of the following two criteria:

(i) The method can be used on an ordinal or lower scale. (ii) There is no assumption made about the underlying distribution of the

random variable that is being observed. A comparison of parametric and nonparametric (or distribution-free) procedures is outlined in Table 1.

Table 1. (SMD = sampling distribution) Parametric Methods Nonparametric Methods

The parent population is known and generally assumed to be Gaussian

No assumption is made about the underlying distribution

The scale has to be at least interval Any scale

The sample size n should exceed 20 n ≥ 6

Highest statistical power (or sensitivity)

Average relative efficiency is roughly 80%

The SMD of the test statistic depends on the underlying distribution

The SMD of the test statistic does not depend on the underlying distribution

In order to define the relative efficiency of a statistical test, we must first introduce the four circumstances that may occur in testing any statistical hypothesis, as shown in Table 2. In testing a hypothesis, H0 is referred to as the null hypothesis and H1 (or Ha) is called the alternative hypothesis. Nearly always, the objective is to ascertain if the sample provides sufficient evidence to reject the null hypothesis H0, tolerating certain amount of risks.

Table 2 shows that the power (or sensitivity = 1 − β) of a statistical test is simply the Pr (probability) of rejecting H0 if H0 is false, while the specificity of a

statistical test is the Pr of accepting H0 when H0 is true given by 1 − α. Further,

the overall error rate of a statistical test is given by α + β. In general, the Pr of

3

3

Table 2. (Pr = probability) H0: No disease, H1: presence of disease

H0 is true H0 is false

Reject H0

Event ≈ “ Type I Error, or

False Positive” Occurrence

Pr = α

True Positive → Correct

Decision, Sensitivity =

Occurrence Pr = 1 − β

Do not reject H0

(or accept H0 )

True Negative → Correct

Decision, Specificity =

Occurrence Pr = 1 − α

Event ≈ “ Type II Error, or

false negative”

Occurrence Pr = β

rejecting H0 when H0 is true, or the level of significance (LOS) of a test, is a priori specified, and in most applications the LOS is set at α = 0.05; in SQC the LOS of a control chart is set at 0.0027 (this is due to the fact that too many false alarms are cost prohibitive; further a LOS of 0.0027 pertains to 3-sigma limits). The third

most common LOS is α = 0.01. In any application, if no LOS is specified prior to

experimentation, it is implied that the value of α is the standard 5%.

A test for which 1− β ≥ α is said to be unbiased, and a statistical test for which

the value of 1 − β → 1 as n → ∞ is said to be consistent. A statistical test is conservative iff (if & only if) the stated LOS exceeds the actual LOS. The OC (operating characteristic) curve of a statistical test is simply the graph of

β versus the parameter under the null hypothesis, and the power function gives

the graph of (1 − β ) versus all possible values of the parameter under H0.

The relative efficiency (REF) of the statistical test T1 to T2, having the

same LOS α, is given by n2/n1 such that both tests have identical values of β. As

an example, if T1 requires a sample of size n1 = 20 and has α = 0.05, β = 0.10, but

T2 requires an n2 = 25 to attain the same α = 0.05 and β = 0.10, then the efficiency

of T1 relative to T2 is given by 25/20 = 125%, or the REF(T2 to T1) = 20/25 = 80%. On the other hand, if the 5% level tests T1 and T2 both use the same random

4

4

sample of size n = n1 = n2 = 25, but β(T1) = 0.10 while β(T2) = 0.125, then the REF

of T1 to T2 is given by 0.125/0.10 = 125%.

1.3 The Binomial Test Example 1. A machine produces parts in a manufacturing process and it is desired to test if its quality level is worse than 6%, i.e., we wish to test H0 : p = 0.06 versus H1: p > 0.06

at a prescribed LOS α = 0.05. A random sample of n = 15 yields TObs = 3

nonconforming units (NCUs). We wish to determine if the sample provides sufficient evidence to reject H0 (i.e., H0: the machine is operating properly) in favor of H1: The machine needs adjustment (and/or quality improvement =QI).

Throughout these notes, the random variable T denotes the test statistic, and in this case T = the no. of defective units in 15 parts. Note that we have 15 Bernoulli trials and T has a binomial Pr mass function (pmf) given by b(x;15, p) =

15Cx px qn−x , x = 0, 1, 2, …, 15 and q = 1 − p, where 15Cx = 15 15!

x!(15 x)!x⎛ ⎞

=⎜ ⎟ −⎝ ⎠.

Because large values of T are congruent with the rejection of H0, then the

rejection (or critical) region of size α = 0.05 corresponds to the set {15, 14, 13, ,…,

xL) such that the Pr(T ≥ xL| p = 0.06) ≤ 0.05. Because the sample space for T is

discrete, it is nearly always impossible to generate an exact 5%-level test. The

Excel file (named Example 1) on my website shows that the Pr(T ≥ 4| p = 0.06) =

0.010360, while Pr(T ≥ 3| p = 0.06) = 0.057133. Denoting AR as the acceptance

region and AR as the rejection region, the size of the rejection region 1AR = {4,

5, 6, …, 14, 15) is equal to 0.010360 while 2AR = {3, 4, 5, 6, …, 14, 15} = 0.057133.

Because the observed value of TObs = 3, we could decide to reject H0 but in so doing, the Pr of committing a type I Error is 0.057133. On the other hand, if we

cannot tolerate a type I error of size 0.057133 and would prefer to have α =

0.010360, then the sample does not provide sufficient evidence, at the 1.0360% level, to reject H0 and our decision would be not to reject H0, in which case we may indeed be inflating the Pr of committing a type II error. Both Prs of

5

5

committing a type II error, β, and that of rejecting H0, 1− β, are functions of p, and

on the same Excel file, I have graphed the OC curves for both tests (α = 0.010360

and α = 0.057133) with the corresponding power functions. The upper

acceptance limit for α = 0.010360 is AU = 3 while for α = 0.057133 is equal to Au =

4. Note that a rejection region is also called a critical region, denoted CR.

1.4 Sample Size Considerations for a Test of Hypothesis on a Proportion In relationship to the Example 1, clearly n = 15 and AR = {0, 1, 2, 3} provide a

0.01036-level test with type II error Pr of β = 0.9444444 at p = 0.10 (see my

spreadsheet under Example 1). In short, because the sample size n = 15 is small the power of the test at p = 0.10 is equal to 0.05555563. Suppose now we are very unhappy with the overall error rate at n = 15 and are willing to spend more

resources to have a more powerful test (1− β larger than 0.05555563). The

question then is what is the needed sample size n such that α ≅ 0.05 but the

power of test is at least 0.80 when p = 0.10, i.e., we wish to develop a test whose

OC curve approximately goes through the points (p = 0.06, 0.95) and (p = 0.10, β =

0.20). Obviously, the requisite sample size n will far exceed 15 and hence we can justify using the normal approximation to the binomial. The sample size n is determined by solving the following two equations with two unknowns (I will draw the pictures during class discussions):

p = 0.06, β = 0.95 → AU = 0.06 + 1.64485× 0.06 0.94 / n×

p = 0.10, β = 0.20 → AU = 0.10 − 0.84162× 0.10 0.90 / n×

Subtracting the 1st equation from 2nd yields 0 = 0.04 − 0.6431174631/ n

→ n = 0.6431174631/0.04 =16.07793657656 → nmin = 259 → AU = 0.0842726351

→ Acceptance Region = AR = {0, 1, 2, 3, …, 21, 22}.

The reader must be cognizant of the fact that the above answer is an approximation, specially because the binomial parameter p is far away from 0.50

6

6

and hence the normal approximation is only moderately accurate at best. The exact solution can be obtained using the binomial distribution through trial/error. My spread sheet (Example 1) shows that the exact answer to the above sample

size determination is to use n = 265, α = 0.049666787 and β = 0.208886841.

1.5 Confidence Interval for the Binomial Parameter p Again unless otherwise specified, the most common confidence level, 1−α, is

0.95, although 90% and 99% CIs are fairly common. The problem that arises is the fact that p is an STB (smaller-the-better) type QCH (quality characteristic) and thus it is always desired to reduce FNC (fraction nonconforming). Consequently, the CI that relates the most valuable information to a quality engineer is of the upper one-sided type (0, pU), although two-sided CIs (pL, pU) are fairly common. It will be illustrated in this course that a right-tailed test, as in Example 1, corresponds to a lower one-sided CI and vice a versa. However, obtaining a 95% CI such as (pL, 1) for p is almost useless because it provides little information on machine quality. Only for the sake of consistency, we will first develop a 95% lower one-sided CI for p and will discuss the consequences. By the 95% CI on p we mean a range of hypothesized values of p such that the test statistic TObs (=3 in this case) will result in accepting H0. That is, we are

seeking an interval such that the Pr(pL ≤ p

7

7

0.05 is pL = 0.0569. Thus, we are 95% confident that the machines FNC lies in the

interval 0.0569 ≤ p < 1. This implies that if we hypothesize H0: p = 0.06 vs H1: p >

0.06, then this null hypothesis cannot be rejected at the 5% level of significance because the hypothesized value of p = 0.06 lies inside the 95% CI = [0.0569, 1). Now consider testing the null hypothesis H0: p = 0.04 versus the alternative

H1: p > 0.04, given the sample result TObs = 3 NCUs. Then α = Pr(T ≥ xL| p = 0.04)

≤ 0.05 and the closest 5%-level test that can be constructed is AR = {3, 4, 5, 6, …,

14, 15} with the exact size of the critical region α = 0.020292. However, now the

value of our test statistic TObs = 3 lies inside the rejection region and hence we can reject H0 at the 2.03% level and conclude that p > 0.04. Our 95% CI = [0.0569, 1) confirms the same conclusion because it excludes the hypothesized value of p = 0.04. Suppose now we construct the 95% upper one-sided CI = (0, pU], assuming TObs = 3. The question now is how large p must become such that the p-value =

Pr(T ≤ 3 ⎢p ↑ pU) ≥ 0.05; this in turn will assure us with 95% confidence pr that

the true value of p lies within the interval (0, pU]. The Excel file on my website shows that the exact solution to Pr(T ≤ 3 ⎢p ↑ pU) ≤ 0.05 is pU = 0.4397. Thus we

are 95% confident that the machines FNC lies in the interval (0, 0.4397]; however, this 95% CI is inconsistent with our test result because the 95% CI =(0, 0.4397] includes the hypothesized value of p = 0.04, while our test statistic had rejected H0: p = 0.04. As stated earlier, a right-tailed test corresponds to a lower one-sided CI and vice a versa. In other words, the upper one-sided 95% CI =(0, 0.4397] corresponds to testing H0: p = 0.04 versus the alternative H1 : p < 0.04. Before we obtain the rejection region for the alternative H1 : p < 0.04, it is clear that the 95%

CI = (0, 0.4397] includes p ≥ 0.04, and hence we cannot reject H0, concluding that

p < 0.04. In order to conduct testing of H0: p ≥ 0.04 versus the alternative H1 : p <

0.04, we first observe that small values of T are congruent with rejecting H0, and

thus the 5%-level test consists of the rejection region AR = {0, 1, 2, 3, … xU},

where xU must be obtained in such a manner that the Pr(T ≤ xU⎪p = 0.04) ≤ 0.05.

8

8

The Excel file on my website shows that such an xU does not exist because even

if we let xU = 0, Pr(T = 0⎪p = 0.04) = 0.54208638 > > 0.05. My Excel file shows that

we need a sample size n ≥ 74 in order to construct a 5%-level test for H0: p ≥ 0.04

versus the alternative H1 : p < 0.04. Just for the sake of illustration, let’s test H0: p = 0.30 versus H1:p < 0.30, which we know a priori that the sample n = 15 and TObs = 3 cannot reject H0 because p = 0.30 also lies inside the 95% CI = (0, 0.4397]. However, a 5% lower-

tail test now can be constructed because Pr(T ≤ 1⎪p = 0.30) = 0.0352676 < 0.05.

Because TObs = 3 is not inside the AR = {0, 1}, then H0 : p ≥ 0.30 cannot be rejected; therefore, we cannot conclude at the 5% level that p < 0.30.

1.6 The Quantile (or Percentile) Test The pth quantile of a rv (random variable) X, denoted by xp, is a value in the range space of X that satisfies exactly one of the following two conditions:

(1) Pr(X < xp) < p and Pr(X > xp) < 1 − p, or

(2) Pr(X < xp) = p and Pr(X > xp) = 1 − p.

We may collapse the above two conditions into Pr(X < xp) ≤ p and Pr(X > xp) ≤

1 − p, bearing in mind that in any specifically individual case only one of the

above two must be satisfied.

Example 2. The following data represent the diameters of trees, X, used for wood production (i.e., X is an LTB type QCH) in order statistics format: 33.1″, 34.8,

35.7, 36.9, 39.8, 40.2, 40.2, 41.1, 41.6, 41.7, 43.0, 44.0, 47.8, 48.0, 50.9, 52.8, 54.8,

55.6, 60.7. (a) Estimate the 80th percentile (or the 0.80 quantile) of X denoted by x0.80.

0.80×n = 0.80×19 = 15.2 ↑ (16) → the point estimate of x0.80 is given by the

value of the 16th order-statistic → 0.80x̂ = x(16) = 52.8″

We now check to see if this point estimate satisfies the 1st of the above definition:

Pr (X < 52.8) = 15/19 = 0.7895 < 0.80 and Pr (X > 52.8) = 3/19 = 0.1579 < 1 − 0.80.

9

9

Note that if the sample size n were equal to 20, say that the 20th sample

order-statistic were 61.2″, then 0.80×n = 0.80×20 = 16 → the point estimate of x0.80

would be given by the average of the 16th and 17th order statistics → 0.80x̂ = [x(16) +

x(17)]/2 = (52.8+54.8)/2 = 53.80. Now the Pr (X < 53.8) = 16/20 = 0.80 = p, and P̂r (X > 53.8) = 4/20 = 0.20 = 1 − p.

(b) Do the data provide sufficient evidence, at the 5% level, to conclude that x0.80 exceeds 41.7, i.e., test H0 : x0.80 ≤ 41.7 versus H1: x0.80 >41.7′′ . In order to

conduct a test on a quantile (or percentile), our first task is to convert to a

binomial test. To this end, we let p = Pr(X ≤ 41.7); by definition Pr(X < x0.80) ≤ 0.80

and Pr(X > x0.80) ≤ 1 − p = 0.20. This last inequality implies that Pr(X > x0.80) =

1 − Pr(X ≤ x0.80) ≤ 0.20 → Pr(X ≤ x0.80) ≥ 0.80 → under H0, Pr(X ≤ 41.7) ≥ 0.80 →

H0 : p ≥ 0.80 versus H1: p < 0.80. Let T represent the number of trees with X

values ≤ 41.7; then small values of T reject H0, where the range space of T is RT =

{0, 1, 2, 3, …, 19}. The rejection region of size 0.05 consists of the sum over x values whose binomial Prs from zero to Ru does not exceed 0.05, i.e.,

UR x 19 x

x 0

190.80 (0.20)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ ≤ 0.05 → RU = upper rejection limit = 11 →

11x 19 x

x 0

190.80 (0.20)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ = 0.023278312 = α; again because the range space of T is

discrete, it is impossible to create an exact 5%-level test. If we let AR = {0, 1, 2,

…, 11, 12}, then α =12

x 19 x

x 0

190.80 (0.20)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ = 0.067600066 > 0.05. Since TObs = 10,

then we have sufficient evidence at the 5% level to reject H0. We now define the P-value, or Pr level of a statistical test; this is the smallest LOS after sampling at which H0 can be rejected using the computed test statistic, assuming H0 is true. For the Example 2, the p-value (also referred to as

the critical level) is given by p-value = Pr(T ≤ 10 ⎢p = 0.80) = 10

x 19 x

x 0

190.80 (0.20)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ = 0.006657655. Once the p-value is computed, the null

10

10

hypothesis is rejected iff it is less than or equal to the LOS, α, of the test. This

leads to the following general procedure for testing any statistical hypothesis.

Step 1. Set up the null and alternative hypotheses. Always put the ? mark under H1. The status quo is stated under H0 and what is to be proven under H1.

Step 2. If the test is one-sided, determine whether small values of T reject H0, or large values of T reject H0. If small values of T reject H0, then the p-value =

Pr(T≤ TObs ⎢H0 is true); If large values of T reject H0, then the p-value = Pr(T ≥ TObs ⎢H0 is true). If the p-value < α (generally 0.05), then reject H0. If the p-value <

0.01, then strongly reject H0.

Step 3. If the test is two-sided, determine if the value of TObs lies below the median of T or above T0.50. If TObs < T0.50, then the p-value = 2×Pr(T≤ TObs ⎢H0 is

true); If TObs > T0.50, p-value = 2×Pr(T ≥ TObs⎢H0 is true). Again, if the computed p-

value < α (= 0.05 unless otherwise specified), then reject H0. If the p-value < 0.01,

then strongly reject H0. For our example 2 the p-value = 0.006657655 < 0.01, then strongly reject H0 and conclude that x0.80 > 41.7.

1.7 Obtaining a 95% CI for the pth Quantile (xp = Qp) As the continuation of Example 2, we illustrate the procedure by obtaining an approximate 95% CI for the 80th percentile, x0.80, of tree diameters. Our objective is to obtain the order statistic of rank r (or rth order-statistic) such that

Pr[x(r) ≤ x0.80 < ∞] ≥ 0.95 → Pr[x(r) > x0.80] ≤ 0.05

The inequality x(r) > x0.80 holds true iff at most (r − 1) of the X’s are less than or

equal to x0.80, i.e., Pr[x(r) > x0.80] = Pr[at most (r − 1) of the X’s ≤ x0.80]. Define the

statistic T as the number of observed X’s ≤ x0.80. Thus, T has the b(x;n = 19, p =

0.80) SMD, and as a result, the Pr[x(r) > x0.80] = Pr[at most (r − 1) of the X’s ≤ x0.80] =

Pr(T≤ r − 1⎢p = 0.80) = r 1

x 19 x

x 0

190.80 (0.20)

x

−−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ ≤ 0.05. The spreadsheet on my

11

11

website (Under Example 2) shows that the value of r − 1 must equal to 11 so that

the rank of the requisite order-statistic is r = 12. Hence, the approximate 95% CI is

given by x(12) ≤ x0.80 < ∞ → 44 ≤ x0.80 < ∞. The exact value of the confidence

coefficient (or confidence level) is equal to 1 − α = 1 − 0.023278312 = 0.9767217.

Example 3. The data 81, 86, 93, 98, 103, 103, 117, 119, 119, 122, 128, 131, 134, 137, 142, 144, 154, 158, 161, 165 represent times to failure of certain hard-drives measured in units of 1000 hours. Our objective is to perform statistical inference on the 3rd quartile (Q3 = x0.75) of time to failure (TTF).

(i) Point Estimation. n = 20 → np = n×0.75 = 15 → 0.75x̂ = (15) (16)x x

2+

→

0.75x̂ = 142 144

2+

= 143 thousand hours.

(ii) Interval Estimation (a 2-sided 95% CI for x0.75) We need to find order statistics x(r) and x(s) such that the Pr[x(r) ≤ x0.75 ≤ x(s)] ≥ 0.95.

To obtain the lower limit x(r), we must satisfy the inequality Pr[x(r) > x0.75] ≤ 0.025

because α/2 = 0.025. The inequality x(r) > x0.75 holds true iff at most (r − 1) of the

X’s are less than or equal to x0.75. Letting T represent the number of X’s ≤ x0.75, it

follows that Pr[x(r) > x0.75] = Pr(T ≤ r − 1⎢p = 0.75) = r 1

x 20 x

x 0

200.75 (0.25)

x

−−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ → r − 1 =

10 → r = 11→ The lower 95% CL = x(11) = 128 (1000-hours), where CL represents

confidence limit, and αL = 10

x 20 x

x 0

200.75 (0.25)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ = 0.013864417.

To obtain the upper 95% CL, we must satisfy the inequality Pr[x(s) < x0.75] ≤ 0.025.

The inequality x(s) < x0.75 holds iff at least s of the X’s are less than or equal to x0.75.

That is, Pr[x(s) < x0.75] = Pr(T ≥ s ⎢p = 0.75) ≤ 0.025 → 1 − Pr(T ≤ s − 1 ⎢p = 0.75) ≤

0.025 → Pr(T ≤ s − 1 ⎢p = 0.75) ≥ 0.975 → s 1

x 20 x

x 0

200.75 (0.25)

x

−−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ ≥ 0.975 → s − 1 =

12

12

18 → s = 19 →The upper 95% CL = x(19) = 161 (1000-hours), where αU = 20

x 20 x

x 19

200.75 (0.25)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ = 0.0243126 → The approximate 95% CI for x0.75 is given by

128 ≤ x0.75 ≤ 161, having the exact confidence coefficient 1 − αL − αU = 0.961823.

We now use the normal approximation to the binomial distribution to approximate the order-statistic values r and s. Again the binomial rv T represents

the number of X’s ≤ 0.75; then E(T) = μT = np = 20×0.75 = 15 and the variance of T

is given by V(T) = npq = 20×0.75×0.25 = 3.75 → σT = 3.75 = 1.9365. Figure 1

clearly shows that the approximate value of the rank r 15 − 1.96×1.9365 =

11.2045 → r = 11 as before, and s = 15 + 1.96×1.9365 =18.7955 → s = 19 as

before.

(iii) Test of Hypothesis on x0.75 Suppose now we wish to test the null hypothesis that the 75th percentile of TTF is equal to 120 thousand-hours, i.e., we wish to test H0: x0.75 = 120 versus the 2-sided

alternative H1: x0.75 ≠ 120 at the LOS α = 0.05. Recall that the approximate 95% CI

was 128 ≤ x0.75 ≤ 161, and hence our testing procedure will reject H0 because the

null value of 120 is outside the corresponding 95% CI. In order to conduct the

15

1.9365

.025 .025

r T s

Figure 1. Normal Approximation to the SMD of T

13

13

test, we let T = the number of X’s less than or equal to the hypothesized value 120

→ TObs = 9 < np = 15 → Lα̂ = Pr(T ≤ 9 ⎢p = 0.75) = 9

x 20 x

x 0

200.75 (0.25)

x−

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ =

0.003942142 → Uα̂ = Pr(T = 20⎢p = 0.75) = 0.00317121 → α̂ = Lα̂ + Uα̂ = 0.0071134 <

< 0.01 → Strongly reject H0, as expected.

1.8 Nonparametric Statistical Tolerance Limits For illustrative purposes, it is best to first discuss the parametric tolerance limits,

i.e., the underlying distribution is N(μ, σ2), where for part (a) both the process mean μ and variance σ2 are known; (b) both μ and σ2 are unknown, and in part (c) no assumption is made about the underlying distribution (i.e., the distribution-free or the nonparametric case).

(a) Suppose the injury rate, X, is N(50 injuries per 100000 work-hours, 4); we wish to obtain the 95% tolerance limits such that Pr(L0.95 ≤ X ≤ U0.95) = 0.95. Using the

0.975 quantile of the standard normal distribution (z0.025 = 1.96) yields the values

of L0.95 = 50 − 1.96×2 = 46.08, and similarly U0.95 = 53.92. The 95% tolerance

interval (46.08, 53.92) means that we are 100% certain (because both μ & σ are

known) that 95% of all injury rates (IRs) lie within this last tolerance interval.

(b) Suppose now the normal population mean μ and population standard deviation σ are unknown and have to be estimated from a simple random sample

of size n by x and S = n

2i

i 1(x x) /(n 1)

=− −∑ , respectively. Then the (γ, p = 1 − α)

tolerance interval is given by x ± kα/2×S, where γ represents the confidence level

with most common values of 0.95 and 0.99, p = 1 − α represents the minimum

proportion of the population captured (or covered), and kα/2 is called the tolerance

factor whose value for a given γ depends on the proportion p captured and the

sample size n. As an example, consider the case (a) above but assume that we do not know μ and σ, but a random sample of size n = 15 yields the average IR x =

14

14

51.3 injuries/100000 hours and S = 2.91. Our objective is to obtain the (0.99, 0.95)

tolerance interval x − kα/2×S ≤ X ≤ x + kα/2×S for IR that contains at least 95% of the population at a confidence level of 99%. Table 5.4 of H-C Chiu shows that k0.025 = 3.50731 (p = 0.95) so that the requisite tolerance interval is (41.0937, 61.5063). Thus, we are 99% confident that at least 95% of IRs lie within the tolerance interval (41.0937, 61.5063). Note that the standard deviation of a process basically determines the length of a tolerance interval which is given by

TIL (tolerance interval length) = 2kα/2×S. In other words no matter how large n is, a

TIL can never be reduced to zero while a confidence interval length (CIL), which is

given by 0.025x t S / n± × , approaches zero as n → ∞. For n > 50, the value of

kα/2 can be closely approximated from kα/2 = [1 +1/(2n)]zα/2× ;n 12(n 1) /γ −

− χ ,

where ;n 1

2γ −

χ is the (1 − γ) quantile of chi-square distribution with (n − 1) degrees

of freedom (df). We illustrate the application of this last formula by approximating

the exact value k0.025 = 3.50731 from H-C Chiu: k0.025 ≅ (1 +1/30)×1.96× 0.99;14214 /χ

= (1 +1/30)×1.96× 14 / 4.6604 = 3.5103 > k0.025 from Chiu, which is a bit

conservative because this approximation will lead to a wider tolerance band.

(c) The Distribution-Free Tolerance Intervals The objective is to determine the sample size n and ranks r and (n+1−m) such

that the

Pr[x(r) ≤ at least the fraction p of a population ≤ x(n+1 − m)] ≥ γ (1)

The most common values of r and m are 1 in which case we wish the sample spread to contain at least a proportion p (note that W. J. Conover uses q for the

proportion p) of the population with γ×100% confidence probability. We first

provide an approximate solution for the requisite sample size n for a specified p,

r, s, and γ. n ≅ 0.25×1 ;2(r m)2−γ +

χ ×(1+p)/(1 − p) + (r + m − 1)/2 (2)

15

15

As an example, suppose we wish to determine the needed sample size in a human population whose sample spread, [x(1), x(n)], covers at least 95% of the population for their % body-fat content, X, with 99% certainty, i.e., we wish to

obtain the tolerance interval such that the Pr[x(1) ≤ at least the fraction 0.95 of the

human population ≤ x(n)] ≥ 0.99 → r =1, m =1, p = 0.95, 0.01;42χ = 13.2767, and Eq.

(2) yields n = 0.25×13.2767×(1.95)/0.05 + (1+1 − 1)/2 ≅ 129.9478 → nmin = 130. Suppose a random sample of size 130 gives x(1) = 5.7% body-fat and x(130) = 22.9%. Thus we are 99% confident that at least 95% members of the target

human population have their body-fats within the tolerance interval 5.7≤ X ≤

22.9%.

Eq. (2) can be rearranged and p can be solved for a given n, r, m, and γ and

the result is shown below [see W. J. Conover, p. 151, Eq. (2)]

p = 1 ;2(r m)

1 ;2(r m)

2

2

4n 2(r m 1)

4n 2(r m 1)−γ +

−γ +

− + − − χ

− + − + χ (3)

In the above example, suppose we take a random sample of size n = 100 from the target human population and we wish to know at least what proportion of the population is contained within the sample spread [x(1) % body-fat, x(100)] with 99%

certainty? Substitution into Eq. (3) yields p = 400 2 13.2767400 2 13.2767

− −− +

= 0.9354.

If the rv, X, is an STB or LTB (larger-the-better) type such as IR and job efficiency, respectively, then it will be more appropriate to obtain a one-sided tolerance interval. In the case of an STB QCH, we need to determine n such that the

Pr[at least the proportion p of a population ≤ x(n+1 − m)] ≥ γ (4a)

where the most common value of m = 1. For an LTB type QCH, our objective is to find n such that

Pr[at least the proportion p of a population ≥ x(r)] ≥ γ (4b)

where again the most common value of r = 1.

For the most common cases of r = m = 1 because Eq. (4a) is true iff Pr[xp ≤

x(n)] ≥ γ and (4b) holds iff Pr[x1 − p ≥ x(1)] ≥ γ, it can be shown that the solution for

16

16

both cases is

n = ln(1 )

ln(p)− γ

= 10

10

log (1 )

log (p)

− γ, (only when r = m =1) (4c)

As an example, suppose we wish to obtain an upper one-sided tolerance interval

for IR such that Pr[at least 95% of the target population ≤ x(n)] ≥ 0.99; thus the

requisite n is given by n = ln(0.01)/ln(0.95) = 89.7811 = log10(0.01)/log10(0.95) →

the minimum sample size required is nmin = 90. Suppose we sample n = 90 IRs and obtain the 90th order-statistic to be 65 injuries/100,000 hours-worked. Then we can claim we are 99% confident that the upper 95% tolerance limit for IR is 65. It is interesting to note that Eq. (2) can also be used to obtain the solution given

in (4c). For this last example, r = 0, m = 1, and n ≅ 0.25×1 0.99;2(1 0)2− +

χ ×(1+0.95)/(1

− 0.95) + (0 + 1 − 1)/2 = 0.25×9.2103×(1.95)/0.05 = 89.8004 → nmin = 90 as before.

1.9 The Sign Test As W. J. Conover (1999, 3rd edition) states on his pp. 157-8 that the bivariate data have to be in at least an ordinal scale. To illustrate the application of the sign test, consider the data of Exercise 8 on page 177 of W. J. Conover (1999), where there are 60 bivariate data points, and we divide them into two subgroups below and above the median income. Thus, 30 individuals had income below the

median 0.50x̂ = $25941.50 and 30 above the median income. Let the statistic T

represent the number of families beyond the median income whose no. of

children X(30 +i) > X(i), i = 1, 2, 3, …, 30 =n′ . However, there are 4 ties in the data so that the number of Bernoulli trials is n = 26. Thus our null hypothesis is H0 : p

= 0.50 versus H1: p ≠ 0.50, and under H0 T has a b(x;n = 26, p = 0.50) SMD. The

observed value of T is TObs = 10. Because TObs = 10 < np = 26×0.50 = 13, the p-

value = 2Pr(T ≤ 10| p =0.50) = 2B(10; 13, 0.50) = 2x 10

30

x 0

26(0.50)

x

=

=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ = 2× 0.1634698

= 0.326939583 > 0.05 → Insufficient evidence to reject H0, i.e., the correlation

between X & Y is not significant. Minitab provides an answer of p = 0.3269. The

17

17

sign test when used to test for trend is generally refereed to as the Cox-Stuart test for Trend (see pp. 170-176 of W. J. Conover). The most common application of the sign test occurs in the before-after design such as testing for effectiveness of a certain diet to reduce weight (see Exercise 1 on p. 164 of

Conover, where n = 6). The null hypothesis is H0: μX = μY versus the alternative

H1: μY < μX. Letting T represent the number of bivariate pairs where Y < X, then

TObs = 5. Thus, the p-value = Pr(T≥ 5| p = 0.50) = 1− B(4; 6, 0.50) = 1− 0.8906250 =

0.1093750 > 0.05 → Insufficient evidence to reject H0. If the TObs had been equal

to 6, then p-value = (0.5)6 = 0.0156250 < 0.5, then we would have sufficient evidence to reject H0 and conclude that the diet had been effective.

1.10 The McNemar Test for Significant Changes for Before to After This test is a variation of the sign-test where data are taken in the form of a 2×2

contingency table, and the cells (0, 0) and (1, 1) indicate no changes from before to after treatment, while the cells (0, 1) and (1, 0) show the number of changes in preferences from before to after. For the sake of illustration, suppose a social scientist is interested to determine if certain treatment (such a film, video, etc) about juvenile delinquency would change a community’s opinion about how severely delinquents should be punished. Accordingly, a random sample of n = 120 adults is selected from the community and he conducts a before-and-after study, where each subject in the sample acts his/her own control. Thus, the null hypothesis is that the treatment has no effect versus the alternative that the film does impact the community’s opinion toward the level of severity. Out of the 120 individuals in the sample, 80 favored more punishment for juvenile delinquency and 40 favored less, while after watching the film only 14 of the 80 favored more punishment and 31 out of the 40 who favored less before changed to more punishment after viewing the film. Table 3 depicts the survey’s result, which

clearly shows that there were 14 (1→1) and 9 (0 → 0) adults with no opinion

changes, i.e., there were 14+9 = 23 ties, which will be kept out of the analysis

altogether. However, out of the remaining 97 adults there were 66 (1→0) and 31

(0→1) changes. I will first compute the exact p-value of the test, and then will

18

18

Table 3. McNemar’s test for a Before-After Study After Before

Favored more punishment (1)

Favored less punishment (0)

Total Before

Favored more punishment (1)

14 = n11 (ties)

66 (=TObs) = n10

E12 =0.5×97

80

Favored less punishment (0)

31 = n01 E21 =0.5×97= 48.5

9= n00 (ties)

40

Total After 45 75 120

use the normal approximation to the binomial to compute the p-value. We

basically have n = 66 + 31 = 97 Bernoulli trials where the event (1→0) has

occurred 66 times and we must ascertain whether such an occurrence is strictly due to chance (random) variation, or there is an underlying shift in opinion.

We may state the null hypothesis H0: Pr(1→0) = Pr(0→1) = p = 0.50 versus the 2-

sided alternative H1: p = Pr(1→0) ≠ 0.50. Using the binomial pmf, the precise p-

value of the test is given by p-value = 2×Pr(T ≥ 66 ⎢p = 0.50) = 2×[1 − Pr(T ≤ 65 ⎢p

= 0.50)] = 2×[1 −B(65;97, 0.50)] =2×[1 −65

9797 x

x 0C (0.5)

=∑ ] = 2×(1 − 0.99975492102) =

0.00049015795 < 0.01 → Strongly reject H0 and conclude that the treatment had a

significant impact in changing the community’s opinion toward punishment severity for juvenile delinquents. We now compute the p-value of the above test using the normal approximation to the binomial. Because under H0 the binomial mean is np =

97×0.50 = 48.5 and variance is npq = 97×0.25 = 24.25 → σT = 4.92443, then p-value

= 2×Pr(T ≥ 66 ⎢p = 0.50) = 2×Pr( μσ

TT

T -≥

.4 9244366 - 48.5 ) ≅ 2×Pr(ZN(0,1) ≥ 3.553711578) =

2×Φ( −3.553711578) = 2×0.0001899177645 = 0.000379835529, which is not very close to the exact p-value = 0.00049015795 due to the fact that the binomial pmf has a discrete range space and the measurement for the normal distribution

19

19

must be at least in the interval scale. Therefore, we need to correct TObs = 66 for continuity; because on a continuous scale, the integer 66 is represented by the

interval (65.5, 66.5), then p-value = 2×Pr(T ≥ 66 ⎢p = 0.50) = 2×Pr( μσ

TT

T -≥

.4 9244365.5 - 48.5 ) ≅ 2×Pr(ZN(0,1) ≥ 3.452177) = 2×Φ( − 3.452177) = 2×0.00027804145 =

0.000556082894. This approximation is now acceptable because the stated LOS is actually larger than the exact p-value = 0.000490158 and hence is conservative. From statistical theory, it is well known that if Z ~ N(0, 1), then Z2 has a chi-

square distribution with 1 df, i.e., Z2 ~ 12χ . W/O correction for continuity (cfc), p-

value ≅ 2×Pr(Z ≥ 3.553711578) = Pr[ 12χ ≥ (3.553711578)2] = Pr( 1

2χ ≥ 12.62886598)

= 0.00037983553 as before. It turn out that the value of 12χ for Table 3 W/O cfc is

simply 12χ = +

2 2(66 - 48.5) (31- 48.5)48.5 48.5

= 2×2(66 - 48.5)

48.5 =

210 01

10 01

(n -n )n +n

=

2(66 - 31)97

= 12.62886598. While with the cfc, the above chi-square changes to

12χ = 2×

2(66 - 48.5 - 0.50)48.5

= 2

10 01

10 01

(n -n -1)n +n

= 342

97= 11.9175257732, which

results in the p-value ≅ Pr( 12χ ≥ 11.9175257732) = 0.000556083, again as before.

1.11 Before-and-After Designs For Number of Injuries As an example, see pages 102-104 of L. S. Robson et al. (Guide to Evaluating the Effectiveness of Strategies for Preventing Work Injuries) where there were 28 injuries during 40000 employee hours; after a safety program intervention (such as the Implementation of a behavioral based safety program, Lifting

training/education, Establishment of safety committee, Guarding evaluation and upgrade, etc; these examples were provided by Dr. Jerry Davis) there were 22 injuries during 60000 hours. Since the unit of employee-hours-worked is 100,000

20

20

hours = 1 unit, we first convert the two observed number of injuries to the proper unit. Let IRB represent the (average) injury rate before intervention; then

BˆIR = 28/(40000 hours) = 28 100000 hours

40000 hours one unit× = 70 per unit. Similarly,

the average IR for after intervention is given by AˆIR =

22 100000 hours60000 hours one unit

× = 36.66667 per unit. We first examine the difference

in IRs followed by their ratios.

(1) The point estimate of the rate difference is given by ˆ DR = ˆ BIR − ˆ AIR =

70−36.66667 = 33.3333 . Note that the sample sizes for these last two point

estimates are nB = 0.40 unit and nA = 0.60 unit. The question now is “has the

intervention been effective?, i.e., is the sample rate difference of 33.3333

significantly larger than zero?” Although the authors do not highlight this question, from a statistical standpoint this leads to a one-sided test of hypothesis H0 : IRB = IRA ↔ H0 : RD = 0 versus the alternative H1: IRA < IRB ↔ H1

: RD > 0, where the alternative states that injury rate has been reduced due to the intervention. Because occurrence of injuries during 100000 hours is a rare event, I surmise that the number of injuries per unit is Poisson distributed and

hence both its mean and variance are equal to the Poisson parameter μ, where

the pmf is given by

Pr(x) = x

ex!

−μμ , x = 0, 1, 2, 3, …

and the rv X represents the number of injuries in one unit time interval (i.e., 100000 hours). From statistical theory, the limiting distribution of Poisson is the

Gaussian N(μ, μ); further, a linear combination of any two normal deviates, such

as X1 − X2, is also normally distributed with E(X1 − X2) = μ1 − μ2 and variance V(X1

− X2) = 21σ + 22σ − 2σ12, where σ12 = Cov(X1, X2) = E[(X1 − μ1)×(X2 − μ2)]. Because

the two IRs during before and after are independent, then Cov( ˆ BIR , ˆ AIR ) = 0 and

21

21

hence the point estimate of V( ˆ DR ) = V( ˆ BIR − ˆ AIR ) = V( ˆ BIR ) + V( ˆ AIR ) = B AB A

μ μ+n n

is given by v( ˆ BIR − ˆ AIR ) = v( ˆ BIR ) + v( ˆ AIR ) ≅ ˆ ˆ

B A

B A

IR IR+n n

= 70 36.66667+

0.40 0.60 =

236.1111 → se( ˆ BIR − ˆ AIR ) = se( ˆ DR ) = 236.11111 = 15.36591 per 100,000

hours-worked. Now that we have estimated the standard error (se) of

ˆDR = ˆ BIR − ˆ AIR , we can assess its significant difference from zero by

standardizing it as Z0 = ˆ ˆ

ˆ ˆ )B A

B A

(IR - IR ) - 0(IR - IRse

= 33.333315.36591

= 2.1693046, where Z0

denotes the test statistic under H0: RD = IRB − IRA = 0 → p-value ≅ Pr(ZN(0, 1) ≥

2.1693046) = 0.01503 < 0.05 → Reject H0 and conclude that the safety program

intervention has been effective at the 5% level.

We can also conduct the above test of hypothesis using 2df 1=

χ by

computing the expected frequencies under H0: RD = IRB − IRA = 0. Under H0: IRB =

IRA out of the total of 28 + 22 = 50 injuries, the expected frequency for “Before” is

given by EB = 50×(40000/100000) = 20 and thus EA = 50−20 = 30. As a

consequence, the chi-square statistic is given by 22

2 i0

i 1 i

E )E=

−χ =∑ i(n =

2(28 - 20)20

+

2(22 - 30)30

= 5.3333→ p-value =0.50×Pr( 21

χ ≥ 5.3333 ) =0.50×0.02092133534 =

0.01046067, which is smaller (less conservative) of the p-value from the normal approximation to the Poisson. The reason behind the incongruence of the two p-values is the fact that both are approximations, one of which (the normal) has not been corrected for continuity and is more conservative than the other. If both

were exact test statistics, then for certain 20Z = (2.1693046)2 = 4.70588244 would

equal to 22

2 i0

i 1 i

E )E=

−χ =∑ i(n 5.3333 . When in doubt, it is statistically prudent to

state the p-value that is more conservative, i.e., the p-value in this case should

22

22

be stated as 0.01503. Because I conducted a right-tailed test (H1: IRA < IRB ↔

H1: 0 < IRB −IRA ↔ H1: RD > 0), then we need to develop a lower one-sided 95 % CI

for the parameter RD. The approximate lower CL is (RD)L = ˆ DR −z0.05×se( ˆ DR ) =

33.33333 −1.64485×15.36591 = 8.0586648→ 8.0586648 ≤ RD < ∞, which excludes

zero consistent with rejecting the null hypothesis H0: RD = 0. It is paramount to

note that the 99% lower CL = ˆ DR −z0.01×se( ˆ DR ) =33.33333 −2.32635×15.36591 =

−2.413113 gives the CI = (−2.413113, ∞) that includes zero because we could not

reject H0: RD = 0 at the 1% level (p-value ≅ 0.01503 > 0.01).

We can also measure the effective strength of the safety intervention by

computing the rate ratio ˆ RR = ˆ AIR / ˆ BIR = 36.666667/70 = 0.52381, implying that IR

has diminished by as much as 47.619%. However, the statistical properties of a

rate ratio ˆ RR are not at all as nice as a rate difference ˆ DR . There are two major

reasons: (1) If the rate Before, ˆ BIR , is zero albeit rare, clearly the corresponding

ˆRR cannot be computed.

(2) Even if both ˆ BIR and ˆ AIR were normally distributed, their ratio ˆ RR is never

normally distributed. Even worse is the fact that the approach to normality of

ˆRR SMD is agonizingly slow (in general, at least n > 50 is required before the

normal approximation is fairly adequate). Nearly always a ratio estimator is

biased and its SMD is positively skewed (i.e., α3 = E[⎛ ⎞⎜ ⎟⎝ ⎠

3X -μσ

] > 0). We can

alleviate the approach to normality somewhat by taking the natural logarithm of

ˆRR , which is a variance-reduction transformation plus the fact that ln( ˆ RR )

approaches normality more rapidly that ˆ RR itself. For convenience, let Y =

ln( ˆ RR ); then form statistical literature for large n, Y is approximately normally

distributed with approximate mean ln(RR) and approximate variance that I will compute below.

23

23

V(Y) = V[ln( ˆ RR )] = V[ln( ˆ AR / ˆ BR )]= V[ln( ˆ AR ) − ln( ˆ BR )] = V[ln( ˆ AR )]+V[ln( ˆ BR )]. (5)

It is impossible to compute an exact variance for ln( ˆ AR ) even if the exact

variance of ˆ AR were known, unless ˆ AR is lognormally distributed. Thus, what

follows is a rough approximation. We first write the Taylor expansion of ln( ˆ AR )

about the mean of ˆ AR , denoted by μA = E( ˆ AR ).

ln( ˆ AR ) = ln(μA) + ( ˆ AR −μA) A′ μf ( ) + ( ˆ AR −μA)2

A′′ μf ( ) /2! + …, where A′ μf ( ) =

d ˆln( )ˆd AAR

R⎢μA, A′′ μf ( )=

2d ˆln( )ˆd A2AR

R⎢μA, etc. Because A′ μf ( ) =

d ˆln( )ˆd AAR

R⎢μA =

1/μA and ′′ ξf ( ) =−1/ 2Aμ , the above Taylor expansion reduces to

ln( ˆ AR ) = ln(μA) + ( ˆ AR −μA)/μA − ( ˆ AR −μA)2/(2 2Aμ ) + … (6)

We now apply the variance operator V to both sides of Eq. (6) but maintaining only the 1st two terms on the RHS of (6).

V[ln( ˆ AR )] ≅ V[ln(μA) + ( ˆ AR −μA)/μA] = 0 + V[( ˆ AR −μA)/μA] = 2Aμ1

V( ˆ AR ) = A2A

/ nμμ

A →

V[ln( ˆ AR )] ≅ A

1n μA

= 1

0.60 36.6666× =

122

; similarly, V[ln( ˆ BR )] ≅128

.

Inserting these into Eq. (5) yields V[ln( ˆ RR )] = V[ln( ˆ AR )]+V[ln( ˆ BR )] ≅ 122

+ 128

=

0.08116883 → se[ln( ˆ RR )] = 0.28490144. We could now proceed to obtain a 2-

sided 95% CI for RR = RA/RB as was done on page 104 of L. S. Robson et al., but we deviate from their procedure because RR = RA/RB should always be minimized (in the ideal case RA = 0 in which case the intervention program is 100% effective). Therefore, it is more informative to obtain an upper one-sided 95% CI

for RR. To this end, ln( ˆ RR ) +1.64485×se[ln( ˆ RR )] = ln(0.52381) + 1.64485×0.284901

= − 0.178006 → (RR)U = e−0.178006 = 0.8369374051 → 0 < RR ≤ 0.8369374051. This

95% CI excludes the hypothesized value H0 : RR = RA/RB ≥ 1, hence we can reject

H0: RR = 1 at the 5% level in favor of H1: RA/RB < 1.

Nonparametric Statistics INSY 8970 M06 Maghsoodloo 1.1 ...maghssa/INSY8970/Chapter1.pdfNonparametric Statistics INSY 8970 M06 Maghsoodloo 1.1 MEASUREMENT SCALES (a) The lowest measurement

Documents