-
Nonparametric Statistics INSY 8970 M06 Maghsoodloo 1.1
MEASUREMENT SCALES (a) The lowest measurement level is the Nominal
scale which is used just to separate elements into different
classes or categories. Examples are Success/Failure; Go/No-Go;
Grades A, B, C, D, F; License plates and SSN. If 10 machines are
assigned the numbers 1 through 10 merely for identification
purposes, then we are using a nominal scale for measurements
because we could have just as well assigned the names A, B, C, …, J
to the 10 machines.
(b) The second lowest measurement level is the ordinal scale
where only the or = comparisons amongst elements can be made;
ordinal scale simply consists of ranks. For example, if 4 types of
injuries can occur in a manufacturing plant but not all four are of
equal importance, we may assign rank 1 to the lowest priority, rank
2 to 2nd least important, rank 3, and then rank 4 to the highest
priority. Note that the scale here is still discrete and generally
a rank of 1 implies the least preferred.
(c) The first scale where measurements are continuous is the
interval, where not only relative order of two measurements are
discernable but also the size of their difference. The interval
scale does not have a unique well-defined zero-point (or origin)
and a well-defined unit distance, but it is possible to arbitrarily
define a zero-point and unit distance. A prime example of a
variable that is measured in an interval scale is temperature; note
that temperature has an arbitrary zero-point and arbitrary unit
distance. Other examples of (random) variables that are measured in
an interval scale is time and humidity (although I am not sure
about humidity).
(d) The highest level is the ratio scale which has all the
characteristics of an interval scale but it does have a
well-defined unique zero-point. Examples of variables that are
measured in a ratio scale are height, weight, distance, income,
yield, velocity, etc. Just like the interval scale, one unit is
arbitrarily defined.
-
2
2
1.2 Nonparametric Test A statistical test (or method) is
nonparametric if it satisfies at least one of the following two
criteria:
(i) The method can be used on an ordinal or lower scale. (ii)
There is no assumption made about the underlying distribution of
the
random variable that is being observed. A comparison of
parametric and nonparametric (or distribution-free) procedures is
outlined in Table 1.
Table 1. (SMD = sampling distribution) Parametric Methods
Nonparametric Methods
The parent population is known and generally assumed to be
Gaussian
No assumption is made about the underlying distribution
The scale has to be at least interval Any scale
The sample size n should exceed 20 n ≥ 6
Highest statistical power (or sensitivity)
Average relative efficiency is roughly 80%
The SMD of the test statistic depends on the underlying
distribution
The SMD of the test statistic does not depend on the underlying
distribution
In order to define the relative efficiency of a statistical
test, we must first introduce the four circumstances that may occur
in testing any statistical hypothesis, as shown in Table 2. In
testing a hypothesis, H0 is referred to as the null hypothesis and
H1 (or Ha) is called the alternative hypothesis. Nearly always, the
objective is to ascertain if the sample provides sufficient
evidence to reject the null hypothesis H0, tolerating certain
amount of risks.
Table 2 shows that the power (or sensitivity = 1 − β) of a
statistical test is simply the Pr (probability) of rejecting H0 if
H0 is false, while the specificity of a
statistical test is the Pr of accepting H0 when H0 is true given
by 1 − α. Further,
the overall error rate of a statistical test is given by α + β.
In general, the Pr of
-
3
3
Table 2. (Pr = probability) H0: No disease, H1: presence of
disease
H0 is true H0 is false
Reject H0
Event ≈ “ Type I Error, or
False Positive” Occurrence
Pr = α
True Positive → Correct
Decision, Sensitivity =
Occurrence Pr = 1 − β
Do not reject H0
(or accept H0 )
True Negative → Correct
Decision, Specificity =
Occurrence Pr = 1 − α
Event ≈ “ Type II Error, or
false negative”
Occurrence Pr = β
rejecting H0 when H0 is true, or the level of significance (LOS)
of a test, is a priori specified, and in most applications the LOS
is set at α = 0.05; in SQC the LOS of a control chart is set at
0.0027 (this is due to the fact that too many false alarms are cost
prohibitive; further a LOS of 0.0027 pertains to 3-sigma limits).
The third
most common LOS is α = 0.01. In any application, if no LOS is
specified prior to
experimentation, it is implied that the value of α is the
standard 5%.
A test for which 1− β ≥ α is said to be unbiased, and a
statistical test for which
the value of 1 − β → 1 as n → ∞ is said to be consistent. A
statistical test is conservative iff (if & only if) the stated
LOS exceeds the actual LOS. The OC (operating characteristic) curve
of a statistical test is simply the graph of
β versus the parameter under the null hypothesis, and the power
function gives
the graph of (1 − β ) versus all possible values of the
parameter under H0.
The relative efficiency (REF) of the statistical test T1 to T2,
having the
same LOS α, is given by n2/n1 such that both tests have
identical values of β. As
an example, if T1 requires a sample of size n1 = 20 and has α =
0.05, β = 0.10, but
T2 requires an n2 = 25 to attain the same α = 0.05 and β = 0.10,
then the efficiency
of T1 relative to T2 is given by 25/20 = 125%, or the REF(T2 to
T1) = 20/25 = 80%. On the other hand, if the 5% level tests T1 and
T2 both use the same random
-
4
4
sample of size n = n1 = n2 = 25, but β(T1) = 0.10 while β(T2) =
0.125, then the REF
of T1 to T2 is given by 0.125/0.10 = 125%.
1.3 The Binomial Test Example 1. A machine produces parts in a
manufacturing process and it is desired to test if its quality
level is worse than 6%, i.e., we wish to test H0 : p = 0.06 versus
H1: p > 0.06
at a prescribed LOS α = 0.05. A random sample of n = 15 yields
TObs = 3
nonconforming units (NCUs). We wish to determine if the sample
provides sufficient evidence to reject H0 (i.e., H0: the machine is
operating properly) in favor of H1: The machine needs adjustment
(and/or quality improvement =QI).
Throughout these notes, the random variable T denotes the test
statistic, and in this case T = the no. of defective units in 15
parts. Note that we have 15 Bernoulli trials and T has a binomial
Pr mass function (pmf) given by b(x;15, p) =
15Cx px qn−x , x = 0, 1, 2, …, 15 and q = 1 − p, where 15Cx = 15
15!
x!(15 x)!x⎛ ⎞
=⎜ ⎟ −⎝ ⎠.
Because large values of T are congruent with the rejection of
H0, then the
rejection (or critical) region of size α = 0.05 corresponds to
the set {15, 14, 13, ,…,
xL) such that the Pr(T ≥ xL| p = 0.06) ≤ 0.05. Because the
sample space for T is
discrete, it is nearly always impossible to generate an exact
5%-level test. The
Excel file (named Example 1) on my website shows that the Pr(T ≥
4| p = 0.06) =
0.010360, while Pr(T ≥ 3| p = 0.06) = 0.057133. Denoting AR as
the acceptance
region and AR as the rejection region, the size of the rejection
region 1AR = {4,
5, 6, …, 14, 15) is equal to 0.010360 while 2AR = {3, 4, 5, 6,
…, 14, 15} = 0.057133.
Because the observed value of TObs = 3, we could decide to
reject H0 but in so doing, the Pr of committing a type I Error is
0.057133. On the other hand, if we
cannot tolerate a type I error of size 0.057133 and would prefer
to have α =
0.010360, then the sample does not provide sufficient evidence,
at the 1.0360% level, to reject H0 and our decision would be not to
reject H0, in which case we may indeed be inflating the Pr of
committing a type II error. Both Prs of
-
5
5
committing a type II error, β, and that of rejecting H0, 1− β,
are functions of p, and
on the same Excel file, I have graphed the OC curves for both
tests (α = 0.010360
and α = 0.057133) with the corresponding power functions. The
upper
acceptance limit for α = 0.010360 is AU = 3 while for α =
0.057133 is equal to Au =
4. Note that a rejection region is also called a critical
region, denoted CR.
1.4 Sample Size Considerations for a Test of Hypothesis on a
Proportion In relationship to the Example 1, clearly n = 15 and AR
= {0, 1, 2, 3} provide a
0.01036-level test with type II error Pr of β = 0.9444444 at p =
0.10 (see my
spreadsheet under Example 1). In short, because the sample size
n = 15 is small the power of the test at p = 0.10 is equal to
0.05555563. Suppose now we are very unhappy with the overall error
rate at n = 15 and are willing to spend more
resources to have a more powerful test (1− β larger than
0.05555563). The
question then is what is the needed sample size n such that α ≅
0.05 but the
power of test is at least 0.80 when p = 0.10, i.e., we wish to
develop a test whose
OC curve approximately goes through the points (p = 0.06, 0.95)
and (p = 0.10, β =
0.20). Obviously, the requisite sample size n will far exceed 15
and hence we can justify using the normal approximation to the
binomial. The sample size n is determined by solving the following
two equations with two unknowns (I will draw the pictures during
class discussions):
p = 0.06, β = 0.95 → AU = 0.06 + 1.64485× 0.06 0.94 / n×
p = 0.10, β = 0.20 → AU = 0.10 − 0.84162× 0.10 0.90 / n×
Subtracting the 1st equation from 2nd yields 0 = 0.04 −
0.6431174631/ n
→ n = 0.6431174631/0.04 =16.07793657656 → nmin = 259 → AU =
0.0842726351
→ Acceptance Region = AR = {0, 1, 2, 3, …, 21, 22}.
The reader must be cognizant of the fact that the above answer
is an approximation, specially because the binomial parameter p is
far away from 0.50
-
6
6
and hence the normal approximation is only moderately accurate
at best. The exact solution can be obtained using the binomial
distribution through trial/error. My spread sheet (Example 1) shows
that the exact answer to the above sample
size determination is to use n = 265, α = 0.049666787 and β =
0.208886841.
1.5 Confidence Interval for the Binomial Parameter p Again
unless otherwise specified, the most common confidence level, 1−α,
is
0.95, although 90% and 99% CIs are fairly common. The problem
that arises is the fact that p is an STB (smaller-the-better) type
QCH (quality characteristic) and thus it is always desired to
reduce FNC (fraction nonconforming). Consequently, the CI that
relates the most valuable information to a quality engineer is of
the upper one-sided type (0, pU), although two-sided CIs (pL, pU)
are fairly common. It will be illustrated in this course that a
right-tailed test, as in Example 1, corresponds to a lower
one-sided CI and vice a versa. However, obtaining a 95% CI such as
(pL, 1) for p is almost useless because it provides little
information on machine quality. Only for the sake of consistency,
we will first develop a 95% lower one-sided CI for p and will
discuss the consequences. By the 95% CI on p we mean a range of
hypothesized values of p such that the test statistic TObs (=3 in
this case) will result in accepting H0. That is, we are
seeking an interval such that the Pr(pL ≤ p
-
7
7
0.05 is pL = 0.0569. Thus, we are 95% confident that the
machines FNC lies in the
interval 0.0569 ≤ p < 1. This implies that if we hypothesize
H0: p = 0.06 vs H1: p >
0.06, then this null hypothesis cannot be rejected at the 5%
level of significance because the hypothesized value of p = 0.06
lies inside the 95% CI = [0.0569, 1). Now consider testing the null
hypothesis H0: p = 0.04 versus the alternative
H1: p > 0.04, given the sample result TObs = 3 NCUs. Then α =
Pr(T ≥ xL| p = 0.04)
≤ 0.05 and the closest 5%-level test that can be constructed is
AR = {3, 4, 5, 6, …,
14, 15} with the exact size of the critical region α = 0.020292.
However, now the
value of our test statistic TObs = 3 lies inside the rejection
region and hence we can reject H0 at the 2.03% level and conclude
that p > 0.04. Our 95% CI = [0.0569, 1) confirms the same
conclusion because it excludes the hypothesized value of p = 0.04.
Suppose now we construct the 95% upper one-sided CI = (0, pU],
assuming TObs = 3. The question now is how large p must become such
that the p-value =
Pr(T ≤ 3 ⎢p ↑ pU) ≥ 0.05; this in turn will assure us with 95%
confidence pr that
the true value of p lies within the interval (0, pU]. The Excel
file on my website shows that the exact solution to Pr(T ≤ 3 ⎢p ↑
pU) ≤ 0.05 is pU = 0.4397. Thus we
are 95% confident that the machines FNC lies in the interval (0,
0.4397]; however, this 95% CI is inconsistent with our test result
because the 95% CI =(0, 0.4397] includes the hypothesized value of
p = 0.04, while our test statistic had rejected H0: p = 0.04. As
stated earlier, a right-tailed test corresponds to a lower
one-sided CI and vice a versa. In other words, the upper one-sided
95% CI =(0, 0.4397] corresponds to testing H0: p = 0.04 versus the
alternative H1 : p < 0.04. Before we obtain the rejection region
for the alternative H1 : p < 0.04, it is clear that the 95%
CI = (0, 0.4397] includes p ≥ 0.04, and hence we cannot reject
H0, concluding that
p < 0.04. In order to conduct testing of H0: p ≥ 0.04 versus
the alternative H1 : p <
0.04, we first observe that small values of T are congruent with
rejecting H0, and
thus the 5%-level test consists of the rejection region AR = {0,
1, 2, 3, … xU},
where xU must be obtained in such a manner that the Pr(T ≤ xU⎪p
= 0.04) ≤ 0.05.
-
8
8
The Excel file on my website shows that such an xU does not
exist because even
if we let xU = 0, Pr(T = 0⎪p = 0.04) = 0.54208638 > >
0.05. My Excel file shows that
we need a sample size n ≥ 74 in order to construct a 5%-level
test for H0: p ≥ 0.04
versus the alternative H1 : p < 0.04. Just for the sake of
illustration, let’s test H0: p = 0.30 versus H1:p < 0.30, which
we know a priori that the sample n = 15 and TObs = 3 cannot reject
H0 because p = 0.30 also lies inside the 95% CI = (0, 0.4397].
However, a 5% lower-
tail test now can be constructed because Pr(T ≤ 1⎪p = 0.30) =
0.0352676 < 0.05.
Because TObs = 3 is not inside the AR = {0, 1}, then H0 : p ≥
0.30 cannot be rejected; therefore, we cannot conclude at the 5%
level that p < 0.30.
1.6 The Quantile (or Percentile) Test The pth quantile of a rv
(random variable) X, denoted by xp, is a value in the range space
of X that satisfies exactly one of the following two
conditions:
(1) Pr(X < xp) < p and Pr(X > xp) < 1 − p, or
(2) Pr(X < xp) = p and Pr(X > xp) = 1 − p.
We may collapse the above two conditions into Pr(X < xp) ≤ p
and Pr(X > xp) ≤
1 − p, bearing in mind that in any specifically individual case
only one of the
above two must be satisfied.
Example 2. The following data represent the diameters of trees,
X, used for wood production (i.e., X is an LTB type QCH) in order
statistics format: 33.1″, 34.8,
35.7, 36.9, 39.8, 40.2, 40.2, 41.1, 41.6, 41.7, 43.0, 44.0,
47.8, 48.0, 50.9, 52.8, 54.8,
55.6, 60.7. (a) Estimate the 80th percentile (or the 0.80
quantile) of X denoted by x0.80.
0.80×n = 0.80×19 = 15.2 ↑ (16) → the point estimate of x0.80 is
given by the
value of the 16th order-statistic → 0.80x̂ = x(16) = 52.8″
We now check to see if this point estimate satisfies the 1st of
the above definition:
Pr (X < 52.8) = 15/19 = 0.7895 < 0.80 and Pr (X > 52.8)
= 3/19 = 0.1579 < 1 − 0.80.
-
9
9
Note that if the sample size n were equal to 20, say that the
20th sample
order-statistic were 61.2″, then 0.80×n = 0.80×20 = 16 → the
point estimate of x0.80
would be given by the average of the 16th and 17th order
statistics → 0.80x̂ = [x(16) +
x(17)]/2 = (52.8+54.8)/2 = 53.80. Now the Pr (X < 53.8) =
16/20 = 0.80 = p, and P̂r (X > 53.8) = 4/20 = 0.20 = 1 − p.
(b) Do the data provide sufficient evidence, at the 5% level, to
conclude that x0.80 exceeds 41.7, i.e., test H0 : x0.80 ≤ 41.7
versus H1: x0.80 >41.7′′ . In order to
conduct a test on a quantile (or percentile), our first task is
to convert to a
binomial test. To this end, we let p = Pr(X ≤ 41.7); by
definition Pr(X < x0.80) ≤ 0.80
and Pr(X > x0.80) ≤ 1 − p = 0.20. This last inequality
implies that Pr(X > x0.80) =
1 − Pr(X ≤ x0.80) ≤ 0.20 → Pr(X ≤ x0.80) ≥ 0.80 → under H0, Pr(X
≤ 41.7) ≥ 0.80 →
H0 : p ≥ 0.80 versus H1: p < 0.80. Let T represent the number
of trees with X
values ≤ 41.7; then small values of T reject H0, where the range
space of T is RT =
{0, 1, 2, 3, …, 19}. The rejection region of size 0.05 consists
of the sum over x values whose binomial Prs from zero to Ru does
not exceed 0.05, i.e.,
UR x 19 x
x 0
190.80 (0.20)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ ≤ 0.05 → RU = upper rejection limit = 11 →
11x 19 x
x 0
190.80 (0.20)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ = 0.023278312 = α; again because the range space of T is
discrete, it is impossible to create an exact 5%-level test. If
we let AR = {0, 1, 2,
…, 11, 12}, then α =12
x 19 x
x 0
190.80 (0.20)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ = 0.067600066 > 0.05. Since TObs = 10,
then we have sufficient evidence at the 5% level to reject H0.
We now define the P-value, or Pr level of a statistical test; this
is the smallest LOS after sampling at which H0 can be rejected
using the computed test statistic, assuming H0 is true. For the
Example 2, the p-value (also referred to as
the critical level) is given by p-value = Pr(T ≤ 10 ⎢p = 0.80) =
10
x 19 x
x 0
190.80 (0.20)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ = 0.006657655. Once the p-value is computed, the null
-
10
10
hypothesis is rejected iff it is less than or equal to the LOS,
α, of the test. This
leads to the following general procedure for testing any
statistical hypothesis.
Step 1. Set up the null and alternative hypotheses. Always put
the ? mark under H1. The status quo is stated under H0 and what is
to be proven under H1.
Step 2. If the test is one-sided, determine whether small values
of T reject H0, or large values of T reject H0. If small values of
T reject H0, then the p-value =
Pr(T≤ TObs ⎢H0 is true); If large values of T reject H0, then
the p-value = Pr(T ≥ TObs ⎢H0 is true). If the p-value < α
(generally 0.05), then reject H0. If the p-value <
0.01, then strongly reject H0.
Step 3. If the test is two-sided, determine if the value of TObs
lies below the median of T or above T0.50. If TObs < T0.50, then
the p-value = 2×Pr(T≤ TObs ⎢H0 is
true); If TObs > T0.50, p-value = 2×Pr(T ≥ TObs⎢H0 is true).
Again, if the computed p-
value < α (= 0.05 unless otherwise specified), then reject
H0. If the p-value < 0.01,
then strongly reject H0. For our example 2 the p-value =
0.006657655 < 0.01, then strongly reject H0 and conclude that
x0.80 > 41.7.
1.7 Obtaining a 95% CI for the pth Quantile (xp = Qp) As the
continuation of Example 2, we illustrate the procedure by obtaining
an approximate 95% CI for the 80th percentile, x0.80, of tree
diameters. Our objective is to obtain the order statistic of rank r
(or rth order-statistic) such that
Pr[x(r) ≤ x0.80 < ∞] ≥ 0.95 → Pr[x(r) > x0.80] ≤ 0.05
The inequality x(r) > x0.80 holds true iff at most (r − 1) of
the X’s are less than or
equal to x0.80, i.e., Pr[x(r) > x0.80] = Pr[at most (r − 1)
of the X’s ≤ x0.80]. Define the
statistic T as the number of observed X’s ≤ x0.80. Thus, T has
the b(x;n = 19, p =
0.80) SMD, and as a result, the Pr[x(r) > x0.80] = Pr[at most
(r − 1) of the X’s ≤ x0.80] =
Pr(T≤ r − 1⎢p = 0.80) = r 1
x 19 x
x 0
190.80 (0.20)
x
−−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ ≤ 0.05. The spreadsheet on my
-
11
11
website (Under Example 2) shows that the value of r − 1 must
equal to 11 so that
the rank of the requisite order-statistic is r = 12. Hence, the
approximate 95% CI is
given by x(12) ≤ x0.80 < ∞ → 44 ≤ x0.80 < ∞. The exact
value of the confidence
coefficient (or confidence level) is equal to 1 − α = 1 −
0.023278312 = 0.9767217.
Example 3. The data 81, 86, 93, 98, 103, 103, 117, 119, 119,
122, 128, 131, 134, 137, 142, 144, 154, 158, 161, 165 represent
times to failure of certain hard-drives measured in units of 1000
hours. Our objective is to perform statistical inference on the 3rd
quartile (Q3 = x0.75) of time to failure (TTF).
(i) Point Estimation. n = 20 → np = n×0.75 = 15 → 0.75x̂ = (15)
(16)x x
2+
→
0.75x̂ = 142 144
2+
= 143 thousand hours.
(ii) Interval Estimation (a 2-sided 95% CI for x0.75) We need to
find order statistics x(r) and x(s) such that the Pr[x(r) ≤ x0.75 ≤
x(s)] ≥ 0.95.
To obtain the lower limit x(r), we must satisfy the inequality
Pr[x(r) > x0.75] ≤ 0.025
because α/2 = 0.025. The inequality x(r) > x0.75 holds true
iff at most (r − 1) of the
X’s are less than or equal to x0.75. Letting T represent the
number of X’s ≤ x0.75, it
follows that Pr[x(r) > x0.75] = Pr(T ≤ r − 1⎢p = 0.75) = r
1
x 20 x
x 0
200.75 (0.25)
x
−−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ → r − 1 =
10 → r = 11→ The lower 95% CL = x(11) = 128 (1000-hours), where
CL represents
confidence limit, and αL = 10
x 20 x
x 0
200.75 (0.25)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ = 0.013864417.
To obtain the upper 95% CL, we must satisfy the inequality
Pr[x(s) < x0.75] ≤ 0.025.
The inequality x(s) < x0.75 holds iff at least s of the X’s
are less than or equal to x0.75.
That is, Pr[x(s) < x0.75] = Pr(T ≥ s ⎢p = 0.75) ≤ 0.025 → 1 −
Pr(T ≤ s − 1 ⎢p = 0.75) ≤
0.025 → Pr(T ≤ s − 1 ⎢p = 0.75) ≥ 0.975 → s 1
x 20 x
x 0
200.75 (0.25)
x
−−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ ≥ 0.975 → s − 1 =
-
12
12
18 → s = 19 →The upper 95% CL = x(19) = 161 (1000-hours), where
αU = 20
x 20 x
x 19
200.75 (0.25)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ = 0.0243126 → The approximate 95% CI for x0.75 is given by
128 ≤ x0.75 ≤ 161, having the exact confidence coefficient 1 −
αL − αU = 0.961823.
We now use the normal approximation to the binomial distribution
to approximate the order-statistic values r and s. Again the
binomial rv T represents
the number of X’s ≤ 0.75; then E(T) = μT = np = 20×0.75 = 15 and
the variance of T
is given by V(T) = npq = 20×0.75×0.25 = 3.75 → σT = 3.75 =
1.9365. Figure 1
clearly shows that the approximate value of the rank r 15 −
1.96×1.9365 =
11.2045 → r = 11 as before, and s = 15 + 1.96×1.9365 =18.7955 →
s = 19 as
before.
(iii) Test of Hypothesis on x0.75 Suppose now we wish to test
the null hypothesis that the 75th percentile of TTF is equal to 120
thousand-hours, i.e., we wish to test H0: x0.75 = 120 versus the
2-sided
alternative H1: x0.75 ≠ 120 at the LOS α = 0.05. Recall that the
approximate 95% CI
was 128 ≤ x0.75 ≤ 161, and hence our testing procedure will
reject H0 because the
null value of 120 is outside the corresponding 95% CI. In order
to conduct the
15
1.9365
.025 .025
r T s
Figure 1. Normal Approximation to the SMD of T
-
13
13
test, we let T = the number of X’s less than or equal to the
hypothesized value 120
→ TObs = 9 < np = 15 → Lα̂ = Pr(T ≤ 9 ⎢p = 0.75) = 9
x 20 x
x 0
200.75 (0.25)
x−
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ =
0.003942142 → Uα̂ = Pr(T = 20⎢p = 0.75) = 0.00317121 → α̂ = Lα̂
+ Uα̂ = 0.0071134 <
< 0.01 → Strongly reject H0, as expected.
1.8 Nonparametric Statistical Tolerance Limits For illustrative
purposes, it is best to first discuss the parametric tolerance
limits,
i.e., the underlying distribution is N(μ, σ2), where for part
(a) both the process mean μ and variance σ2 are known; (b) both μ
and σ2 are unknown, and in part (c) no assumption is made about the
underlying distribution (i.e., the distribution-free or the
nonparametric case).
(a) Suppose the injury rate, X, is N(50 injuries per 100000
work-hours, 4); we wish to obtain the 95% tolerance limits such
that Pr(L0.95 ≤ X ≤ U0.95) = 0.95. Using the
0.975 quantile of the standard normal distribution (z0.025 =
1.96) yields the values
of L0.95 = 50 − 1.96×2 = 46.08, and similarly U0.95 = 53.92. The
95% tolerance
interval (46.08, 53.92) means that we are 100% certain (because
both μ & σ are
known) that 95% of all injury rates (IRs) lie within this last
tolerance interval.
(b) Suppose now the normal population mean μ and population
standard deviation σ are unknown and have to be estimated from a
simple random sample
of size n by x and S = n
2i
i 1(x x) /(n 1)
=− −∑ , respectively. Then the (γ, p = 1 − α)
tolerance interval is given by x ± kα/2×S, where γ represents
the confidence level
with most common values of 0.95 and 0.99, p = 1 − α represents
the minimum
proportion of the population captured (or covered), and kα/2 is
called the tolerance
factor whose value for a given γ depends on the proportion p
captured and the
sample size n. As an example, consider the case (a) above but
assume that we do not know μ and σ, but a random sample of size n =
15 yields the average IR x =
-
14
14
51.3 injuries/100000 hours and S = 2.91. Our objective is to
obtain the (0.99, 0.95)
tolerance interval x − kα/2×S ≤ X ≤ x + kα/2×S for IR that
contains at least 95% of the population at a confidence level of
99%. Table 5.4 of H-C Chiu shows that k0.025 = 3.50731 (p = 0.95)
so that the requisite tolerance interval is (41.0937, 61.5063).
Thus, we are 99% confident that at least 95% of IRs lie within the
tolerance interval (41.0937, 61.5063). Note that the standard
deviation of a process basically determines the length of a
tolerance interval which is given by
TIL (tolerance interval length) = 2kα/2×S. In other words no
matter how large n is, a
TIL can never be reduced to zero while a confidence interval
length (CIL), which is
given by 0.025x t S / n± × , approaches zero as n → ∞. For n
> 50, the value of
kα/2 can be closely approximated from kα/2 = [1 +1/(2n)]zα/2× ;n
12(n 1) /γ −
− χ ,
where ;n 1
2γ −
χ is the (1 − γ) quantile of chi-square distribution with (n −
1) degrees
of freedom (df). We illustrate the application of this last
formula by approximating
the exact value k0.025 = 3.50731 from H-C Chiu: k0.025 ≅ (1
+1/30)×1.96× 0.99;14214 /χ
= (1 +1/30)×1.96× 14 / 4.6604 = 3.5103 > k0.025 from Chiu,
which is a bit
conservative because this approximation will lead to a wider
tolerance band.
(c) The Distribution-Free Tolerance Intervals The objective is
to determine the sample size n and ranks r and (n+1−m) such
that the
Pr[x(r) ≤ at least the fraction p of a population ≤ x(n+1 − m)]
≥ γ (1)
The most common values of r and m are 1 in which case we wish
the sample spread to contain at least a proportion p (note that W.
J. Conover uses q for the
proportion p) of the population with γ×100% confidence
probability. We first
provide an approximate solution for the requisite sample size n
for a specified p,
r, s, and γ. n ≅ 0.25×1 ;2(r m)2−γ +
χ ×(1+p)/(1 − p) + (r + m − 1)/2 (2)
-
15
15
As an example, suppose we wish to determine the needed sample
size in a human population whose sample spread, [x(1), x(n)],
covers at least 95% of the population for their % body-fat content,
X, with 99% certainty, i.e., we wish to
obtain the tolerance interval such that the Pr[x(1) ≤ at least
the fraction 0.95 of the
human population ≤ x(n)] ≥ 0.99 → r =1, m =1, p = 0.95, 0.01;42χ
= 13.2767, and Eq.
(2) yields n = 0.25×13.2767×(1.95)/0.05 + (1+1 − 1)/2 ≅ 129.9478
→ nmin = 130. Suppose a random sample of size 130 gives x(1) = 5.7%
body-fat and x(130) = 22.9%. Thus we are 99% confident that at
least 95% members of the target
human population have their body-fats within the tolerance
interval 5.7≤ X ≤
22.9%.
Eq. (2) can be rearranged and p can be solved for a given n, r,
m, and γ and
the result is shown below [see W. J. Conover, p. 151, Eq.
(2)]
p = 1 ;2(r m)
1 ;2(r m)
2
2
4n 2(r m 1)
4n 2(r m 1)−γ +
−γ +
− + − − χ
− + − + χ (3)
In the above example, suppose we take a random sample of size n
= 100 from the target human population and we wish to know at least
what proportion of the population is contained within the sample
spread [x(1) % body-fat, x(100)] with 99%
certainty? Substitution into Eq. (3) yields p = 400 2 13.2767400
2 13.2767
− −− +
= 0.9354.
If the rv, X, is an STB or LTB (larger-the-better) type such as
IR and job efficiency, respectively, then it will be more
appropriate to obtain a one-sided tolerance interval. In the case
of an STB QCH, we need to determine n such that the
Pr[at least the proportion p of a population ≤ x(n+1 − m)] ≥ γ
(4a)
where the most common value of m = 1. For an LTB type QCH, our
objective is to find n such that
Pr[at least the proportion p of a population ≥ x(r)] ≥ γ
(4b)
where again the most common value of r = 1.
For the most common cases of r = m = 1 because Eq. (4a) is true
iff Pr[xp ≤
x(n)] ≥ γ and (4b) holds iff Pr[x1 − p ≥ x(1)] ≥ γ, it can be
shown that the solution for
-
16
16
both cases is
n = ln(1 )
ln(p)− γ
= 10
10
log (1 )
log (p)
− γ, (only when r = m =1) (4c)
As an example, suppose we wish to obtain an upper one-sided
tolerance interval
for IR such that Pr[at least 95% of the target population ≤
x(n)] ≥ 0.99; thus the
requisite n is given by n = ln(0.01)/ln(0.95) = 89.7811 =
log10(0.01)/log10(0.95) →
the minimum sample size required is nmin = 90. Suppose we sample
n = 90 IRs and obtain the 90th order-statistic to be 65
injuries/100,000 hours-worked. Then we can claim we are 99%
confident that the upper 95% tolerance limit for IR is 65. It is
interesting to note that Eq. (2) can also be used to obtain the
solution given
in (4c). For this last example, r = 0, m = 1, and n ≅ 0.25×1
0.99;2(1 0)2− +
χ ×(1+0.95)/(1
− 0.95) + (0 + 1 − 1)/2 = 0.25×9.2103×(1.95)/0.05 = 89.8004 →
nmin = 90 as before.
1.9 The Sign Test As W. J. Conover (1999, 3rd edition) states on
his pp. 157-8 that the bivariate data have to be in at least an
ordinal scale. To illustrate the application of the sign test,
consider the data of Exercise 8 on page 177 of W. J. Conover
(1999), where there are 60 bivariate data points, and we divide
them into two subgroups below and above the median income. Thus, 30
individuals had income below the
median 0.50x̂ = $25941.50 and 30 above the median income. Let
the statistic T
represent the number of families beyond the median income whose
no. of
children X(30 +i) > X(i), i = 1, 2, 3, …, 30 =n′ . However,
there are 4 ties in the data so that the number of Bernoulli trials
is n = 26. Thus our null hypothesis is H0 : p
= 0.50 versus H1: p ≠ 0.50, and under H0 T has a b(x;n = 26, p =
0.50) SMD. The
observed value of T is TObs = 10. Because TObs = 10 < np =
26×0.50 = 13, the p-
value = 2Pr(T ≤ 10| p =0.50) = 2B(10; 13, 0.50) = 2x 10
30
x 0
26(0.50)
x
=
=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ = 2× 0.1634698
= 0.326939583 > 0.05 → Insufficient evidence to reject H0,
i.e., the correlation
between X & Y is not significant. Minitab provides an answer
of p = 0.3269. The
-
17
17
sign test when used to test for trend is generally refereed to
as the Cox-Stuart test for Trend (see pp. 170-176 of W. J.
Conover). The most common application of the sign test occurs in
the before-after design such as testing for effectiveness of a
certain diet to reduce weight (see Exercise 1 on p. 164 of
Conover, where n = 6). The null hypothesis is H0: μX = μY versus
the alternative
H1: μY < μX. Letting T represent the number of bivariate
pairs where Y < X, then
TObs = 5. Thus, the p-value = Pr(T≥ 5| p = 0.50) = 1− B(4; 6,
0.50) = 1− 0.8906250 =
0.1093750 > 0.05 → Insufficient evidence to reject H0. If the
TObs had been equal
to 6, then p-value = (0.5)6 = 0.0156250 < 0.5, then we would
have sufficient evidence to reject H0 and conclude that the diet
had been effective.
1.10 The McNemar Test for Significant Changes for Before to
After This test is a variation of the sign-test where data are
taken in the form of a 2×2
contingency table, and the cells (0, 0) and (1, 1) indicate no
changes from before to after treatment, while the cells (0, 1) and
(1, 0) show the number of changes in preferences from before to
after. For the sake of illustration, suppose a social scientist is
interested to determine if certain treatment (such a film, video,
etc) about juvenile delinquency would change a community’s opinion
about how severely delinquents should be punished. Accordingly, a
random sample of n = 120 adults is selected from the community and
he conducts a before-and-after study, where each subject in the
sample acts his/her own control. Thus, the null hypothesis is that
the treatment has no effect versus the alternative that the film
does impact the community’s opinion toward the level of severity.
Out of the 120 individuals in the sample, 80 favored more
punishment for juvenile delinquency and 40 favored less, while
after watching the film only 14 of the 80 favored more punishment
and 31 out of the 40 who favored less before changed to more
punishment after viewing the film. Table 3 depicts the survey’s
result, which
clearly shows that there were 14 (1→1) and 9 (0 → 0) adults with
no opinion
changes, i.e., there were 14+9 = 23 ties, which will be kept out
of the analysis
altogether. However, out of the remaining 97 adults there were
66 (1→0) and 31
(0→1) changes. I will first compute the exact p-value of the
test, and then will
-
18
18
Table 3. McNemar’s test for a Before-After Study After
Before
Favored more punishment (1)
Favored less punishment (0)
Total Before
Favored more punishment (1)
14 = n11 (ties)
66 (=TObs) = n10
E12 =0.5×97
80
Favored less punishment (0)
31 = n01 E21 =0.5×97= 48.5
9= n00 (ties)
40
Total After 45 75 120
use the normal approximation to the binomial to compute the
p-value. We
basically have n = 66 + 31 = 97 Bernoulli trials where the event
(1→0) has
occurred 66 times and we must ascertain whether such an
occurrence is strictly due to chance (random) variation, or there
is an underlying shift in opinion.
We may state the null hypothesis H0: Pr(1→0) = Pr(0→1) = p =
0.50 versus the 2-
sided alternative H1: p = Pr(1→0) ≠ 0.50. Using the binomial
pmf, the precise p-
value of the test is given by p-value = 2×Pr(T ≥ 66 ⎢p = 0.50) =
2×[1 − Pr(T ≤ 65 ⎢p
= 0.50)] = 2×[1 −B(65;97, 0.50)] =2×[1 −65
9797 x
x 0C (0.5)
=∑ ] = 2×(1 − 0.99975492102) =
0.00049015795 < 0.01 → Strongly reject H0 and conclude that
the treatment had a
significant impact in changing the community’s opinion toward
punishment severity for juvenile delinquents. We now compute the
p-value of the above test using the normal approximation to the
binomial. Because under H0 the binomial mean is np =
97×0.50 = 48.5 and variance is npq = 97×0.25 = 24.25 → σT =
4.92443, then p-value
= 2×Pr(T ≥ 66 ⎢p = 0.50) = 2×Pr( μσ
TT
T -≥
.4 9244366 - 48.5 ) ≅ 2×Pr(ZN(0,1) ≥ 3.553711578) =
2×Φ( −3.553711578) = 2×0.0001899177645 = 0.000379835529, which
is not very close to the exact p-value = 0.00049015795 due to the
fact that the binomial pmf has a discrete range space and the
measurement for the normal distribution
-
19
19
must be at least in the interval scale. Therefore, we need to
correct TObs = 66 for continuity; because on a continuous scale,
the integer 66 is represented by the
interval (65.5, 66.5), then p-value = 2×Pr(T ≥ 66 ⎢p = 0.50) =
2×Pr( μσ
TT
T -≥
.4 9244365.5 - 48.5 ) ≅ 2×Pr(ZN(0,1) ≥ 3.452177) = 2×Φ( −
3.452177) = 2×0.00027804145 =
0.000556082894. This approximation is now acceptable because the
stated LOS is actually larger than the exact p-value = 0.000490158
and hence is conservative. From statistical theory, it is well
known that if Z ~ N(0, 1), then Z2 has a chi-
square distribution with 1 df, i.e., Z2 ~ 12χ . W/O correction
for continuity (cfc), p-
value ≅ 2×Pr(Z ≥ 3.553711578) = Pr[ 12χ ≥ (3.553711578)2] = Pr(
1
2χ ≥ 12.62886598)
= 0.00037983553 as before. It turn out that the value of 12χ for
Table 3 W/O cfc is
simply 12χ = +
2 2(66 - 48.5) (31- 48.5)48.5 48.5
= 2×2(66 - 48.5)
48.5 =
210 01
10 01
(n -n )n +n
=
2(66 - 31)97
= 12.62886598. While with the cfc, the above chi-square changes
to
12χ = 2×
2(66 - 48.5 - 0.50)48.5
= 2
10 01
10 01
(n -n -1)n +n
= 342
97= 11.9175257732, which
results in the p-value ≅ Pr( 12χ ≥ 11.9175257732) = 0.000556083,
again as before.
1.11 Before-and-After Designs For Number of Injuries As an
example, see pages 102-104 of L. S. Robson et al. (Guide to
Evaluating the Effectiveness of Strategies for Preventing Work
Injuries) where there were 28 injuries during 40000 employee hours;
after a safety program intervention (such as the Implementation of
a behavioral based safety program, Lifting
training/education, Establishment of safety committee, Guarding
evaluation and upgrade, etc; these examples were provided by Dr.
Jerry Davis) there were 22 injuries during 60000 hours. Since the
unit of employee-hours-worked is 100,000
-
20
20
hours = 1 unit, we first convert the two observed number of
injuries to the proper unit. Let IRB represent the (average) injury
rate before intervention; then
BˆIR = 28/(40000 hours) = 28 100000 hours
40000 hours one unit× = 70 per unit. Similarly,
the average IR for after intervention is given by AˆIR =
22 100000 hours60000 hours one unit
× = 36.66667 per unit. We first examine the difference
in IRs followed by their ratios.
(1) The point estimate of the rate difference is given by ˆ DR =
ˆ BIR − ˆ AIR =
70−36.66667 = 33.3333 . Note that the sample sizes for these
last two point
estimates are nB = 0.40 unit and nA = 0.60 unit. The question
now is “has the
intervention been effective?, i.e., is the sample rate
difference of 33.3333
significantly larger than zero?” Although the authors do not
highlight this question, from a statistical standpoint this leads
to a one-sided test of hypothesis H0 : IRB = IRA ↔ H0 : RD = 0
versus the alternative H1: IRA < IRB ↔ H1
: RD > 0, where the alternative states that injury rate has
been reduced due to the intervention. Because occurrence of
injuries during 100000 hours is a rare event, I surmise that the
number of injuries per unit is Poisson distributed and
hence both its mean and variance are equal to the Poisson
parameter μ, where
the pmf is given by
Pr(x) = x
ex!
−μμ , x = 0, 1, 2, 3, …
and the rv X represents the number of injuries in one unit time
interval (i.e., 100000 hours). From statistical theory, the
limiting distribution of Poisson is the
Gaussian N(μ, μ); further, a linear combination of any two
normal deviates, such
as X1 − X2, is also normally distributed with E(X1 − X2) = μ1 −
μ2 and variance V(X1
− X2) = 21σ + 22σ − 2σ12, where σ12 = Cov(X1, X2) = E[(X1 −
μ1)×(X2 − μ2)]. Because
the two IRs during before and after are independent, then Cov( ˆ
BIR , ˆ AIR ) = 0 and
-
21
21
hence the point estimate of V( ˆ DR ) = V( ˆ BIR − ˆ AIR ) = V(
ˆ BIR ) + V( ˆ AIR ) = B AB A
μ μ+n n
is given by v( ˆ BIR − ˆ AIR ) = v( ˆ BIR ) + v( ˆ AIR ) ≅ ˆ
ˆ
B A
B A
IR IR+n n
= 70 36.66667+
0.40 0.60 =
236.1111 → se( ˆ BIR − ˆ AIR ) = se( ˆ DR ) = 236.11111 =
15.36591 per 100,000
hours-worked. Now that we have estimated the standard error (se)
of
ˆDR = ˆ BIR − ˆ AIR , we can assess its significant difference
from zero by
standardizing it as Z0 = ˆ ˆ
ˆ ˆ )B A
B A
(IR - IR ) - 0(IR - IRse
= 33.333315.36591
= 2.1693046, where Z0
denotes the test statistic under H0: RD = IRB − IRA = 0 →
p-value ≅ Pr(ZN(0, 1) ≥
2.1693046) = 0.01503 < 0.05 → Reject H0 and conclude that the
safety program
intervention has been effective at the 5% level.
We can also conduct the above test of hypothesis using 2df
1=
χ by
computing the expected frequencies under H0: RD = IRB − IRA = 0.
Under H0: IRB =
IRA out of the total of 28 + 22 = 50 injuries, the expected
frequency for “Before” is
given by EB = 50×(40000/100000) = 20 and thus EA = 50−20 = 30.
As a
consequence, the chi-square statistic is given by 22
2 i0
i 1 i
E )E=
−χ =∑ i(n =
2(28 - 20)20
+
2(22 - 30)30
= 5.3333→ p-value =0.50×Pr( 21
χ ≥ 5.3333 ) =0.50×0.02092133534 =
0.01046067, which is smaller (less conservative) of the p-value
from the normal approximation to the Poisson. The reason behind the
incongruence of the two p-values is the fact that both are
approximations, one of which (the normal) has not been corrected
for continuity and is more conservative than the other. If both
were exact test statistics, then for certain 20Z = (2.1693046)2
= 4.70588244 would
equal to 22
2 i0
i 1 i
E )E=
−χ =∑ i(n 5.3333 . When in doubt, it is statistically prudent
to
state the p-value that is more conservative, i.e., the p-value
in this case should
-
22
22
be stated as 0.01503. Because I conducted a right-tailed test
(H1: IRA < IRB ↔
H1: 0 < IRB −IRA ↔ H1: RD > 0), then we need to develop a
lower one-sided 95 % CI
for the parameter RD. The approximate lower CL is (RD)L = ˆ DR
−z0.05×se( ˆ DR ) =
33.33333 −1.64485×15.36591 = 8.0586648→ 8.0586648 ≤ RD < ∞,
which excludes
zero consistent with rejecting the null hypothesis H0: RD = 0.
It is paramount to
note that the 99% lower CL = ˆ DR −z0.01×se( ˆ DR ) =33.33333
−2.32635×15.36591 =
−2.413113 gives the CI = (−2.413113, ∞) that includes zero
because we could not
reject H0: RD = 0 at the 1% level (p-value ≅ 0.01503 >
0.01).
We can also measure the effective strength of the safety
intervention by
computing the rate ratio ˆ RR = ˆ AIR / ˆ BIR = 36.666667/70 =
0.52381, implying that IR
has diminished by as much as 47.619%. However, the statistical
properties of a
rate ratio ˆ RR are not at all as nice as a rate difference ˆ DR
. There are two major
reasons: (1) If the rate Before, ˆ BIR , is zero albeit rare,
clearly the corresponding
ˆRR cannot be computed.
(2) Even if both ˆ BIR and ˆ AIR were normally distributed,
their ratio ˆ RR is never
normally distributed. Even worse is the fact that the approach
to normality of
ˆRR SMD is agonizingly slow (in general, at least n > 50 is
required before the
normal approximation is fairly adequate). Nearly always a ratio
estimator is
biased and its SMD is positively skewed (i.e., α3 = E[⎛ ⎞⎜ ⎟⎝
⎠
3X -μσ
] > 0). We can
alleviate the approach to normality somewhat by taking the
natural logarithm of
ˆRR , which is a variance-reduction transformation plus the fact
that ln( ˆ RR )
approaches normality more rapidly that ˆ RR itself. For
convenience, let Y =
ln( ˆ RR ); then form statistical literature for large n, Y is
approximately normally
distributed with approximate mean ln(RR) and approximate
variance that I will compute below.
-
23
23
V(Y) = V[ln( ˆ RR )] = V[ln( ˆ AR / ˆ BR )]= V[ln( ˆ AR ) − ln(
ˆ BR )] = V[ln( ˆ AR )]+V[ln( ˆ BR )]. (5)
It is impossible to compute an exact variance for ln( ˆ AR )
even if the exact
variance of ˆ AR were known, unless ˆ AR is lognormally
distributed. Thus, what
follows is a rough approximation. We first write the Taylor
expansion of ln( ˆ AR )
about the mean of ˆ AR , denoted by μA = E( ˆ AR ).
ln( ˆ AR ) = ln(μA) + ( ˆ AR −μA) A′ μf ( ) + ( ˆ AR −μA)2
A′′ μf ( ) /2! + …, where A′ μf ( ) =
d ˆln( )ˆd AAR
R⎢μA, A′′ μf ( )=
2d ˆln( )ˆd A2AR
R⎢μA, etc. Because A′ μf ( ) =
d ˆln( )ˆd AAR
R⎢μA =
1/μA and ′′ ξf ( ) =−1/ 2Aμ , the above Taylor expansion reduces
to
ln( ˆ AR ) = ln(μA) + ( ˆ AR −μA)/μA − ( ˆ AR −μA)2/(2 2Aμ ) + …
(6)
We now apply the variance operator V to both sides of Eq. (6)
but maintaining only the 1st two terms on the RHS of (6).
V[ln( ˆ AR )] ≅ V[ln(μA) + ( ˆ AR −μA)/μA] = 0 + V[( ˆ AR
−μA)/μA] = 2Aμ1
V( ˆ AR ) = A2A
/ nμμ
A →
V[ln( ˆ AR )] ≅ A
1n μA
= 1
0.60 36.6666× =
122
; similarly, V[ln( ˆ BR )] ≅128
.
Inserting these into Eq. (5) yields V[ln( ˆ RR )] = V[ln( ˆ AR
)]+V[ln( ˆ BR )] ≅ 122
+ 128
=
0.08116883 → se[ln( ˆ RR )] = 0.28490144. We could now proceed
to obtain a 2-
sided 95% CI for RR = RA/RB as was done on page 104 of L. S.
Robson et al., but we deviate from their procedure because RR =
RA/RB should always be minimized (in the ideal case RA = 0 in which
case the intervention program is 100% effective). Therefore, it is
more informative to obtain an upper one-sided 95% CI
for RR. To this end, ln( ˆ RR ) +1.64485×se[ln( ˆ RR )] =
ln(0.52381) + 1.64485×0.284901
= − 0.178006 → (RR)U = e−0.178006 = 0.8369374051 → 0 < RR ≤
0.8369374051. This
95% CI excludes the hypothesized value H0 : RR = RA/RB ≥ 1,
hence we can reject
H0: RR = 1 at the 5% level in favor of H1: RA/RB < 1.