Top Banner
Dr. Neal, WKU MATH 482 The F–Distributions The F –distributions can be used in statistics to analyze the ratio σ 1 / σ 2 of the standard deviations of two independent normally distributed measurements, as well as for one- way and two-way ANOVA (analysis of variance). An F –distribution is based on two parameters called the degrees of freedom n 1 and n 2 , which are both integers greater than or equal to 1. Such a random variable is denoted by X ~ F ( n 1 , n 2 ) . The F ( n 1 , n 2 ) distribution is formally defined by the ratio F ( n 1 , n 2 ) = χ 2 ( n 1 ) n 1 χ 2 (n 2 ) n 2 , where the χ 2 ( n 1 ) and χ 2 ( n 2 ) distributions are assumed to be independent. Then n 1 is the degrees of the numerator and n 2 is the degrees of the denominator. From this definition, it is clear that if X ~ F ( n 1 , n 2 ) , then 1/ X ~ F ( n 2 , n 1 ) . For X ~ F ( n 1 , n 2 ) , the probability distribution function is given by f ( x ) = n 1 n 2 n 1 /2 Γ n 1 + n 2 2 Γ n 1 2 Γ n 2 2 × x (n 1 /2) 1 1 + n 1 n 2 x (n 1 +n 2 )/2 , for x 0 . As with chi-square distributions, F -distribution pdfs, for n 2 3 , are non- symmetric, skewed “bell-shaped” curves. These graphs attain a maximum value at x = n 1 n 2 2 n 2 n 1 n 2 + 2 n 1 . For X ~ F ( n 1 , n 2 ) and 0 a b , we can compute P ( a X b ) with the built-in TI command Fcdf(a, b, n 1 , n 2 ). Mean and Variance For X ~ F ( n 1 , n 2 ) , we can re-write X as X = n 2 n 1 × U × V 1 , where U ~ χ 2 ( n 1 ) and V ~ χ 2 ( n 2 ) = Γ [ n / 2, 1 / 2] with U and V being independent. We recall now that E[ χ 2 ( n)] = n and E[(Γ[α, β ]) r ] = Γ( α + r ) Γ( α ) β r for r > α . Thus, for n 2 /2 > 1 we have E[V 1 ] = Γ(n 2 /2 1) Γ(n 2 / 2) (1 / 2) 1 = Γ(n 2 /2 1) (n 2 /2 1) Γ(n 2 /2 1) × 2 = 1 n 2 2 .
7

Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Apr 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used in statistics to analyze the ratio σ1 / σ2 of the standard deviations of two independent normally distributed measurements, as well as for one-way and two-way ANOVA (analysis of variance). An F–distribution is based on two parameters called the degrees of freedom n1 and n2 , which are both integers greater than or equal to 1. Such a random variable is denoted by X ~ F(n1, n2 ) . The F(n1, n2 ) distribution is formally defined by the ratio

F(n1, n2 ) =

χ2(n1)n1

χ2 (n2 )n2

,

where the χ2 (n1) and χ2 (n2 ) distributions are assumed to be independent. Then n1 is the degrees of the numerator and n2 is the degrees of the denominator. From this definition, it is clear that if X ~ F(n1,n2 ) , then 1 / X ~ F(n2, n1) . For X ~ F(n1, n2 ) , the probability distribution function is given by

f (x) =

n1n2

⎝⎜

⎠⎟n1/2

Γn1+ n22

⎝⎜

⎠⎟

Γn12

⎝⎜

⎠⎟ Γ

n22

⎝⎜

⎠⎟

×x(n1/2)−1

1+ n1n2

x⎛

⎝⎜

⎠⎟(n1+n2 )/2

,

for x ≥ 0 . As with chi-square distributions, F -distribution pdfs, for n2 ≥ 3 , are non-symmetric, skewed “bell-shaped” curves. These graphs attain a maximum value at x =

n1n2 − 2n2n1n2 + 2n1

. For X ~ F(n1,n2 ) and 0 ≤ a ≤ b , we can compute P(a ≤ X ≤ b) with the

built-in TI command Fcdf(a, b, n1, n2).

Mean and Variance For X ~ F(n1, n2 ) , we can re-write X as X =

n2n1×U × V −1 , where U ~ χ2 (n1) and

V ~ χ2(n2 ) = Γ[n / 2, 1 / 2] with U and V being independent. We recall now that

E[χ2 (n)] = n and E[(Γ[α, β])r ]= Γ(α + r)Γ(α) β r

for r > −α . Thus, for n2 / 2 > 1 we have

E[V−1]= Γ(n2 / 2−1)Γ(n2 / 2) (1 / 2)

−1 =Γ(n2 / 2−1)

(n2 / 2−1)Γ(n2 / 2−1)×2=

1n2 − 2

.

Page 2: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU Thus for n2 > 2 , we have

E[F(n1, n2 )] =n2n1E[U ] × E[V −1] = n2

n1× n1 ×

1n2 − 2

=n2

n2 − 2.

We see then that the mean of the F(n1, n2 ) distribution depends only on the degrees of freedom of the denominator.

Next we have X2 =n2n1

⎝ ⎜

⎠ ⎟ 2×U2 × V −2 . Now for n2 / 2 > 2 we have

E[V−2]= Γ(n2 / 2− 2)Γ(n2 / 2) (1 / 2)

−2 =Γ(n2 / 2− 2)

(n2 / 2−1)Γ(n2 / 2−1)× 4

=1

(n2 / 2−1)(n2 / 2− 2)× 4=

1(n2 − 2)(n2 − 4)

and

E[U2]= Γ(n1 / 2+ 2)Γ(n1 / 2) (1 / 2)

2 =4× (n1 / 2+1)×Γ(n1 / 2+1)

Γ(n1 / 2)= 4× (n1 / 2+1)× (n1 / 2) = n1(n1+ 2) .

Thus, E[X2 ] =n2n1

⎝ ⎜

⎠ ⎟ 2× E[U2] × E[V −2 ] =

n2n1

⎝ ⎜

⎠ ⎟ 2×

n1(n1 + 2)(n2 − 2)(n2 − 4)

. Hence,

Var(X) =n2n1

⎝ ⎜

⎠ ⎟

n1(n1 + 2)(n2 − 2)(n2 − 4)

−n2

n2 − 2⎛

⎝ ⎜

⎠ ⎟

2

= n22 (n1 + 2)n1(n2 − 2)(n2 − 4)

−1

(n2 − 2)2

⎝ ⎜ ⎜

⎠ ⎟ ⎟

= n22 (n1 + 2)(n2 − 2) − n1(n2 − 4)

n1(n2 − 2)2 (n2 − 4)

⎝ ⎜ ⎜

⎠ ⎟ ⎟

= n22 2n1 + 2n2 − 4n1(n2 − 2)2 (n2 − 4)

⎝ ⎜ ⎜

⎠ ⎟ ⎟

=2n2

2(n1 + n2 − 2)n1(n2 − 2)2(n2 − 4)

, for n2 > 4 .

Page 3: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU

f-Scores

αα

1 − 2α

fα (n1, n2 )f1−α (n1, n2 )

F(n1, n2 )

Given an F(n1, n2 ) distribution, we often need to know the two values a and b such that P(F(n1,n2 ) ≤ a) = α or P(b ≤ F(n1, n2 )) = α . As with chi-square distributions, these values of a and b , called f -scores, depend on the degrees of freedom. As usual, we use the right-tail probability for notation purposes. Thus, the right f -score b is denoted by b = fα (n1,n2 ) , while the left f -score is a = f1−α (n1, n2 ) . However, we can re-write the left f -score in terms of a right f -score as follows:

α = P(F(n1,n2 ) ≤ a) = P(1 / a ≤ 1 / F(n1, n2 )) = P(1 / a ≤ F(n2, n1))

Thus, 1 / a = fα (n2 ,n1) and a = 1 / fα (n2 ,n1) . A chart can be used for certain degrees of freedom to find the right f -score values fα (n1, n2 ) and fα (n2 ,n1) . Then we simply use 1 / fα (n2 ,n1) for the left f -score value.

Ratio of Variances

Theorem. Let X1 and X2 be independent normally distributed measurements with non-zero standard deviations σ1 and σ2 , respectively. Let S1 and S2 be the sample deviations obtained from independent random samples of X1 and X2 having sizes n1 and n2 , respectively. Then

σ22

σ12 ×

S12

S22 ~ F(n1 − 1, n2 − 1) .

Proof. With the conditions, we have (n1 − 1)S12

σ12 ~ χ2 (n1 − 1) and (n2 −1)S2

2

σ22 ~ χ2 (n2 − 1).

Thus,

F(n1 − 1, n2 − 1) ~

χ2 (n1 − 1)n1 −1

χ2(n2 −1)n2 − 1

=

S12

σ12

S22

σ22

=σ22

σ12 ×

S12

S22 .

Page 4: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU Application I: When random sampling with normally distributed measurements, we can evaluate P(a ≤ S1 / S2 ≤ b) , provided σ1 and σ2 are known, by

P a ≤ S1 / S2 ≤ b( ) = P(a2 ≤ S12 / S22 ≤ b2 ) = P σ2

2 a2

σ12 ≤

σ22 S1

2

σ12 S2

2 ≤σ2

2 b2

σ12

⎝ ⎜ ⎜

⎠ ⎟ ⎟

= P σ22 a2

σ12 ≤ F (n1 −1, n2 −1) ≤ σ2

2 b2

σ12

⎝ ⎜ ⎜

⎠ ⎟ ⎟ .

. Application II: When random sampling with normally distributed measurements, a (1 − α )× 100% confidence interval for the ratio of variances σ1

2 / σ22 is given by

S12

S22 ×

1fα / 2(n1 − 1, n2 − 1)

≤σ12

σ22 ≤

S12

S22 × fα /2 (n2 −1, n1 − 1) ,

A confidence interval for σ1 / σ2 is then given by

S1S2

×1

fα / 2 (n1 −1, n2 −1)≤σ1σ2

≤S1S2

× fα / 2 (n2 − 1, n1 −1) .

To derive this confidence interval, we solve for the endpoints a and b such that P a ≤ σ1

2 / σ22 ≤ b( ) = 1 − α . To solve for a , we have

α2

= P σ12

σ22 ≤ a

⎝ ⎜ ⎜

⎠ ⎟ ⎟ = P S22 σ12

S12 σ2

2 ≤S22 aS12

⎝ ⎜ ⎜

⎠ ⎟ ⎟ = P S12

S22 a

≤σ22 S12

σ12 S2

2⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟ = P S12

S22 a

≤ F(n1 −1, n2 −1)⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟ .

So the value S12

S22 a

must equal the right-tail f -score fα /2 (n1 − 1, n2 − 1) . We then have

a = S12

S22 ×

1fα / 2(n1 − 1, n2 − 1)

. Likewise we have

α2

= P b ≤ σ12

σ22

⎝ ⎜ ⎜

⎠ ⎟ ⎟ = P S22 b

S12 ≤

S22 σ12

S12 σ2

2⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟ = P S22 b

S12 ≤ F (n2 − 1, n1 −1)

⎝ ⎜ ⎜

⎠ ⎟ ⎟

So the value S22 bS12 must equal the right-tail f -score fα /2 (n2 −1, n1 − 1) . We then have

b = S12

S22 × fα /2 (n2 −1, n1 − 1) .

Page 5: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU

Application III: To test the null hypothesis H0 : σ1σ2

= R for normally distributed

measurements, we obtain the sample deviations S1 and S2 from random samples of

sizes n1 and n2 . We use the alternative Ha : σ1σ2

< R if S1S2

< R , and use the alternative

Ha : σ1σ2

> R if S1S2

> R . The test statistic is then x = σ22 S12

σ12 S22 =

1R2

×S12

S22 which is compared

with the F(n1 − 1, n2 − 1) distribution. We compute the (left-tail) P -value

P F(n1 − 1, n2 − 1) ≤ x( ) for the alternative Ha : σ1σ2

< R , and compute the (right-tail) P -

value P F(n1 − 1, n2 − 1) ≥ x( ) for the alternative Ha : σ1σ2

> R .

Example 1. Female Verbal GRE scores have a standard deviation of σ1 = 80 and while male Math GRE scores have a standard deviation of σ2 = 75. With random samples of 30 females and 25 males, compute P 1 ≤ S1 / S2 ≤ 1. 25( ) . Solution. With n1 = 30 and n2 = 25, we convert to an F(29, 24) distribution:

P 1 ≤ S1 / S2 ≤ 1. 25( ) = P 752 ×12

802 ≤σ2

2 S12

σ12 S2

2 ≤752 × 1. 252

802⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟

= P 225256

≤ F(29, 24) ≤ 56254096

⎛ ⎝ ⎜

⎞ ⎠ ⎟ ≈ 0.4177 .

Example 2. The standard deviation σ3 of female math GRE scores is unknown as is the standard deviation σ4 of male Verbal GRE scores. (a) A random sample of 21 female Math GRE scores gave a sample deviation of S3 = 70.6, while a random sample of 16 male Verbal GRE scores gave a sample deviation of S4 = 90.2. Find a 98% confidence interval for the ratio σ3 / σ4 . (b) Test the hypothesis that σ3 is half of σ4 . (c) Test the hypothesis that σ4 is five-fourths of σ3 . Solution. (a) We use the formula

S3S4

×1

fα / 2 (n3 −1, n4 − 1)≤σ3σ4

≤S3S4

× fα / 2(n4 − 1, n3 −1)

which gives

70.690.2

×1

f0.01(20, 15)≤σ3σ4

≤70.690. 2

× f0.01(15, 20) .

Page 6: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU

Using the f -score chart, we find f 0.01(20, 15) ≈ 3.37 and f 0.01(15, 20) ≈ 3.09. Thus

we obtain 0.4263 ≤ σ3σ4

≤ 1.3759.

(b) To test σ3 =

12σ4 , we use the null hypothesis H0 : σ3

σ4=12

. Because

S3S4

=70. 690.2

≈ 0. 7827 > 12

, we use the alternative Ha : σ3σ4

>12

. The test statistic is then

x = σ42 S32

σ32 S42 = 4 × 70. 6

2

90. 22≈ 2.4505

which is compared with the F(n3 − 1, n4 − 1) = F (20, 15) distribution. We now compute the right-tail P -value by P F(20, 15) ≥ 2. 4505( ) = Fcdf(2.4505, 1E99, 20, 15) ≈ 0.0408 .

If σ3σ4

=12

were true, then there would be only a 4.08% chance of obtaining a sample

ratio of S3S4

= 0. 7827 or higher with samples of sizes 21 and 16. We have enough

evidence to reject σ3 =12σ4 in favor of σ3 >

12σ4 .

(c) We now test if σ4 =

54σ3 using H0 : σ3

σ4=45

. We now use Ha : σ3σ4

<45

because

S3S4

= 0. 7827 < 45

. The test stat is

x =σ42 S32

σ32 S42 =

54

⎛ ⎝ ⎜

⎞ ⎠ ⎟ 2×70. 62

90. 22≈ 0.95723

The (left-tail) P -value is P F(20, 15) ≤ 0. 95723( ) = Fcdf(0, .95723, 20, 15) ≈ 0.4552.

If σ3σ4

=45

were true, then there would be a 45.52% chance of obtaining a sample

ratio of S3S4

= 0. 7827 or lower with samples of sizes 21 and 16. We do not have any

evidence to reject σ4 =54σ3 .

Page 7: Dr. Neal, WKU MATH 482 The F–Distributionspeople.wku.edu/david.neal/482/twovariables/Fdist.pdf · Dr. Neal, WKU MATH 482 The F–Distributions The F–distributions can be used

Dr. Neal, WKU

Exercises

Assume all measurements are normally distributed.

1. Adult male heights have a standard deviation of σ1 = 3.5 inches, while adult female heights have a standard deviation of σ2 = 3 inches. Suppose 13 random male heights and 16 random female heights are obtained and the sample deviations S1 and S2 are

computed. Compute P 0.9 ≤ S1S2

≤ 1. 2⎛

⎝ ⎜

⎠ ⎟ .

2. The IQ scores of a control group of “average” children are compared with those of a group of “precocious” children to see if there is a difference in standard deviation. Let σ1 be the true standard deviation among all “average” children and let σ2 be the true standard deviation among all “precocious” children. Suppose 31 average children have a sample deviation of S1 = 15.4 while 25 precocious children have a sample deviation of S2 = 9.6. (a) Find a 95% confidence interval for the ratio σ1 / σ2 . (b) Is there significant evidence to reject the hypothesis that σ1 = σ2 ? State appropriate initial and alternative hypotheses, compute the test statistic and P -value, then use the P -value to completely explain your conclusion. (c) Is there significant evidence to reject the hypothesis σ2 is 60% of σ1? State appropriate initial and alternative hypotheses, compute the test statistic and P -value, then use the P -value to completely explain your conclusion. 3. Let X =

Zχ2(n)n

~ t (n) , where Z ~ N(0, 1) and Z is independent of χ2 (n) .

Prove that X2 ~ F (1, n) .