" a • THE MOMENTS OF THE SAMPLE MEDIAN by J. T. Chu and Harold Special report to the Office of Naval Research of work at Chapel Hill under Contract NR 042 031, Project N7-onr-284(02)1 for re- search in Probability and Statistics. Institute of Statistics Mimeograph Series No. 116 August1 1954
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
"
a
•
THE MOMENTS OF THE SAMPLE MEDIAN
by
J. T. Chu and Harold Ho~elling
Special report to the Office ofNaval Research of work at ChapelHill under Contract NR 042 031 ,Project N7-onr-284(02)1 for research in Probability and Statistics.
Institute of StatisticsMimeograph Series No. 116August1 1954
·e
•
THE MCMENTS OF THE S1U-IPLE MEDIANl , 2
by
John T. Chu and Harold HotellingInstitute of Statistics, University of North Carolina
1. Summary. It is shown that under certain regularity conditions, the moments
about its mean of the sample median tend, as the sample size increases indefinitely,
to the corresponding ones of the asymptotic distribution (which is normal). A
method of approximation, using the inverse function of the cumulative distribution
function, is obtained for the moments of the sronple median of a certain type of
parent distribution. An advantage of this method is that the error can be made as
small as is required. Applications to normal, Laplace, and Cauchy distributions are
discussed. Upper and lower bounds are obtained, by a different method, for the
variance of the sample median of normal and Laplace parent distributions. They are
simple in form, and of practical use if the sample size is not too small.
2. Introduction. Let a population be given with cdf (cumulative distribution
function) F(x) and pdf (probability density function) f(x), and median ~ which we
assume to exist uniquely. Let xdenote the sanlple median of a sample of size
2n + 1. Then the pdf g(x) of xand the pdf h(x) of the asymptotic distribution of
-x are respectively
where C = (2n + l)!/(n! n!), andn
1. This paper was presented at the summer meeting of the Institute ofMathematical Statistics at Montreal, P.Q., Canada, on September 11, 1954.
2. Sponsored by the Office of Naval Research under Contract NR 042 031 forresearch in Probabilj.ty and Statistics.
(2) h(x)
2
where il2
=
A question that follows naturally is: Can the moments of the asymptotic dis
tribution of xbe used as approximations to the corresponding moments of i, and if
not, how to find better anproximations? When the parent distribution is normal,
this question has been answered by various authors, e.g., T. Hojo ~6_7, K. Pearson
~8, 9_7 and more recently, J. H. Cadwell ~3_7. It has been stated, e.g., in ~3_7,
that experiments showed that the distribution of xtends rapidly to normality, but
the variance of x(as ofquantiles in general) tends only slowly to the variance of
the asymptotic distribution. For this reason of slow convergence, approximations...
were derived for the variance of x when the sample size is small. 1Nhile different
methods were used by different authors, their results agree fairly well with each
other. In fact, the problem should be considered as completely solved but for the
unknown error cowaittcd in using such approximation....
One of us ~4_7 recently proved that the distribution of x, for a normal
parent distribution, does tend to normality rather "rapidly". In g 6 we shall
-confirm another experimental result that the variance of x tends flslowly" to the
variance of the asymptotic distribution (actually not "very slowly"). Upper and
lower bounds are obtained (§ 6, Theorem 4) for the variance of~. A slightly better
lower bound is obtained, by a different method, in ~ 5, formula (49), or ~ 6,
formula (74). It seems that even for sample sizes around 10 to 20, the asymptotic
-variance is not a bad approximation to the true variance of x. It becomes a very
good approximation if the sample size is large. However, for large samples, an even
better approximation is obtained in ~ 5, formula (56).
Before further discussion, the following notations will be introduced. If
f(x) and g(x) are functions of x, then Ef(g) denotes the expectation of g(x) with
3
00
~ respect to f(x), i.e., J g(x) f(x) dx. We use, where f, g, and h are given by
-00
(1) and (2),
(3) ,
and for any integer k ~ 2,
,
( 4) ... ( ... )k~k = Ee x - ~l '
- ( -)k~k = Eh x - ~l '
It should be pointed out that, although the pdf g(x) of i tends to h(x), the moments
~k of i in general do not necessarily tend to ~k. In fact ~k may never exist ~2_7.
Nevertheless, if the parent pdf satisfies certain conditions, then it can be shown...
that ~k tends to ~k as sample size tends to infinity ( §3, Theorem 1). Therefore
under such circumstance, it is justifiable, at least for large samples, to use ~k
as an approximation to ~.
If the parent distribution satisfies certain conditions, a general method is
obtained in ~ 4 for computing ~k' k = 1, 2, ..•• The method is based on the Taylor
expansion of x(F), the inverse function of F(x). For example, if x(F) - ~
00 1 m k= ~ ~(F - 2) converges for 0 < F < 1, a = O(ZUm ) where k ~ 0 and f(x) is sym-~l m
metric with respect to x = ~J then when n > 2k + 3,
4
1
ii2 * j S~ Cn F"(I - F)n dF ,
o
m 1 rwhere C is given by (1) and S = ~ a (F - -2) (~4, Theorem 3). Error in such
n m r=l r
approximation can be computed, and it tends to 0 as m tends to 00. If the parent
pdf is not symmetric, similar approximations can be obtained ( ~ 4, formula 26).
Applications are given to the variances of the sMQple medians of Laplace and Cauchy
parent distributions (§ 4, Examples 1 and 2).
Finally upper and lower bounds are derived in ~ 7 for the variance of x of a
Laplace parent distribution. It then can be seen that for estimating the mean of a
Laplace distribution, the sample median is a "betterB estimate than the sample mean,
not only for large smnples, but for small samples as well.
3. Large Sample Moments.
Lemma 1. If 0 ~ c ~~, then for m, n = 1, 2, ... ,
(6)
12 + c(I
1 )2 - c
4c 2
m 1 m+2n+l I ()/
I u - ~ I unCI - u)n du - ( "2 ) ,J t m-I 2 (1 - t)n dt
o
1In particular, if c = 2 ' ~d Cn = (2n + l)l/nl nl , we have for fixod m,
If this is true, then the RHS of (77) 1s an upper bound of ~2/~2. Thus the
proof is completed.
24
To establish (81), we introduce, as in (68),
(82) 22/ 2gO(y) = y (1 - 4v ) 4v
where y and v satisfy (78). For all y ~ 0, gO(y) is not smaller than the LHS of
(81) and
(83)1
oas y->
o
00
Let "I" denote differentiation with respect to y, then
(84) , where
(85)
(86) gl'(y) =1 e-y g (y)2 2 , where
(87)
(88)
= -
= -
If f(x) is a function of x, and if as x increases from 0 to 00, f(x) varies from,
e.g., positive to negative, and then back to positive, we will write, for
simplicity, As x: 0-> 00, f(x): +,-,+. Now
25
I > >g,(y) ~ 0 according as y ~ log 2 ,
So as y:
Now g2(0) = 0, while g2(00) =00. We say that as y: 0 ---> 00, g2{y): +,-,+.
I
Otherwise g2{y) ~ 0 for all y ~ 0, so gl(y) ~ 0 and Sl(y) ~ 0 for all y ~ 0, as
gl(O) = O. Hence gO{y) is steadily increasing. This, however, contradicts (8,).
I
It follows that as y: 0 ---> 00, gl{y): +,.,+. Now gl(O) =gl(oo) = 0, hence as
y: 0 ---> 00, gl(y): +,-. Therefore we conclude that as y: 0 ---> 00, SO(y)
increases steadily from 1 ,to a maximum, and then decreases steadily to O. To
find the maximum of BO(y), we first solve Bl(y) = 0, which is equivalent to
2v(1+2V) • y = O. Using table ~12_7, we obtain an approximate solution y = 1.15.
The maximum of gO(y) is then found to be 1.51.
Remark. The variance of the sample mean (of a sample of size 2n+l) drawn
from a Laplace distribution with pdf given by (75) is 2/(2n+l). It follows, from
Theorem 5, that the sample median has smaller variance than the sample mean for
sample size 2n+l ~ 7. In a recent paper, A. E. Sarhan 111_7 found that for
sample sizes equal to 2, 3, 4, and 5, the variance of sample median is also
smaller than that of the sample mean.
\
26
l References
)I ~1-7 T. J. Ila. Bromich, An Introduction to the theory of Infinite Series,. MacMillan and Co., London, 1908•
.. 1:2_7 G. W. Brown and J. W. Tukey, Some distributions of sample means", Ann. Math.Stat., Vol. 17 (1946), pp. 1-12.
£:3_7 J. H. Cadwell, "The distribution of quantiles of small samples", Biometrika,Vol. 39 (1952), pp. 207-211.
["4_7 J. T. Chu, "On the distribution of the sample median", to be published.
£:5_7 W. Feller, "On the normal approximation to the binomial distribution", Ann.Math. Stat., Vol. 16 (1945), pp. 319-329. ----
£:6_7 T. Hojo, "Distirbution of the median, etc.", Biometrika, Vol. 23 (1931),pp. 315-360.
L7_7 K. Knopp, Theory and Application of Infinite Series, Hafner PublishingCompany, New York.
£:8_7 K. Pearson, "On the standard error of the median, etc.", Biometrika 23 (1931),pp. 361-363.
L9_7 K. Pearson, "On the mean character and variance of a ranked indiVidual, etc."Biometrika, Vol. 23 (1931), pp. 364-397.
POlya, "Remarks on computing the probability integral in one and twodimensions", Proceedings of the Berkeley Symposium on MathematicalStatistics and Probability, University of California Press, Berkeley,1949, pp. 63-78.
Lll_7 A. E. Sarhan, "Estimation of the mean and standard deViation by orderstatistics", Ann. Math. Stat., Vol. 25 (1954), pp. 317-328.
1:12-7 Tables of the Exponential Function eX, Applied Mathematics Series, 14,National Bureau of Standards, Washington, D. C., 1951.