Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997
Mar 31, 2015
Computer algebra and rank statistics
Alessandro Di Bucchianico
HCM Workshop Coimbra
November 5, 1997
2
How to run this presentation?
• the presentation runs itself most of the time
• click the mouse if you want to continue
• type S to stop or restart the presentation
• underlined items are hyperlinks to files on the World Wide Web (usually Postscripts files of technical reports)
Enjoy my presentation!
3
Outline
• General remarks on nonparametric methods
• What is computer algebra?
• Case study: the Mann-Whitney statistic
• Critical values of rank test statistics
• Moments of the Mann-Whitney statistic
• Conclusions
4
General remarks on nonparametric methods
Practical problems
• tables (limited, errors, not exact,…)
• limited availability in statistical software
• procedures in statistical software often only based on asymptotics
5
General remarks on nonparametric methods
Mathematical problems
• in general no closed expression for distribution function
• direct enumeration only feasible for small sample sizes
• recurrences are time-consuming
6
What is computer algebra?
ExpandAHx+1LIx2 + 1MESample session in Mathematica 3.0
ExpandAH1+xLI1+x2ME1+x+x2+x3
<<"DiscreteMath RSolve "
SeriesTerm@1H1 - x^4L,8x, 0, n<,Assumptions - >8n ³ 0<D1
4+H- 1Ln4
+IfBEven@nD, 1
2H- 1Ln2, 0F
7
Case study: Mann-Whitney statistic
independent samples X1,…,Xm and Y1,…,Yn
continuous distribution functions F, G resp.
(hence, no ties with probability one)
order the pooled sample from small to large
8
Mann-Whitney (continued)
Wilcoxon: Wm,n= i rank(Xi)
Mann-Whitney: Mm,n = #{(i,j) | Yj < Xi}
Wm,n = Mm,n + ½ m (m+1)
What is the distribution of Mm,n under H0:F=G?
9
m
j
j
nm
ni
imn
k
knm
x
x
n
nmxkMP
1
1
0,
)1(
)1(1
)(
Under H0, we have:
Mann-Whitney generating function in Mathematica 3.0
ExpandASimplifyA1Binomial@5, 3D
Ûi=35 H1 - xiLÛj=13 H1 - xjLEE
1
10+
x
10+x2
5+x3
5+x4
5+x5
10+x6
10
CoefficientList@%, xD:110
,1
10,1
5,1
5,1
5,
1
10,
1
10>
10
MannWhitneyCumFreq@m_, n_D:= ModuleA8x, i, j<,FoldListAPlus, 0, CoefficientListAExpandAFactorA1Binomial@m +n, nD*Ûi=n+1
m+n H1 - xiLÛj=1m H1- xjLEE, xEEEMannWhitneyLeftSigValue@m_, n_, k_D:= MannWhitneyCumFreq@@k+2DDMannWhitneyLeftCritValue@m_, n_, a_D:= Module@8value= Length@Select@MannWhitneyCumFreq@m, nD, # £ a&DD- 2<,If@NonNegative@N@valueDD, value, "no critical value exists"DD
11
Computational speed (Pentium 133 MHz)
Exact: P(M5,5 4) = 1/21 0.0476
computing time: 0.05 sec (generating function: degree 25)
P(M5,5 4) 0.0384
Exact: P(M20,20 138) = 0.0482 (rounded)computing time: 8.5 sec (generating function: degree 400)P(M20,20 138) 0.0475
Asymptotics and exact calculations are both useful!
12
Other examples of nonparametric test statistics with closed form for generating function include:
• Wilcoxon signed rank statistic
• Kendall rank correlation statistic
• Kolmogorov one-sample statistic
• Smirnov two-sample statistic
• Jonckheere-Terpstra statistic
Consult the combinatorial literature!
What to do if there is no generating function?
13
Linear rank statistics
Z = 1 if th order statistic in the pooled sample is an X-observation, and 0 otherwise
nm
nm ZaT1
, )(
)1(...)1()Pr( )()1(, yxyxyxkT
m
N Naamknm
Nnm k
Streitberg & Röhmel 1986 (cf. Euler 1748):
Branch-and-bound algorithm (Van de Wiel)
14
Moments of Mann-Whitney statistic
Mann and Whitney (1947) calculated 4th central moment
Fix and Hodges (1955) calculated 6th central moment
Computations are based on recurrences
Can we improve?
solution:computer algebra and generating functions
15
Computing moments of Mm,n
recompute E(Mm,n) (following René Swarttouw)1
)( )(
xk
kn
n
n xkXPdx
dXE
m
k
km
k
knnm
m
mnn
nm
xxxG
xx
xxxG
11,
1
,
)1log()1log()(log
)1)...(1(
)1)...(1(:)(
16
.)(lim :Fact
)('lim)1(' hence ,polynomiala is
)1()1(
)1()()1(
)(log)(
)(
1
)(
1)(log
,1x
,1
,,
1
11
,,
,
1
1
1
1
,
n
mnxG
xGGG
xx
xxknxxk
xGdx
d
xG
xGdx
d
x
xkn
x
xkxG
dx
d
nm
nmx
nmnm
m
kknk
kknknk
nmnm
nm
m
kkn
knm
kk
k
nm
17
Hence, it remains to calculate for 1 k m :
)1()1(
)1()1(lim
)1()1(
)1()()1(lim
1x
11
1x
knk
knn
knk
kknknk
xx
xxnxk
xx
xxknxxk
After some simplifications:
)1()(
)...1()...1(lim
11
1x xknk
xnxk knn
18
L’Hôpital’s rule yields that the limit equals:
2)1(
2
1()1(
2
1
)(
1 nkkknnnkn
knk
It is tedious to perform these computations by hand.
Alternative:
compute moments using Mathematica.
19
LogAG@k_, n_, x_D:= LogA1 - xn+kE- LogA1- xkEDerivativeOfLogG@r_D:= Module@8j, der<,Sum@Simplify@r! Coefficient@
Normal@Series@LogG@k, n, xD,8x, 1, r+1<DD, x - 1, rDD,8k, 1, m<DDFactorialMoments@r_D:= Module@8i, j, equations<,
equations = Table@ReplaceAll@Simplify@Together@D, Log@G@zDD,8z, j<DD,
G@zD® 1D== DerivativeOfLogG@jD,8j, 1, r<D;Flatten@ReplaceAll@Table@D@G@zD,8z, i<D,8i, 1, r<D,Solve@equations, Table@D@G@zD,8z, i<D,8i, 1, r<DDDDD
Mathematica procedures for moments of Mm,n:
20
8th central moment of Mm,n1
34560 Im nH1+m +nLI- 96 m +96 m2 +240 m3 - 240 m4 -
432 m5 - 144 m6 - 96n+192 m n+224 m2 n- 540 m3 n-
100 m4n+780 m5n+404 m6n+96n2+224 m n2 - 600 m2 n2 -
200 m3n2+900 m4 n2 - 48 m5n2 - 420 m6 n2+240n3 - 540 m n3 -
200 m2n3+1095 m3 n3 - 395 m4n3 - 735 m5 n3+175 m6 n3 -
240n4 - 100 m n4+900 m2n4 - 395 m3 n4 - 630 m4 n4+525 m5n4 -
432n5+780 m n5 - 48 m2n5 - 735 m3 n5+525 m4 n5 - 144n6+
404 m n6 - 420 m2n6+175 m3 n6MM
21
Conclusions
• generating functions are also useful in nonparametric statistics
• computer algebra is a natural tool for mathematicians
• asymptotics and exact calculations complement each other
22
Topics under investigation
• tests for censored data
• power calculations
• nonparametric ANOVA (Kruskal-Wallis, block designs, multiple comparisons)
• Spearman’s (rank correlation)
• multimedia/ World Wide Web implementation
Click on underlined items to obtain Postscript file of technical report
23
References
• A. Di Bucchianico, Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test, to appear in J. Stat. Plann. Inf.
• B. Streitberg and J. Röhmel, Exact distributions for permutation and rank tests: An introduction to some recently published algorithms, Stat. Software Newsletter 12 (1986), 10-18
24
References (continued)• M.A. van de Wiel, Exact distributions of
nonparametric statistics using computer algebra, Master’s Thesis, TUE, 1996
• M.A. van de Wiel and A. Di Bucchianico, The exact distribution of Spearman’s rho, technical report
• M.A. van de Wiel, A. Di Bucchianico and P. van der Laan, Exact distributions of nonparametric test statistics using computer algebra, technical report
25
References (continued)
• M.A. van de Wiel, Edgeworth expansions with exact cumulants for two-sample linear rank statistics , technical report
• M.A. van de Wiel, Exact distributions of two-sample rank statistics and block rank statistics using computer algebra , technical report
26
The End