Top Banner
Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997
26

Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

Mar 31, 2015

Download

Documents

Brisa Fling
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

Computer algebra and rank statistics

Alessandro Di Bucchianico

HCM Workshop Coimbra

November 5, 1997

Page 2: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

2

How to run this presentation?

• the presentation runs itself most of the time

• click the mouse if you want to continue

• type S to stop or restart the presentation

• underlined items are hyperlinks to files on the World Wide Web (usually Postscripts files of technical reports)

Enjoy my presentation!

Page 3: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

3

Outline

• General remarks on nonparametric methods

• What is computer algebra?

• Case study: the Mann-Whitney statistic

• Critical values of rank test statistics

• Moments of the Mann-Whitney statistic

• Conclusions

Page 4: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

4

General remarks on nonparametric methods

Practical problems

• tables (limited, errors, not exact,…)

• limited availability in statistical software

• procedures in statistical software often only based on asymptotics

Page 5: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

5

General remarks on nonparametric methods

Mathematical problems

• in general no closed expression for distribution function

• direct enumeration only feasible for small sample sizes

• recurrences are time-consuming

Page 6: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

6

What is computer algebra?

ExpandAHx+1LIx2 + 1MESample session in Mathematica 3.0

ExpandAH1+xLI1+x2ME1+x+x2+x3

<<"DiscreteMath RSolve "

SeriesTerm@1H1 - x^4L,8x, 0, n<,Assumptions - >8n ³ 0<D1

4+H- 1Ln4

+IfBEven@nD, 1

2H- 1Ln2, 0F

Page 7: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

7

Case study: Mann-Whitney statistic

independent samples X1,…,Xm and Y1,…,Yn

continuous distribution functions F, G resp.

(hence, no ties with probability one)

order the pooled sample from small to large

Page 8: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

8

Mann-Whitney (continued)

Wilcoxon: Wm,n= i rank(Xi)

Mann-Whitney: Mm,n = #{(i,j) | Yj < Xi}

Wm,n = Mm,n + ½ m (m+1)

What is the distribution of Mm,n under H0:F=G?

Page 9: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

9

m

j

j

nm

ni

imn

k

knm

x

x

n

nmxkMP

1

1

0,

)1(

)1(1

)(

Under H0, we have:

Mann-Whitney generating function in Mathematica 3.0

ExpandASimplifyA1Binomial@5, 3D

Ûi=35 H1 - xiLÛj=13 H1 - xjLEE

1

10+

x

10+x2

5+x3

5+x4

5+x5

10+x6

10

CoefficientList@%, xD:110

,1

10,1

5,1

5,1

5,

1

10,

1

10>

Page 10: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

10

MannWhitneyCumFreq@m_, n_D:= ModuleA8x, i, j<,FoldListAPlus, 0, CoefficientListAExpandAFactorA1Binomial@m +n, nD*Ûi=n+1

m+n H1 - xiLÛj=1m H1- xjLEE, xEEEMannWhitneyLeftSigValue@m_, n_, k_D:= MannWhitneyCumFreq@@k+2DDMannWhitneyLeftCritValue@m_, n_, a_D:= Module@8value= Length@Select@MannWhitneyCumFreq@m, nD, # £ a&DD- 2<,If@NonNegative@N@valueDD, value, "no critical value exists"DD

Page 11: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

11

Computational speed (Pentium 133 MHz)

Exact: P(M5,5 4) = 1/21 0.0476

computing time: 0.05 sec (generating function: degree 25)

P(M5,5 4) 0.0384

Exact: P(M20,20 138) = 0.0482 (rounded)computing time: 8.5 sec (generating function: degree 400)P(M20,20 138) 0.0475

Asymptotics and exact calculations are both useful!

Page 12: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

12

Other examples of nonparametric test statistics with closed form for generating function include:

• Wilcoxon signed rank statistic

• Kendall rank correlation statistic

• Kolmogorov one-sample statistic

• Smirnov two-sample statistic

• Jonckheere-Terpstra statistic

Consult the combinatorial literature!

What to do if there is no generating function?

Page 13: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

13

Linear rank statistics

Z = 1 if th order statistic in the pooled sample is an X-observation, and 0 otherwise

nm

nm ZaT1

, )(

)1(...)1()Pr( )()1(, yxyxyxkT

m

N Naamknm

Nnm k

Streitberg & Röhmel 1986 (cf. Euler 1748):

Branch-and-bound algorithm (Van de Wiel)

Page 14: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

14

Moments of Mann-Whitney statistic

Mann and Whitney (1947) calculated 4th central moment

Fix and Hodges (1955) calculated 6th central moment

Computations are based on recurrences

Can we improve?

solution:computer algebra and generating functions

Page 15: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

15

Computing moments of Mm,n

recompute E(Mm,n) (following René Swarttouw)1

)( )(

xk

kn

n

n xkXPdx

dXE

m

k

km

k

knnm

m

mnn

nm

xxxG

xx

xxxG

11,

1

,

)1log()1log()(log

)1)...(1(

)1)...(1(:)(

Page 16: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

16

.)(lim :Fact

)('lim)1(' hence ,polynomiala is

)1()1(

)1()()1(

)(log)(

)(

1

)(

1)(log

,1x

,1

,,

1

11

,,

,

1

1

1

1

,

n

mnxG

xGGG

xx

xxknxxk

xGdx

d

xG

xGdx

d

x

xkn

x

xkxG

dx

d

nm

nmx

nmnm

m

kknk

kknknk

nmnm

nm

m

kkn

knm

kk

k

nm

Page 17: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

17

Hence, it remains to calculate for 1 k m :

)1()1(

)1()1(lim

)1()1(

)1()()1(lim

1x

11

1x

knk

knn

knk

kknknk

xx

xxnxk

xx

xxknxxk

After some simplifications:

)1()(

)...1()...1(lim

11

1x xknk

xnxk knn

Page 18: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

18

L’Hôpital’s rule yields that the limit equals:

2)1(

2

1()1(

2

1

)(

1 nkkknnnkn

knk

It is tedious to perform these computations by hand.

Alternative:

compute moments using Mathematica.

Page 19: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

19

LogAG@k_, n_, x_D:= LogA1 - xn+kE- LogA1- xkEDerivativeOfLogG@r_D:= Module@8j, der<,Sum@Simplify@r! Coefficient@

Normal@Series@LogG@k, n, xD,8x, 1, r+1<DD, x - 1, rDD,8k, 1, m<DDFactorialMoments@r_D:= Module@8i, j, equations<,

equations = Table@ReplaceAll@Simplify@Together@D, Log@G@zDD,8z, j<DD,

G@zD® 1D== DerivativeOfLogG@jD,8j, 1, r<D;Flatten@ReplaceAll@Table@D@G@zD,8z, i<D,8i, 1, r<D,Solve@equations, Table@D@G@zD,8z, i<D,8i, 1, r<DDDDD

Mathematica procedures for moments of Mm,n:

Page 20: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

20

8th central moment of Mm,n1

34560 Im nH1+m +nLI- 96 m +96 m2 +240 m3 - 240 m4 -

432 m5 - 144 m6 - 96n+192 m n+224 m2 n- 540 m3 n-

100 m4n+780 m5n+404 m6n+96n2+224 m n2 - 600 m2 n2 -

200 m3n2+900 m4 n2 - 48 m5n2 - 420 m6 n2+240n3 - 540 m n3 -

200 m2n3+1095 m3 n3 - 395 m4n3 - 735 m5 n3+175 m6 n3 -

240n4 - 100 m n4+900 m2n4 - 395 m3 n4 - 630 m4 n4+525 m5n4 -

432n5+780 m n5 - 48 m2n5 - 735 m3 n5+525 m4 n5 - 144n6+

404 m n6 - 420 m2n6+175 m3 n6MM

Page 21: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

21

Conclusions

• generating functions are also useful in nonparametric statistics

• computer algebra is a natural tool for mathematicians

• asymptotics and exact calculations complement each other

Page 22: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

22

Topics under investigation

• tests for censored data

• power calculations

• nonparametric ANOVA (Kruskal-Wallis, block designs, multiple comparisons)

• Spearman’s (rank correlation)

• multimedia/ World Wide Web implementation

Click on underlined items to obtain Postscript file of technical report

Page 23: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

23

References

• A. Di Bucchianico, Combinatorics, computer algebra and the Wilcoxon-Mann-Whitney test, to appear in J. Stat. Plann. Inf.

• B. Streitberg and J. Röhmel, Exact distributions for permutation and rank tests: An introduction to some recently published algorithms, Stat. Software Newsletter 12 (1986), 10-18

Page 24: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

24

References (continued)• M.A. van de Wiel, Exact distributions of

nonparametric statistics using computer algebra, Master’s Thesis, TUE, 1996

• M.A. van de Wiel and A. Di Bucchianico, The exact distribution of Spearman’s rho, technical report

• M.A. van de Wiel, A. Di Bucchianico and P. van der Laan, Exact distributions of nonparametric test statistics using computer algebra, technical report

Page 25: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

25

References (continued)

• M.A. van de Wiel, Edgeworth expansions with exact cumulants for two-sample linear rank statistics , technical report

• M.A. van de Wiel, Exact distributions of two-sample rank statistics and block rank statistics using computer algebra , technical report

Page 26: Computer algebra and rank statistics Alessandro Di Bucchianico HCM Workshop Coimbra November 5, 1997.

26

The End