Page 1
Robust and Efficient One-way MANOVA Tests -
Supplemental Material
Stefan Van Aelst and Gert Willems
Department of Applied Mathematics and Computer Science
Ghent University, Krijgslaan 281 S9, B-9000 Ghent, Belgium.
Abstract
In this extra report we present some supplemental results related to the robust MANOVA
tests, introduced in the accompanying manuscript. We show Fisher-consistency of the k-
sample S and MM-estimators at unimodal elliptical distributions. We provide simulation
results to verify the performance of the bootstrap based tests in finite samples. Finally,
we illustrate the use of these robust test statistics on a second real data example. The
Appendix also contains the proofs of the theorems in the manuscript.
Keywords: MM-estimators, Fast and Robust Bootstrap, Wilks’ Lambda, Outliers.
1 Introduction
In the accompanying manuscript we introduced the robust MANOVA test statistics SΛ1F ,
SΛ2aF , and SΛ2b
F based on one-sample and k-sample S-estimators. See respectively expressions
(23), (7) and (8) of the manuscript. Based on MM-estimators we introduced the related test
statistics MMΛaF and MMΛb
F , given by expressions (11) and (12) in the manuscript. As in
the manuscript, we replace the subscript F by n in the finite-sample case. To estimate the p-
value of the robust tests based on these test statistics, we introduced a fast and robust (FRB)
bootstrap procedure.
In the next section, we provide a proposition that shows Fisher-consistency of the k-sample
S and MM-estimators at unimodal elliptical distributions under fairly weak conditions on the
loss functions. Section 3 contains more results on the finite-sample accuracy of the FRB based
robust tests. In Section 4 we provide further simulation results regarding the robustness of the
null distribution of the test statistics, irrespective of how the p-value is estimated. Section 5
first investigates the power of the tests by examining to what extent the distribution of the
1
Page 2
test statistics under the alternative differs from the null distribution. We then investigate the
effect of contamination on the power of the robust tests when the p-value is estimated by FRB.
Section 6 contains a second real data example where we illustrate the use of the MM-based
tests for both multivariate and univariate ANOVA. The Appendix contains the proof of the
Fisher-consistency and the proofs of the theorems in the manuscript.
2 Fisher-consistency
In this section we investigate Fisher-consistency of the k-sample S and MM-estimators when the
groups follow a unimodal elliptical distribution with possibly different centers, i.e. Fj = Fµj ,Σ.
The proof of this proposition which is given in the Appendix, is similar to the Fisher-consistency
proof of general multivariate location M-estimators in Alqallaf, Van Aelst, Yohai and Zamar
(2009).
Proposition 1. Suppose that ρ0 and ρ1 are non-decreasing, continuous and strictly increasing
at zero. Moreover, suppose that the groups have elliptical distributions Fj = Fµj ,Σ (j = 1, . . . , k)
with density
fµj ,Σ(x) = f((x− µj)tΣ−1(x− µj)).
The scatter matrix Σ can be decomposed in a scale part σ and a shape part Γ, i.e. Σ = σ2Γ.
The function f is assumed to be non-increasing, continuous and strictly decreasing at zero,
such that the distributions Fj are unimodal elliptical distributions. Let b0 = EF0,I[ρ0(∥x∥)],
then, µ̂(k)j,F = µj (j = 1, . . . , k), Σ̂
(k)F = Σ, and µ̃
(k)j,F = µj (j = 1, . . . , k), Γ̃
(k)F = Γ That is, the
k-sample S and MM-estimators are Fisher-consistent.
Note that the conditions on the loss functions ρ0 and ρ1 in Proposition 1 are fairly weak and
are clearly satisfied for loss functions that satisfy conditions (R1) and (R2) of the manuscript.
3 Finite-sample accuracy of the FRB based tests
In this section we investigate further how closely the actual Type I error rates of our tests
match their nominal values in case of finite samples. As in Section 5 of the manuscript, we
generated data under the null hypothesis of equal means. In particular, the k samples were all
2
Page 3
drawn from the multivariate standard normal distribution N(0, Ip), so µ1 = . . . = µk = 0. For
k = 2, 3 and 5 groups, we considered the dimension p and sample sizes nj combinations given
in Table 1. As in the manuscript, for each number of groups k, we show the simulation results
of the cases according to increasing ratio n/p, which expresses the relative sample size, taking
the dimension into account. The (id)-column in Table 1 gives the position of each case in the
corresponding plots. For each case again 1000 samples were generated and the tests used FRB
with B = 1000 bootstrap samples, yielding a p-value estimate for each test as defined in (24).
k = 2 k = 3 k = 5
p n1 n2 (id) p n1 n2 n3 (id) p n1 n2 n3 n4 n5 (id)
2 10 10 (2) 2 10 10 10 (2) 2 10 10 10 10 10 (2)
2 20 20 (7) 2 20 20 20 (8) 2 20 20 20 20 20 (8)
2 30 30 (9) 2 30 30 30 (10) 2 30 30 30 30 30 (10)
2 50 50 (12) 2 50 50 50 (13) 2 50 50 50 50 50 (13)
2 100 100 (14) 2 100 100 100 (15) 2 100 100 100 100 100 (14)
2 200 200 (15) 2 20 20 10 (5) 2 20 20 10 10 10 (5)
2 20 10 (5) 2 30 30 10 (9) 2 50 50 10 10 10 (9)
2 30 10 (8) 2 50 50 20 (12) 2 100 50 20 20 20 (12)
2 50 20 (11) 2 100 50 20 (14) 6 20 20 20 20 20 (1)
6 20 20 (1) 6 20 20 20 (1) 6 30 30 30 30 30 (3)
6 30 30 (3) 6 30 30 30 (3) 6 50 50 50 50 50 (7)
6 50 50 (6) 6 50 50 50 (6) 6 100 100 100 100 100 (11)
6 100 100 (10) 6 100 100 100 (11) 6 50 50 20 20 20 (4)
6 200 200 (13) 6 50 50 20 (4) 6 100 50 20 20 20 (6)
6 50 20 (4) 6 100 50 20 (7)
Table 1: Simulation settings under the null hypothesis to investigate the accuracy of the FRB
tests.
The curves in Figure 1 represent the observed type I error rate if the tests would be per-
formed on the 5% significance level. The top row in Figure 1 contains the results for k = 2,
the middle row for k = 3 and the bottom row for k = 5. The left plots consider tests based
3
Page 4
on S-estimates, while the right plots consider MM-estimates. The results for other significance
levels were found to be similar.
0 5 10 150
0.05
0.1
0.15
0.2
case #0 5 10 15
0
0.05
0.1
0.15
0.2
case #
S1S2aS2b
MaMb
0 5 10 150
0.05
0.1
0.15
0.2
case #0 5 10 15
0
0.05
0.1
0.15
0.2
case #
S1S2aS2b
MaMb
0 5 10 150
0.05
0.1
0.15
0.2
case #0 5 10 15
0
0.05
0.1
0.15
0.2
case #
MaMb
S1S2aS2b
Figure 1: Observed FRB-based Type I-error rates for nominal level 0.05, various sample sizes
and dimensions; The top panel contains the results for k = 2 groups, the middle panel for k = 3
groups, and the bottom panel for k = 5 groups; In each plot the cases are ordered as indicated
in Table 1.
The plots in Figure 1 confirm the conclusions in the manuscript. That is, the FRB based
tests are more accurate for increasing relative sample size. Moreover, the FRB based tests are
4
Page 5
more accurate for MM-estimates than for S-estimates. Especially the SΛ2bn test turns out to
be overly liberal. For the MM-based tests, the results for MMΛbn are slightly better than for
MMΛan. We conclude that the FRB is a sufficiently reliable method to obtain p-values for the
MM-based tests, even in small samples, while it requires larger sample sizes to obtain accurate
p-values for the S-based tests.
4 Robustness of the null distribution
Here, we investigate the effect of contamination on the (true) null distribution of the test
statistics, which is not dependent on the FRB or the p-value estimates in general. We consider
the same test statistics as in Section 6 of the manuscript. That is, the five test statistics
based on S/MM-estimators, the classical Wilks’ Lambda test statistic, the rank-transformed
MANOVA test statistic, and the MCD test statistic.
The simulation setting is the same as in Section 6 of the manuscript. In particular, groups
X1 to Xk−1 are generated according to N(0, Ip), and group Xk follows the contamination model
(1− ϵ)N(0, Ip) + ϵ N(µc, Ip)
where µc = d√χ2p,.999/p, and d = 2, 5 or 10. We consider the setting of k = 3, with sample
sizes nj = 20 or nj = 100 (j = 1, . . . , 3) and dimension p = 2 or p = 6. The outlier proportion
was fixed at ϵ = 0.10 and 1000 samples were generated for each case.
Figures 2 and 3 show probability-probability plots of the empirical distribution of the test
statistics in the contaminated data versus the distribution in the non-contaminated data. In
particular, withN = 1000, the i-th data point in the plots (i = 1, . . . , N) equals ( iN+1 , F̂cont(F̂
−1norm(
iN+1))),
where F̂cont and F̂norm denote the empirical distributions of the statistic in respectively the con-
taminated and the non-contaminated samples. Large deviations from the diagonal indicate an
adverse effect of the outliers.
Figure 2 shows the results for n1 = n2 = n3 = 20 and p = 2, where the outlier distance is
d = 2 in the upper row and d = 10 in the lower row. The plots on the left correspond to the
S-based test statistics, the middle plots to the MM-based test statistics, and the plots on the
right correspond to Wilks’ Lambda, its rank-transformed version and the MCD test statistic.
Figure 3 shows the results for the case n1 = n2 = n3 = 100 and p = 6. It can be seen that
5
Page 6
0 0.5 10
0.2
0.4
0.6
0.8
1O
utlie
r D
ista
nce
= 2
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
Out
lier
Dis
tanc
e =
10
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
S1S2aS2b
MaMb
ClasRankMCD
Figure 2: PP-plots comparing the distribution of the test statistics under the null hypothesis
in contaminated versus non-contaminated samples, with n1 = n2 = n3 = 20, p = 2.
0 0.5 10
0.2
0.4
0.6
0.8
1
Out
lier
Dis
tanc
e =
2
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
Out
lier
Dis
tanc
e =
10
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
S1S2aS2b
MaMb
ClasRankMCD
Figure 3: PP-plots comparing the distribution of the test statistics under the null hypothesis
in contaminated versus non-contaminated samples, with n1 = n2 = n3 = 100, p = 6.
6
Page 7
the S-based and MCD-based statistics are most resistant to the outliers in all cases while the
MM-based statistics are only slightly more affected. Neither the classical Wilks’ Lambda nor
its rank-transformed version are very robust, as is especially clear in Figure 3.
5 Power of the tests
Now, we compare the finite-sample power of the tests. We generate samples under the al-
ternative hypothesis Ha as in Section 7 of the manuscript. In particular, groups X1 to Xk−1
are generated according to N(0, Ip) as before, but group Xk is generated from N(µd, Ip) where
µd = (d, 0, . . . , 0) with d = 0.2, 0.5, 0.7, 1, 1.5 or 2. For k = 3 groups, we generated 1000 samples
for the same dimension and sample size combinations as in the previous section.
We first examine to what extent the test statistics can differentiate between H0 and Ha,
independently of how the p-values are estimated. This corresponds to the question of power in
case we would have the exact null distribution at our disposal. Figures 4 and 5 show probability-
probability plots of the empirical distribution under Ha versus that under H0. Figure 4 shows
the case n1 = n2 = n3 = 20 and p = 2 while Figure 5 shows the case n1 = n2 = n3 = 20 and
p = 6. For nj = 100, the power was consistently very high and these results are omitted here.
In both figures, the top row corresponds to d = 0.5 and the bottom row to d = 1. The
eight test statistics are distributed over the three plots as before, but the curve for the classical
Wilks’ Lambda is shown in each of the three plots for ease of comparison. The classical Wilks’
Lambda, being the likelihood ratio statistic under normality, obviously has the highest power.
However, the robust MM-based statistics are almost equally powerful. They are slightly better
than the rank-based statistic and also more powerful than the MCD-based test, especially
in higher dimensions as seen in Figure 5. The test statistics based on S-estimates clearly
have lower power than their MM-based counterparts, although their performance is better in
higher dimensions (it is well-known that S-estimates become rapidly more efficient in higher
dimensions).
We now examine the possible effect of outliers on the power of the tests. For each case, we
generate 1000 random samples under Ha as above, but the observations in Xk are contaminated
so that the classical test is tempted to accept the null hypothesis that Xk has the same mean
7
Page 8
0 0.5 10
0.2
0.4
0.6
0.8
1d
= 0.
5
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
d =
1
ClasS1S2aS2b
0 0.5 10
0.2
0.4
0.6
0.8
1
ClasMaMb
0 0.5 10
0.2
0.4
0.6
0.8
1
ClasRankMCD
Figure 4: Size-power curves, comparing the distribution of the test statistics under Ha versus
under H0 for the case n1 = n2 = n3 = 20, p = 2.
0 0.5 10
0.2
0.4
0.6
0.8
1
d =
0.5
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
d =
1
0 0.5 10
0.2
0.4
0.6
0.8
1
0 0.5 10
0.2
0.4
0.6
0.8
1
ClasS1S2aS2b
ClasMaMb
ClasRankMCD
Figure 5: Size-power curves, comparing the distribution of the test statistics under Ha versus
under H0 for the case n1 = n2 = n3 = 20, p = 6.
8
Page 9
as the other groups. This is accomplished by drawing the observations in Xk from
(1− ϵ)N(µd, Ip) + ϵ N(µ−d/ϵ, Ip)
with µd = (d, 0, . . . , 0) as before and similarly µ−d/ϵ = (−d/ϵ, 0, . . . , 0). This is indeed the
worst-case contamination scenario for the power of the classical Wilks’ Lambda, while it likely
is approximately worst-case for the robust tests. We again take ϵ = 0.10 and consider d =
0.5, 0.7, 1, 1.5 and 2.
Figure 6 shows the observed power under contamination for sample sizes n1 = n2 = n3 = 20
in p = 2 and p = 6 dimensions. Hence, these results can be compared to the power found
without contamination in Figure 5 of the manuscript. First note that, not surprisingly, the
classical test shows complete power breakdown. It can further be seen that all the robust
alternatives have only a small loss of their power. Moreover, the outliers have not affected the
performance order of the robust tests, with MMΛan and MMΛb
n yielding the most powerful
tests.
From the influence functions in Theorem 2 of the manuscript, we know that the MM-
based statistics have a larger bias than the S-based robust test statistics for intermediate
outliers. For larger samples, their larger bias may weaken the MM-based tests in comparison
to its competitors. This can be seen in Figure 7 which show the results for sample sizes
n1 = n2 = n3 = 100 in p = 2 and p = 6 dimensions respectively. While the power now
converges more rapidly to 1 as the distance d increases, it is noted that the S and MCD tests
enjoy better power for small d. Indeed, these estimators have a higher variance but a lower bias
than the MM-estimators for close by outliers. Especially, for large n this bias effect obviously
prevails yielding a disadvantage for the MM-based tests.
6 Example: Biting flies
We consider the Biting flies data from Johnson and Wichern (1998), consisting of two groups
of 35 flies (Leptoconops torrens and Leptoconops carteri), on which 7 characteristics were
measured. The second group contains an obvious outlier in the second variable (wing width)
as can be seen from the boxplot in Figure 8. It appears that for this variable the location of
the second group (L.carteri) is higher than that of the first group, but the second group has
9
Page 10
0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
d0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
d0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
d
ClasS1S2aS2b
ClasMaMb
ClasRankMCD
0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
d0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
d0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
d
ClasS1S2aS2b
ClasMaMb
ClasRankMCD
Figure 6: Power for the case n1 = n2 = n3 = 20 with 10% outliers in the last group. The top
panel considers the case p = 2 for Ha : µ3 = (d, 0) . The bottom panel considers the case p = 6
for Ha : µ3 = (d, 0, . . . , 0).
one extremely low outlier with value 19 which affects the sample mean of this group.
Let us start with the MANOVA tests using all 7 variables. Because the sample size is small,
we use the 95% efficient MM-based test statistics with FRB based on B = 5000 bootstrap
samples to estimate the p-value of the tests. We compare the results to the classical test and
the MCD based test. Table 2 shows the p-values for each of the tests. From this table, we
immediately see that all tests clearly reject the null hypothesis of equal centers, although the
robust tests downweight the effect of the outlier and the classical Wilk’s lambda does not.
Hence, the outlier in the data had no negative effect on the overall classical Wilk’s lambda test
10
Page 11
0 0.5 1 1.5 20
0.2
0.4
0.6
0.8
1
d0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
d0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
d
ClasS1S2aS2b
ClasMaMb
ClasRankMCD
0 0.5 1 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
d
0 0.5 1 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
d
0 0.5 1 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
d
ClasS1S2aS2b
ClasMaMb
ClasRankMCD
Figure 7: Power for the case n1 = n2 = n3 = 100 for the alternative Ha : µ3 = (d, 0) and with
10% outliers in the last group. The top panel considers the case p = 2 for Ha : µ3 = (d, 0) .
The bottom panel considers the case p = 6 for Ha : µ3 = (d, 0, . . . , 0).
in this case. This is a consequence of the large differences in sample means that are present for
some of the other variables in the dataset.
Clas MMΛan MMΛb
n MCD-test
p .0000 .0034 .0002 .0000
Table 2: p-values for the classical and robust MANOVA tests applied on the biting flies data
to investigate a mean difference between the two species.
11
Page 12
20 25 30 35 40 45 50
1
2
wing width
grou
p
Figure 8: Biting flies data: boxplots of wing width measurements in both groups.
Let us now consider the simple setting of univariate one-way ANOVA for the wing width
measurements which may be part of a post hoc analysis to find out in which variables the two
groups differ. Note that, also in the univariate case outlier-robust testing for equality of means
has received quite little attention in the literature and statistical software. The tests proposed
in this paper are also useful for this setting. To illustrate the robustness advantage of our tests
we again compare them to the classical test (which here simplifies to a two-sample t-test). The
t-test yields a p-value of 0.398. To illustrate the effect of the outlier in this case, we repeat
the analysis without the outlier. After removing the outlier (observation x2,1) the t-test finds
a significant difference in means, with p = 0.021. The corresponding empirical means can be
found in the left panel of Table 3.
sample means and t-test MM-centers and MMΛn test
µ1 µ2 p µ1 µ2 p
full data 42.91 43.74 .398 42.98 44.56 .021
x2,1 removed 42.91 44.47 .021 42.99 44.57 .023
Table 3: Location estimates for the two groups of biting flies and the corresponding p-values for
the classical t-test and the robust MM-based test, both before and after removal of the outlier.
Clearly, the outlier now critically influenced the classical test. Therefore, let us now consider
the robust MM-based tests. Note that for univariate data the test statistics (11) and (12)
coincide, hence we denote it by MMΛn in this case. Table 3 also contains the location MM-
estimates. It can be seen that the outlier has very little effect on the location MM-estimates,
as expected. Figure 9 shows the histogram of the bootstrapped test statistics MMΛ∗n. The
12
Page 13
value of the test statistic in the original sample was MMΛn = 0.939 and is indicated in Figure
9 by the vertical line. In 106 out of the 5000 bootstrap samples, we obtained MMΛ∗n < 0.939,
which yields p̂ = 0.021 according to (24). We thus find that the robust test behaves similarly
to the classical test after removal of the outlier. For the data without the outlier, the estimated
p-value of the MM-based test becomes p̂ = 0.023, which shows that the outlier also has very
limited influence on the robust test.
0.85 0.9 0.95 10
500
1000
1500
2000
MM−Λn*
Figure 9: Biting flies data: histogram of bootstrapped MMΛn values; vertical line indicates
the MM-based test statistic in the original sample
Appendix
Proof of Proposition 1
Let us first consider one-sample S-estimators (µ̂(1)F , Σ̂
(1)F ), i.e. k = 1. Due to affine equiv-
ariance of the estimators, we only have to consider the case µ = 0 and Σ = I, i.e. F = F0,I .
Clearly, taking b0 = EF0,I[ρ0(∥x∥)] implies that the choice (0, I) satisfies the constraint (2) of
the manuscript and therefore is a candidate solution. Now suppose that there exists another
solution (m,S) ̸= (0, I) that satisfies constraint (2) in the manuscript with |S| ≤ |I| = 1. Let
us decompose S into a scale part s and shape part G as before, i.e. S = s2G with |G| = 1 and
13
Page 14
s ≤ 1, then from constraint (2) in the manuscript, we obtain
b0 = EF0,I
(ρ0
([(x−m)tS−1(x−m)]
12
))= EF0,I
(ρ0
([(x−m)tG−1(x−m)]
12 /s))
≥ EF0,I
(ρ0
([(x−m)tG−1(x−m)]
12
)),
where the last inequality holds because ρ0 is nondecreasing. Hence, to show that µ̂(1)F = 0 and
Σ̂(1)F = I, it suffices to show that for all (m,G) ̸= (0, I) with |G| = 1 we have that
EF0,I
(ρ0
([(x−m)tG−1(x−m)]
12
))> b0. (1)
We start by showing that for all z > 0
P ([(x−m)tG−1(x−m)]12 ≤ z) ≤ P (||x|| ≤ z) (2)
Let A1 = {[(x−m)tG−1(x−m)]12 ≤ z} , A2 = {||x|| ≤ z}, B = A1 ∩A2. Then, the left hand
side of (2) can be rewritten as
P ([(x−m)tG−1(x−m)]12 ≤ z) =
∫A1
dF0,I(x) =
∫A1∩A2
f(xtx) dx+
∫A1\A2
f(xtx) dx
≤∫A1∩A2
f(xtx) dx+ f(z2)
∫A1\A2
dx
=
∫A1∩A2
f(xtx) dx+ f(z2)
∫A2\A1
dx
≤∫A1∩A2
f(xtx) dx+
∫A2\A1
f(xtx) dx
=
∫A2
dF0,I(x) = P (||x|| ≤ z)
The inequality in the second step holds because f is non-increasing and by noting that ||x|| > z
for all x ∈ A1 \ A2. The equality in the third steps holds because |G| = |I| = 1 implies that
vol(A1) = vol(A2) and thus also vol(A1 \ A2) = vol(A2 \ A1), where vol stands for volume.
Similarly as in the second step, the inequality in the fourth step follows by noting that ||x|| ≤ z
for all x ∈ A2 \A1 and f is non-increasing.
Because f is assumed to be strictly decreasing at zero, a similar derivation shows that for
(m,G) ̸= (0, I), there exists ε > 0 such that for z ≤ ε we have
P ([(x−m)tG−1(x−m)]12 ≤ z) < P (||x|| ≤ z) (3)
14
Page 15
Results (2)-(3), together with the conditions on ρ0 imply that ρ0([(x −m)tG−1(x −m)]12 ) is
stochastically larger than ρ0((||x||), which immediately yields (1) and hence Fisher-consistency
of the one-sample S-estimators.
Now consider k-sample S-estimators. Since the S-estimators only use within-group dis-
tances, they are affine equivariant, and thus we only have to consider the case µj = 0 (j =
1, . . . , k) and Σ = I, i.e. F = F0,I for j = 1, . . . , k. As before, taking b0 = EF0,I[ρ0(∥x∥)]
implies that the choice (0, . . . ,0, I) satisfies the constraint (3) of the manuscript and therefore
is a candidate solution. As in the one-sample case, to show that there does not exist another
solution (m1, . . . ,mk,S) ̸= (0, . . . ,0, I) with |S| ≤ |I| = 1 that satisfies constraint (3) of the
manuscript, we need to show that
k∑j=1
πjEF0,I
(ρ0
([(x−mj)
tG−1(x−mj)]12
))> b0, (4)
where S = s2G with |G| = 1 as before. The derivation for the one-sample case above shows
that for each of the expectations in (4) we have that
EF0,I
(ρ0
([(x−mj)
tG−1(x−mj)]12
))> b0 (5)
if (mj ,S) ̸= (0, I), which immediately implies (4) and hence Fisher-consistency of the k-sample
S-estimators.
The k-sample MM-estimators are also based on within-group distances, so they are affine
equivariant, and thus we again only have to consider the case µj = 0 (j = 1, . . . , k) and Σ = I.
The k sample MM-estimators (µ̃(k)1,F , . . . , µ̃
(k)g,F , Γ̃
(k)F ) then minimize
k∑j=1
πjEF0,I
(ρ1
([(x−mj)
tG−1(x−mj)]12 /σ̂
(k)F
))with |G|=1. Here σ̂
(k)F is the k-sample S-scale which equals 1 due to Fisher-consistency of the
k-sample S-estimators. Hence, to show Fisher-consistency of the k-sample MM-estimators, we
need that
k∑j=1
πjEF0,I
(ρ1
([(x−mj)
tG−1(x−mj)]12
))> EF0,I
[ρ1(∥x∥)] = b1 (6)
for all (m1, . . . ,mk,G) ̸= (0, . . . ,0, I) with |G| = 1. Similarly as for ρ0, results (2)-(3), together
with the conditions on ρ1 imply that for j = 1, . . . , k
EF0,I
(ρ1
([(x−mj)
tG−1(x−mj)]12
))> b1 (7)
15
Page 16
if (mj ,G) ̸= (0, I), which immediately implies (6) and hence Fisher-consistency of the k-sample
MM-estimators.
Proof of Theorem 1 of the manuscript:
Because of equivariance we assume, without loss of generality, that Σ = Ip and µ = 0. First
note that the delta method implies that
n(1− SΛ1n) =
1
2pn
(1− |Σ̂(k)|
|Σ̂(1)|
)+ op(1) (8)
In the following we drop the subscript n in the notation of the estimates, for convenience. To
simplify the derivation, we first rewrite the right hand side of(8) as follows
n
(1− |Σ̂(k)|
|Σ̂(1)|
)= n
(1− |(Σ̂(1)
)−1|
|(Σ̂(k))−1|
)=
1
|(Σ̂(k))−1|
n (|(Σ̂(k))−1| − |(Σ̂(1)
)−1|) (9)
Furthermore, notice that by considering the following Taylor expansion of |(Σ̂(1))−1|
|(Σ̂(1))−1| = 1 + tr((Σ̂
(1))−1 − Ip) + op(n
−1)
and similarly for |(Σ̂(k))−1|, we can rewrite the difference in determinants in the right hand
side of (9) as
|(Σ̂(k))−1| − |(Σ̂(1)
)−1| = tr((Σ̂(k)
)−1 − (Σ̂(1)
)−1) + op(n−1). (10)
Hence this allows us to work with traces rather than determinants, in the remainder.
We now expand the following two representations:
1
n
n∑i=1
ρ0
([(xi − µ̂(1))t(Σ̂
(1))−1(xi − µ̂(1))]1/2
)= b0 (11)
1
n
k∑j=1
∑i∈Πj
ρ0
([(xi − µ̂
(k)j )t(Σ̂
(g))−1(xi − µ̂
(k)j )]1/2
)= b0 (12)
where µ̂(1) denotes the one-sample S-estimate and µ̂(k)1 , . . . , µ̂
(k)k the k-sample S-estimates.
The second-order expansion of (11) is given by
b0 =1
n
n∑i=1
ρ0(∥xi∥)−
(1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
xti
)(µ̂(1) − µ) + tr
[(1
n
n∑i=1
ψ0(∥xi∥)2∥xi∥
xixti
)((Σ̂
(1))−1 − Ip)
]
+1
2(µ̂(1) − µ)t
(1
n
n∑i=1
(ψ′0(∥xi∥)∥xi∥
− ψ0(∥xi∥)∥xi∥2
)xix
ti
∥xi∥+
1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
Ip
)(µ̂(1) − µ)
+An[µ, (Σ̂(1)
)−1] +Bn[(Σ̂(1)
)−1] + op(n−1)
16
Page 17
Here, An[µ, (Σ̂(1)
)−1] represents the mixed second order term and Bn[(Σ̂(1)
)−1] represents the
term involving the second derivative w.r.t. (Σ̂(1)
)−1. The former can be shown to be of order
op(n−1) and may therefore be dropped from the further derivation.
We have
1
n
n∑i=1
(ψ′0(∥xi∥)∥xi∥
− ψ0(∥xi∥)∥xi∥2
)xix
ti
∥xi∥+
1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
Ipa.s.−→ β0 Ip
and
1
n
n∑i=1
ψ0(∥xi∥)2∥xi∥
xixti
a.s.−→ γ02p
Ip
Hence, we can write
γ02pn tr
[(Σ̂
(1))−1 − Ip
]= n(b0 −
1
n
n∑i=1
ρ0(∥xi∥)) + n
(1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
xti
)(µ̂(1) − µ)
−1
2β0n (µ̂
(1) − µ)t(µ̂(1) − µ) + nBn[(Σ̂(1)
)−1] + op(1)
Similarly we expand (12) and then take the difference, which yields
γ02pn tr
[(Σ̂
(k))−1 − (Σ̂
(1))−1]= n
k∑j=1
πj
1
nj
∑i∈Πj
ψ0(∥xi∥)∥xi∥
xti
(µ̂(k)j − µ̂(1))
−1
2β0 n
k∑j=1
πj(µ̂(k)j − µ)t(µ̂
(k)j − µ) +
1
2β0 n (µ̂
(1) − µ)t(µ̂(1) − µ)
+nBn[(Σ̂(k)
)−1]− nBn[(Σ̂(1)
)−1] + op(1)
It can be shown that the difference Bn[(Σ̂(k)
)−1] − Bn[(Σ̂(1)
)−1] is of order op(n−1), so also
this term can henceforth be omitted.
We now use the first-order approximation for the location estimates:
√n(µ̂(1) − µ) =
1
β0
√n
(1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
xi
)+ op(1),
√n(µ̂
(k)j − µ) =
1
β0
√n
1
nj
∑i∈Πj
ψ0(∥xi∥)∥xi∥
xi
+ op(1) j = 1, . . . , k.
Direct use of these approximations yields
1
2pγ0 n tr
[(Σ̂
(k))−1 − (Σ̂
(1))−1]
=1
2
1
β0
n k∑j=1
πj
1
nj
∑i∈Πj
ψ0(∥xi∥)∥xi∥
xi
t 1
nj
∑i∈Πj
ψ0(∥xi∥)∥xi∥
xi
−n
(1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
xi
)t(1
n
n∑i=1
ψ0(∥xi∥)∥xi∥
xi
)]+ op(1).
17
Page 18
Now, denote
Zj =1
nj
∑i∈Πj
ψ0(∥xi∥)∥xi∥
xi (j = 1, . . . , k)
such that we can write
β0γ0p
n tr[(Σ̂
(k))−1 − (Σ̂
(1))−1]=
n k∑j=1
πjZtjZj − n
k∑j=1
πjZj
t k∑j=1
πjZj
+ op(1).
(13)
Consider the following (Helmert-type) transformation for these variables:
Y1 =k∑
j=1
πjZj
Y2 = (π1Z1 − π1Z2)1√
π1 + π21/π2
Y3 = (π1Z1 + π2Z2 − (π1 + π2)Z3)1√
π1 + π2 + (π1 + π2)2/π3...
...
Yk =
k−1∑j=1
πjZj −
k−1∑j=1
πj
Zk
1√∑k−1j=1 πj + (
∑k−1j=1 πj)
2/πk
It can be shown that this is an orthogonal transformation, such that the Yj are independent
andk∑
j=1
πjZtjZj =
k∑j=1
YtjYj
Noting that the second term on the right-hand side in (13) equals nYt1Y1, we may rewrite the
equation as
β0γ0p
n tr[(Σ̂
(k))−1 − (Σ̂
(1))−1]=
n k∑j=2
YtjYj
+ op(1) (14)
(note that the sum starts from j = 2 instead of j = 1). For n→ ∞ we have that
ZjD−→ N(0,
1
πjnα0Ip),
independently for each j = 1, . . . , k, where α0 =1pEF0,I
[ψ20(∥x∥)]. Hence,
YjD−→ N(0,
1
nα0Ip),
independently for each j = 1, . . . , k. We therefore have in equation (14) a sum of squares of
p(k − 1) independent (asymptotically) normally distributed variables, leading to the result
n tr[(Σ̂
(k))−1 − (Σ̂
(1))−1]=EF0,I
[ψ20(∥x∥)]
γ0 β0χ2p(k−1) + op(1).
18
Page 19
Then by (8), (10), and since the remaining factor in (9) converges almost surely to 1 by the
assumption of Σ = Ip, we finally have
n(1− SΛ1n) =
EF0,I[ψ2
0(∥x∥)]2pγ0 β0
χ2p(k−1) + op(1).
The proofs for the other test statistics proceed along the same lines.
Proof of Theorem 2 of the manuscript:
The result requires taking second derivatives of the statistics, which is straightforward but
tedious. To keep the notation simple, we sketch the derivation here for the univariate case
only (p = 1). We write the test statistics explicitly as functionals, i.e. as functions of the
distribution F . In the univariate case, the functionals of the test statistics become
SΛ1(F ) =σ(k)(F )
σ(1)(F )
SΛ2.(F ) =
∑kj=1 πj EFj
[ρ0((x− µ
(k)j (F ))/σ(k)(F ))
]EF1
[ρ0((x− µ(1)(F ))/σ(k)(F ))
]MMΛ.(F ) =
∑kj=1 πj EFj
[ρ1((x− µ̃
(k)j (F ))/σ(k)(F ))
]EF1
[ρ1((x− µ̃(1)(F ))/σ(k)(F ))
]We assume that k = 2 (the derivation is completely analogous in case k > 2) and recall that
the contamination is placed in group 1. For the first test statistic, note that
∂2
(∂ϵ)2SΛ1(Fϵ,y)|ϵ=0 =
∂2
(∂ϵ)2σ(k)(Fϵ,y)|ϵ=0 −
∂2
(∂ϵ)2σ(1)(Fϵ,y)|ϵ=0 (15)
because σ(k)=σ(1) = 1 and the first-order influence function (IF) of σ(k) and σ(1) coincide, i.e.
∂
∂ϵσ(k)(Fϵ,y)|ϵ=0 =
∂
∂ϵσ(1)(Fϵ,y)|ϵ=0.
The scale functionals σ(1) and σ(k) are defined by (the univariate equivalent of) (2) and (3) in
the manuscript. Their second-order derivatives can be obtained by taking the second derivative
of these equations. For instance, for σ(k) we find that
π1∂2
(∂ϵ)2
((1− ϵ)E
[ρ0(
x− µ(k)1 (Fϵ,y)
σ(k)(Fϵ,y))
]+ ϵρ0(
y − µ(k)1 (Fϵ,y)
σ(k)(Fϵ,y))
)+π2
∂2
(∂ϵ)2E
[ρ0(
x− µ(k)2 (Fϵ,y)
σ(k)(Fϵ,y))
]= 0
where the expectation is always over F0,1. From this expression we obtain that
∂2
(∂ϵ)2σk(Fϵ,y)|ϵ=0 =
−1
E[ψ0(x)x]
(−π1(E[2ψ0(x)x+ ρ0(x)]− (2ψ0(y)y − ρ0(y)))IF (y, σ
(k), F )
19
Page 20
+2π1ψ0(y)IF (y, µ(k)1 , F )− π1E[ψ0(x)] IF (y, µ
(k)1 , F )2
−E[ψ′0(x)x
2 + ψ0(x)x] IF (y, σ(k), F )2
)(16)
A similar expression holds for σ(1). From the first-order derivatives of equations (2) and (3) in
the manuscript we find that
IF (y, σ(k), F ) = IF (y, σ(1), F ) =π1(ρ0(y)− b0)
E[ψ0(x)x]
IF (y, µ(k)1 , F ) =
ψ0(y)
E[ψ′0(x)]
IF (y, µ(1), F ) =π1ψ0(y)
E[ψ′0(x)]
The second-order IF for SΛ1 now follows through (15).
For SΛ2.(F ), note that the numerator actually equals b0 according to (3) in the manuscript,
and write SΛ2.(Fϵ,y) = b0/V (ϵ), where V (0) = b0, such that we have
∂2
(∂ϵ)2SΛ2.(Fϵ,y)|ϵ=0 = − 1
b0
∂2
(∂ϵ)2V (ϵ)|ϵ=0 (17)
The second derivative of V (ϵ) at 0 is
∂2
(∂ϵ)2V (ϵ)|ϵ=0 = 2π21 (E[ψ0(x)x]− ψ0(y)y)
(ρ0(y)− b0)
E[ψ0(x)x]− π21
ψ20(y)
E[ψ′0(x)]
+π2iE[ψ′0(x)x
2 + ψ0(x)x](ρ0(y)− b0)
2
E[ψ0(x)x]2− E[ψ0(x)x]IF2(y, σ
(k), F )
where IF2(y, σ(k), F ) is given by (16). The second-order IF for SΛ2. then follows by (17).
In MMΛ. the numerator is not constant anymore, but the derivation of the second-order
IF is analogous to that of SΛ2..
Proof of Theorem 3 of the manuscript:
The test statistic Λ is asymptotically χ2q according to (13) in the manuscript and up to O(1/n)
we have that α(Fϵ,n) = 1−Hq(η1−α0 , δ(ϵ)) where δ(ϵ) = nΛ(Fϵ,n). Let b(ϵ) = −Hq(η1−α0 , δ(ϵ)),
then we have up to O(1/n) that
α(Fϵ,n)− α0 = b(ϵ)− b(0) = ϵ b′(0) +ϵ2
2b′′(0) + o(ϵ2).
A second order von Mises expansion of Λ(Fϵ,n) yields
Λ(Fϵ,n) = Λ(F ) +ϵ√n
∫ξ1(x) dG(x) +
1
2
ϵ2
n
∫∫ξ2(x,y) dG(x) dG(y) + o(ϵ2/n) (18)
20
Page 21
with ξ1(x) = IF (x,Λ, F ) = 0 (see e.g. Fernholz 2001, Gatto and Ronchetti 1996). From (18)
we immediately obtain b′(0) = κ ∂δ∂ϵ |ϵ=0 = nκ
∂Λ(Fϵ,n)∂ϵ |ϵ=0 = 0, and
b′′(0) = κ∂2δ
∂ϵ2|ϵ=0 = nκ
∂2Λ(Fϵ,n)
∂ϵ2|ϵ=0 = κ
∫∫ξ2(x,y) dG(x) dG(y).
For G = ∆y this expression reduces to b′′(0) = κξ2(y,y) = κ IF2(y,Λ, F ).
Proof of Theorem 4 of the manuscript:
Analogously to Θ̂R∗n , define
Θ̂Rn := Θ+ [I−∇gn(Θ)]−1(gn(Θ)−Θ).
where Θ denotes the limiting values of Θ̂n as before. It can be shown, as in Salibian-Barrera
et al. (2006), that
Θ̂Rn − Θ̂n = op(n
−1/2). (19)
Let h denote the limiting version of the statistic hn. Condition (21) in the manuscript means
that ∇h(Θ) = 0 under the null hypothesis. Since Θ̂n are root-n consistent estimates for Θ,
this implies that
∇hn(Θ̂n) = Op(n−1/2). (20)
Therefore, considering the expansion
n(hn(Θ̂Rn )− hn(Θ)) = n(hn(Θ̂n)− hn(Θ)) +∇hn(Θ̂n)n(Θ̂
Rn − Θ̂n) +Rn
we may conclude that
n(hn(Θ̂Rn )− hn(Θ)) = n(hn(Θ̂n)− hn(Θ)) + op(1). (21)
For the left-hand side of (21), we can write
n(hn(Θ̂Rn )− hn(Θ)) = n∇hn(Θ)(Θ̂
Rn −Θ) + n(Θ̂
Rn −Θ)tHn(Θ)(Θ̂
Rn −Θ) + op(1) (22)
where Hn(.) is the Hessian matrix corresponding to hn(.). It follows from Salibian-Barrera et
al.(2006, Theorem 2) that
(Θ̂R∗n − Θ̂n) = (Θ̂
Rn −Θ) + op(n
−1/2), (23)
21
Page 22
where the left-hand side is considered conditionally on the sample. By (22) and (23), and since
∇hn(Θ) = Op(n−1/2), we can write
n(hn(Θ̂Rn )− hn(Θ)) = n∇hn(Θ)(Θ̂
R∗n − Θ̂n) +n(Θ̂
R∗n − Θ̂n)
tHn(Θ)(Θ̂R∗n − Θ̂n) + op(1) (24)
The asymptotic validity of the bootstrap (see e.g. Bickel and Freedman 1981) implies that we
similarly have
n(h∗n(Θ̂Rn )−h∗n(Θ̂n)) = n∇h∗n(Θ)(Θ̂
R∗n − Θ̂n)+n(Θ̂
R∗n − Θ̂n)
tH∗n(Θ)(Θ̂
R∗n − Θ̂n)+op(1) (25)
where ∇h∗n(Θ) and H∗n(Θ) converge to the same limits as ∇hn(Θ) and Hn(Θ). Hence,
n(hn(Θ̂Rn )− hn(Θ)) = n(h∗n(Θ̂
R∗n )− h∗n(Θ̂n)) + op(1) (26)
and thus by (21)
n(hn(Θ̂n)− hn(Θ)) = n(h∗n(Θ̂R∗n )− h∗n(Θ̂n)) + op(1) (27)
Since both
hn(Θ) = 1 + op(1/n) and h∗n(Θ̂n) = 1 + op(n
−1) (28)
we obtain
n(hn(Θ̂n)− 1) = n(h∗n(Θ̂R∗n )− 1) + op(1), (29)
which was to be shown. Note that the equality on the left of (28) follows from condition (21)
in the manuscript and because after transformation to null data, we have that hn(Θ̂n) = 1.
The equality on the right of equation (28) is immediately true in the case of S-estimates
because of the equations regarding the estimates after translation as explained in Section 4 of
the manuscript. In the case of MM-estimates, however, we have for the translated data that
µ̃(k,0)j,n = µ̃
(1)n , j = 1, . . . , k but generally µ̃
(1,0)n ̸= µ̃
(1)n , and similarly for the shape estimates. In
this case the result in (28) follows since
µ̃(1,0)n − µ̃
(k,0)1,n = op(n
−1/2) (30)
and also Γ̃(1,0)n − Γ̃
(k,0)n = op(n
−1/2). For example, for statistic Tn(.) = MMΛbn we can then
write (by considering h∗n(Θ̂n) as a function of µ̃(1,0)n only and expanding it around µ̃
(k,0)1,n )
h∗n(Θ̂n) = 1 +∂h∗n
∂µ̃(1,0)n
(µ̃(1,0)n − µ̃
(k,0)1,n ) + op(1)
22
Page 23
and (28) follows since the derivative of h∗n is of order Op(n−1/2).
To show (30), one may proceed by noting that the estimates must satisfy the respective
first order equations
k∑j=1
∑i∈Πj
ρ′1(di(µ̃(k,0)j,n , Σ̂
(k,0)n ))
di(µ̃(k,0)j,n , Σ̂
(k,0)n )
(xi − µ̃(k,0)j,n ) = 0
andn∑
i=1
ρ′1(di(µ̃(1,0)n , Σ̂
(1,0)n ))
di(µ̃(1,0)n , Σ̂
(1,0)n )
(xi − µ̃(1,0)n ) = 0
where di(µ,Σ) = [(xi − µ)tΣ−1(xi − µ)]1/2. Expanding the first equation around Σ̂(1,0)n and
the second one around µ̃(k,0)1,n leads to
(µ̃(1,0)n − µ̃
(k,0)1,n ) = Anvec((Σ̂
(k,0)n − Σ̂
(1,0)n )) + op(n
−1/2)
with An = Op(n−1/2), which suffices for the right equality of (30) to hold since Σ̂
(k,0)n = Σ̂
(1,0)n .
In a similar but more elaborate way, we can show Γ̃(1,0)n − Γ̃
(k,0)n = op(n
−1/2), which is required
to establish the equality in (28) for the other MM-based test statistic.
References
Alqallaf, F., Van Aelst, S., Yohai, V.J. and Zamar, R.H. (2009), “Propagation of outliers in
multivariate data,” Annals of Statistics, 37, 311-331.
Bickel, P.J., and Freedman, D.A. (1981), “Some asymptotic theory for the bootstrap,” The
Annals of Statistics, 9, 1196–1217.
Fernholz, L.T. (2001), “On multivariate higher order von Mises expansions,” Metrika, 53, 123-
140.
Gatto, R. and Ronchetti, E. (1996), “General saddlepoint approximations of marginal densities
and tail probabilities,” Journal of the American Statistical Association, 91, 666-673.
23