259 REFERENCES Agostinelli, C. and Markatou, M. (1998), A one-step robust estimator for regression based on a weighted likelihood reweighting scheme, Statistics and Probability Letters, 37, 341 350. Agullo, J. (2001), New algorithms for computing the least trimmed squares regression estimator, Computational Statistics and Data Analysis, 36, 425 439. Allen, D. M. (1984), Discussion of K-clustering as a detection tool for influential subsets in regression, Technometrics, 26, 319 - 320. Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H. and Tukey, J. W. (1972). Robust Estimates of Location: Surveys and Advances. Princeton, New Jersey: Princeton University Press. Andrews, D. F. and Pregibon, D. (1978), Finding the outliers that matter, Journal of the Royal Statistical Society, Ser. B, 40, 85 - 93. Atkinson, A. C. (1994), Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329 - 1339. Atkinson, A. C. and Mulira, H. M. (1993), The stalactite plot for the detection of multivariate outliers, Statistics and Computing, 3, 27 - 35. Becker, C. and Gather, U. (2001), The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier detection rules, Computational Statistics and Data Analysis, 36, 119 127. Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons. Birch, J. B. and Agard, D. B. (1993), Robust inference in regression: A comparative study, Communications in Statistics, 22(1), 217 - 244. Birch, J. B. (1992), Estimation and inference in multiple regression using robust weights: a unified approach, Technical Report, 92-2, Department of Statistics, Virginia Polytechnic Institute and State University. Birch, J. B. (1993), Exploratory and Robust Data Analysis, pre-publication course packet, Virginia Polytechnic Institute and State University. Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. 2nd edition. John Wiley & Sons.
27
Embed
REFERENCES - vtechworks.lib.vt.edu · 259 REFERENCES Agostinelli, C. and Markatou, M. (1998), fiA one-step robust estimator for regression based on a weighted likelihood reweighting
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
259
REFERENCES Agostinelli, C. and Markatou, M. (1998), �A one-step robust estimator for regression based on a
weighted likelihood reweighting scheme,� Statistics and Probability Letters, 37, 341 � 350.
Agullo, J. (2001), �New algorithms for computing the least trimmed squares regression
estimator,� Computational Statistics and Data Analysis, 36, 425 � 439. Allen, D. M. (1984), Discussion of �K-clustering as a detection tool for influential subsets in
regression,� Technometrics, 26, 319 - 320. Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H. and Tukey, J. W.
(1972). Robust Estimates of Location: Surveys and Advances. Princeton, New Jersey: Princeton University Press.
Andrews, D. F. and Pregibon, D. (1978), �Finding the outliers that matter,� Journal of the Royal
Statistical Society, Ser. B, 40, 85 - 93. Atkinson, A. C. (1994), �Fast very robust methods for the detection of multiple outliers,� Journal
of the American Statistical Association, 89, 1329 - 1339. Atkinson, A. C. and Mulira, H. M. (1993), �The stalactite plot for the detection of multivariate
outliers,� Statistics and Computing, 3, 27 - 35. Becker, C. and Gather, U. (2001), �The largest nonidentifiable outlier: a comparison of
multivariate simultaneous outlier detection rules,� Computational Statistics and Data Analysis, 36, 119 � 127.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential
Data and Sources of Collinearity. New York: John Wiley & Sons. Birch, J. B. and Agard, D. B. (1993), �Robust inference in regression: A comparative study,�
Communications in Statistics, 22(1), 217 - 244. Birch, J. B. (1992), �Estimation and inference in multiple regression using robust weights: a
unified approach,� Technical Report, 92-2, Department of Statistics, Virginia Polytechnic Institute and State University.
Birch, J. B. (1993), Exploratory and Robust Data Analysis, pre-publication course packet,
Virginia Polytechnic Institute and State University. Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. 2nd
edition. John Wiley & Sons.
260
Butler, R. W., Davies, P. L. and Jhun, M. (1993), �Asymptotics for the minimum covariance determinant estimator,� Annals of Statistics, 21, 1385 - 1400.
Cleveland, W. S. (1979). �Robust Locally Weighted Regression and Smoothing Scatterplots,�
Journal of the American Statistical Association, 74, 829 - 836. Coakley, C. W. and Hettmansperger, T. P. (1993), �A bounded-influence, high breakdown
efficient regression estimator,� Journal of the American Statistical Association, 88, 872 - 880.
Cook, R. D., Hawkins, D. M. and Weisberg, S. (1993), �Exact iterative computation of the
robust multivariate minimum volume ellipsoid estimator,� Statistics and Probability Letters, 16, 213 - 218.
Cook, R. D. and Hawkins, D. M. (1990), Comment on �Unmasking multivariate outliers and
leverage points,� Journal of the American Statistical Association, 85, 640 - 644. Cook, R. D. and Weisberg, S. (1980), �Characterizations of an empirical influence function for
detecting influential cases in regression,� Technometrics, 22, 495 - 507.
Davies, P. L. (1992), �The asymptotics of Rousseeuw's minimum volume ellipsoid estimator�, The Annals of Statistics, 20, 1828-1843.
Davies, P. L. (1987), Asymptotic behavior of S-estimates of multivariate location parameters and
dispersion matrices, Technical Report, University of Essen, West-Germany. Devlin, S. J., Gnanadesikan, R. and Kettering, J. R. (1975), �Robust estimation of dispersion
matrices and principle components�, Biometrika, 62, 531-545. Donoho, D. L. (1982), �Breakdown properties of multivariate location estimators�, qualifying
paper, Harvard University, Boston, MA. Donoho, D. L. and Huber, P. J. (1983), �The notion of breakdown point�, in A Festschrift for
Erich Lehmann, edited by P. Bickel, K. Doksum and J. L. Hodges, Jr., Belmont, CA: Wadsworth.
Gnanadesikan, R. and Kettering, J. R. (1972), �Robust estimates, residuals and outlier detection with multiresponse data�, Biometrics, 28, 81-124.
Gray, J. B. (1988), �A classification of influence measures,� J. Statist. Comput. Simul., 30, 159 -
171. Gray, J. B. (1986), �A simple graphic for assessing influence in regression,� J. Statist. Comput.
Simul., 24, 121 - 134.
261
Gray, J. B. and Ling, R. F. (1985), Response to Hadi, A. S.: Letter to the Editor on �K-clustering as a detection tool for influential subsets in regression,� Technometrics, 26, 324 - 325.
Gray, J. B. and Ling, R. F. (1984), �K-clustering as a detection tool for influential subsets in
regression,� Technometrics, 26, 305 - 318. Gray, J. B. and Ling, R. F. (1984), Response to �K-clustering as a detection tool for influential
subsets in regression,� Technometrics, 26, 326 - 330. Hadi, A. S. (1992), �Identifying multiple outliers in multivariate data,� Journal of the Royal
Statistical Society, Ser. B, 54, 761 - 771. Hadi, A. S. (1985), Letter to the Editor on �K-clustering as a detection tool for influential subsets
in regression,� Technometrics, 27, 323. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986), Robust Statistics:
The Approach Based on Influence Functions, New York, John Wiley & Sons. Hartigan, J. A. (1975), Clustering Algorithms, New York: John Wiley & Sons.
Hawkins, D. M. (1993), �A feasible solution algorithm for the minimum volume ellipsoid
estimator in multivariate data,� Computational Statistics and Data Analysis, 8, 95-107. Hawkins, D. M. (1994), �A feasible solution algorithm for the minimum covariance determinant
estimator,� Computational Statistics and Data Analysis, 17, 197-210. Hawkins, D. M. (1995), �Convergence of the feasible solution algorithm for least median of
squares regression,� Computational Statistics and Data Analysis, 19, 519 - 538. Hawkins, D. M., Bradu, D. and Kass, G. V. (1984), �Location of several outliers in multiple-
regression data using elemental subsets,� Technometrics, 26, 197 - 208. Hawkins, D. M. and Olive, D. (1999), �Applications and algorithms for least trimmed sum of
absolute deviations regression,� Computational Statistics and Data Analysis, 32, 119 - 134.
Hawkins, D. M. and Olive, D. (1999), �Improved feasible solution algorithms for high breakdown
estimation,� Computational Statistics and Data Analysis, 30, 1 � 11. He, X. and Portnoy, S. (1992), �Reweighted ls estimators converge at the same rate as the initial
estimator,� The Annals of Statistics, 20, 2161 - 2167. Hettmansperger, T. P. and Sheather, S. J. (1992), �A cautionary note on the method of least
median squares,� The American Statistician, 46, 79 - 83.
262
Hocking R. R. (1984), Discussion of �K-clustering as a detection tool for influential subsets in regression,� Technometrics, 26, 321 - 323.
Huber, P. J. (1981), Robust Statistics, New York: John Wiley & Sons. Johnson, R. A. and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis, Englewood
Cliffs, N.J.: Prentice-Hall, Inc. Jolliffe, I. T. and Penny, K. I. (2001), �A comparison of multivariate outlier detection methods for
clinical laboratory safety data�, The Statistician, 50, 295 � 308.
Kempthorne, P. J. and Mendel, M. B. (1990), Comment on �Unmasking multivariate outliers and leverage points,� Journal of the American Statistical Association, 85, 647 - 651.
Lawrence, D. E. (1996), Cluster-Based Bounded Influence Regression, Abstract from the Joint
Statistical Meetings of the American Statistical Association, Chicago, IL, Section 240. Lopuhaa, H. P. (1992), �Highly efficient estimators of multivariate location with high breakdown
point,� Annals of Statistics, 20, 398 - 413. Lopuhaa, H. P. and Rousseeuw, P. J. (1991), �Breakdown points of affine equivariant estimators
of multivariate location and covariance matrices,� Annals of Statistics, 19, 229 - 248. Markatou, M. and He, X. (1994), �Bounded influence and high breakdown point testing
procedures in linear models,� Journal of the American Statistical Association, 89, 543 - 549.
Maronna, R. A. and Yohai, V. J. (1981), �Asymptotic behavior of general M-estimates for
regression and scale with random carriers�, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 58, 7 - 20.
Maronna, R., Bustos, O. and Yohai, V. (1979), �Bias- and efficiency- robustness of general M-
estimators for regression with random carriers,� in Smoothing Techniques for Curve Estimation, edited by T. Gasser and M. Rosenblatt, Springer-Verlag, New York, 91-116.
Myers, R. H. (1990), Classical and Modern Regression with Applications, Boston: PWS-Kent. Morrison D. F. (1990), Multivariate Statistical Methods, New York: McGraw-Hill. Pendleton, O. J. and Hocking R. R. (1981). �Diagnostic Techniques in Multiple Linear
Regression using PROC MATRIX,� SUGI, 195 - 201. Rocke, D. M. and Woodruff, D. L. (1996), �Identification of outliers in multivariate data,�
Journal of the American Statistical Association, 91, 1047-1061.
263
Rousseeuw, P. J. (1993), �A resampling design for computing high-breakdown regression,� Statistics and Probability Letters, 18, 125 - 128.
Rousseeuw P. J. and Bassett G.W. (1991), �Robustness of the p-subset algorithm for regression
with high breakdown point�, in: Directions in robust statistics and diagnostics, part II / Stahel W., p. 185-194
Rousseeuw, P. J. and van Zomeren, B. C. (1990), �Unmasking multivariate outliers and leverage
points,� Journal of the American Statistical Association, 85, 633 - 639. Rousseeuw, P. J., and Leroy, A. M. (1987), Robust Regression and Outlier Detection, New
York: John Wiley & Sons. Rousseeuw, P. J. (1984), �Least median of squares regression�, Journal of the American
Statistical Association, 79, 871 - 880. Ruiz-Gazen, A. (1996), �A very simple robust estimator of a dispersion matrix,� Computational
Statistics and Data Analysis, 21, 149 - 162. Ruppert, D. and Simpson, D. G. (1990), Comment on �Unmasking multivariate outliers and
leverage points,� Journal of the American Statistical Association, 85, 644 - 646. Seber, D. M., Montgomery, D. C. and Rollier, D. A. (1998), �A clustering algorithm for
identifying multiple outliers in linear regression,� Computational Statistics and Data Analysis, 27, 461 � 484.
Simpson, D. G., Ruppert, D. and Carroll, R. J. (1992), �On one-step estimates and stability of
inferences in linear regression, Journal of the American Statistical Association, 87, 439 - 450.
Stahel, W. A. (1981), �Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von
Kovarianzmatrizen�, Ph.D. Thesis, ETH Zürich, Switzerland. Staudte, R. G. and Sheather, S. J. (1990), Robust Estimation and Testing, New York: John Wiley
& Sons.
Stromberg, A. J. (1993), �Computation of high breakdown nonlinear regression parameters,� Journal of the American Statistical Association, 88, 237 - 244.
Walker, E. (1984), �Influence, collinearity and robust estimation in regression�, Ph.D. dissertation, Virginia Polytechnic Institute and State University.
Weisberg, S. (1984), Discussion of �K-clustering as a detection tool for influential subsets in
regression,� Technometrics, 26, 324 - 325.
264
Welsch, R. E. (1982), �Influence functions and regression diagnostics,� in Modern Data Analysis, eds. R. Launer and A. Siegel, New York: Academic Press, 149 - 169.
Wisnowski, J. W., Montgomery, D. C. and Simpson, J. R. (2001), �A Comparative analysis of
multiple outlier detection procedures in the linear regression model,� Computational Statistics and Data Analysis, 36, 351 � 382.
Woodruff, D. L. and Rocke, D. M. (1994), �Computable robust estimation of multivariate
location and shape using compound estimators�, Journal of the American Statistical Association, 89, 888 - 896.
You, J. (1999), �A Monte Carlo comparison of several high breakdown and efficient estimators,�
Computational Statistics & Data Analysis, 30, 205 � 219.
265
APPENDIX A Datasets A.1 Stackloss Data Reference: Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. 2nd
edition. John Wiley & Sons. Table A.1: Stackloss data.
Step 1: First, an initial robust location estimator, �µ , is determined. This is used to
compute an intermediate dispersion matrix defined as
( )( ), ,1
1 � �n
y i y iin =
′= − −∑V z µ z µ .
Step 2: A weight function whose argument is proportional to the robust distance based on
( )� ,µ V produces individual observation weights, defined as
( ) ( )1, ,� �Ki y i y iw − ′= β − −
z µ V z µ ,
with K being a positive and decreasing function, and β being any non-negative
scalar. These weights are used in computing a one-step dispersion estimator,
found by
( )( ), ,1
1
� �
1
n
i y i y ii
n
ii
w
w
=
=
′− −=
−
∑
∑
z µ z µC .
Step 3: Finally, the robust dispersion matrix is found via the transformation
( ) 11 1 −− −= −βU C V ,
by which consistency is attained.
In order to obtain U, it is necessary to define the initial location estimator, �µ , the weight
function, K, and the scalar, β. Recommendations are that �µ be the coordinatewise median
vector, ( ) exp( / 2)K x x= − , and 26 pβ = ε , where ε represents the percentage of contamination
in the data (in the article, 0.20ε ≤ was investigated). It is suggested that the breakdown point for
277
this procedure is roughly 20%, and it appears as though the choice for β has a rather large impact
on whether extreme observations are detected.
B.3 Exact Minimum Volume Ellipsoid (MVE) Estimator Source: Hawkins, D. M. (1993).
Step 1: Given a subset containing h observations, initially (at iteration stage 0k = ) give
each observation equal weight, or (0) 1iw h−= .
Step 2: Calculate the weighted mean vector and associated weighted covariance matrix.
Compute all n individual robust Mahalanobis distances, denoted by ( )( )kiD w , and
determine the maximum such distance, ( )max ( )kD w .
Step 3: If ( )max ( )kD w p< + ε , where ε is a convergence bound, then the iteration process
stops. Otherwise, update the weights for the 1k + iteration by ( ) ( )
( 1) ( )k kk i i
iw D ww
p+ = ,
and go back to Step 2.
Step 4: Repeat Steps 1, 2, & 3 for every subset of size h. Since the volume of an ellipsoid
is proportional to the determinant of the weighted covariance matrix, the exact
MVE estimators are the weighted mean vector and associated weighted covariance
matrix which correspond to the subset yielding the smallest such determinant. The
weights across these h observations will not, in general, be equal.
278
B.4 Approximate Minimum Volume Ellipsoid (MVE) Estimator via the
Random Subsampling Algorithm Source: Rousseeuw and Leroy (1987).
Step 1: The number of subsets, N, needed is found by equating the probability that at least
one of the N elemental sets will contain only good points to a value (near 1) that is
acceptable. Using 0.999 is cited often (Rousseeuw and Leroy (1987)). Let ε
denote the percentage of contamination in the data. This produces the equality 10.999 1 (1 (1 ) )p Nε += − − − , which by letting 0.5ε = , means that the necessary
number of random subsets is
1ln(0.001)
ln(1 (0.5) )pN +=−
.
Many times, this step is not employed. Instead, the researcher arbitrarily selects N
to be 500 or 1000, for convenience.
Step 2: Randomly select 1p + points. J will refer to this subset of points. These points
need to form a full rank design space, so discard the sample otherwise.
Step 3: Determine the 1p× mean vector and p p× covariance matrix:
,1
1J y ii Jp ∈
=+ ∑z z
and
( )( ), ,1
J y i J y i Ji Jp ∈
′= − −∑C z z z z .
Step 4: Calculate the set of n robust squared Mahalanobis distances, 2iRD , and find the
median, 2Jm , of them:
279
( ) ( )2 2 1, ,med medJ i y i J J y i Ji i
m RD −
∀ ∀
′= = − −z z C z z .
Step 5: The objective function is to minimize the volume of the ellipsoid, where 1/ 2Volume (det( )) ( ) p
J Jm∝ C .
Thus, for each elemental set, calculate 1/ 2(det( )) ( )pJ JmC . If this represents the
smallest objective function calculated thus far, store this elemental set as being the
current best subset.
Step 6: After all N elemental sets have been drawn and analyzed, the MVE estimators are
based on the best subset from Step 5, and calculated by
J=m z
and 2 1 2
0.50( )p, J Jm−= χC C .
The location vector, m, is just the average coordinate vector of the best elemental
set. The covariance matrix of the best elemental set is multiplied by 2 1 2, 0.50( )p Jm−χ
to expand (or shrink) the ellipsoid and ensure that half the data is contained,
forming C.
Rousseeuw and Leroy (1987) suggest incorporating a one-step improvement to the MVE
estimators, which adds the following three steps to the algorithm.
Step 7: The robust squared Mahalanobis distance for each observation can be calculated in
the usual manner, 2 1
, ,( ) ( ), 1, 2, ..., .i y i y iRD i n−′= − − =z m C z m
Step 8: A binary weighting scheme is now used. Observations with large 2iRD values are
given zero weights due to their extremity in the regressor space. Small 2iRD
280
values are an indication of good points, and thus full weight is given to these
observations. By viewing the robust squared Mahalanobis distances as having an
(asymptotic) chi-square distribution with p degrees of freedom, the 0.975 quantile
of this distribution becomes the critical cutoff value: 2 2
, 0.9751, ,0, .
i pi
if RDw
otherwise ≤ χ
=
Step 9: The one-step improvement MVE�s become
,1
1
n
i y ii
n
ii
w
w
=
=
=∑
∑
zm ,
and
( ) ( ), ,1
1
1
n
i y i y ii
n
ii
w
w
=
=
′− −=
−
∑
∑
z m z mC .
B.5 Exact Minimum Covariance Determinant (MCD) Estimator via the
Feasible Solution Algorithm Source: Hawkins, D. M. (1994).
Step 1: Randomly select h observations. Calculate the determinant of the usual covariance
matrix for this subset. The n observations fall into two categories, those h being
�active� and the remaining n h− being �trimmed.�
Step 2: Consider all possible pairwise swaps of one active observation and one trimmed
observation. Calculate the determinant of the usual covariance matrix for each of
these potentially new active subsets. If a reduction in the determinant (versus that
obtained from the active subset) is realized, then actually perform the swap that
281
corresponds to the smallest observed determinant and repeat Step 2. When no
such reduction in the determinant occurs, then the search concludes and this active
subset is considered a feasible solution.
Step 3: Repeat Steps 1 & 2 for a total of N searches, where a number of feasible solutions
may be found. The MCD estimators are the mean vector and covariance matrix
based on the feasible solution with the smallest determinant.
B.6 Stalactite Plot Analysis Source: Atkinson, A. C. and Mulira, H. M. (1993).
Step 1: Randomly select a subset, denoted by I, of 1m p= + observations.
Step 2: Calculate the location estimator, mm , as the coordinatewise mean vector of I.
Then, calculate the usual covariance estimator, mC , as
( )( ), ,1
1m y i m y i mi Im ∈
′= − −− ∑C z m z m .
These initial estimators are used to create a set of n robust Mahalanobis distances,
found by
1, ,( ) ( ) ( )i y i m m y i md m −′= − −z m C z m .
Step 3: The size, m, of subset I is now increased by 1. Those observations with the
smallest m robust distances now define subset I. Therefore, observations included
in the previous subset are not necessarily among those in the new I subset.
Step 4: Repeat Steps 2 and 3 until subset I contains all n observations.
Step 5: Repeat the entire algorithm (Steps 1 through 4) many (N) times.
282
Step 6: The stalactite plot is based upon the sequence that produced the largest robust
distance when the subset size was some specified percentage of the sample size.
Atkinson suggests using 80% or 90% to define the critical stage. However, actual
estimators for location and dispersion could be acquired by scanning across all
sequences and determining which subset of �half� the data produced the largest
robust distance.
One final note is that to obtain pronounced results, the robust distances are normalized
before being plotted. The sum of all n robust distances at stage m is found for each simulation,
and the average of these N sums is found, and denoted by ( )T m . The normalized distance
becomes
2( 1) ( )( )( )
ii
p n d md mT m−
=% .
B.7 Hadi�s Forward Search Source: Hadi, A. S. (1992).
Step 1: Define the initial location estimator, 0m , as the coordinatewise median vector.
Then, calculate the initial covariance estimator, 0C , as
( )( )0 , 0 , 01
11
n
y i y iin =
′= − −− ∑C z m z m .
These initial estimators are used to create a set of n robust Mahalanobis distances,
found by
( ) ( )1, 0 0 , 0i y i y iRD −′= − −z m C z m .
283
Step 2: Trimmed location and covariance estimators are now determined by using only
those observations that correspond to the smallest [ ]( 1) / 2h n p= + + robust
distances. These estimators, in turn, are used to create a new set of n robust
distances.
Step 3: Each observation is now placed into exactly one of two subsets, which Hadi refers
to as �basic� and �non-basic�. The basic subset consists of the 1r p= +
observations having the smallest robust distances. All remaining observations are
placed in the non-basic subset.
Step 4: The usual mean vector and covariance matrix based solely on the basic subset are
used to calculate a new set of robust distances. Here, assuming that the basic
subset is of full rank, we have
1, ,( ) ( )i y i b b y i bRD −′= − −z m C z m .
If a less than full rank situation occurs, this calculation is replaced by
( ) ( ), ,i y i b b b b y i bRD ′ ′= − −z m V W V z m ,
where bV is the matrix of normalized eigenvectors of bC and bW is a diagonal
matrix whose jth diagonal element is defined as
1 , 1, 2,..., ,max{ , }j
j s
w j n= =λ λ
with sλ being the smallest non-zero eigenvalue of bC .
Step 5: Again, each observation is now placed into either the basic or non-basic subset.
Here the size, r, of the basic subset is increased by 1 over the previous basic
subset, and consists of the r observations having the smallest robust distances. All
remaining observations are placed in the non-basic subset.
284
Step 6: Continue cycle of Steps 4 & 5 until the basic subset contains [ ]( 1) / 2h n p= + +
observations. A side note is that Hadi also mentions using a critical value for the
robust distances in obtaining a stopping rule, but opts to avoid this approach.
Step 7: The final basic subset is used to produce the final robust estimators of location and
scale.
According to Rocke and Woodruff (1996), �the algorithm ... breaks down if the
contamination is extremely far away from the good data in the correct metric.� They also point
out that the procedure is not affine equivariant because of the use of the coordinatewise median.
285
VITA
David E. Lawrence, son of Henry and Shelby Lawrence, Jr., was born and raised in northeast
Ohio, graduating from Cuyahoga Falls High School in 1984. He received his Bachelor of Science
in 1990 from the University of Akron, majoring in statistics while also studying mathematics and
engineering. After two years as a graduate teaching assistant in the Mathematical Sciences
Department at the University of Akron, he received his Master of Science (1992), also in
statistics. From 1992 until 1996, he was a graduate teaching assistant in the Statistics Department
at Virginia Polytechnic Institute and State University. Thereafter until the present time, he has
been employed as a statistician with Becton Dickinson and Company, a medical device company
in New Jersey. He and his fiancée, Ms. Jane McGonigle, are to be married in September 2003.