REFERENCES - vtechworks.lib.vt.edu · 259 REFERENCES Agostinelli, C. and Markatou, M. (1998), ﬁA one-step robust estimator for regression based on a weighted likelihood reweighting

259

REFERENCES Agostinelli, C. and Markatou, M. (1998), �A one-step robust estimator for regression based on a

weighted likelihood reweighting scheme,� Statistics and Probability Letters, 37, 341 � 350.

Agullo, J. (2001), �New algorithms for computing the least trimmed squares regression

estimator,� Computational Statistics and Data Analysis, 36, 425 � 439. Allen, D. M. (1984), Discussion of �K-clustering as a detection tool for influential subsets in

regression,� Technometrics, 26, 319 - 320. Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H. and Tukey, J. W.

(1972). Robust Estimates of Location: Surveys and Advances. Princeton, New Jersey: Princeton University Press.

Andrews, D. F. and Pregibon, D. (1978), �Finding the outliers that matter,� Journal of the Royal

Statistical Society, Ser. B, 40, 85 - 93. Atkinson, A. C. (1994), �Fast very robust methods for the detection of multiple outliers,� Journal

of the American Statistical Association, 89, 1329 - 1339. Atkinson, A. C. and Mulira, H. M. (1993), �The stalactite plot for the detection of multivariate

outliers,� Statistics and Computing, 3, 27 - 35. Becker, C. and Gather, U. (2001), �The largest nonidentifiable outlier: a comparison of

multivariate simultaneous outlier detection rules,� Computational Statistics and Data Analysis, 36, 119 � 127.

Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics: Identifying Influential

Data and Sources of Collinearity. New York: John Wiley & Sons. Birch, J. B. and Agard, D. B. (1993), �Robust inference in regression: A comparative study,�

Communications in Statistics, 22(1), 217 - 244. Birch, J. B. (1992), �Estimation and inference in multiple regression using robust weights: a

unified approach,� Technical Report, 92-2, Department of Statistics, Virginia Polytechnic Institute and State University.

Birch, J. B. (1993), Exploratory and Robust Data Analysis, pre-publication course packet,

Virginia Polytechnic Institute and State University. Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. 2nd

edition. John Wiley & Sons.

260

Butler, R. W., Davies, P. L. and Jhun, M. (1993), �Asymptotics for the minimum covariance determinant estimator,� Annals of Statistics, 21, 1385 - 1400.

Cleveland, W. S. (1979). �Robust Locally Weighted Regression and Smoothing Scatterplots,�

Journal of the American Statistical Association, 74, 829 - 836. Coakley, C. W. and Hettmansperger, T. P. (1993), �A bounded-influence, high breakdown

efficient regression estimator,� Journal of the American Statistical Association, 88, 872 - 880.

Cook, R. D., Hawkins, D. M. and Weisberg, S. (1993), �Exact iterative computation of the

robust multivariate minimum volume ellipsoid estimator,� Statistics and Probability Letters, 16, 213 - 218.

Cook, R. D. and Hawkins, D. M. (1990), Comment on �Unmasking multivariate outliers and

leverage points,� Journal of the American Statistical Association, 85, 640 - 644. Cook, R. D. and Weisberg, S. (1980), �Characterizations of an empirical influence function for

detecting influential cases in regression,� Technometrics, 22, 495 - 507.

Davies, P. L. (1992), �The asymptotics of Rousseeuw's minimum volume ellipsoid estimator�, The Annals of Statistics, 20, 1828-1843.

Davies, P. L. (1987), Asymptotic behavior of S-estimates of multivariate location parameters and

dispersion matrices, Technical Report, University of Essen, West-Germany. Devlin, S. J., Gnanadesikan, R. and Kettering, J. R. (1975), �Robust estimation of dispersion

matrices and principle components�, Biometrika, 62, 531-545. Donoho, D. L. (1982), �Breakdown properties of multivariate location estimators�, qualifying

paper, Harvard University, Boston, MA. Donoho, D. L. and Huber, P. J. (1983), �The notion of breakdown point�, in A Festschrift for

Erich Lehmann, edited by P. Bickel, K. Doksum and J. L. Hodges, Jr., Belmont, CA: Wadsworth.

Gnanadesikan, R. and Kettering, J. R. (1972), �Robust estimates, residuals and outlier detection with multiresponse data�, Biometrics, 28, 81-124.

Gray, J. B. (1988), �A classification of influence measures,� J. Statist. Comput. Simul., 30, 159 -

171. Gray, J. B. (1986), �A simple graphic for assessing influence in regression,� J. Statist. Comput.

Simul., 24, 121 - 134.

261

Gray, J. B. and Ling, R. F. (1985), Response to Hadi, A. S.: Letter to the Editor on �K-clustering as a detection tool for influential subsets in regression,� Technometrics, 26, 324 - 325.

Gray, J. B. and Ling, R. F. (1984), �K-clustering as a detection tool for influential subsets in

regression,� Technometrics, 26, 305 - 318. Gray, J. B. and Ling, R. F. (1984), Response to �K-clustering as a detection tool for influential

subsets in regression,� Technometrics, 26, 326 - 330. Hadi, A. S. (1992), �Identifying multiple outliers in multivariate data,� Journal of the Royal

Statistical Society, Ser. B, 54, 761 - 771. Hadi, A. S. (1985), Letter to the Editor on �K-clustering as a detection tool for influential subsets

in regression,� Technometrics, 27, 323. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986), Robust Statistics:

The Approach Based on Influence Functions, New York, John Wiley & Sons. Hartigan, J. A. (1975), Clustering Algorithms, New York: John Wiley & Sons.

Hawkins, D. M. (1993), �A feasible solution algorithm for the minimum volume ellipsoid

estimator in multivariate data,� Computational Statistics and Data Analysis, 8, 95-107. Hawkins, D. M. (1994), �A feasible solution algorithm for the minimum covariance determinant

estimator,� Computational Statistics and Data Analysis, 17, 197-210. Hawkins, D. M. (1995), �Convergence of the feasible solution algorithm for least median of

squares regression,� Computational Statistics and Data Analysis, 19, 519 - 538. Hawkins, D. M., Bradu, D. and Kass, G. V. (1984), �Location of several outliers in multiple-

regression data using elemental subsets,� Technometrics, 26, 197 - 208. Hawkins, D. M. and Olive, D. (1999), �Applications and algorithms for least trimmed sum of

absolute deviations regression,� Computational Statistics and Data Analysis, 32, 119 - 134.

Hawkins, D. M. and Olive, D. (1999), �Improved feasible solution algorithms for high breakdown

estimation,� Computational Statistics and Data Analysis, 30, 1 � 11. He, X. and Portnoy, S. (1992), �Reweighted ls estimators converge at the same rate as the initial

estimator,� The Annals of Statistics, 20, 2161 - 2167. Hettmansperger, T. P. and Sheather, S. J. (1992), �A cautionary note on the method of least

median squares,� The American Statistician, 46, 79 - 83.

262

Hocking R. R. (1984), Discussion of �K-clustering as a detection tool for influential subsets in regression,� Technometrics, 26, 321 - 323.

Huber, P. J. (1981), Robust Statistics, New York: John Wiley & Sons. Johnson, R. A. and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis, Englewood

Cliffs, N.J.: Prentice-Hall, Inc. Jolliffe, I. T. and Penny, K. I. (2001), �A comparison of multivariate outlier detection methods for

clinical laboratory safety data�, The Statistician, 50, 295 � 308.

Kempthorne, P. J. and Mendel, M. B. (1990), Comment on �Unmasking multivariate outliers and leverage points,� Journal of the American Statistical Association, 85, 647 - 651.

Lawrence, D. E. (1996), Cluster-Based Bounded Influence Regression, Abstract from the Joint

Statistical Meetings of the American Statistical Association, Chicago, IL, Section 240. Lopuhaa, H. P. (1992), �Highly efficient estimators of multivariate location with high breakdown

point,� Annals of Statistics, 20, 398 - 413. Lopuhaa, H. P. and Rousseeuw, P. J. (1991), �Breakdown points of affine equivariant estimators

of multivariate location and covariance matrices,� Annals of Statistics, 19, 229 - 248. Markatou, M. and He, X. (1994), �Bounded influence and high breakdown point testing

procedures in linear models,� Journal of the American Statistical Association, 89, 543 - 549.

Maronna, R. A. and Yohai, V. J. (1981), �Asymptotic behavior of general M-estimates for

regression and scale with random carriers�, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 58, 7 - 20.

Maronna, R., Bustos, O. and Yohai, V. (1979), �Bias- and efficiency- robustness of general M-

estimators for regression with random carriers,� in Smoothing Techniques for Curve Estimation, edited by T. Gasser and M. Rosenblatt, Springer-Verlag, New York, 91-116.

Myers, R. H. (1990), Classical and Modern Regression with Applications, Boston: PWS-Kent. Morrison D. F. (1990), Multivariate Statistical Methods, New York: McGraw-Hill. Pendleton, O. J. and Hocking R. R. (1981). �Diagnostic Techniques in Multiple Linear

Regression using PROC MATRIX,� SUGI, 195 - 201. Rocke, D. M. and Woodruff, D. L. (1996), �Identification of outliers in multivariate data,�

Journal of the American Statistical Association, 91, 1047-1061.

263

Rousseeuw, P. J. (1993), �A resampling design for computing high-breakdown regression,� Statistics and Probability Letters, 18, 125 - 128.

Rousseeuw P. J. and Bassett G.W. (1991), �Robustness of the p-subset algorithm for regression

with high breakdown point�, in: Directions in robust statistics and diagnostics, part II / Stahel W., p. 185-194

Rousseeuw, P. J. and van Zomeren, B. C. (1990), �Unmasking multivariate outliers and leverage

points,� Journal of the American Statistical Association, 85, 633 - 639. Rousseeuw, P. J., and Leroy, A. M. (1987), Robust Regression and Outlier Detection, New

York: John Wiley & Sons. Rousseeuw, P. J. (1984), �Least median of squares regression�, Journal of the American

Statistical Association, 79, 871 - 880. Ruiz-Gazen, A. (1996), �A very simple robust estimator of a dispersion matrix,� Computational

Statistics and Data Analysis, 21, 149 - 162. Ruppert, D. and Simpson, D. G. (1990), Comment on �Unmasking multivariate outliers and

leverage points,� Journal of the American Statistical Association, 85, 644 - 646. Seber, D. M., Montgomery, D. C. and Rollier, D. A. (1998), �A clustering algorithm for

identifying multiple outliers in linear regression,� Computational Statistics and Data Analysis, 27, 461 � 484.

Simpson, D. G., Ruppert, D. and Carroll, R. J. (1992), �On one-step estimates and stability of

inferences in linear regression, Journal of the American Statistical Association, 87, 439 - 450.

Stahel, W. A. (1981), �Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von

Kovarianzmatrizen�, Ph.D. Thesis, ETH Zürich, Switzerland. Staudte, R. G. and Sheather, S. J. (1990), Robust Estimation and Testing, New York: John Wiley

& Sons.

Stromberg, A. J. (1993), �Computation of high breakdown nonlinear regression parameters,� Journal of the American Statistical Association, 88, 237 - 244.

Walker, E. (1984), �Influence, collinearity and robust estimation in regression�, Ph.D. dissertation, Virginia Polytechnic Institute and State University.

Weisberg, S. (1984), Discussion of �K-clustering as a detection tool for influential subsets in

regression,� Technometrics, 26, 324 - 325.

264

Welsch, R. E. (1982), �Influence functions and regression diagnostics,� in Modern Data Analysis, eds. R. Launer and A. Siegel, New York: Academic Press, 149 - 169.

Wisnowski, J. W., Montgomery, D. C. and Simpson, J. R. (2001), �A Comparative analysis of

multiple outlier detection procedures in the linear regression model,� Computational Statistics and Data Analysis, 36, 351 � 382.

Woodruff, D. L. and Rocke, D. M. (1994), �Computable robust estimation of multivariate

location and shape using compound estimators�, Journal of the American Statistical Association, 89, 888 - 896.

You, J. (1999), �A Monte Carlo comparison of several high breakdown and efficient estimators,�

Computational Statistics & Data Analysis, 30, 205 � 219.

265

APPENDIX A Datasets A.1 Stackloss Data Reference: Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. 2nd

edition. John Wiley & Sons. Table A.1: Stackloss data.

Ob. Y X1 X2 X3 1 42 80 27 89 2 37 80 27 88 3 37 75 25 90 4 28 62 24 87 5 18 62 22 87 6 18 62 23 87 7 19 62 24 93 8 20 62 24 93 9 15 58 23 87

10 14 58 18 80 11 14 58 18 89 12 13 58 17 88 13 11 58 18 82 14 12 58 19 93 15 8 50 18 89 16 7 50 18 86 17 8 50 19 72 18 8 50 19 79 19 9 50 20 80 20 15 56 20 82 21 15 70 20 91

266

A.2 Pendleton-Hocking Data Reference: Pendleton, O. J. and Hocking R. R. (1981). "Diagnostic Techniques in Multiple Linear

Regression using PROC MATRIX," SUGI, 195 - 201. Table A.2: Pendleton-Hocking data.

Ob. Y X1 X2 X3 1 57.702 12.980 0.317 9.998 2 59.296 14.295 2.028 6.776 3 56.166 15.531 5.305 2.947 4 55.767 15.133 4.738 4.201 5 51.722 15.342 7.038 2.053 6 60.446 17.149 5.982 -0.055 7 60.715 15.462 2.737 4.657 8 37.447 12.801 10.663 3.048 9 60.974 17.039 5.132 0.257

10 55.270 13.172 2.039 8.738 11 59.289 16.125 2.271 2.101 12 54.027 14.340 4.077 5.545 13 53.199 12.923 2.643 9.331 14 41.896 14.231 10.401 1.041 15 63.264 15.222 1.220 6.149 16 45.798 15.740 10.612 -1.691 17 58.699 14.958 4.815 4.111 18 50.086 14.125 3.153 8.453 19 48.890 16.391 9.698 -1.714 20 62.213 16.452 3.912 2.145 21 45.625 13.535 7.625 3.851 22 53.923 14.199 4.474 5.112 23 55.799 15.837 5.753 2.087 24 56.741 16.565 8.546 8.974 25 43.145 13.322 8.598 4.011 26 50.706 15.949 8.290 -0.248

267

A.3 Hawkins, Bradu and Kass Data Reference: Hawkins, D. M., Bradu, D. and Kass, G. V. (1984), "Location of several outliers in multiple-

regression data using elemental subsets," Technometrics, 26, 197 - 208.

Table A.3: Hawkins, Bradu and Kass data. Y X1 X2 X3 Y X1 X2 X3 1 9.7 10.1 19.6 28.3 39 -0.7 2.1 0.0 1.2 2 10.1 9.5 20.5 28.9 40 -0.5 0.5 2.0 1.2 3 10.3 10.7 20.2 31.0 41 -0.1 3.4 1.6 2.9 4 9.5 9.9 21.5 31.7 42 -0.7 0.3 1.0 2.7 5 10.0 10.3 21.1 31.1 43 0.6 0.1 3.3 0.9 6 10.0 10.8 20.4 29.2 44 -0.7 1.8 0.5 3.2 7 10.8 10.5 20.9 29.1 45 -0.5 1.9 0.1 0.6 8 10.3 9.9 19.6 28.8 46 -0.4 1.8 0.5 3.0 9 9.6 9.7 20.7 31.0 47 -0.9 3.0 0.1 0.8

10 9.9 9.3 19.7 30.3 48 0.1 3.1 1.6 3.0 11 -0.2 11.0 24.0 35.0 49 0.9 3.1 2.5 1.9 12 -0.4 12.0 23.0 37.0 50 -0.4 2.1 2.8 2.9 13 0.7 12.0 26.0 34.0 51 0.7 2.3 1.5 0.4 14 0.1 11.0 34.0 34.0 52 -0.5 3.3 0.6 1.2 15 -0.4 3.4 2.9 2.1 53 0.7 0.3 0.4 3.3 16 0.6 3.1 2.2 0.3 54 0.7 1.1 3.0 0.3 17 -0.2 0.0 1.6 0.2 55 0.0 0.5 2.4 0.9 18 0.0 2.3 1.6 2.0 56 0.1 1.8 3.2 0.9 19 0.1 0.8 2.9 1.6 57 0.7 1.8 0.7 0.7 20 0.4 3.1 3.4 2.2 58 -0.1 2.4 3.4 1.5 21 0.9 2.6 2.2 1.9 59 -0.3 1.6 2.1 3.0 22 0.3 0.4 3.2 1.9 60 -0.9 0.3 1.5 3.3 23 -0.8 2.0 2.3 0.8 61 -0.3 0.4 3.4 3.0 24 0.7 1.3 2.3 0.5 62 0.6 0.9 0.1 0.3 25 -0.3 1.0 0.0 0.4 63 -0.3 1.1 2.7 0.2 26 -0.8 0.9 3.3 2.5 64 -0.5 2.8 3.0 2.9 27 -0.7 3.3 2.5 2.9 65 0.6 2.0 0.7 2.7 28 0.3 1.8 0.8 2.0 66 -0.9 0.2 1.8 0.8 29 0.3 1.2 0.9 0.8 67 -0.7 1.6 2.0 1.2 30 -0.3 1.2 0.7 3.4 68 0.6 0.1 0.0 1.1 31 0.0 3.1 1.4 1.0 69 0.2 2.0 0.6 0.3 32 -0.4 0.5 2.4 0.3 70 0.7 1.0 2.2 2.9 33 -0.6 1.5 3.1 1.5 71 0.2 2.2 2.5 2.3 34 -0.7 0.4 0.0 0.7 72 -0.2 0.6 2.0 1.5 35 0.3 3.1 2.4 3.0 73 0.4 0.3 1.7 2.2 36 -1.0 1.1 2.2 2.7 74 -0.9 0.0 2.2 1.6 37 -0.6 0.1 3.0 2.6 75 0.2 0.3 0.4 2.6 38 0.9 1.5 1.2 0.2

268

A.4 Fixed Regressor Space Values for the Monte Carlo Studies of Chapter 8 Table A.4: Fixed regressor values for Monte Carlo study #1A.

Ob. X1 X2 Ob. X1 X2 1 4.9809608 25.492144 21 4.5878977 25.997617 2 4.7874692 19.257638 22 5.0093434 24.952955 3 4.8141791 25.734805 23 5.0145014 21.420791 4 4.8263561 23.856727 24 5.2501156 21.538908 5 4.9185266 35.464467 25 4.9249762 13.85356 6 5.1879569 31.971461 26 5.0781594 21.044511 7 4.7263517 32.469273 27 5.0115528 24.651625 8 5.1176512 22.482368 28 5.2391671 24.187701 9 5.2590662 23.102106 29 5.1830581 23.216206

10 4.8883919 19.403064 30 5.1934779 25.323707 11 4.9259145 26.041908 31 4.7102837 24.345368 12 5.3418209 29.853181 32 4.7853804 18.62554 13 5.2894474 23.785344 33 4.8986887 26.421706 14 5.0095029 26.803292 34 5.01165 27.852635 15 5.39985 21.607396 35 5.1730204 22.644743 16 4.8539195 26.162135 36 5.0060727 25.70112 17 4.9254964 14.376639 37 4.8301909 30.352332 18 5.3005711 30.15892 38 5.0656552 23.066039 19 5.1629344 23.996045 39 5.1790372 27.599885 20 5.1226208 31.404363 40 5.0510636 26.084139

Table A.5: Fixed regressor values for Monte Carlo study #2A.

Ob. X 1 6.1264817 2 5.1514701 3 10.974825 4 10.895944 5 15.346236 6 17.578544 7 20.967308 8 20.080270 9 30

10 31

269

Table A.6: Fixed regressor values for Monte Carlo study #5A. Ob. X1 X2 X3 Ob. X1 X2 X3

1 -12.63314 -19.26487 1.9400214 41 -4.039812 -0.042185 0.5523584 2 -2.815594 20.636146 0.3807438 42 3.9569132 -0.27841 -1.261767 3 -15.62276 27.636379 -3.852225 43 6.2070317 1.2660595 1.0668243 4 -10.32639 13.73599 2.8980921 44 -0.167771 0.1123689 1.2843903 5 1.7094749 -15.76244 -1.093925 45 -0.023525 -0.155076 0.1545028 6 -7.237722 -8.904723 -1.101329 46 5.9785688 2.4244832 0.9566864 7 -6.473868 18.512597 -4.663797 47 3.308514 -2.120637 1.420749 8 -15.60019 16.255014 5.2241868 48 -3.095675 0.2705996 1.1357376 9 -5.119335 1.5584515 10.32239 49 -2.568817 -0.42868 0.3093638

10 -6.687004 -11.7059 6.5929259 50 3.8910376 -1.046335 -1.706905 11 -2.007673 -19.13293 5.8354353 51 -0.581567 -0.150737 -0.025609 12 3.9826821 -16.84593 3.3168254 52 -4.538968 0.4338112 0.0240461 13 -7.048526 -4.18178 1.6358927 53 -4.876328 2.6144984 -0.274581 14 8.611309 -13.8958 1.8872 54 -5.062683 4.6974597 -1.805272 15 12.775917 23.178516 7.1739877 55 7.4201247 -0.9778 -1.411264 16 4.8353493 -38.67647 -1.445508 56 4.2257975 -0.373843 -0.471687 17 7.2739298 14.377269 -5.798797 57 0.1833783 1.0085584 0.8020562 18 -2.907927 -37.42785 3.4300881 58 -0.196628 0.1621627 -0.670621 19 -13.00039 35.617861 2.2811412 59 5.8816421 4.8197132 -0.102362 20 8.2139617 8.1490426 2.740089 60 2.4146876 -0.112053 -0.045159 21 -20.33235 39.550698 4.1167225 61 5.4953114 4.4639968 2.1369606 22 -0.866501 -22.70264 2.1091271 62 -4.716316 -1.400887 0.6080264 23 -2.749719 8.1503808 2.0987436 63 2.3233879 -0.494963 -1.994706 24 3.7833267 -29.93519 0.3280623 64 5.2696235 4.3134273 0.3733642 25 -4.894219 7.3370776 -0.743796 65 -1.083115 -1.729413 -0.841712 26 -1.044961 -25.70016 1.1161516 66 -6.022163 -2.825057 -0.783004 27 12.225536 20.417368 -6.735794 67 1.8750951 2.1211983 0.2968708 28 -20.56248 -33.716 5.1705204 68 -2.137572 -0.063033 1.5773927 29 2.4391847 -47.50323 0.69181 69 1.0385323 -0.739242 1.0980694 30 15.424392 19.058291 -5.250473 70 7.6523267 3.523363 -2.534074 31 -1.531221 37.784648 3.870144 71 2.053432 -1.920942 1.0525711 32 -4.209953 0.7162364 2.2663915 72 -8.956132 1.63294 0.655159 33 2.0696837 -3.067015 -0.516119 73 -4.019537 -2.74607 1.0564193 34 -0.623666 -0.161815 -0.597864 74 5.9543706 -1.964611 -0.052726 35 -0.567997 1.7001247 -0.820378 75 1.2592316 -2.310176 -0.60127 36 1.5419861 -0.486937 -0.426942 76 -1.68567 -1.700142 -1.364399 37 -3.83476 -0.894218 -0.922671 77 3.2390345 0.4663662 -0.885468 38 0.2468996 -3.725222 -0.449429 78 5.990521 0.7231447 -0.862605 39 0.041292 -1.51999 -0.764395 79 -0.375221 5.4484518 0.9237609 40 -5.152355 3.2654726 0.1358588 80 -3.875952 2.9268612 1.0281954

270


1 15.612671 10.412916 61.662283 41 -0.362636 -0.189163 14.942533 2 14.354675 10.338138 58.877293 42 -1.042461 0.1055018 2.6506082 3 16.991121 10.399565 58.794224 43 -6.196471 1.8104723 0.4013994 4 14.870043 10.310332 60.121207 44 3.7495193 0.4870614 6.6625427 5 16.09617 7.9369748 59.410052 45 2.7021718 -1.355444 18.799854 6 16.479456 10.394462 59.919326 46 -7.502624 -0.351981 -7.876836 7 13.759513 9.6263779 59.249221 47 3.7887469 -1.071041 12.430455 8 15.259066 10.337438 59.379793 48 -2.860453 1.3193392 -19.11853 9 15.434625 10.4977 59.571855 49 2.1456967 1.1013029 16.996593

10 15.218436 10.462727 58.796314 50 5.4923106 -1.845975 4.9863821 11 15.738415 11.326205 59.873438 51 -2.50006 2.1907411 8.1000388 12 14.341411 10.812122 57.827182 52 -10.23305 0.9321318 -18.08434 13 15.277114 9.8082721 59.426591 53 1.5542781 -0.851137 4.2391628 14 15.870441 9.4297518 62.79483 54 6.2310898 -4.813676 -1.131966 15 16.180492 7.6226479 60.098451 55 1.7155703 2.920129 -5.024686 16 13.057312 10.890865 61.056115 56 -6.264601 -0.161748 -3.719983 17 14.482781 9.6549259 60.00378 57 2.2302766 -2.051084 -5.83325 18 15.423182 8.6233083 62.711046 58 8.1649528 -1.491349 13.192655 19 17.466145 9.1981425 57.209953 59 -6.855456 0.5016447 12.114052 20 14.451848 9.7092961 59.497639 60 1.4815458 -4.353678 -7.10527 21 16.713972 11.208904 60.38396 61 -4.678064 -4.216357 1.3888489 22 14.490928 11.51106 58.959243 62 7.4874483 5.5274363 -5.341442 23 14.816444 7.9174363 59.381259 63 -4.048343 -2.417799 9.2803156 24 14.538062 11.23355 59.634159 64 4.5568444 -0.693475 1.8629724 25 13.69935 10.080184 58.988817 65 1.146686 0.7001665 4.7506557 26 15.851339 8.7402037 60.486518 66 -3.121723 1.9256568 -1.985863 27 15.044312 8.9975301 61.391769 67 -3.178893 -1.791429 -0.910642 28 15.417628 9.6260555 58.619832 68 5.4892127 -2.098882 -4.524306 29 16.35271 9.4969207 61.442102 69 1.8600049 3.9370744 4.819896 30 14.66806 10.447346 60.824961 70 -0.615692 -2.148786 -2.538265 31 14.956232 9.9309694 60.208262 71 -0.516413 2.2520696 0.0189963 32 15.68271 10.715369 58.201547 72 -1.887958 -0.744273 -16.76863 33 -0.255351 -0.253545 1.1780134 73 5.0065134 -0.70802 -5.730122 34 -4.230074 2.0075143 -4.898985 74 5.015331 0.7528367 2.7468605 35 1.2286732 -1.981172 9.7917161 75 -0.702594 -0.042047 16.215394 36 -0.480897 1.7322039 0.0916945 76 -0.0939 -0.419639 -1.814697 37 2.5920694 2.9448557 -12.68092 77 -5.971553 1.4733632 3.0707645 38 -1.04934 -1.917639 6.3678596 78 1.4981456 -0.687445 3.7445067 39 -1.753487 -0.74568 -5.551004 79 -1.086573 -0.146373 3.0537097 40 -1.019329 -1.388671 4.0689872 80 5.6549385 -0.263214 0.4796714

271


1 18.313842 4.2470809 20.588334 41 -3.241267 2.7463418 1.1804903 2 20.732486 6.5224856 21.647932 42 1.4346672 -2.006547 -0.216874 3 20.933312 3.2645251 19.667518 43 -2.348068 1.5883304 1.3249256 4 19.876454 6.3325737 22.19751 44 -1.816612 -2.526255 -0.953691 5 19.885589 4.4945072 21.887203 45 0.1661003 -2.31313 1.5230148 6 20.064247 5.0283085 20.568694 46 0.3397219 0.2046365 0.3343162 7 19.666891 5.7038274 16.912759 47 -6.281029 -2.497294 0.408608 8 20.411364 4.2291487 19.691028 48 0.3552715 4.9859544 -1.90914 9 19.290645 5.7536888 19.526367 49 3.388193 -1.295004 0.7582527

10 19.813565 4.7485448 19.33558 50 1.8216734 0.3777128 0.1288065 11 20.046956 3.7635915 19.691461 51 -0.717786 0.9376402 -0.026645 12 20.218584 4.6833761 21.142938 52 1.4558402 -0.489993 -0.859572 13 19.598102 5.1289578 21.61591 53 -0.333113 1.7402625 -0.356615 14 19.79796 4.1017864 18.406886 54 -9.714929 6.0120708 -0.118785 15 19.264098 4.9050631 19.340981 55 6.4217519 -1.516145 0.0472072 16 21.871796 3.9627682 19.513953 56 -4.969153 -0.103739 -1.702753 17 20.091597 6.0061987 19.917042 57 1.6560671 3.7254981 -0.099689 18 18.039951 5.9577532 21.335001 58 1.3869913 -0.816193 0.6912699 19 19.463812 7.152404 19.769738 59 -2.719783 1.8648985 1.3222459 20 21.949722 4.8897528 20.485465 60 2.6571381 -1.290514 0.0581646 21 19.584361 4.1876544 20.841133 61 -3.113918 -0.258527 0.4802481 22 20.171531 7.1911443 20.103666 62 5.2858482 -1.428909 -2.268635 23 19.772927 4.2830146 20.589669 63 5.6644479 2.5325656 -0.766118 24 19.494625 6.2856132 19.280005 64 -5.5959 0.4334935 0.2720034 25 18.694317 5.4892493 18.945432 65 -3.218565 0.6716364 1.4496364 26 20.531331 5.3497994 17.783146 66 1.5353961 -0.419727 0.2454937 27 21.813142 4.1230021 19.551324 67 -4.727712 1.4717725 0.8737108 28 20.028605 3.0229892 21.262448 68 -6.91386 2.435917 -0.43843 29 20.411783 5.4245609 20.951986 69 -6.650697 0.5008392 -1.526905 30 20.487499 3.285228 20.770357 70 -6.754056 -1.216101 -0.869425 31 20.668191 4.9675646 21.48134 71 -2.788354 2.5927829 0.3156178 32 19.067652 5.7935858 20.659842 72 4.3260299 2.8310556 -0.242471 33 -2.803312 -4.657835 0.9318182 73 2.9794271 -1.209111 -1.162154 34 3.1383683 -0.867456 -1.904331 74 -1.5088 -1.304015 0.429654 35 2.2615526 -1.864327 1.1317165 75 3.1248249 3.5976366 1.9538889 36 -5.714287 -2.137189 0.0461315 76 0.6287572 0.489819 0.000653 37 0.3626909 0.7574952 -1.00893 77 6.0282078 2.2353076 -0.966175 38 -9.299309 -3.49874 -0.1746 78 -5.172284 -1.068494 -1.434851 39 -0.744587 -2.070822 0.015684 79 3.8134851 1.9939145 0.2134581 40 -0.055614 -1.964404 -0.800843 80 -4.286108 0.3510365 0.4403163

272

Table A.9: Fixed regressor values for Monte Carlo study #8A. Ob. X1 X2 X3 X4 X5

1 2.4551994 1.6039969 0.771979 5.5719519 -6.984104 2 5.6548539 -3.201285 -1.695354 5.7301823 -6.792621 3 3.1090154 -0.780371 1.3579994 7.1889829 -8.194307 4 2.5624174 -1.535172 2.6151834 3.2605977 -6.429804 5 -0.820558 2.7977486 -0.721812 1.4424617 -6.260441 6 2.2629493 3.4383323 0.1415163 6.7964361 -7.929694 7 1.9904623 -2.779197 0.7292112 6.0489193 -6.994251 8 1.7303358 4.3706308 -0.55368 8.1214078 -7.426463 9 -6.070742 3.4984463 2.5392434 2.7149043 -6.522669

10 4.7911816 2.535418 0.1223768 7.0145126 -5.249682 11 0.9644754 0.2075137 -0.104733 7.4053104 -9.874539 12 1.1593638 -3.749182 0.4159218 7.478258 -7.013968 13 -1.049377 0.4334127 0.3181273 5.6526353 -5.800515 14 2.167034 -1.299899 -0.668787 3.5299735 -7.18559 15 -4.216607 -1.110726 0.7350772 1.1605536 -5.626214 16 -0.145942 1.694178 0.869457 9.0097736 -5.881709 17 1.5114511 -0.77594 0.2768686 4.5661618 -9.129576 18 3.9214541 -3.520833 0.1547866 8.2321429 -9.55868 19 -5.628962 3.5310496 -0.391717 2.0803022 -6.493972 20 0.2521608 1.3095965 -0.134088 2.5824827 -5.452932 21 -7.191541 -2.455592 -0.163954 0.382836 -8.536511 22 1.5561189 1.798751 -1.217303 8.7991876 -8.28184 23 1.4894123 -2.460487 0.7421407 1.4386101 -5.614571 24 -0.344091 0.6032078 -1.46742 8.8524864 -5.60654 25 1.9577393 0.914194 -2.300632 1.2765181 -7.582997 26 8.8188549 0.9333006 0.6917976 3.3946128 -6.918511 27 -3.926538 -0.19915 0.8155998 3.7342125 -9.14992 28 -5.226785 0.3310104 0.0240536 8.0075797 -8.936005 29 5.9548804 -5.554468 0.4883315 2.4896985 -5.997984 30 -7.932026 -0.88185 1.3850693 7.5295009 -7.880748 31 2.6618469 -2.149163 -0.943253 1.7098614 -5.79987 32 7.6441933 2.1540269 -0.120512 4.5902471 -6.276753 33 3.5542697 0.487055 -1.536644 3.5070226 -5.831364 34 5.3021198 3.0756587 1.2470685 1.9931377 -6.246086 35 2.2547634 -1.659195 1.1336098 1.1446427 -8.161568 36 0.81697 -1.560184 2.2988125 3.5477899 -6.984252 37 -3.233377 1.1070682 -1.130248 2.3214867 -8.456348 38 -2.0823 -4.493644 0.5684646 8.6751484 -9.203917 39 1.1932664 1.206778 -0.086661 0.044744 -5.776753 40 1.8790588 -0.540956 1.6054843 2.5705513 -5.701875 41 -6.359151 1.511217 0.8096913 6.4333155 -5.511977 42 -2.272676 -1.316227 1.5618722 7.2563225 -6.523107 43 4.5252253 0.1072805 -0.261875 3.3339777 -9.696193 44 -0.608778 -3.441668 1.6310245 5.3814767 -8.170321

Continued on the next page.

273

Table A.9 (continued): Fixed regressor values for Monte Carlo study #8A. Ob. X1 X2 X3 X4 X5 45 4.5335906 2.7316635 -0.539804 9.2694879 -9.291695 46 -2.722117 2.5775817 0.3630763 2.6590439 -5.250007 47 -3.032853 -3.818463 1.161327 7.3477194 -6.770939 48 -1.329667 0.6618129 0.078864 8.3051858 -5.027216 49 -0.907956 -0.015041 1.5933869 4.3811683 -6.351639 50 2.1036264 1.2899227 0.9735255 6.3448861 -8.651986 51 0.7578671 -0.941215 0.8203417 4.3733674 -8.731228 52 -2.910649 -0.641768 3.1500148 7.7305325 -5.386668 53 -3.989738 -0.892899 1.3719092 7.8716106 -8.242101 54 -1.597832 -3.401163 -1.296601 3.8767422 -8.216221 55 1.3447741 0.433384 0.1414215 3.6973256 -8.675429 56 1.7238401 5.9119309 -9.41649 0.7562368 -9.168638 57 -12.77296 6.4907924 -9.491445 1.1782766 -8.847313 58 -17.51123 3.088433 -10.10327 0.6633642 -5.81194 59 -16.62507 1.8828395 -10.63883 3.8697848 -8.146697 60 -11.28115 4.7825219 -11.38552 2.1035216 -6.462635 61 -1.037358 -8.770928 -4.596174 0.5672843 -8.22988 62 -12.42756 -0.766698 -0.793724 8.5944903 -6.428173 63 0.3294962 -11.77444 6.6942686 8.2294403 -6.342292 64 -16.95548 0.9343733 6.8119237 7.1636888 -8.041689 65 16.993645 -1.992438 -5.534329 8.7889585 -5.795288 66 0.7602474 1.9069127 0.6535647 9.1171939 -9.566875 67 -9.621467 -2.074851 -0.671032 3.6210901 -6.745869 68 -1.466116 1.0942845 0.5419799 9.3698862 -8.730752 69 -1.560495 2.3671872 -0.150141 9.6537791 -6.587297 70 -3.74385 -0.010918 1.0280631 2.1412906 -6.112808 71 21.422037 6.1305026 20.381449 1.0084804 -5.55427 72 21.356835 3.6927493 18.704121 8.5743736 -6.981464 73 19.292778 4.9135976 19.98245 6.4891026 -6.662199 74 19.688347 5.1705252 20.875589 0.7936889 -6.032444 75 21.553826 4.9428576 21.493484 7.3566883 -6.358846 76 19.830023 5.8758502 18.091948 3.3171357 -9.861476 77 19.42379 5.4940438 18.944235 3.5873845 -9.854685 78 21.025342 4.2414094 20.854593 9.5812527 -6.802351 79 20.387811 5.1837329 18.143042 6.9824586 -5.737176 80 19.216838 5.5071664 21.032923 1.819173 -8.220059 81 18.430588 6.9120259 18.793073 2.358986 -6.417116 82 20.645001 2.9035296 21.737649 7.126585 -8.154202 83 19.912493 4.6816624 18.850053 6.2787218 -8.860819 84 21.057672 5.5850622 19.438177 5.4760526 -9.407884 85 20.52614 5.143221 21.707647 3.2447915 -6.946749 86 20.978552 6.2410273 21.031866 4.3290161 -6.261348 87 18.970651 5.8613665 20.086693 4.8390831 -9.780552 88 19.670015 6.0964583 19.468727 6.1924442 -5.05844 89 19.434555 6.0079646 20.56069 5.3611634 -7.269481 90 18.719127 5.2078154 20.456912 3.9216247 -6.735526

Continued on the next page.

274

Table A.9 (continued): Fixed regressor values for Monte Carlo study #8A. Ob. X1 X2 X3 X4 X5 91 22.202122 5.0896591 19.654384 9.2384985 -7.764876 92 20.128503 5.0287128 19.765722 9.4219474 -5.04005 93 19.221674 4.6900579 19.852953 5.1815294 -9.090241 94 21.230722 8.3342261 21.37897 3.4813171 -7.142093 95 18.228396 5.8226252 21.39785 3.8547973 -6.391827 96 18.996908 3.6999872 20.932438 7.5248821 -9.775834 97 20.20852 5.0543198 21.639637 5.0679806 -7.550655 98 21.043343 6.2318201 18.49674 7.90221 -7.642038 99 20.264573 5.6270313 20.634759 0.2253018 -8.658174 100 20.132266 5.6359124 20.006934 8.3675054 -9.351743

275

APPENDIX B Detailed Algorithms

B.1 Stahel-Donoho Projection-based Location and Scale Estimator Source: Rousseeuw and Leroy (1987).

Step 1: The robust distance, denoted by iu , for the ith observation ,y iz , is defined as

, ,

1, ,

med( )sup

med med( )

y i y jji

y k y jk j

u ∀

=

∀ ∀

′ ′−=

′ ′−v

z v z v

z v z v.

Here, v represents a 1p× directional vector, on which the projections of the n

observations are placed. The median and MAD of a given set of n projections are

calculated so that this set can be centered and scaled. By looking at the data �from

all angles,� the robust distance for a particular observation, iu , represents the

largest separation of this observation from the rest of the data.

Step 2: Given this set of n robust distances the location estimator is found as

,1

1

( )

( )

n

i y ii

n

ii

w u

w u

=

=

=∑

∑

zm ,

which in turn is used in the calculation of the weighted covariance matrix,

, ,1

1

( )( )( )

( ) 1

n

i y i y ii

n

ii

w u

w u

=

=

′− −=

−

∑

∑

z m z mC ,

where ( )w u is a weight function (that is strictly positive, a decreasing function of

u (u must be non-negative) and such that ( )u w u⋅ is bounded).

276

B.2 Transformed One-step Weighted Dispersion Estimator Source: Ruiz-Gazen (1996).

Step 1: First, an initial robust location estimator, �µ , is determined. This is used to

compute an intermediate dispersion matrix defined as

( )( ), ,1

1 � �n

y i y iin =

′= − −∑V z µ z µ .

Step 2: A weight function whose argument is proportional to the robust distance based on

( )� ,µ V produces individual observation weights, defined as

( ) ( )1, ,� �Ki y i y iw − ′= β − −

z µ V z µ ,

with K being a positive and decreasing function, and β being any non-negative

scalar. These weights are used in computing a one-step dispersion estimator,

found by

( )( ), ,1

1

� �

1

n

i y i y ii

n

ii

w

w

=

=

′− −=

−

∑

∑

z µ z µC .

Step 3: Finally, the robust dispersion matrix is found via the transformation

( ) 11 1 −− −= −βU C V ,

by which consistency is attained.

In order to obtain U, it is necessary to define the initial location estimator, �µ , the weight

function, K, and the scalar, β. Recommendations are that �µ be the coordinatewise median

vector, ( ) exp( / 2)K x x= − , and 26 pβ = ε , where ε represents the percentage of contamination

in the data (in the article, 0.20ε ≤ was investigated). It is suggested that the breakdown point for

277

this procedure is roughly 20%, and it appears as though the choice for β has a rather large impact

on whether extreme observations are detected.

B.3 Exact Minimum Volume Ellipsoid (MVE) Estimator Source: Hawkins, D. M. (1993).

Step 1: Given a subset containing h observations, initially (at iteration stage 0k = ) give

each observation equal weight, or (0) 1iw h−= .

Step 2: Calculate the weighted mean vector and associated weighted covariance matrix.

Compute all n individual robust Mahalanobis distances, denoted by ( )( )kiD w , and

determine the maximum such distance, ( )max ( )kD w .

Step 3: If ( )max ( )kD w p< + ε , where ε is a convergence bound, then the iteration process

stops. Otherwise, update the weights for the 1k + iteration by ( ) ( )

( 1) ( )k kk i i

iw D ww

p+ = ,

and go back to Step 2.

Step 4: Repeat Steps 1, 2, & 3 for every subset of size h. Since the volume of an ellipsoid

is proportional to the determinant of the weighted covariance matrix, the exact

MVE estimators are the weighted mean vector and associated weighted covariance

matrix which correspond to the subset yielding the smallest such determinant. The

weights across these h observations will not, in general, be equal.

278

B.4 Approximate Minimum Volume Ellipsoid (MVE) Estimator via the

Random Subsampling Algorithm Source: Rousseeuw and Leroy (1987).

Step 1: The number of subsets, N, needed is found by equating the probability that at least

one of the N elemental sets will contain only good points to a value (near 1) that is

acceptable. Using 0.999 is cited often (Rousseeuw and Leroy (1987)). Let ε

denote the percentage of contamination in the data. This produces the equality 10.999 1 (1 (1 ) )p Nε += − − − , which by letting 0.5ε = , means that the necessary

number of random subsets is

1ln(0.001)

ln(1 (0.5) )pN +=−

.

Many times, this step is not employed. Instead, the researcher arbitrarily selects N

to be 500 or 1000, for convenience.

Step 2: Randomly select 1p + points. J will refer to this subset of points. These points

need to form a full rank design space, so discard the sample otherwise.

Step 3: Determine the 1p× mean vector and p p× covariance matrix:

,1

1J y ii Jp ∈

=+ ∑z z

and

( )( ), ,1

J y i J y i Ji Jp ∈

′= − −∑C z z z z .

Step 4: Calculate the set of n robust squared Mahalanobis distances, 2iRD , and find the

median, 2Jm , of them:

279

( ) ( )2 2 1, ,med medJ i y i J J y i Ji i

m RD −

∀ ∀

′= = − −z z C z z .

Step 5: The objective function is to minimize the volume of the ellipsoid, where 1/ 2Volume (det( )) ( ) p

J Jm∝ C .

Thus, for each elemental set, calculate 1/ 2(det( )) ( )pJ JmC . If this represents the

smallest objective function calculated thus far, store this elemental set as being the

current best subset.

Step 6: After all N elemental sets have been drawn and analyzed, the MVE estimators are

based on the best subset from Step 5, and calculated by

J=m z

and 2 1 2

0.50( )p, J Jm−= χC C .

The location vector, m, is just the average coordinate vector of the best elemental

set. The covariance matrix of the best elemental set is multiplied by 2 1 2, 0.50( )p Jm−χ

to expand (or shrink) the ellipsoid and ensure that half the data is contained,

forming C.

Rousseeuw and Leroy (1987) suggest incorporating a one-step improvement to the MVE

estimators, which adds the following three steps to the algorithm.

Step 7: The robust squared Mahalanobis distance for each observation can be calculated in

the usual manner, 2 1

, ,( ) ( ), 1, 2, ..., .i y i y iRD i n−′= − − =z m C z m

Step 8: A binary weighting scheme is now used. Observations with large 2iRD values are

given zero weights due to their extremity in the regressor space. Small 2iRD

280

values are an indication of good points, and thus full weight is given to these

observations. By viewing the robust squared Mahalanobis distances as having an

(asymptotic) chi-square distribution with p degrees of freedom, the 0.975 quantile

of this distribution becomes the critical cutoff value: 2 2

, 0.9751, ,0, .

i pi

if RDw

otherwise ≤ χ

=

Step 9: The one-step improvement MVE�s become

,1

1

n

i y ii

n

ii

w

w

=

=

=∑

∑

zm ,

and

( ) ( ), ,1

1

1

n

i y i y ii

n

ii

w

w

=

=

′− −=

−

∑

∑

z m z mC .

B.5 Exact Minimum Covariance Determinant (MCD) Estimator via the

Feasible Solution Algorithm Source: Hawkins, D. M. (1994).

Step 1: Randomly select h observations. Calculate the determinant of the usual covariance

matrix for this subset. The n observations fall into two categories, those h being

�active� and the remaining n h− being �trimmed.�

Step 2: Consider all possible pairwise swaps of one active observation and one trimmed

observation. Calculate the determinant of the usual covariance matrix for each of

these potentially new active subsets. If a reduction in the determinant (versus that

obtained from the active subset) is realized, then actually perform the swap that

281

corresponds to the smallest observed determinant and repeat Step 2. When no

such reduction in the determinant occurs, then the search concludes and this active

subset is considered a feasible solution.

Step 3: Repeat Steps 1 & 2 for a total of N searches, where a number of feasible solutions

may be found. The MCD estimators are the mean vector and covariance matrix

based on the feasible solution with the smallest determinant.

B.6 Stalactite Plot Analysis Source: Atkinson, A. C. and Mulira, H. M. (1993).

Step 1: Randomly select a subset, denoted by I, of 1m p= + observations.

Step 2: Calculate the location estimator, mm , as the coordinatewise mean vector of I.

Then, calculate the usual covariance estimator, mC , as

( )( ), ,1

1m y i m y i mi Im ∈

′= − −− ∑C z m z m .

These initial estimators are used to create a set of n robust Mahalanobis distances,

found by

1, ,( ) ( ) ( )i y i m m y i md m −′= − −z m C z m .

Step 3: The size, m, of subset I is now increased by 1. Those observations with the

smallest m robust distances now define subset I. Therefore, observations included

in the previous subset are not necessarily among those in the new I subset.

Step 4: Repeat Steps 2 and 3 until subset I contains all n observations.

Step 5: Repeat the entire algorithm (Steps 1 through 4) many (N) times.

282

Step 6: The stalactite plot is based upon the sequence that produced the largest robust

distance when the subset size was some specified percentage of the sample size.

Atkinson suggests using 80% or 90% to define the critical stage. However, actual

estimators for location and dispersion could be acquired by scanning across all

sequences and determining which subset of �half� the data produced the largest

robust distance.

One final note is that to obtain pronounced results, the robust distances are normalized

before being plotted. The sum of all n robust distances at stage m is found for each simulation,

and the average of these N sums is found, and denoted by ( )T m . The normalized distance

becomes

2( 1) ( )( )( )

ii

p n d md mT m−

=% .

B.7 Hadi�s Forward Search Source: Hadi, A. S. (1992).

Step 1: Define the initial location estimator, 0m , as the coordinatewise median vector.

Then, calculate the initial covariance estimator, 0C , as

( )( )0 , 0 , 01

11

n

y i y iin =

′= − −− ∑C z m z m .

These initial estimators are used to create a set of n robust Mahalanobis distances,

found by

( ) ( )1, 0 0 , 0i y i y iRD −′= − −z m C z m .

283

Step 2: Trimmed location and covariance estimators are now determined by using only

those observations that correspond to the smallest [ ]( 1) / 2h n p= + + robust

distances. These estimators, in turn, are used to create a new set of n robust

distances.

Step 3: Each observation is now placed into exactly one of two subsets, which Hadi refers

to as �basic� and �non-basic�. The basic subset consists of the 1r p= +

observations having the smallest robust distances. All remaining observations are

placed in the non-basic subset.

Step 4: The usual mean vector and covariance matrix based solely on the basic subset are

used to calculate a new set of robust distances. Here, assuming that the basic

subset is of full rank, we have

1, ,( ) ( )i y i b b y i bRD −′= − −z m C z m .

If a less than full rank situation occurs, this calculation is replaced by

( ) ( ), ,i y i b b b b y i bRD ′ ′= − −z m V W V z m ,

where bV is the matrix of normalized eigenvectors of bC and bW is a diagonal

matrix whose jth diagonal element is defined as

1 , 1, 2,..., ,max{ , }j

j s

w j n= =λ λ

with sλ being the smallest non-zero eigenvalue of bC .

Step 5: Again, each observation is now placed into either the basic or non-basic subset.

Here the size, r, of the basic subset is increased by 1 over the previous basic

subset, and consists of the r observations having the smallest robust distances. All

remaining observations are placed in the non-basic subset.

284

Step 6: Continue cycle of Steps 4 & 5 until the basic subset contains [ ]( 1) / 2h n p= + +

observations. A side note is that Hadi also mentions using a critical value for the

robust distances in obtaining a stopping rule, but opts to avoid this approach.

Step 7: The final basic subset is used to produce the final robust estimators of location and

scale.

According to Rocke and Woodruff (1996), �the algorithm ... breaks down if the

contamination is extremely far away from the good data in the correct metric.� They also point

out that the procedure is not affine equivariant because of the use of the coordinatewise median.

285

VITA

David E. Lawrence, son of Henry and Shelby Lawrence, Jr., was born and raised in northeast

Ohio, graduating from Cuyahoga Falls High School in 1984. He received his Bachelor of Science

in 1990 from the University of Akron, majoring in statistics while also studying mathematics and

engineering. After two years as a graduate teaching assistant in the Mathematical Sciences

Department at the University of Akron, he received his Master of Science (1992), also in

statistics. From 1992 until 1996, he was a graduate teaching assistant in the Statistics Department

at Virginia Polytechnic Institute and State University. Thereafter until the present time, he has

been employed as a statistician with Becton Dickinson and Company, a medical device company

in New Jersey. He and his fiancée, Ms. Jane McGonigle, are to be married in September 2003.

REFERENCES - vtechworks.lib.vt.edu · 259 REFERENCES Agostinelli, C. and Markatou, M. (1998), ﬁA one-step robust estimator for regression based on a weighted likelihood reweighting

Documents