Introduction Robust regression Examples Conclusion Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics Jiří Franc Robust regression 16. 11. 2009
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Xi=(Xi,1,Xi,2, . . . ,Xi,p)T is called explanatory variables or regressors and
it is a sequence of deterministic p-dimensional vectors or a sequence ofrandom variables,
Yi is called response variable (dependent variable) and it is i-th elementof the random sequence of observations,
β0=(β01 , β
02 , . . . , β
0p)T is a p-dimensional vector of true regression
coefficients,
ei is called the sequence of disturbances (error terms), it representsunexplained variation in the dependent variable and it is a sequence ofrandom variables.
In the certain conditions OLS is the best linear unbiased estimatorof β0.
In the certain conditions OLS is the best estimator among allunbiased estimators (ordinary least squares method is best formultiple regression when the iid errors are normal distributed.)
OLS is not robust and consequently often gives false result for realdata (even a single regression outlier can totally offset the OLSestimator).
description of the structure best fitting the bulk of the data.
identification of deviating data points (outliers) or deviatingsubstructures for further treatment.
identification of highly influential data points (leverage points) or atleast warning about them.
deal with unsuspected serial correlations.
Two ways how to deal with regression outliers:
Regression diagnostics: where certain quantities are computedfrom the data with the purpose of pinpointing influential points,after which these outliers can be removed or corrected.
Robust regression: which tries to devise estimators that are not sostrongly affected by outliers.
description of the structure best fitting the bulk of the data.
identification of deviating data points (outliers) or deviatingsubstructures for further treatment.
identification of highly influential data points (leverage points) or atleast warning about them.
deal with unsuspected serial correlations.
Two ways how to deal with regression outliers:
Regression diagnostics: where certain quantities are computedfrom the data with the purpose of pinpointing influential points,after which these outliers can be removed or corrected.
Robust regression: which tries to devise estimators that are not sostrongly affected by outliers.
where (Y(1), . . . ,Y(n)) is the order statistic and α is definite coefficient:α ∈ 〈0, 1〉.
α− regression quantile (Koenker, Bassett (1978))
β(Lrq ,α) = arg minβ∈Rp
n∑i=1
ρα(ri ),
where ri (β) = Yi − XTi β i = 1 . . . n and ρα(ri ) =
{α |ri | if ri ≥ 0(α− 1) |ri | if ri < 0
.
L-estimators are scale equivariant and regression equivariant.However the breakdown point is still 0% and for α = 0.5 holds, 0.5-regressionquantile estimator is least absolute values estimator.
OLS , L1 are also M-estimators with ψ(t) = t for OLS and ψ(t) = sgn(t)for L1 estimate.
M-estimators are unfortunately not scale equivariant even if they areregression equivariant. Hence one has to studentizate the M-estimators byan estimate of scale of disturbances σ necessarily.
β(M) = arg minβ∈Rp
n∑i=1
ρ
(ri (β)
σ
),
One possibility is to use the median absolute deviation (MAD):
σ = C mediani
(∣∣∣∣ri −medianj
(rj)∣∣∣∣) ,
where C is a constant (correction factor), which depends on the distribution.For normally distributed data C = 1.4826.
Let vectors (Yi ,XTi )T are iid with distribution function F (Y ,X). If the function
ρ has an absolutely continuously derivation ψ, σ = 1 for simplicity and thefunctional T (F ) corresponding to the M-estimator then the functional T (F ) isthe solution of ∫
ψ(Y − XTT (F ))XdF (X,Y ) = 0.
DefineM(ψ,F ) :=
∫ψ′(Y − XTT (F ))XXTdF (X,Y ).
Then the influence function of T at a distribution F (on Rp × R) is given by
IF (X0,Y0,T ,F ) = M−1(ψ,F )X0ψ(Y0 − X0TT (F )).
The influence function with respect of Y0 can by bounded by choice of ψ , butthe influence function of M-estimators is unbounded in respect of X0.
The breakdown point of M-estimators is 0% due to the vulnerability to leveragepoints.
Generalized M-estimators are introduced in order to bounding theinfluence function of outlying Xi ’s by means of some weight function w .
GM-estimators
β(GM) = arg minβ∈Rp
n∑i=1
w(Xi )ρ(ri (β))
σ
The definition can be rewrite ton∑
i=1
w(Xi )ψ( riσ
)Xi = 0.
Unfortunately Maronna, Buston and Yohai (1979) showed that thebreakdown point of GM-estimators can be no better than a certain valuethat decrease as a function of p−1, where p is the number of regressioncoefficients.
R-estimation is procedure based on the ranks of the residuals. The idea ofusing rank statistics has been extended to the domain of multipleregression by Jurečková(1971) and Jaeckel(1972).
Let Ri (β) be the rank of Yi − XiTβ, further let an(i), i = 1, . . . , n be a
S-estimators were introduced by Rousseeuw and Yohai (1984), they arederived from a scale statistic in an implicit way and they are regression,scale and affine equivariant.
S-estimators are defined by minimization of the dispersion of the residualsby:
S-estimator
β(S,c,K) = arg minβ∈Rp
s(r1(β), . . . , rn(β)) = arg minβ∈Rp
s(β),
where s(β) is estimator of scale defined as the solution of equation
MM-estimators: high-breakdown and high-efficiency estimators, wherethe initial estimate is obtained with an S-estimator, and it is thenimproved with an M-estimator.
LMS is probably the first really applicable 50% breakdown pointestimator introduced by Rousseeuw (1984). The idea of the LMS isto replace the sum operator by a median, which is very robust.
Least median of squares estimator
β(LMS) = arg minβ∈Rp
(med
i(r2
i (β))
).
There always exists a solution for the LMS estimator.The LMS estimator is regression equivariant , scale equivariantand affine equivariant.If p > 1 then the breakdown point of the LMS method is
LMS is probably the first really applicable 50% breakdown pointestimator introduced by Rousseeuw (1984). The idea of the LMS isto replace the sum operator by a median, which is very robust.
Least median of squares estimator
β(LMS) = arg minβ∈Rp
(med
i(r2
i (β))
).
There always exists a solution for the LMS estimator.The LMS estimator is regression equivariant , scale equivariantand affine equivariant.If p > 1 then the breakdown point of the LMS method is
The drawback of the LTS is that the objective function requiressorting of the squared residuals, which takes O(n log n) operations.
Robust high breakdown point estimators like the LTS can be verysensitive to a very small change of data or to a deletion of even onepoint from data set (i.e. small change of data can really cause alarge change of the estimate).
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel,W. A. (1986), Robust Statistics: The Approach Based onInfluence Functions, J. Wiley, New York.Rousseeuw, P. J. and Leroy, A. M. (1987), Robust Regressionand Outlier Detection. J. Wiley, New York.Jurečková, J. (2001). Robustní statistické metody, Karolinum,PragueVíšek, J. Á. (2000): Regression with high breakdown point.Robust 2000, 2001, 324 - 356.