Diagnostics and Remedial Measures: An Overview Residuals Model diagnostics I Graphical techniques I Hypothesis testing Remedial measures I Transformation Later: more about all this for multiple regression W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 50
52
Embed
Diagnostics and Remedial Measures: An Overviewriczw/teach/STAT540_F15/Lecture/lec04.pdfI Logically, you are investigating model assumptions not \marginal e ect". W. Zhou (Colorado
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Diagnostics and Remedial Measures: An Overview
Residuals
Model diagnostics
I Graphical techniques
I Hypothesis testing
Remedial measures
I Transformation
Later: more about all this for multiple regression
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 50
Model Assumptions
Recall simple linear regression model.
Yi = β0 + β1Xi + εi, εi ∼ iid N(0, σ2),
for i = 1, . . . , n.
A linear-line relationship between E(Y ) and X:
Homogeneous variance:
Independence:
Normal distribution:
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 50
Ramifications If Assumptions Violated
Recall simple linear regression model.
Nonlinearity
I Linear model will fit poorly
I Parameter estimates may be meaningless
Non-independence
I Parameter estimates are still unbiased
I Standard errors are a problem and thus so is inference
Nonconstant variance
I Parameter estimates are still unbiased
I Standard errors are a problem
Non-normality
I Least important, why?
I Inference is fairly robust to non-normality
I Important effects on prediction intervals
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 50
Model Diagnostics
Reliable inference hinges on reasonable adherence to model assumptions
Hence it is important to evaluate the FOUR model assumptions, that is, to
perform model “diagnostics”.
The main approach to model diagnostics is to examine the residuals (thanks
to the additive model assumption)
Consider two approaches.
I Graphical techniques: More subjective but quick and very informative for an
expert.
I Hypothesis tests: More objective and comfortable for amateurs, but outcomes
depend on assumptions, sensitivity. Tendency to use as a crutch.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 50
Graphical Techniques
At this point in the analysis, you have already done EDA.
I 1D exploration of X and Y .
I 2D exploration of X and Y .
I Not very effective for model diagnostics except in drastic cases
Recall the definition of residual
ei = Yi − Yi, where i = 1, . . . , n
ei can be treated as an estimate of the true error
εi = Yi − E(Yi) ∼ iid N(0, σ2)
ei can be used to check normality, homoscedasticity, linearity, and
independence.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 50
Properties of Residuals
Mean:
e =
Variance:
MSE =SSE
n− 2=
∑ni=1 e
2i
n− 2=
∑ni=1(ei − e)2
n− 2= s2.
Nonindependence:
When the sample size n is large, however, residuals can be treated as
independent.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 50
Standardized Residuals
For diagnostics there are superior choices to the ‘ordinary residuals’
I Standardized (KNNL: ‘semi-studentized’) residuals:
V ar(εi) = σ2
therefore is is natural to apply the standardization
e∗i =
But each ei has a different variance..
I Use this fact to derive superior type of residuals below
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 50
Hat Values
Yi =
The hij are called hat values.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 50
Deriving the Variance of Residuals
Using Yi =∑j hijYj we obtain
ei =
Therefore (since the Y’s are independent)
V ar{ei} =
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 50
Continuing to Derive Variance of ResidualsUsing V ar{ei} = σ2
[(1− hii)2 +
∑j 6=i h
2ij
], we have∑
j
h2ij = hii
(show it in HW.) Finally,
V ar{ei} = σ2
(1− hii)2 +∑j 6=i
h2ij
= σ2
1− 2hii + h2ii +∑j 6=i
h2ij
= σ2
1− 2hii +∑j
h2ij
= σ2 (1− 2hii + hii)
= σ2(1− hii)W. Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 50
Studentized Residuals
Now we may scale each residual separately by its own standard deviation
The (internally) studentized residual is
ri = ei/√MSE(1− hii)
There is still a problem: Imagine that Yi is a severe outlier
I Yi will strongly ‘pull’ the regression line toward it
I ei will understate the distance between Yi and the ‘true’ regression line
The solution is to use ‘externally studentized residuals’. . .
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 50
Studentized Residuals
Now we may scale each residual separately by its own standard deviation
The (internally) studentized residual is
ri = ei/√MSE(1− hii)
There is still a problem: Imagine that Yi is a severe outlier
I Yi will strongly ‘pull’ the regression line toward it
I ei will understate the distance between Yi and the ‘true’ regression line
The solution is to use ‘externally studentized residuals’. . .
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 50
Studentized Deleted Residuals
To eliminate the influence of Yi on the misfit at the ith point, fit the
regression line based on all points except the ith.
Define the prediction at Xi using this deleted regression as Yi(i)
The ‘deleted residual’ is di = Yi − Yi(i)The studentized deleted residual is
ti = di/s{di} =Yi − Yi(i)√
MSE(i)/(1− hii)
No need to fit n deleted regressions, we can show that
di = ei/(1− hii)
(n− 2)MSE = (n− 3)MSE(i) + e2i /(1− hii)
Also, ti has a t-distribution: ti ∼ tn−3
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 50
Residual Plots
Residual plot is a primary graphic diagnostic method.I Departures from model assumptions can be difficult to detect directly from X
and Y .
I Use the externally standardized residuals
Some key residual plots:I Plot ti against predicted values Yi (Not Yi)
F detect nonconstant varianceF detect nonlinearityF detect outliers
I Plot ti against Xi.F In simple linear regression this is same as above (Why?)F In multiple regression will be useful to detect partial correlation
I Plot ti versus other possible predictors (e.g., time)F Detect important lurking variable
I Plot ti versus lagged residualsF Detect correlated errors
I QQ-plot or normal probability (PP-) plot of ti.F Detect non-normality
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 50
Nonlinearity of Regression Function
Plot ti against Yi (and Xi for multiple linear regressions).
I Random scatter indicates no serious departure from linearity.
I Banana indicates departure from linearity.
I Could fit nonparametric smoother to residual plot to aid detection
Plotting Y vs. X is not nearly as effective for detecting nonlinearity because
trend has not been removed
I Logically, you are investigating model assumptions not “marginal effect”.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 50
Nonconstant Error Variance
Plot ti against Yi (and Xi for multiple linear regressions).
I Random scatter indicates no serious departure from constant variance.
I Could fit nonparametric smoother to this plot to aid detection
Funnel indicates non-constant variance.
Example: KNNL Figure 3.4(c).
Often both nonconstant variance and nonlinearity exist.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 50
Nonindependence of Error Terms
Possible causes of nonindependence.
I Observations collected over time and/or across space.
I Study done on sets of siblings.
Departure from independence. For example,
I Trend effect (KNNL Figure 3.4(d), 3.8(a)).
I Cyclical nonindependence (KNNL Figure 3.8(b)).
Plot ti against other covariate, such as time.
Autocorrelation function plot ( acf() )
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 50
Nonnormality of Error Terms
Box plot, histogram, stem-and-leaf plot of ti.
QQ (quantile-quantile) plot.
1 Order the residuals: t(1) ≤ t(2) ≤ · · · ≤ t(n).2 Find the corresponding “rankits”: z(1) ≤ z(2) ≤ · · · ≤ z(n), where for
k = 1, . . . , n,
z(k) =√MSE × z
(k − 0.375
n+ 0.25
)is an approximation of the expected value of the kth smallest observation in a
normal random sample.
3 Plot t(k) against z(k).
QQ plot should be approximately linear if normality holds
I ‘S’ shape means distribution of residuals has light (’short’) tails
I Backwards ‘S’ means heavy tails
I ‘C’ or backwards ‘C’ means skew
It is a good idea to examine other possible problems first.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 50
Presence of Outliers
An outlier refers to an extreme observation.
Some diagnostic methods
I Box plot of ti.
I Plot ti against Yi (and Xi).
I ti which are very unlikely compared to the reference t-distribution could be
called outliers
I Modern cluster analysis methods
Outliers may convey important information.
I An error.
I A different mechanism is at work.
I A significant discovery.
Temptation to throw away outliers because they may strongly influence
parameter estimates.
I Doesn’t mean that the model is right and the data point is wrong
I The data point is right and the model is wrong
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 50
Graphical Techniques: Remarks
We generally do not plot residuals (ti) against response (Yi). Why?
Residual plots may provide evidence against model assumptions, but do not
generally validate assumptions.
For data analysis in practice:
I Fit model and check model assumptions (an iterative process).
I Generally do not include residual plots in a report, but include a sentence or
two such as “Standard diagnostics did not indicate any violations of the
assumptions for this model.”
For this class, always include residual plots for homework assignments so you
can learn the methods
No magic formulas.
Decision may be difficult for small sample size.
As much art as science.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 50
Diagnostic Methods Based on Hypothesis Testing
Tests for linearity: F test for lack of fit (Section 3.7).
Tests for constancy of variance (Section 3.6):
I Brown-Forsythe test.
I Breusch-Pagan test.
I Levene’s test.
I Bartlett’s test.
Tests for independence (Chapter 12):
I Runs test.
I Durbin-Watson test.
Tests for normality (Section 3.5).
I χ2 test.
I Kolmogorov-Smirnov test.
Tests for outliers (Chapter 10).
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 50
F Test for Lack of Fit
Residual plots can be used to assess the adequacy of a simple linear regression
model. A more formal procedure is a test for lack of fit using “pure error”.
Need ‘repeat groups’
For a given data set, suppose we have fitted a simple linear regression model
and computed regression error sum of squares
SSE =
n∑i=1
(Yi − Yi)2.
These deviations Yi − Yi could be due to either random fluctuations around
the linear line or an inadequate model
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 21 / 50
Pure Error and Lack of Fit
The main idea is to take several observations on Y for the same X,
independently, to distinguish the error due to random fluctuations around the
linear line and the error due to lack of fit of the simple linear regression
model.
The variation among the repeated measurements is called “pure error”.
The remaining error variation is called “lack of fit”
Thus we can partition the regression SSE into two parts:
SSE = SSPE + SSLF
where SSPE = SS Pure Error and SSLF = SS Lack of Fit.
Actually, we are comparing a “Linear Function” with a “Simple function”.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 22 / 50
Pure Error and Lack of Fit
One possibility is that pure error is comparatively large and the linear model
seems adequate. That is, pure error is a large part of the SSE.
The other possibility is that pure error is comparatively small and linear
model seems inadequate. That is, pure error is a small part of the regression
error and error due to lack of fit is then a large part of the SSE.
If the latter case holds, there may be significant evidence of lack of fit.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 23 / 50
Notation
Models (R notation):
I Null (N): Y ∼ 1, common mean model
I Linear regression is Reduced (R): Y ∼ X, regression model
I ANOVA is Full (F): Y ∼ factor(X), separate mean model
I Notation:
F Yij are the data, where j indexes groups and i indexes individuals. (Sums will
be taken over all available indices).F Y is the grand meanF Yj is the jth group meanF Yij are the fitted values using the regression line.F Note that Yj are the fitted values under the ANOVA model that fits group
means, Y ∼ factor(X)
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 24 / 50
Sums of Squares
Recall: All sums are over both i and j except as noted.
SSTO =∑
(Yij − Y )2
SSRR =∑
(Yij − Y )2
SSER =∑
(Yij − Yij)2
SSTO = SSRR + SSER
SSPE = SSEF =∑
(Yij − Yj)2
SSLF =∑
(Yj − Yij)2 =∑j nj(Yj − Yij)2
SSER = SSPE + SSLF
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 25 / 50
LOF ANOVA Table
One way to summarize the LOF test is by ANOVA:
Source df SS MS
Regression 1 SSR SSR/1
Lack of Fit r − 2 SSLF MSLF=SSLF/(r − 2)
Pure Error n− r SSPE MSPE=SSPE/(n− r)Total n− 1 SSTO
I E(MSPE) = σ2 and E(MSLF ) = σ2 +
r∑i=1
ni(µi−(β0+β1xi))2
r−2
I F-test for lack of fit is therefore:
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 26 / 50
LOF as model comparison
In fact, the above lack of fit test is doing model comparison
I our desired model Y ∼ X to the potentially better model Y ∼ factor(X)
which would be required if the linear model fit poorly.
Apply the GLT to compare these two models (are they nested?)
FLOF =SSER − SSEFdfR − dfF
/SSEFdfF
Notice SSER − SSEF = SSER − SSPE = SSLF and SSEF = SSPE so
FLOF = MSLF/MSPE and LOF ANOVA F-test is same as model
comparison by FLOF .
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 27 / 50
Lack of Fit in Ranova(reduced.lm)
Df Sum Sq Mean Sq F value Pr(>F)
x 1 60.95 60.950 193.07 1.395e-09 #<--------------------SSR_R