Multiple Regression → One outcome, many explaining variables Example: Ultrasound scanning, shortly before birth (1-3 days before) OBS WEIGHT BPD AD 1 2350 88 92 2 2450 91 98 3 3300 94 110 . . . . . . . . . . . . 105 3550 92 116 106 1173 72 73 107 2900 92 104 (BPD: Head diameter; AD: Stomach circumference) Objectives could be: • Prediction, construction of normal regions for diagnostic use (as here) • Calculation of causal relationships for intervention use • Scientific insight 1 First we look at a single covariate, bpd: The statistical model for a simple linear regression was Y i = α + βX i + ε i ,ε i ∼ N (0,σ 2 ) indep. Here there is a marked deviation from linearity! How does that look in model checking? 2 Model checking Statistical model: Y i = α + βX i + ε i ,ε i ∼ N (0,σ 2 ) indep. What do we have to check here? • linearity • variance homogeneity • deviations from normality (distance to the line) Note: • No assumption of normality for the x i ! • Independence between the Y i is checked by inspecting – Are there several observations from the same individual? – Are there persons from the same family? Twins? 3
19
Embed
Multiple Regression Model checkingstaff.pubhealth.ku.dk/~lts/engelsk_basal/overheads... · Statistics/Regression/Linear , by clicking Plots/Residual , where, for example, Ordinary
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Multiple Regression
→ One outcome, many explaining variables
Example:Ultrasound scanning, shortly before birth(1-3 days before)
OBS WEIGHT BPD AD
1 2350 88 92
2 2450 91 98
3 3300 94 110
. . . .
. . . .
. . . .
105 3550 92 116
106 1173 72 73
107 2900 92 104
(BPD: Head diameter; AD: Stomach circumference)
Objectives could be:
• Prediction, construction of normal regionsfor diagnostic use (as here)
• Calculation of causal relationships forintervention use
• Scientific insight
1
First we look at a single covariate, bpd:
The statistical model for a simple linearregression was
Yi = α + βXi + εi, εi ∼ N(0, σ2) indep.
Here there is a marked deviation fromlinearity!
How does that look in model checking?
2
Model checking
Statistical model:
Yi = α + βXi + εi, εi ∼ N(0, σ2) indep.
What do we have to check here?
• linearity
• variance homogeneity
• deviations from normality(distance to the line)
Note:
• No assumption of normality for the xi!
• Independence between the Yi is checked byinspecting
– Are there several observations from thesame individual?
– Are there persons from the samefamily? Twins?
3
Model checking consists of
• graphical checks, typically with theresiduals
• perhaps formal tests
Residual:Quantity which expresses the discrepancybetween the observed and the expected(predicted, fitted) value.
There are 4 types of residuals to choose:
1. ordinary: vertical distance of observationto the line, observed - fitted value:
εi = yi − yi
2. standardized (student): ordinary,normalized with the standard deviation
3. press: observed minus predicted, but in amodel, where the current observation hasbeen excluded in the estimation process
so we would assume that the same holds for theresiduals, εi = yi − yi.
→ This is not so!
• They are not independent (sum up to 0)– not so important, if there are sufficientlymany
• They don’t all have the same variance
Var(εi) = σ2(1− hii)
where
hii =1n
+(xi − x)2
Sxx
denotes the leverage of the ith observation
5
Standardized residuals
ri =εi
s√
1− hii
, Var(ri) ≈ 1
Diagnostic residuals
Here the observations (xi, yi) are excluded oneafter another. For calculating the ith residual,the resulting fitted value (from the modelwithout (xi, yi)) is used - either in raw form(press) or in normalized form (rstudent).
Advantages and disadvantages:
• Nice to have residuals which preserve the
units/scale (type 1 and 3)
• Easiest to find outliers, if observations are
excluded one after another (type 3 and 4)
• Best to normalize, if the observations are
included and one cannot draw...
Thus, in multiple regression type 2 should be
preferred to type 1
6
Residual plots
Residuals (suitably chosen type) are plotted vs.
• the explaining variables xi
– to check linearity
• the fitted values yi
– to check variance homogeneity andnormality of the errors
• ’normal scores’, i.e. probability plotor histogram– to check normality
→ The first two plots should look disordered,i.e. unsystematic.
→ The probability plot should lie on a straightline.
7
Residual plots in ANALYST
Many of the plots can be produced viaStatistics/Regression/Linear, by clickingPlots/Residual, where, for example,Ordinary Residual vs. Predicted is chosen.
Predicted value of weight
8
Several graphs for model checking
Predicted value of weight
9
Linearity
If linearity does not hold, the model will bemisleading and uninterpretable
Ways out:
• add more covariates, e.g.
– a quadratic term: bpd2
weight=α+β1bpd+β2bpd2
Test of linearity: H0 : β2 = 0
– ad (multiple regression)
• transform variables by
– logarithms
– square root
– inverse
• non-linear regression
10
A clear deviation from linearity can be seenwith the test of the quadratic term:
New variable: cbpd2=(bpd-90)**2
Statistics/Regression/Linear, chooseweight as Dependent, bpd and cbpd2 asExplanatory
(or use Statistics/Regression/Simple
and choose Quadratic)
Dependent Variable: weight
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 34103611.113 17051805.556 108.081 0.0001
Error 104 16407889.953 157768.17262
C Total 106 50511501.065
Root MSE 397.20042 R-square 0.6752
Dep Mean 2739.09346 Adj R-sq 0.6689
C.V. 14.50116
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 2720.981236 42.94411387 63.361 0.0001
CBPD 1 117.631510 9.72368306 12.097 0.0001
CBPD2 1 2.232942 0.63718640 3.504 0.0007
11
Quadratic regression
weight = 2720 + 117.63(bpd−90) + 2.23(bpd−90)2
(’-90’: to avoid numerical instability; 90 is ’in the middle’...)
Prediction limits (chosen under Plotsin Statistics/Regression/Simple):
12
Variance homogeneity
(constant variance / constant standard deviation)
Var(εi) = σ2, i = 1, . . . , n
If there is no (rough) variance homogeneity, the
estimation will be inefficient
(we obtain an unnecessarily large variance)
Which alternatives do we have?
• constant relative standard deviation= constant coefficient of variation
CV (X) = SD(X)E(X)
– often constant, if small positivequantities, e.g. concentrations, aremeasured
– will lead to a trumpet shapedresidual plot
– way out: transform the outome (Yi) bylogarithm
• Compound experiment
– e.g., several instruments or laboratories
13
Normality assumption
Remember: Only the error terms areassumed to be normally distributed,neither the outcome nor the covariates!
Normality assumption
• is not crucial for the fit itself:the least squares method yields the ’best’estimates at any rate
• is a formal pre-requisite for the t
distribution of the test statistics, but reallyonly a normality assumption for theestimate β is needed, and this is often(approximately) given, if there aresufficiently many observations, due to :
The central limit theorem,
which states that sums or other functionsof many observations get ’more and more’normally distributed.
14
Transformations
• logarithms, square root, inverse
Why take logarithms?
• of the explaing variable
– for obtaining linearity: if there aresuccessive doublings, which have aconstant effect: Use logarithms tothe basis 2!
• of the response / outcome
– for obtaining linearity
– for obtaining variance homogeneity
Var(log(Y )) ≈ Var(Y )Y 2
i.e., a constant coefficient of variation ofY means a constant variance of log(Y ),the natural logarithm
yi = β0 +β1xi1 + · · ·+βpxip +εij , i = 1, . . . , n
Usual assumptions:
εi ∼ N(0, σ2), independent
Least squares method:
S(β0, β1, . . . , βp) =∑n
i=1(yi−β0−β1xi1−. . .−βpxip)2
→ minimize with respect to β0, . . . , βp
20
Example: Sechers data with birth weight as alinear function of both bpd and ad
Analyst: Statistics/Regression/Linear, with
weight as Dependent, bpd and ad as Explanatory
The REG Procedure
Dependent Variable: weight
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 40736854 20368427 216.72 <.0001
Error 104 9774647 93987
Corrected Total 106 50511501
Root MSE 306.57298 R-Square 0.8065
Dependent Mean 2739.09346 Adj R-Sq 0.8028
Coeff Var 11.19250
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -4628.11813 455.98980 -10.15 <.0001
bpd 1 37.13292 7.61510 4.88 <.0001
ad 1 39.76305 4.16394 9.55 <.0001
→ Strongly significant effect of both covariates.→ But: Are the model assumptions fulfilled?
21
Model checks for theuntransformed model
Predicted value of weight
22
Assessment of the model:
• Normality holds roughly, but some singlequite large positive deviations, which couldargue for a logarithmic transformation ofweight.
• Perhaps a light trumpet shape in the plotof residuals vs. predicted values, but notethat the observations are not equallydistributed over the x axis.
• Linearity does not hold well - mainly dueto the earliest born babies.
• Theoretical arguments from clinical expertssuggest a logarithmic transformation ofboth covariates
23
Logarithmic transformation of the data:
lweight=log2(weight)
lbpd=log2(bpd)
lad=log2(ad)
Statistics/Regression/Linear,
choose lweight as Dependent,
lbpd and lad as Explanatory
Dependent Variable: Lweight
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 14.95054 7.47527 314.925 0.0001
Error 104 2.46861 0.02374
C Total 106 17.41915
Root MSE 0.15407 R-square 0.8583
Dep Mean 11.36775 Adj R-sq 0.8556
C.V. 1.35530
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -8.456359 0.95456918 -8.859 0.0001
LBPD 1 1.551943 0.22944935 6.764 0.0001
LAD 1 1.466662 0.14669097 9.998 0.0001
24
Test of hypotheses
Is AD of importance, if BPD is already in themodel?
H0 : β2 = 0
Here we have β2=1.467 (SE(β2) = 0.147),and thus the t test yields
t =β2
SE(β2)= 9.998 ∼ t104, p < 0.0001
95% confidence interval for β2:
c.i. = β2 ± t97.5%,n−p−1SE(β2)
= 1.467± 1.984× 0.147 = (1.175 , 1.759)
But:
The βj are correlated – unless the explainingvariables are independent
25
Fitted values
log2(weight) = −8.46 + 1.47 log2(ad)
+1.55 log2(bpd)
i.e.
weight = 0.0028× ad1.47 × bpd1.55
→ If bpd is increased by 10%, this correspondsto multiplying the weight by
1.11.55 = 1.16
i.e. an increase by 16%, if ad is kept fixed
26
Example for calculations
For ad=113 and bpd=88, we would expect
log2(weight)
= −8.46 + 1.47× log2(113) + 1.55× log2(88)
= −8.46 + 1.47× 6.82 + 1.55× 6.46
= 11.58
→ Expected birth weight: 211.58 = 3061g
• Actually observed birth weight: 3400g
• Residual: 3400g - 3061g =339g
27
Uncertainty in prediction
Note: The log-scale results in a constantrelative uncertainty
2±1.96×0.154 = (0.81 , 1.23)
This means, that with 95% probability thebirth weight lies somewhere between 19%under and 23% over the predicted value.
(Here we have cheated a bit: We have neglected the
estimation uncertainty in the β’s themselves.)
28
Marginal vs. multiple models
Marginal models:
The response is considered with each singleexplaining variable on its own.
Multiple regression model:
The response is considered with bothexplaining variables together.
Estimates for these models (with correspondingstandard errors in parentheses) :
β0, int. β1, lbpd β2, lad s R2
-10.223 3.332(0.202) - 0.215 0.72
-3.527 - 2.237(0.111) 0.184 0.80
-8.456 1.552(0.229) 1.467(0.147) 0.154 0.86
29
Interpretation of the coefficientβ1 for lbpd
• Marginal model:Change in lweight, if the covariate lbpd ischanged by 1 unit, i.e. if bpd is doubled
• Multiple regression modelChange in lweight, if the covariate lbpd ischanged by 1 unit, but where all othercovariates (here only ad) are kept fixed
We say, that we have corrected (oradjusted) for the effects of the othercovariates in the model.
The difference between the two models canbe quite drastic, since the covariates aretypically related:
– If one of them is changed,the others are also changed
30
Goodness-of-fit Measure
R2 =SSModel
SSTotal
“How large is the proportion of variationexplained by the model?”
(here: 0.8583, i.e. 85.83%)
Problem of interpretation if the covariatesare controlled (as for the correlation coefficient)
R2 increases with the number of covariates,even if these are not important!
Adjusted R2:
R2adj = 1− MSResidual
MSTotal
(here: 0.8556)
31
Model checking
• Plots:
– residuals vs. each covariate separately(linearity)
– residuals vs. predicted values(variance homogeneity)
– probability plot(normality)
• Tests:
– generalized vs. simple models
– curvature: square term, cubic term, ...
– interaction: product term ?
• Influencing observations
– modified residuals
– Cook’s distance
32
Model checks for thelog2-transformed model
Predicted value of lweight
33
Regression diagnostics
Are the conclusions supported by the wholedata set?
Or are there observations with rather largeinfluence on the results?
Leverage = potential influence(hat-matrix, in SAS called Hat Diag or H)If there is only one covariate, this is simply:
hii =1n
+(xi − x)2
Sxx
Observations with extreme x values can have alarge influence on the results,
34
... but not necessarily!
→ no problem if they lie ’nice’ in relation tothe regression line, i.e. if they have a smallresidual
→ For example:
0 1 2 3 4 5 6
02
46
810
x
y
35
Influencing observations
→ are those, which have a combination of
• high leverage
• large residual
36
Regression diagnostics
• Leave out the ith person and find newestimates, β
(i)0 , β
(i)1 and β
(i)2
• Calculate Cook’s distance, a compoundmeasure for the change in the parameterestimates
• Split Cook’s distance into its coordinatesand state:By how many SE’s does β1 (for example)change, if the ith person is left out?
What to do with influencing observations?
• leave them out?
• quote a measure of their influence?
37
Diagnostics: Cook’s distance
38
Outliers
Observations, which don’t fit in the relationship
• not necessarily influencing
• not necessarily with a large residual
What to do with outliers?
• look closer at them,they are often quite interesting
When can we leave them out?
• if they lie far outside, i.e. have a highleverage
– keep in mind to distinguish thecorresponding conclusions!
• if one can find a reason
– and then all these would be left out!
39
Model checking and Diagnosticsin Analyst
Many graphics can be produced directly in theregression setting in Analyst, underPlots/Residual or Plots/Diagnostics.
If further plots are wanted (e.g. a plot of Cook’sdistance), one should create a new data set:
In the regression setting, go into
• Save Data
• tick Create and save diagnostics data
• choose (click Add) the quantities to besaved (typically Predicted, Residual,Student, Rstudent, Cookd, Press).
• Double-click at Diagnostics Table in theproject tree
• Save this by clickingFile/Save as By SAS Name
• Open it for further use, byFile/Open By SAS Name
40
Example (DGA p.338)
41
Which explaining variables have a marginaleffect on the response PEmax?
Are these (Age, Height, Weight, FEV1, FRC)the variables which should be included in themultiple regression model?
42
Correlations
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 25
AGE SEX HEIGHT WEIGHT BMP
AGE 1.00000 -0.16712 0.92605 0.90587 0.37776
0.0 0.4246 0.0001 0.0001 0.0626
SEX -0.16712 1.00000 -0.16755 -0.19044 -0.13756
0.4246 0.0 0.4234 0.3619 0.5120
HEIGHT 0.92605 -0.16755 1.00000 0.92070 0.44076
0.0001 0.4234 0.0 0.0001 0.0274
WEIGHT 0.90587 -0.19044 0.92070 1.00000 0.67255
0.0001 0.3619 0.0001 0.0 0.0002
BMP 0.37776 -0.13756 0.44076 0.67255 1.00000
0.0626 0.5120 0.0274 0.0002 0.0
FEV1 0.29449 -0.52826 0.31666 0.44884 0.54552
0.1530 0.0066 0.1230 0.0244 0.0048
RV -0.55194 0.27135 -0.56952 -0.62151 -0.58237
0.0042 0.1895 0.0030 0.0009 0.0023
FRC -0.63936 0.18361 -0.62428 -0.61726 -0.43439
0.0006 0.3797 0.0009 0.0010 0.0300
TLC -0.46937 0.02423 -0.45708 -0.41847 -0.36490
0.0179 0.9085 0.0216 0.0374 0.0729
PEMAX 0.61347 -0.28857 0.59922 0.63522 0.22951
0.0011 0.1618 0.0015 0.0006 0.2698
43
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 25
FEV1 RV FRC TLC PEMAX
AGE 0.29449 -0.55194 -0.63936 -0.46937 0.61347
0.1530 0.0042 0.0006 0.0179 0.0011
SEX -0.52826 0.27135 0.18361 0.02423 -0.28857
0.0066 0.1895 0.3797 0.9085 0.1618
HEIGHT 0.31666 -0.56952 -0.62428 -0.45708 0.59922
0.1230 0.0030 0.0009 0.0216 0.0015
WEIGHT 0.44884 -0.62151 -0.61726 -0.41847 0.63522
0.0244 0.0009 0.0010 0.0374 0.0006
BMP 0.54552 -0.58237 -0.43439 -0.36490 0.22951
0.0048 0.0023 0.0300 0.0729 0.2698
FEV1 1.00000 -0.66586 -0.66511 -0.44299 0.45338
0.0 0.0003 0.0003 0.0266 0.0228
RV -0.66586 1.00000 0.91060 0.58914 -0.31555
0.0003 0.0 0.0001 0.0019 0.1244
FRC -0.66511 0.91060 1.00000 0.70440 -0.41721
0.0003 0.0001 0.0 0.0001 0.0380
TLC -0.44299 0.58914 0.70440 1.00000 -0.18162
0.0266 0.0019 0.0001 0.0 0.3849
PEMAX 0.45338 -0.31555 -0.41721 -0.18162 1.00000
0.0228 0.1244 0.0380 0.3849 0.0
Note in particular the correlations between age,height and weight!
44
Model selection
(chosen in Model under Regression/Linear):
• Forward selectionInclude each time the most significant→ Final model: WEIGHT BMP FEV1
• Backward eliminationStart with all covariates, then drop eachtime the least significant→ Final model: WEIGHT BMP FEV1
This looks quite stable!?
But:
If WEIGHT would have been logarithmictransformed from the start?
→ Then we would have obtained the finalmodel: AGE FEV1
Rule of thumb:
There should be at least 10 times as manyobservations as parameters in the model.