Multiple Regression SPH 247 Statistical Analysis of Laboratory Data 1 April 23, 2010 SPH 247 Statistical Analysis of Laboratory Data
Dec 20, 2015
SPH 247 Statistical Analysis of Laboratory Data 1
Multiple RegressionSPH 247
Statistical Analysis of Laboratory Data
April 23, 2010
SPH 247 Statistical Analysis of Laboratory Data 2
Cystic Fibrosis DataCystic fibrosis lung function data
lung function data for cystic fibrosis patients (7-23 years old)
age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual
capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory
pressure.April 23, 2010
SPH 247 Statistical Analysis of Laboratory Data 3April 23, 2010
cf <- read.csv("cystfibr.csv")pairs(cf)attach(cf)cf.lm <- lm(pemax ~ age+sex+height+weight+bmp+fev1+rv+frc+tlc)print(summary(cf.lm))print(anova(cf.lm))print(drop1(cf.lm,test="F"))plot(cf.lm)step(cf.lm)detach(cf)
SPH 247 Statistical Analysis of Laboratory Data 4April 23, 2010
SPH 247 Statistical Analysis of Laboratory Data 5April 23, 2010
> source("cystfibr.r")> cf.lm <- lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)> print(summary(cf.lm))…
Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 176.0582 225.8912 0.779 0.448age -2.5420 4.8017 -0.529 0.604sex -3.7368 15.4598 -0.242 0.812height -0.4463 0.9034 -0.494 0.628weight 2.9928 2.0080 1.490 0.157bmp -1.7449 1.1552 -1.510 0.152fev1 1.0807 1.0809 1.000 0.333rv 0.1970 0.1962 1.004 0.331frc -0.3084 0.4924 -0.626 0.540tlc 0.1886 0.4997 0.377 0.711
Residual standard error: 25.47 on 15 degrees of freedomMultiple R-Squared: 0.6373, Adjusted R-squared: 0.4197 F-statistic: 2.929 on 9 and 15 DF, p-value: 0.03195
SPH 247 Statistical Analysis of Laboratory Data 6April 23, 2010
> print(anova(cf.lm))Analysis of Variance Table
Response: pemax Df Sum Sq Mean Sq F value Pr(>F) age 1 10098.5 10098.5 15.5661 0.001296 **sex 1 955.4 955.4 1.4727 0.243680 height 1 155.0 155.0 0.2389 0.632089 weight 1 632.3 632.3 0.9747 0.339170 bmp 1 2862.2 2862.2 4.4119 0.053010 . fev1 1 1549.1 1549.1 2.3878 0.143120 rv 1 561.9 561.9 0.8662 0.366757 frc 1 194.6 194.6 0.2999 0.592007 tlc 1 92.4 92.4 0.1424 0.711160 Residuals 15 9731.2 648.7 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Performs sequential ANOVA
SPH 247 Statistical Analysis of Laboratory Data 7April 23, 2010
> print(drop1(cf.lm, test = "F"))
Single term deletions
Model:pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC F value Pr(F)<none> 9731.2 169.1 age 1 181.8 9913.1 167.6 0.2803 0.6043sex 1 37.9 9769.2 167.2 0.0584 0.8123height 1 158.3 9889.6 167.5 0.2440 0.6285weight 1 1441.2 11172.5 170.6 2.2215 0.1568bmp 1 1480.1 11211.4 170.6 2.2815 0.1517fev1 1 648.4 10379.7 168.7 0.9995 0.3333rv 1 653.8 10385.0 168.7 1.0077 0.3314frc 1 254.6 9985.8 167.8 0.3924 0.5405tlc 1 92.4 9823.7 167.3 0.1424 0.7112
Performs Type III ANOVA
SPH 247 Statistical Analysis of Laboratory Data 8April 23, 2010
80 100 120 140 160
-40
-20
02
04
0
Fitted values
Re
sid
ua
ls
lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)
Residuals vs Fitted
21
24
16
SPH 247 Statistical Analysis of Laboratory Data 9April 23, 2010
-2 -1 0 1 2
-10
12
Theoretical Quantiles
Sta
nd
ard
ize
d r
esi
du
als
lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)
Normal Q-Q
24 14
16
SPH 247 Statistical Analysis of Laboratory Data 10April 23, 2010
80 100 120 140 160
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Fitted values
Sta
nda
rdiz
ed
re
sid
uals
lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)
Scale-Location
241416
SPH 247 Statistical Analysis of Laboratory Data 11April 23, 2010
0.0 0.1 0.2 0.3 0.4 0.5 0.6
-2-1
01
2
Leverage
Sta
nd
ard
ize
d r
esi
du
als
lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc)
Cook's distance
0.5
0.5
Residuals vs Leverage
1424
16
SPH 247 Statistical Analysis of Laboratory Data 12April 23, 2010
> step(cf.lm)Start: AIC=169.11pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc
Df Sum of Sq RSS AIC- sex 1 37.9 9769.2 167.2- tlc 1 92.4 9823.7 167.3- height 1 158.3 9889.6 167.5- age 1 181.8 9913.1 167.6- frc 1 254.6 9985.8 167.8- fev1 1 648.4 10379.7 168.7- rv 1 653.8 10385.0 168.7<none> 9731.2 169.1- weight 1 1441.2 11172.5 170.6- bmp 1 1480.1 11211.4 170.6
Step: AIC=167.2pemax ~ age + height + weight + bmp + fev1 + rv + frc + tlc
……………
SPH 247 Statistical Analysis of Laboratory Data 13April 23, 2010
Step: AIC=160.66pemax ~ weight + bmp + fev1 + rv
Df Sum of Sq RSS AIC<none> 10354.6 160.7- rv 1 1183.6 11538.2 161.4- bmp 1 3072.6 13427.2 165.2- fev1 1 3717.1 14071.7 166.3- weight 1 10930.2 21284.8 176.7
Call:lm(formula = pemax ~ weight + bmp + fev1 + rv)
Coefficients:(Intercept) weight bmp fev1 rv 63.9467 1.7489 -1.3772 1.5477 0.1257
SPH 247 Statistical Analysis of Laboratory Data 14April 23, 2010
> cf.lm2 <- lm(pemax ~ rv+bmp+fev1+weight)> summary(cf.lm2)
Call:lm(formula = pemax ~ rv + bmp + fev1 + weight)
Residuals: Min 1Q Median 3Q Max -39.77 -11.74 4.33 15.66 35.07
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 63.94669 53.27673 1.200 0.244057 rv 0.12572 0.08315 1.512 0.146178 bmp -1.37724 0.56534 -2.436 0.024322 * fev1 1.54770 0.57761 2.679 0.014410 * weight 1.74891 0.38063 4.595 0.000175 ***---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 22.75 on 20 degrees of freedomMultiple R-Squared: 0.6141, Adjusted R-squared: 0.5369 F-statistic: 7.957 on 4 and 20 DF, p-value: 0.000523
SPH 247 Statistical Analysis of Laboratory Data 15
Cautionary NotesThe significance levels are not necessarily
believable after variable selectionThe original full model F-statistic is
significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320
After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased.
April 23, 2010
SPH 247 Statistical Analysis of Laboratory Data 16April 23, 2010
set obs 25generate x1 = invnormal(uniform())generate x2 = invnormal(uniform())generate x3 = invnormal(uniform())generate x4 = invnormal(uniform())generate x5 = invnormal(uniform())generate x6 = invnormal(uniform())generate x7 = invnormal(uniform())generate x8 = invnormal(uniform())generate x9 = invnormal(uniform())generate y = invnormal(uniform())regress y x1 x2 x3 x4 x5 x6 x7 x8 x9stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
SPH 247 Statistical Analysis of Laboratory Data 17April 23, 2010
. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538-------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113------------------------------------------------------------------------------
SPH 247 Statistical Analysis of Laboratory Data 18April 23, 2010
. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full modelp = 0.9867 >= 0.1000 removing x4p = 0.9545 >= 0.1000 removing x6p = 0.8456 >= 0.1000 removing x1p = 0.8165 >= 0.1000 removing x7p = 0.7506 >= 0.1000 removing x8p = 0.5023 >= 0.1000 removing x3p = 0.2866 >= 0.1000 removing x5p = 0.2081 >= 0.1000 removing x9
Source | SS df MS Number of obs = 25-------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392-------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346------------------------------------------------------------------------------