Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4 0 0 1 46 5 0 0 1 38 6 0 0 1 47 7 0 0 0 21 8 0 0 0 12 9 0 0 0 14 10 0 0 0 17 11 0 0 0 13 12 0 0 0 17 13 0 1 0 37 14 0 1 0 32 15 0 1 0 15 16 0 1 0 25 17 0 1 0 39 18 0 1 0 41 19 1 0 0 16 20 1 0 0 11 21 1 0 0 20 22 1 0 0 21 23 1 0 0 14 24 1 0 0 7 Descriptive Statistics: Insects Variable Colour N N* Mean Insects B 6 0 14.83 G 6 0 31.50 L 6 0 47.17 W 6 0 15.67 The regression equation is Insects trapped = 15.7 - 0.83 Blue + 15.8 Green + 31.5 Lemon Predictor Coef StDev T P Constant 15.667 2.770 5.66 0.000 Blue -0.833 3.917 -0.21 0.834 Green 15.833 3.917 4.04 0.001 Lemon 31.500 3.917 8.04 0.000 S = 6.784 R-Sq = 82.1% R-Sq(adj) = 79.4% 1
19
Embed
Model Building Chap 5 p251fisher.utstat.toronto.edu/~mahinda/stab27/wk5b27wb.pdf · 2008-02-24 · Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Model Building Chap 5 p251
Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4 0 0 1 46 5 0 0 1 38 6 0 0 1 47 7 0 0 0 21 8 0 0 0 12 9 0 0 0 14 10 0 0 0 17 11 0 0 0 13 12 0 0 0 17 13 0 1 0 37 14 0 1 0 32 15 0 1 0 15 16 0 1 0 25 17 0 1 0 39 18 0 1 0 41 19 1 0 0 16 20 1 0 0 11 21 1 0 0 20 22 1 0 0 21 23 1 0 0 14 24 1 0 0 7 Descriptive Statistics: Insects Variable Colour N N* Mean Insects B 6 0 14.83 G 6 0 31.50 L 6 0 47.17 W 6 0 15.67 The regression equation is Insects trapped = 15.7 - 0.83 Blue + 15.8 Green + 31.5 Lemon Predictor Coef StDev T P Constant 15.667 2.770 5.66 0.000 Blue -0.833 3.917 -0.21 0.834 Green 15.833 3.917 4.04 0.001 Lemon 31.500 3.917 8.04 0.000 S = 6.784 R-Sq = 82.1% R-Sq(adj) = 79.4%
1
Analysis of Variance Source DF SS MS F P Regression 3 4218.5 1406.2 30.55 0.000 Residual Error 20 920.5 46.0 Estimate the betas using the means (Descriptive statistics) State whether the following statements are true or false.
a) The value of the F-statistic for testing any differences among the colours is 30.55.
b) We have evidence at p < 0.01 that the means for green and white are different.
c) We have evidence at p < 0.01 that means for blue and white are different. d) A 95% confidence interval for the difference between means for lemon yellow
and white is (23.3, 39.7)
e) We may say that 82.1% of the variation in the number of insects trapped has
The regression equation is perform = 68.7 + 11.3 F2 - 21.7 F3 - 32.7 B2 - 0.83 F2B2 + 47.2 F3B2 Predictor Coef StDev T P Constant 68.667 1.939 35.42 0.000 F2 11.333 3.066 3.70 0.010 F3 -21.667 3.066 -7.07 0.000 B2 -32.667 3.878 -8.42 0.000 F2B2 -0.833 5.130 -0.16 0.876 F3B2 47.167 5.130 9.19 0.000 S = 3.358 R-Sq = 97.1% R-Sq(adj) = 94.8% Analysis of Variance Source DF SS MS F P Regression 5 2303.00 460.60 40.84 0.000 Residual Error 6 67.67 11.28 Total 11 2370.67 Source DF Seq SS F2 1 92.04 F3 1 78.13 B2 1 688.09 F2B2 1 491.30 F3B2 1 953.44
4
B1B2
F1 F2 F3
40
50
60
70
80
F
BM
ean
Interaction Plot - Data Means for perform
Estimate the regression equation using the means (descriptive statistics) Test whether there is an interaction between brand and fuel type.
5
Variable Screening methods, Chap 6 p321 Stepwise regression p323 A hospital Surgical unit was interested in predicting the survival times of patients undergoing a particular type of liver operation. A random sample of patients was available for analysis. From each patient record, the following info was extracted from the preoperation evaluation: X1 = blood clotting score X2 = prognostic index X3 = enzyme function test score X4 = liver function test score X5 = age in years X6 = indicator variable for gender (0 = M, 1 = F) X7 and X8 = indicator variables for history of alcohol use (categorical: none, moderate, severe) X7 = indicator of moderate X8 = indicator of severe Data Display Row X1 X2 X3 X4 X5 X6 X7 X8 Y lnY 1 6.7 62 81 2.59 50 0 1 0 695 6.544 2 5.1 59 66 1.70 39 0 0 0 403 5.999 3 7.4 57 83 2.16 55 0 0 0 710 6.565 4 6.5 73 41 2.01 48 0 0 0 349 5.854 5 7.8 65 115 4.30 45 0 0 1 2343 7.759 6 5.8 38 72 1.42 65 1 1 0 348 5.852 7 5.7 46 63 1.91 49 1 0 1 518 6.25
All possible Regressions Selection Procedure (6.3) p327
R-sq Criterion:
2 1SSR SSERSST SST
= = − Response is lnY Adj. X X X X X X X X Vars R-Sq R-Sq C-p s 1 2 3 4 5 6 7 8 1 42.8 41.7 117.4 0.37549 X 1 42.2 41.0 119.2 0.37746 X 1 22.1 20.6 177.9 0.43807 X 1 13.9 12.2 201.8 0.46052 X 1 6.1 4.3 224.7 0.48101 X 2 66.3 65.0 50.5 0.29079 X X 2 59.9 58.4 69.1 0.31715 X X 2 54.9 53.1 84.0 0.33668 X X 2 51.6 49.7 93.4 0.34850 X X 2 50.8 48.9 95.9 0.35157 X X 3 77.8 76.5 18.9 0.23845 X X X 3 75.7 74.3 25.0 0.24934 X X X 3 71.8 70.1 36.5 0.26885 X X X 3 68.1 66.2 47.3 0.28587 X X X 3 67.6 65.7 48.7 0.28802 X X X 4 83.0 81.6 5.8 0.21087 X X X X 4 81.4 79.9 10.3 0.22023 X X X X 4 78.9 77.2 17.8 0.23498 X X X X 4 78.4 76.6 19.3 0.23785 X X X X 4 78.0 76.2 20.4 0.23982 X X X X 5 83.7 82.1 5.5 0.20827 X X X X X 5 83.6 81.9 6.0 0.20931 X X X X X 5 83.3 81.6 6.8 0.21100 X X X X X 5 83.2 81.4 7.2 0.21193 X X X X X 5 81.8 79.9 11.3 0.22044 X X X X X 6 84.3 82.3 5.8 0.20655 X X X X X X 6 83.9 81.9 7.0 0.20934 X X X X X X 6 83.9 81.8 7.2 0.20964 X X X X X X 6 83.8 81.8 7.2 0.20982 X X X X X X 6 83.7 81.6 7.6 0.21066 X X X X X X 7 84.6 82.3 7.0 0.20705 X X X X X X X 7 84.4 82.0 7.7 0.20867 X X X X X X X 7 84.0 81.6 8.7 0.21081 X X X X X X X 7 84.0 81.5 8.9 0.21136 X X X X X X X 7 82.1 79.4 14.3 0.22306 X X X X X X X 8 84.6 81.9 9.0 0.20927 X X X X X X X X
14
Best Subsets Regression Response is lnY Adj. X X X X X X X X Vars R-Sq R-Sq C-p s 1 2 3 4 5 6 7 8 1 42.8 41.7 117.4 0.37549 X 2 66.3 65.0 50.5 0.29079 X X 3 77.8 76.5 18.9 0.23845 X X X 4 83.0 81.6 5.8 0.21087 X X X X 5 83.7 82.1 5.5 0.20827 X X X X X 6 84.3 82.3 5.8 0.20655 X X X X X X 7 84.6 82.3 7.0 0.20705 X X X X X X X 8 84.6 81.9 9.0 0.20927 X X X X X X X X
1 2 3 4 5 6 7 8
40
45
50
55
60
65
70
75
80
85
vars
R-sq
15
Ex: Response is crimes p b o h d p t p s e o o 1 p g g v u t 8 o r r e n p - p a e r e Adj. o 3 6 d e t m Vars R-Sq R-Sq C-p s p 4 5 s s y p 1 75.4 75.3 23.6 39995 X 2 78.3 78.1 -0.2 37660 X X 3 78.4 78.1 1.0 37671 X X X 4 78.5 78.0 2.6 37732 X X X X 5 78.5 78.0 4.1 37784 X X X X X 6 78.5 77.9 6.1 37875 X X X X X X 7 78.5 77.8 8.0 37968 X X X X X X X
7654321
78.5
77.5
76.5
75.5
Vars
R-sq
16
Other Criteria R-sq (Adj)
2) 2 1
/( 1)MSERAdj SST n
= −−
3) Cp criterion p328
2( 1)
SSEpC ppMSEk
= + + n−
Cp criterion selects as the best model, the subset model with
1) a small value of Cp
2) value of Cp near p + 1 (p is the number of predictors)