Chapter 15 Multiple Regression - Salisbury Universityfacultyfp.salisbury.edu/fxsalimian/Info281/cs/SM SBE13E Chapter 15.pdf · Chapter 15 Multiple Regression Learning Objectives 1.

15 - 1 © 2017 Cengage Learning. All Rights Reserved.

May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 15 Multiple Regression Learning Objectives 1. Understand how multiple regression analysis can be used to develop relationships involving one

dependent variable and several independent variables. 2. Be able to interpret the coefficients in a multiple regression analysis. 3. Know the assumptions necessary to conduct statistical tests involving the hypothesized regression

model. 4. Understand the role of computer packages in performing multiple regression analysis. 5. Be able to interpret and use computer output to develop the estimated regression equation. 6. Be able to determine how good a fit is provided by the estimated regression equation. 7. Be able to test for the significance of the regression equation. 8. Understand how multicollinearity affects multiple regression analysis. 9. Know how residual analysis can be used to make a judgement as to the appropriateness of the model,

identify outliers, and determine which observations are influential. 10. Understand how logistic regression is used for regression analyses involving a binary dependent

variable.

Chapter 15



Solutions: 1. a. b1 = .5906 is an estimate of the change in y corresponding to a 1 unit change in x1 when x2 is held

constant. b2 = .4980 is an estimate of the change in y corresponding to a 1 unit change in x2 when x1 is held

constant. b. y = 29.1270 + .5906(180) + .4980(310) = 289.82

2. a. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 10021.2 10021.2 15.53 0.004 X1 1 10021.2 10021.2 15.53 0.004 Error 8 5161.7 645.2 Lack-of-Fit 7 5157.2 736.7 163.72 0.060 Pure Error 1 4.5 4.5 Total 9 15182.9 Model Summary S R-sq R-sq(adj) R-sq(pred) 25.4009 66.00% 61.75% 49.59% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 45.1 25.4 1.77 0.114 X1 1.944 0.493 3.94 0.004 1.00 Regression Equation Y = 45.1 + 1.944 X1

The estimated regression equation is y = 45.1 + 1.944x1

An estimate of y when x1 = 45 is y = 45.1 + 1.944(45) = 132.58

b. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 3363.4 3363.4 2.28 0.170 X2 1 3363.4 3363.4 2.28 0.170 Error 8 11819.5 1477.4 Lack-of-Fit 6 11010.5 1835.1 4.54 0.192 Pure Error 2 809.0 404.5 Total 9 15182.9

Multiple Regression



Model Summary S R-sq R-sq(adj) R-sq(pred) 38.4374 22.15% 12.42% 0.00% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 85.2 38.4 2.22 0.057 X2 4.32 2.86 1.51 0.170 1.00 Regression Equation Y = 85.2 + 4.32 X2

The estimated regression equation is y = 85.2 + 4.32x2

An estimate of y when x2 = 15 is y = 85.2 + 4.32(15) = 150

c. Partial Minitab output is shown below:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 14052 7026.1 43.50 0.000 X1 1 10689 10688.7 66.17 0.000 X2 1 4031 4030.9 24.95 0.002 Error 7 1131 161.5 Total 9 15183 Model Summary S R-sq R-sq(adj) R-sq(pred) 12.7096 92.55% 90.42% 87.95% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -18.4 18.0 -1.02 0.341 X1 2.010 0.247 8.13 0.000 1.00 X2 4.738 0.948 5.00 0.002 1.00 Regression Equation Y = -18.4 + 2.010 X1 + 4.738 X2

The estimated regression equation is y = -18.4 + 2.01x1 + 4.738x2

Chapter 15



An estimate of y when x1 = 45 and x2 = 15 is y = -18.4 + 2.01(45) + 4.738(15) = 143.12

3. a. b1 = 3.8 is an estimate of the change in y corresponding to a 1 unit change in x1 when x2, x3, and x4

are held constant. b2 = -2.3 is an estimate of the change in y corresponding to a 1 unit change in x2 when x1, x3, and x4

are held constant. b3 = 7.6 is an estimate of the change in y corresponding to a 1 unit change in x3 when x1, x2, and x4

are held constant. b4 = 2.7 is an estimate of the change in y corresponding to a 1 unit change in x4 when x1, x2, and x3

are held constant. b. y = 17.6 + 3.8(10) – 2.3(5) + 7.6(1) + 2.7(2) = 57.1

4. a. y = 25 + 10(15) + 8(10) = 255; sales estimate: $255,000

b. Sales can be expected to increase by $10 for every dollar increase in inventory investment when

advertising expenditure is held constant. Sales can be expected to increase by $8 for every dollar increase in advertising expenditure when inventory investment is held constant.


Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 16.640 16.640 11.27 0.015 Televison_Advertising_($1000s) 1 16.640 16.640 11.27 0.015 Error 6 8.860 1.477 Lack-of-Fit 4 6.360 1.590 1.27 0.485 Pure Error 2 2.500 1.250 Total 7 25.500 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.21518 65.26% 59.46% 28.39% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 88.64 1.58 56.02 0.000 Televison_Advertising_($1000s) 1.604 0.478 3.36 0.015 1.00 Regression Equation Weekly Gross_Revenue_($1000s) = 88.64 + 1.604 Televison_Advertising_($1000s)

Multiple Regression




Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 23.435 11.7177 28.38 0.002 Televison_Advertising_($1000s) 1 23.425 23.4247 56.73 0.001 Newspaper_Advertising_($1000s) 1 6.795 6.7953 16.46 0.010 Error 5 2.065 0.4129 Total 7 25.500 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.642587 91.90% 88.66% 68.19% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 83.23 1.57 52.88 0.000 Televison_Advertising_($1000s) 2.290 0.304 7.53 0.001 1.45 Newspaper_Advertising_($1000s) 1.301 0.321 4.06 0.010 1.45 Regression Equation Weekly Gross_Revenue_($1000s) = 83.23 + 2.290 Televison_Advertising_($1000s) + 1.301 Newspaper_Advertising_($1000s)

c. No, it is 1.60 in part (a) and 2.29 above. In part (b) it represents the marginal change in revenue due

to an increase in television advertising with newspaper advertising held constant. d. Revenue = 83.2 + 2.290(3.5) + 1.301(1.8) = $93.5568 or $93,566.80 6. a. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 4814.3 4814.3 19.11 0.001 Yds/Att 1 4814.3 4814.3 19.11 0.001 Error 14 3527.4 252.0 Lack-of-Fit 13 3037.6 233.7 0.48 0.829 Pure Error 1 489.8 489.8 Total 15 8341.7 Model Summary S R-sq R-sq(adj) R-sq(pred) 15.8732 57.71% 54.69% 44.88% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -58.8 26.2 -2.25 0.041 Yds/Att 16.39 3.75 4.37 0.001 1.00

Chapter 15



Regression Equation Win% = -58.8 + 16.39 Yds/Att Fits and Diagnostics for Unusual Observations Std Obs Win% Fit Resid Resid 14 81.30 47.77 33.53 2.19 R R Large residual


Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 3653 3652.8 10.91 0.005 Int/Att 1 3653 3652.8 10.91 0.005 Error 14 4689 334.9 Lack-of-Fit 11 3536 321.4 0.84 0.644 Pure Error 3 1153 384.4 Total 15 8342 Model Summary S R-sq R-sq(adj) R-sq(pred) 18.3008 43.79% 39.77% 26.48% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 97.5 13.9 7.04 0.000 Int/Att -1600 485 -3.30 0.005 1.00 Regression Equation Win% = 97.5 - 1600 Int/Att Fits and Diagnostics for Unusual Observations Obs Win% Fit Resid Std Resid 8 12.50 55.93 -43.43 -2.45 R R Large residual

c. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 6277 3138.5 19.76 0.000 Yds/Att 1 2624 2624.2 16.52 0.001 Int/Att 1 1463 1462.8 9.21 0.010 Error 13 2065 158.8 Total 15 8342

Multiple Regression



Model Summary S R-sq R-sq(adj) R-sq(pred) 12.6024 75.25% 71.44% 60.51% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -5.8 27.1 -0.21 0.835 Yds/Att 12.95 3.19 4.06 0.001 1.15 Int/Att -1084 357 -3.03 0.010 1.15 Regression Equation Win% = -5.8 + 12.95 Yds/Att - 1084 Int/Att Fits and Diagnostics for Unusual Observations Obs Win% Fit Resid Std Resid 8 12.50 38.57 -26.07 -2.28 R R Large residual

d. The predicted value of Win% for the Kansas City Chiefs is Win% = - 5.8 + 12.95(6.2) – 1084(.036) = 35.47% With 7 wins and 9 loses, the Kansas City Chiefs won 43.75% of the games they played. The

predicted value is somewhat lower than the actual value. 7. a. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 66.34 66.343 9.87 0.014 Contrast Ratio 1 66.34 66.343 9.87 0.014 Error 8 53.76 6.720 Lack-of-Fit 7 41.26 5.894 0.47 0.811 Pure Error 1 12.50 12.500 Total 9 120.10 Model Summary S R-sq R-sq(adj) R-sq(pred) 2.59221 55.24% 49.65% 37.16% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 69.89 3.85 18.17 0.000 Contrast Ratio 0.1699 0.0541 3.14 0.014 1.00 Regression Equation Overall Rating = 69.89 + 0.1699 Contrast Ratio

Chapter 15




Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 100.51 50.255 17.96 0.002 Contrast Ratio 1 24.32 24.323 8.69 0.021 Resolution 1 34.17 34.168 12.21 0.010 Error 7 19.59 2.798 Total 9 120.10 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.67285 83.69% 79.03% 64.56% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 43.10 8.06 5.35 0.001 Contrast Ratio 0.1134 0.0385 2.95 0.021 1.21 Resolution 0.382 0.109 3.49 0.010 1.21 Regression Equation Overall Rating = 43.10 + 0.1134 Contrast Ratio + 0.382 Resolution

c. y = 43.10 + .1134(85) + .382(74) = 81.007 or 81

8. a. Partial Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 60.202 60.2022 17.21 0.001 Shore Excursions 1 60.202 60.2022 17.21 0.001 Error 18 62.963 3.4980 Lack-of-Fit 17 62.463 3.6743 7.35 0.283 Pure Error 1 0.500 0.5000 Total 19 123.166 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.87028 48.88% 46.04% 39.19% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 69.30 4.80 14.44 0.000 Shore Excursions 0.2348 0.0566 4.15 0.001 1.00 Regression Equation Overall = 69.30 + 0.2348 Shore Excursions

Multiple Regression



b. Partial Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 90.95 45.477 24.00 0.000 Shore Excursions 1 69.05 69.053 36.44 0.000 Food/Dining 1 30.75 30.752 16.23 0.001 Error 17 32.21 1.895 Total 19 123.17 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.37650 73.85% 70.77% 62.81% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 45.18 6.95 6.50 0.000 Shore Excursions 0.2529 0.0419 6.04 0.000 1.01 Food/Dining 0.2482 0.0616 4.03 0.001 1.01 Regression Equation Overall = 45.18 + 0.2529 Shore Excursions + 0.2482 Food/Dining

c. ˆ 45.18 .2529(Shore Excursions) .2482(Food/Dining) 45.18 .2529(80) .2482(90) = 87.75y

Thus, an estimate of the overall score is approximately 88. 9. a. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 6241.07 6241.07 455.79 0.000 Club Head Speed 1 6241.07 6241.07 455.79 0.000 Error 188 2574.25 13.69 Lack-of-Fit 177 2521.50 14.25 2.97 0.024 Pure Error 11 52.75 4.80 Total 189 8815.32 Model Summary S R-sq R-sq(adj) R-sq(pred) 3.70038 70.80% 70.64% 70.04% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 124.72 7.41 16.83 0.000 Club Head Speed 1.3943 0.0653 21.35 0.000 1.00

Chapter 15



Regression Equation Total Distance = 124.72 + 1.3943 Club Head Speed


Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 6591.4 6591.42 557.21 0.000 Ball Speed 1 6591.4 6591.42 557.21 0.000 Error 188 2223.9 11.83 Lack-of-Fit 180 2106.8 11.70 0.80 0.728 Pure Error 8 117.1 14.64 Total 189 8815.3 Model Summary S R-sq R-sq(adj) R-sq(pred) 3.43937 74.77% 74.64% 74.13% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 117.14 7.02 16.68 0.000 Ball Speed 0.9876 0.0418 23.61 0.000 1.00 Regression Equation Total Distance = 117.14 + 0.9876 Ball Speed

c. The following scatter diagram illustrates the relationship between the two variables.

The scatter diagram shows a very strong linear relationship between the two variables. In fact, for

these data the coefficient of determination is approximately .99. As a result using both variables in the same model is not recommended because once the linear effect of one variable is accounted for the other variable will be of little additional value. This situation, referred to as multicollinearity, is discussed later in the chapter in the section on testing for significance.

150.00

155.00

160.00

165.00

170.00

175.00

180.00

185.00

190.00

100.00 105.00 110.00 115.00 120.00 125.00 130.00

Ball Speed

Club Head Speed

Multiple Regression



d. Partial Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 7297.8 3648.89 449.64 0.000 Ball Speed 1 7297.7 7297.74 899.27 0.000 Launch Angle 1 706.4 706.36 87.04 0.000 Error 187 1517.5 8.12 Total 189 8815.3 Model Summary S R-sq R-sq(adj) R-sq(pred) 2.84872 82.79% 82.60% 82.16% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 81.60 6.95 11.74 0.000 Ball Speed 1.0927 0.0364 29.99 0.000 1.11 Launch Angle 1.646 0.176 9.33 0.000 1.11 Regression Equation Total Distance = 81.60 + 1.0927 Ball Speed + 1.646 Launch Angle

e. y = predicted Total Distance = 81.6 + 1.0927(170) + 1.646(11) = 285.465 yards

10. a. Partial Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 0.047263 0.047263 13.01 0.002 SO/IP 1 0.047263 0.047263 13.01 0.002 Error 18 0.065392 0.003633 Lack-of-Fit 16 0.063942 0.003996 5.51 0.164 Pure Error 2 0.001450 0.000725 Total 19 0.112655 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0602733 41.95% 38.73% 29.38% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 0.6758 0.0631 10.71 0.000 SO/IP -0.2838 0.0787 -3.61 0.002 1.00 Regression Equation R/IP = 0.6758 - 0.2838 SO/IP

Chapter 15



b. Partial Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 0.02887 0.028874 6.20 0.023 HR/IP 1 0.02887 0.028874 6.20 0.023 Error 18 0.08378 0.004655 Lack-of-Fit 6 0.01258 0.002097 0.35 0.894 Pure Error 12 0.07120 0.005933 Total 19 0.11265 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0682239 25.63% 21.50% 14.87% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 0.3081 0.0604 5.10 0.000 HR/IP 1.347 0.541 2.49 0.023 1.00 Regression Equation R/IP = 0.3081 + 1.347 HR/IP

c. Partial Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 0.06348 0.031738 10.97 0.001 SO/IP 1 0.03460 0.034603 11.96 0.003 HR/IP 1 0.01621 0.016214 5.60 0.030 Error 17 0.04918 0.002893 Total 19 0.11266 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0537850 56.35% 51.21% 41.25% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 0.5365 0.0814 6.59 0.000 SO/IP -0.2483 0.0718 -3.46 0.003 1.05 HR/IP 1.032 0.436 2.37 0.030 1.05 Regression Equation R/IP = 0.5365 - 0.2483 SO/IP + 1.032 HR/IP

Multiple Regression



d. Using the estimated regression equation in part (c) we obtain R/IP = 0.5365 - 0.2483 SO/IP + 1.032 HR/IP = 0.5365 - 0.2483(.91) + 1.032(.16) = .4757 The predicted value for R/IP was less than the actual value. e. This suggestion does not make sense. If a pitcher gives up more runs per inning pitched this pitcher’s

earned run average also has to increase. For these data the sample correlation coefficient between ERA and R/IP is .964. The following partial Minitab output shows the results for part (c) using ERA as the dependent variable.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 5.174 2.5870 14.17 0.000 SO/IP 1 1.905 1.9052 10.44 0.005 HR/IP 1 2.190 2.1901 12.00 0.003 Error 17 3.103 0.1825 Total 19 8.276 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.427204 62.51% 58.10% 50.29% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 3.878 0.647 6.00 0.000 SO/IP -1.843 0.570 -3.23 0.005 1.05 HR/IP 11.99 3.46 3.46 0.003 1.05 Regression Equation ERA = 3.878 - 1.843 SO/IP + 11.99 HR/IP

11. a. SSE = SST - SSR = 6,724.125 - 6,216.375 = 507.75

b. 2 SSR 6,216.375.924

SST 6,724.125R

c. 2 2 1 10 11 (1 ) 1 (1 .924) .902

1 10 2 1a

nR R

n p

d. The estimated regression equation provided an excellent fit.

12. a. 2 SSR 14,052.2.926

SST 15,182.9R

b. 2 2 1 10 11 (1 ) 1 (1 .926) .905

1 10 2 1a

nR R

n p

Chapter 15



c. Yes; after adjusting for the number of independent variables in the model, we see that 90.5% of the variability in y has been accounted for.

13. a. 2 SSR 1760.975

SST 1805R

b. 2 2 1 30 11 (1 ) 1 (1 .975) .971

1 30 4 1a

nR R

n p

c. The estimated regression equation provided an excellent fit.

14. a. 2 SSR 12,000.75

SST 16,000R

b. 2 2 1 91 (1 ) 1 .25 .68

1 7a

nR R

n p

c. The adjusted coefficient of determination shows that 68% of the variability has been explained by

the two independent variables; thus, we conclude that the model does not explain a large amount of variability.

15. a. 2 SSR 23.435.919

SST 25.5R

2 2 1 8 11 (1 ) 1 (1 .919) .887

1 8 2 1a

nR R

n p

b. Multiple regression analysis is preferred since both R2 and 2

aR show an increased percentage of the

variability of y explained when both independent variables are used. 16. a. 2r = .577. Thus, the averages number of passing yards per attempt is able to explain 57.7% of the

variability in the percentage of games won. Considering the nature of the data and all the other factors that might be related to the number of games won, this is not too bad a fit.

b. The value of the coefficient of determination increased to R2 = .752, and the adjusted coefficient of

determination is 2aR = .714. Thus, using both independent variables provides a much better fit.

17. a. A portion of the Minitab output from part (d) of exercise 9 follows:

Model Summary S R-sq R-sq(adj) R-sq(pred) 2.84872 82.79% 82.60% 82.16%

The value of R-sq = 82.79% and the value of R-sq(adj) = 82.60% indicate that the estimated

regression equation provided a very good fit. b. A portion of the Minitab output part (b) of exercise 9 follows:


Multiple Regression



The value of R-sq = 74.77% indicates that using just ball speed can account for 74.77% of the variability in total distance. The addition of launch angle increases the percentage to almost 83%. Therefore, the estimated regression equation using both ball speed and launch angle will provide better predictions.

18. a. A portion of the Minitab output follows:


The Minitab output in part (c) of exercise 10 shows that R-sq = .5635 and R-sq(adj) = .5121. b. The fit is not great, but considering the nature of the data being able to explain slightly more than

50% of the variability in the number of runs given up per inning pitched using just two independent variables is not too bad.

c. Partial Minitab output using ERA as the dependent variable follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 5.174 2.5870 14.17 0.000 SO/IP 1 1.905 1.9052 10.44 0.005 HR/IP 1 2.190 2.1901 12.00 0.003 Error 17 3.103 0.1825 Total 19 8.276 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.427204 62.51% 58.10% 50.29% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 3.878 0.647 6.00 0.000 SO/IP -1.843 0.570 -3.23 0.005 1.05 HR/IP 11.99 3.46 3.46 0.003 1.05 Regression Equation ERA = 3.878 - 1.843 SO/IP + 11.99 HR/IP

The Minitab output shows that R-sq = .6251 and R-sq(adj) = .5810 Approximately 60% of the variability in the ERA can be explained by the linear effect of HR/IP and

SO/IP. This is not too bad considering the complexity of predicting pitching performance. 19. a. MSR = SSR/p = 6,216.375/2 = 3,108.188

SSE 507.75

MSE 72.5361 10 2 1n p

Chapter 15



b. F = MSR/MSE = 3,108.188/72.536 = 42.85 Using F table (2 degrees of freedom numerator and 7 denominator), p-value is less than .01 Actual p-value = .0001 Because p-value = .05, the overall model is significant. c. t = .5906/.0813 = 7.26 Using t table (7 degrees of freedom), area in tail is less than .005; p-value is less than .01 Actual p-value = .0002 Because p-value , is significant. d. t = .4980/.0567 = 8.78 Using t table (7 degrees of freedom), area in tail is less than .005; p-value is less than .01 Actual p-value = .0001 Because p-value , is significant. 20. A portion of the Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 14052 7026.1 43.50 0.000 X1 1 10689 10688.7 66.17 0.000 X2 1 4031 4030.9 24.95 0.002 Error 7 1131 161.5 Total 9 15183 Model Summary S R-sq R-sq(adj) R-sq(pred) 12.7096 92.55% 90.42% 87.95% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -18.4 18.0 -1.02 0.341 X1 2.010 0.247 8.13 0.000 1.00 X2 4.738 0.948 5.00 0.002 1.00 Regression Equation Y = -18.4 + 2.010 X1 + 4.738 X2

a. Since the p-value corresponding to F = 43.50 is .000 < = .05, we reject H0: = = 0; there is a

significant relationship.

Multiple Regression



b. Since the p-value corresponding to t = 8.13 is .000 < = .05, we reject H0: = 0; is significant. c. Since the p-value corresponding to t = 5.00 is .002 < = .05, we reject H0: = 0; is significant. 21. a. In the two independent variable case the coefficient of x1 represents the expected change in y

corresponding to a one unit increase in x1 when x2 is held constant. In the single independent variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1.

b. Yes. If x1 and x2 are correlated one would expect a change in x1 to be accompanied by a change in

x2. 22. a. SSE = SST - SSR = 16000 - 12000 = 4000

2 SSE 4000571.43

- -1 7s

n p

SSR 12000

MSR 60002p

b. F = MSR/MSE = 6000/571.43 = 10.50 Using F table (2 degrees of freedom numerator and 7 denominator), p-value is less than .01 Actual p-value = .008 Because p-value , we reject H0. There is a significant relationship among the variables. 23. a. F = 28.38 Using F table (2 degrees of freedom numerator and 5 denominator), p-value is less than .01 Actual p-value = .002 Because p-value , there is a significant relationship. b. t = 7.53 Using t table (5 degrees of freedom), area in tail is less than .005; p-value is less than .01 Actual p-value = .001 Because p-value , is significant and x1 should not be dropped from the model. c. t = 4.06 Actual p-value = .010 Because p-value , is significant and x2 should not be dropped from the model.

Chapter 15




Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 6179 3089.6 13.18 0.000 OffPassYds/G 1 6079 6079.5 25.94 0.000 DefYds/G 1 1713 1712.6 7.31 0.011 Error 29 6797 234.4 Total 31 12976 Model Summary S R-sq R-sq(adj) R-sq(pred) 15.3096 47.62% 44.01% 33.45% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 60.5 28.4 2.14 0.041 OffPassYds/G 0.3186 0.0626 5.09 0.000 1.21 DefYds/G -0.2413 0.0893 -2.70 0.011 1.21 Regression Equation Win% = 60.5 + 0.3186 OffPassYds/G - 0.2413 DefYds/G

y = 60.5 + 0.3186 OffPassYds/G ˗ 0.2413 DefYds/G

b. Because the p-value for the F test = .000 < = .05, there is a significant relationship. c. For OffPassYds/G: Because the p-value = .000 < = .05, OffPassYds/G is significant. For DefYds/G: Because the p-value = .011 < = .05, DefYds/G is significant. 25. a. Partial Minitab output follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 3 92.352 30.784 15.98 0.000 Itineraries/Schedule 1 1.398 1.398 0.73 0.407 Shore Excursions 1 61.261 61.261 31.81 0.000 Food/Dining 1 30.539 30.539 15.86 0.001 Error 16 30.813 1.926 Total 19 123.166 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.38775 74.98% 70.29% 58.09%

Multiple Regression



Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 35.6 13.2 2.69 0.016 Itineraries/Schedule 0.110 0.130 0.85 0.407 1.05 Shore Excursions 0.2445 0.0434 5.64 0.000 1.07 Food/Dining 0.2474 0.0621 3.98 0.001 1.01 Regression Equation Overall = 35.6 + 0.110 Itineraries/Schedule + 0.2445 Shore Excursions + 0.2474 Food/Dining

b. Because the p-value corresponding to F = 15.98, 0.000, is less than .05, the level of significance,

overall there is a significant relationship. c. Because the p-value for Itineraries/Schedule (.407) is greater than the level of significance (.05),

Itineraries/Schedule is not significant. Shore Excursions (p-value = .000) and Food/Dining (p-value = .001) are both significant because the p-value for each of these independent variables is less than the level of significance (.05).

d. After removing Itineraries/Schedule from the model, we obtained the following Minitab output.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 90.95 45.477 24.00 0.000 Shore Excursions 1 69.05 69.053 36.44 0.000 Food/Dining 1 30.75 30.752 16.23 0.001 Error 17 32.21 1.895 Total 19 123.17 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.37650 73.85% 70.77% 62.81% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 45.18 6.95 6.50 0.000 Shore Excursions 0.2529 0.0419 6.04 0.000 1.01 Food/Dining 0.2482 0.0616 4.03 0.001 1.01 Regression Equation Overall = 45.18 + 0.2529 Shore Excursions + 0.2482 Food/Dining

With Itineraries/Schedule in the model, the R-sq was .7498, while the R-sq after Itineraries/Schedule

was removed from the model was .7385. Removing Itineraries/Schedule from the model resulted in almost no loss in the model’s ability to explain variability in the Overall Score.

Chapter 15



26. The partial Minitab output from part (c) of exercise 10 follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 0.06348 0.031738 10.97 0.001 SO/IP 1 0.03460 0.034603 11.96 0.003 HR/IP 1 0.01621 0.016214 5.60 0.030 Error 17 0.04918 0.002893 Total 19 0.11266 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.0537850 56.35% 51.21% 41.25% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 0.5365 0.0814 6.59 0.000 SO/IP -0.2483 0.0718 -3.46 0.003 1.05 HR/IP 1.032 0.436 2.37 0.030 1.05 Regression Equation R/IP = 0.5365 - 0.2483 SO/IP + 1.032 HR/IP

a. The p-value associated with F = 10.97 is .001. Because the p-value < .05, there is a significant

overall relationship. b. For SO/IP, the p-value associated with t = -3.46 is .003. Because the p-value < .05, SO/IP is

significant. For HR/IP, the p-value associated with t = 2.37 is .030. Because the p-value < .05, HR/IP is also significant.

27. a. y = 29.1270 + .5906(180) + .4980(310) = 289.8150

b. The point estimate for an individual value is y = 289.8150, the same as the point estimate of the

mean value. 28. Partial Minitab output follows:

Regression Equation Y = -18.4 + 2.010 X1 + 4.738 X2 Variable Setting X1 45 X2 15 Fit SE Fit 95% CI 95% PI 143.157 4.64909 (132.164, 154.151) (111.156, 175.158)

Multiple Regression



a. The 95% confidence interval is 132.164 to 154.151. b. The 95% prediction interval is 111.156 to 175.158. 29. a. y = 83.2 + 2.29(3.5) + 1.30(1.8) = 93.555 or $93,555

Note: In Exercise 5b, the Minitab output also shows that b0 = 83.23, b1 = 2.290, and b2 = 1.301;

hence, y = 83.23 + 2.23x1 + 1.301x2. Using this estimated regression equation, we obtain

y = 83.23 + 2.29(3.5) + 1.301(1.8) = 93.5868 or $93,586.80

The difference ($93,586.80 - $93,555 = $31.80) is simply due to the fact that additional significant

digits are used in the computations. From a practical point of view, however, the difference is not enough to be concerned about. In practice, a computer software package is always used to perform the computations and this will not be an issue.


Regression Equation Weekly Gross_Revenue_($1000s) = 83.23 + 2.290 Televison_Advertising_($1000s) + 1.301 Newspaper_Advertising_($1000s) Variable Setting Televison_Advertising_($1000s) 3.5 Newspaper_Advertising_($1000s) 1.8 Fit SE Fit 95% CI 95% PI 93.5875 0.290886 (92.8398, 94.3353) (91.7743, 95.4007)

Confidence interval estimate: 92.8398 to 94.3353 or $92,839.80 to $94,335.30 c. From the partial Minitab output provided in art (b) Prediction interval estimate: 91.7743 to 95.4007 or $91,774.30 to $95,400.70 30. a. Partial Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 2 6179 47.62% 6179 3089.6 13.18 0.000 OffPassYds/G 1 4466 34.42% 6079 6079.5 25.94 0.000 DefYds/G 1 1713 13.20% 1713 1712.6 7.31 0.011 Error 29 6797 52.38% 6797 234.4 Total 31 12976 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 15.3096 47.62% 44.01% 8636.13 33.45%

Chapter 15



Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 60.5 28.4 ( 2.5, 118.5) 2.14 0.041 OffPassYds/G 0.3186 0.0626 ( 0.1907, 0.4466) 5.09 0.000 1.21 DefYds/G -0.2413 0.0893 (-0.4239, -0.0587) -2.70 0.011 1.21 Regression Equation Win% = 60.5 + 0.3186 OffPassYds/G - 0.2413 DefYds/G

The estimated regression equation is y = 60.5 + 0.3186 OffPassYds/G ˗ 0.2413 DefYds/G

For OffPassYds/G = 225 and DefYds/G = 300, the predicted value of the percentage of games won

is y = 60.5 + 0.3186(223) ˗ 0.2413(300) = 59.1578

b. Partial Minitab output follows

Regression Equation Win% = 60.5 + 0.3186 OffPassYds/G - 0.2413 DefYds/G Variable Setting OffPassYds/G 225 DefYds/G 300 Fit SE Fit 95% CI 95% PI 59.8270 4.88676 (49.8325, 69.8216) (26.9589, 92.6952)

The 95% prediction interval is 26.9589 to 92.6952. 31. a. Partial Minitab output follows.

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 2 3.9954 68.27% 3.9954 1.9977 11.84 0.002 Trade Price 1 0.2442 4.17% 0.9665 0.9665 5.73 0.036 Speed of Execution 1 3.7512 64.10% 3.7512 3.7512 22.22 0.001 Error 11 1.8567 31.73% 1.8567 0.1688 Total 13 5.8521 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.410845 68.27% 62.50% 3.45220 41.01%

Multiple Regression



Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant -0.783 0.942 (-2.857, 1.290) -0.83 0.423 Trade Price 0.558 0.233 ( 0.045, 1.071) 2.39 0.036 1.07 Speed of Execution 0.734 0.156 ( 0.391, 1.077) 4.71 0.001 1.07 Regression Equation Satisfaction Electronic Trades = -0.783 + 0.558 Trade Price + 0.734 Speed of Execution

y = -0.783 + 0.558 Trade Price + 0.734 Speed of Execution

b. Satisfaction Electronic Trades = - 0.783 + 0.558(3) + 0.734(3) = 3.093 c./d. A portion of the Minitab output follows.

Regression Equation Satisfaction Electronic Trades = -0.783 + 0.558 Trade Price + 0.734 Speed of Execution Variable Setting Trade Price 3 Speed of Execution 3 Fit SE Fit 95% CI 95% PI 3.09292 0.111486 (2.84754, 3.33830) (2.15596, 4.02989) Predicted Values for New Observations

For part (c) the 95% confidence interval is 2.84754 to 3.33830 For part (d) the 95% prediction interval is 2.155596 to 4.02989; but, because the highest possible

rating is 4, the upper end of the prediction interval is treated as 4. 32. a. E(y) = + x1 + x2 where x2 = 0 if level 1 and 1 if level 2 b. E(y) = + x1 + (0) = + x1 c. E(y) = + x1 + (1) = + x1 + d. = E(y | level 2) - E(y | level 1) is the change in E(y) for a 1 unit change in x1 holding x2 constant.

Chapter 15



33. a. two b. E(y) = + x1 + x2 + x3 where

x2 x3 Level 0 0 1 1 0 2 0 1 3

c. E(y | level 1) = + x1 + (0) + (0) = + x1 E(y | level 2) = + x1 + (1) + (0) = + x1 + E(y | level 3) = + x1 + (0) + (0) = + x1 + = E(y | level 2) - E(y | level 1) = E(y | level 3) - E(y | level 1) is the change in E(y) for a 1 unit change in x1 holding x2 and x3 constant. 34. a. $15,300 b. Estimate of sales = 10.1 - 4.2(2) + 6.8(8) + 15.3(0) = 56.1 or $56,100 c. Estimate of sales = 10.1 - 4.2(1) + 6.8(3) + 15.3(1) = 41.6 or $41,600 35. a. Let Type = 0 if a mechanical repair Type = 1 if an electrical repair Partial Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 0.9127 8.71% 0.9127 0.9127 0.76 0.408 Type 1 0.9127 8.71% 0.9127 0.9127 0.76 0.408 Error 8 9.5633 91.29% 9.5633 1.1954 Total 9 10.4760 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 1.09335 8.71% 0.00% 15.5648 0.00% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 3.450 0.547 ( 2.189, 4.711) 6.31 0.000 Type 0.617 0.706 (-1.011, 2.244) 0.87 0.408 1.00 Regression Equation Repair Time_(hours) = 3.450 + 0.617 Type

Multiple Regression



b. The estimated regression equation did not provide a good fit. In fact, the p-value of .408 shows that the relationship is not significant for any reasonable value of .

c. Let Person = 0 if Bob Jones performed the service Person = 1 if Dave Newton performed the service Partial Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 6.400 61.09% 6.400 6.4000 12.56 0.008 Person 1 6.400 61.09% 6.400 6.4000 12.56 0.008 Error 8 4.076 38.91% 4.076 0.5095 Total 9 10.476 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.713793 61.09% 56.23% 6.36875 39.21% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 4.620 0.319 ( 3.884, 5.356) 14.47 0.000 Person -1.600 0.451 (-2.641, -0.559) -3.54 0.008 1.00 Regression Equation Repair Time_(hours) = 4.620 - 1.600 Person 9 10.4760

d. We see that 61.09% of the variability in repair time has been explained by the repair person that

performed the service; an acceptable, but not good, fit. 36. a. The Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 3 9.4305 90.02% 9.43049 3.14350 18.04 0.002 Months 1 5.5960 53.42% 2.11783 2.11783 12.15 0.013 Type 1 3.4049 32.50% 2.30138 2.30138 13.21 0.011 Person 1 0.4296 4.10% 0.42957 0.42957 2.47 0.167 Error 6 1.0455 9.98% 1.04551 0.17425 Lack-of-Fit 5 1.0455 9.98% 1.04551 0.20910 * * Pure Error 1 0.0000 0.00% 0.00000 0.00000 Total 9 10.4760 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.417434 90.02% 85.03% 3.38309 67.71%

Chapter 15



Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 1.860 0.729 ( 0.077, 3.643) 2.55 0.043 Months 0.2914 0.0836 (0.0869, 0.4960) 3.49 0.013 2.43 Type 1.102 0.303 ( 0.360, 1.845) 3.63 0.011 1.27 Person -0.609 0.388 (-1.558, 0.340) -1.57 0.167 2.16 Regression Equation Repair Time_(hours) = 1.860 + 0.2914 Months + 1.102 Type - 0.609 Person

b. Since the p-value corresponding to F = 18.04 is .002 < = .05, the overall model is statistically

significant. c. The p-value corresponding to t = -1.57 is .167 > = .05; thus, the addition of Person is not

statistically significant. Person is highly correlated with Months (the sample correlation coefficient is -.691); thus, once the effect of Months has been accounted for, Person will not add much to the model.

37. a. A portion of the Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 91.29 34.42% 91.29 91.290 9.97 0.005 Price ($) 1 91.29 34.42% 91.29 91.290 9.97 0.005 Error 19 173.95 65.58% 173.95 9.155 Lack-of-Fit 10 96.45 36.36% 96.45 9.645 1.12 0.437 Pure Error 9 77.50 29.22% 77.50 8.611 Total 20 265.24 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 3.02575 34.42% 30.97% 214.718 19.05% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 69.28 3.40 (62.16, 76.39) 20.37 0.000 Price ($) 0.559 0.177 (0.188, 0.929) 3.16 0.005 1.00 Regression Equation Score = 69.28 + 0.559 Price ($)

b. Because the p-value = .005 < α = .05, there is a significant relationship. c. Let Type_Italian = 1 if the restaurant is an Italian restaurant; 0 otherwise

Multiple Regression



d. A portion of the Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 2 139.577 52.62% 139.577 69.7886 10.00 0.001 Price ($) 1 91.290 34.42% 96.075 96.0754 13.76 0.002 Type_Italian 1 48.287 18.21% 48.287 48.2869 6.92 0.017 Error 18 125.661 47.38% 125.661 6.9812 Lack-of-Fit 13 122.161 46.06% 122.161 9.3970 13.42 0.005 Pure Error 5 3.500 1.32% 3.500 0.7000 Total 20 265.238 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 2.64219 52.62% 47.36% 187.424 29.34% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 67.40 3.05 (60.99, 73.82) 22.07 0.000 Price ($) 0.573 0.155 (0.249, 0.898) 3.71 0.002 1.00 Type_Italian 3.04 1.16 ( 0.61, 5.47) 2.63 0.017 1.00 Regression Equation Score = 67.40 + 0.573 Price ($) + 3.04 Type_Italian

e. For the Type_Italian dummy variable, the p-value = .017 < α = .05; thus, type of restaurant is a

significant factor in overall customer satisfaction. f. The estimated regression equation computed in part (d) is y = 67.4 + .573(Price) +

3.04(Type_Italian). For a seafood/steakhouse Type_Italian = 0 and the estimated score is y = 67.4 + .573(20) + 3.04(0)

= 79.86 For an Italian restaurant Type_Italian = 1 and the estimated score is y = 67.4 + .573(20) + 3.04(1) =

82.90 Thus, the satisfaction score increases by 3.04 points. 38. a. Partial Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 3 3660.7 87.35% 3660.7 1220.25 36.82 0.000 Age 1 1772.0 42.28% 1394.8 1394.84 42.09 0.000 Pressure 1 1607.7 38.36% 1027.4 1027.35 31.00 0.000 Smokers 1 281.1 6.71% 281.1 281.10 8.48 0.010 Error 16 530.2 12.65% 530.2 33.14 Total 19 4190.9 100.00%

Chapter 15



Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 5.75657 87.35% 84.98% 799.476 80.92% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant -91.8 15.2 (-124.0, -59.5) -6.03 0.000 Age 1.077 0.166 ( 0.725, 1.429) 6.49 0.000 1.46 Pressure 0.2518 0.0452 (0.1559, 0.3477) 5.57 0.000 1.25 Smokers 8.74 3.00 ( 2.38, 15.10) 2.91 0.010 1.36 Regression Equation Risk = -91.8 + 1.077 Age + 0.2518 Pressure + 8.74 Smokers

b. Since the p-value corresponding to t = 2.91 is .010 < = .05, smoking is a significant factor. c. Partial Minitab output follows

Regression Equation Risk = -91.8 + 1.077 Age + 0.2518 Pressure + 8.74 Smokers Variable Setting Age 68 Pressure 175 Smokers 1 Fit SE Fit 95% CI 95% PI 34.2661 1.99785 (30.0309, 38.5014) (21.3487, 47.1836)

The point estimate is 34.2661; the 95% prediction interval is 21.3487 to 47.1836. Thus, the

probability of a stroke (.213487 to .471836 at the 95% confidence level) appears to be quite high. The physician would probably recommend that Art quit smoking and begin some type of treatment designed to reduce his blood pressure.


Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 67.60 84.50% 67.60 67.600 16.35 0.027 x 1 67.60 84.50% 67.60 67.600 16.35 0.027 Error 3 12.40 15.50% 12.40 4.133 Total 4 80.00 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 2.03306 84.50% 79.33% 23.8635 70.17%

Multiple Regression



Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 0.20 2.13 (-6.59, 6.99) 0.09 0.931 x 2.600 0.643 (0.554, 4.646) 4.04 0.027 1.00 Regression Equation y = 0.20 + 2.600 x

b. Using Minitab we obtained the following values:

xi

yi

ˆiy

Standardized Residual

1 3 2.8 .16 2 7 5.4 .94 3 5 8.0 -1.65 4 11 10.6 .24 5 14 13.2 .62

The point (3, 5) does not appear to follow the trend of remaining data; however, the value of the

standardized residual for this point, -1.65, is not large enough for us to conclude that (3, 5) is an outlier.

A plot of the standardized residuals versus y also shows that the largest standardized residual

corresponds with the point for which y = 8, which is the point (3, 5). Again, the value of the

standardized residual for this point, -1.65, is not large enough for us to conclude that (3, 5) is an outlier.

Chapter 15



c. Using Minitab, we obtained the following values:

xi

yi

Studentized Deleted Residual

1 3 .13 2 7 .91 3 5 - 4.42 4 11 .19 5 14 .54

t.025 = 4.303 (n - p - 2 = 5 - 1 - 2 = 2 degrees of freedom) Since the studentized deleted residual for (3, 5) is -4.42 < -4.303, we conclude that the third

observation is an outlier. 40. a. Partial Minitab output follows:

Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 1934.42 98.76% 1934.42 1934.42 238.03 0.001 x 1 1934.42 98.76% 1934.42 1934.42 238.03 0.001 Error 3 24.38 1.24% 24.38 8.13 Total 4 1958.80 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 2.85073 98.76% 98.34% 243.374 87.58% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant -53.28 5.79 (-71.69, -34.87) -9.21 0.003 x 3.110 0.202 ( 2.468, 3.752) 15.43 0.001 1.00 Regression Equation y = -53.28 + 3.110 x

b. Using the Minitab we obtained the following values:

xi

yi

Studentized Deleted Residual

22 12 -1.94 24 21 -.12 26 31 1.79 28 35 .40 40 70 -1.90

t.025 = 4.303 (n - p - 2 = 5 - 1 - 2 = 2 degrees of freedom)

Multiple Regression



Since none of the studentized deleted residuals are less than -4.303 or greater than 4.303, none of the observations can be classified as an outlier.

c. Using Minitab we obtained the following values:

xi yi hi 22 12 .38 24 21 .28 26 31 .22 28 35 .20 40 70 .92

The critical value is 3( 1) 3(1 1)

1.25

p

n

Since none of the values exceed 1.2, we conclude that there are no influential observations in the

data. d. Using Minitab we obtained the following values:

xi yi Di 22 12 .60 24 21 .00 26 31 .26 28 35 .03 40 70 11.09

Since D5 = 11.09 > 1 (rule of thumb critical value), we conclude that the fifth observation is

influential. 41. a. The Minitab output appears in the solution to part (b) of Exercise 5; the estimated regression

equation is:

Revenue = 83.2 + 2.29 TVAdv + 1.301 NewsAdv b. Using Minitab we obtained the following values:

ˆiy

Standardized Residual

96.63 -1.62 90.41 -1.08 94.34 1.22 92.21 - .37 94.39 1.10 94.24 - .40 94.42 -1.12 93.35 1.08

With the relatively few observations, it is difficult to determine if any of the assumptions regarding

the error term have been violated. For instance, an argument could be made that there does not appear to be any pattern in the plot; alternatively an argument could be made that there is a curvilinear pattern in the plot.

Chapter 15



c. The values of the standardized residuals are greater than -2 and less than +2; thus, using test, there are no outliers. As a further check for outliers, we used Minitab to compute the following studentized deleted residuals:

Observation Studentized

Deleted Residual 1 -2.11 2 -1.10 3 1.31 4 - .33 5 1.13 6 - .36 7 -1.16 8 1.10

t.025 = 2.776 (n - p - 2 = 8 - 2 - 2 = 4 degrees of freedom) Since none of the studentized deleted residuals is less than -2.776 or greater than 2.776, we conclude

that there are no outliers in the data. d. Using Minitab we obtained the following values:

Observation hi Di 1 .63 1.52 2 .65 .70 3 .30 .22 4 .23 .01 5 .26 .14 6 .14 .01 7 .66 .81 8 .13 .06

The critical value for leverage is 3( 1) 3(2 1)

1.1258

p

n

Since none of the values exceed 1.125, we conclude that there are no influential observations.

However, using Cook’s distance measure, we see that D1 > 1 (rule of thumb critical value); thus, we conclude the first observation is influential. Final Conclusion: observations 1 is an influential observation.


Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 2 915.66 91.94% 915.66 457.828 74.12 0.000 Price ($1000s) 1 406.39 40.80% 46.22 46.222 7.48 0.017 Horsepower 1 509.27 51.13% 509.27 509.266 82.45 0.000 Error 13 80.30 8.06% 80.30 6.177 Total 15 995.95 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 2.48532 91.94% 90.70% 142.697 85.67%

Multiple Regression



Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 71.33 2.25 ( 66.47, 76.18) 31.73 0.000 Price ($1000s) 0.1072 0.0392 ( 0.0225, 0.1918) 2.74 0.017 1.30 Horsepower 0.08450 0.00931 (0.06439, 0.10460) 9.08 0.000 1.30 Regression Equation Speed at 1/4 mile (mph) = 71.33 + 0.1072 Price ($1000s) + 0.08450 Horsepower r Fits and Diagnostics for All Observations

Speed at 1/4 mile Obs (mph) Fit SE Fit 95% CI Resid Std Resid Del Resid HI 1 90.70 90.49 0.89 ( 88.56, 92.41) 0.21 0.09 0.09 0.128399 2 108.00 105.88 2.01 (101.55, 110.22) 2.12 1.45 1.52 0.652221 3 93.20 91.68 0.88 ( 89.78, 93.59) 1.52 0.65 0.64 0.126054 4 103.20 99.76 1.14 ( 97.30, 102.23) 3.44 1.56 1.66 0.210277 5 102.10 105.85 0.94 (103.81, 107.89) -3.75 -1.63 -1.76 0.144325 6 116.20 116.83 1.70 (113.16, 120.50) -0.63 -0.35 -0.33 0.467435 7 91.70 92.83 0.89 ( 90.90, 94.76) -1.13 -0.49 -0.47 0.129294 8 89.70 90.63 0.87 ( 88.75, 92.51) -0.93 -0.40 -0.39 0.122422 9 93.00 94.32 0.78 ( 92.63, 96.00) -1.32 -0.56 -0.54 0.098752 10 92.30 91.54 0.94 ( 89.52, 93.56) 0.76 0.33 0.32 0.141539 11 99.00 103.46 0.80 (101.73, 105.19) -4.46 -1.90 -2.14 0.104138 12 84.60 87.11 1.06 ( 84.81, 89.41) -2.51 -1.12 -1.13 0.183349 13 103.20 100.08 1.05 ( 97.80, 102.35) 3.12 1.39 1.44 0.180096 14 93.20 93.20 0.87 ( 91.31, 95.08) 0.00 0.00 0.00 0.123771 15 105.00 102.76 0.86 (100.91, 104.62) 2.24 0.96 0.96 0.119325 16 97.00 95.68 0.65 ( 94.27, 97.08) 1.32 0.55 0.54 0.068602

Obs Cook’s D DFITS 1 0.00 0.03363 2 1.31 2.07558 X 3 0.02 0.24241 4 0.21 0.85470 5 0.15 -0.72273 6 0.03 -0.31262 7 0.01 -0.18152 8 0.01 -0.14467 9 0.01 -0.17970 10 0.01 0.12868 11 0.14 -0.73025 12 0.09 -0.53557 13 0.14 0.67723 14 0.00 0.00071 15 0.04 0.35229 16 0.01 0.14555 X Unusual X

b. The standardized residual plot follows. There appears to be a very unusual trend in the standardized

residuals.

Chapter 15



Fitted Value

Stan

dard

ized

Res

idua

l

1201151101051009590

2

1

0

-1

-2

c. The Minitab output shown in part (a) did not identify any observations with a large standardized

residual; thus, there does not appear to be any outliers in the data. d. The Minitab output shown in part (a) identifies observation 2 as an influential observation. 43. a. The Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 229.676 114.838 969.26 0.000 Greens in Reg. 1 151.431 151.431 1278.11 0.000 Putting Avg. 1 123.274 123.274 1040.46 0.000 Error 131 15.521 0.118 Lack-of-Fit 130 15.483 0.119 3.10 0.429 Pure Error 1 0.038 0.038 Total 133 245.197 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.344209 93.67% 93.57% 93.26% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 57.148 0.989 57.76 0.000 Greens in Reg. -23.106 0.646 -35.75 0.000 1.04 Putting Avg. 1.0320 0.0320 32.26 0.000 1.04 Regression Equation Scoring Avg. = 57.148 - 23.106 Greens in Reg. + 1.0320 Putting Avg.

Multiple Regression



Fits and Diagnostics for Unusual Observations Scoring Obs Avg. Fit Resid Std Resid 11 73.8000 72.8612 0.9388 2.74 R 25 72.1250 72.9445 -0.8195 -2.41 R 34 74.7500 73.7888 0.9612 2.81 R 36 75.5000 75.2114 0.2886 0.88 X 56 73.2500 73.8035 -0.5535 -1.67 X 59 74.8330 74.4215 0.4115 1.24 X 62 74.1670 73.4551 0.7119 2.10 R 64 75.7500 76.1522 -0.4022 -1.22 X 102 74.0000 75.1150 -1.1150 -3.33 R 122 73.6250 72.2792 1.3458 4.01 R 129 73.2780 72.5712 0.7068 2.09 R R Large residual X Unusual X

b. The standardized residual plot follows:

The standardized residual plot does not support the assumption about . There are several unusual

observations and the variance of the residuals appears to be increasing for larger values of y .

c. The Minitab output in part (a) identified seven outliers: observations 11, 25, 34, 62, 102, 122, and

129. Observations 25 and 102 correspond to Charley Hull and P.K. Kongkraphan, respectively; their

scoring averages were was much lower than other golfers with similar percentage of time hitting the green in regulation and average number of putts taken on greens hit in regulation.

Chapter 15



Observations 11, 34, 62, 122, and 129 correspond to Ashleigh Simon, Dori Carter, Karin Sjodin, Sophia Popov, and Tiffany Joh, respectively; their scoring averages were each much higher than other golfers with similar percentage of time hitting the green in regulation and average number of putts taken on greens hit in regulation.

d. The Minitab output in part (a) identified four influential observations: observations 36, 56, 59, and 64. Observation 36 corresponds to Garrett Phillips, observation 56 corresponds to Jing Yan, Observation 59 corresponds to Ju Young Park, and observation 64 corresponds to Karlin Beck.

44. a. 0

0( )

1

x

x

eE y

e

b. It is an estimate of the probability that a customer that does not have a Simmons credit card will

make a purchase. c. A portion of the Minitab binary logistic regression output follows:

Coefficients Term Coef SE Coef VIF Constant -0.944 0.315 Card 1.025 0.423 1.00 Odds Ratios for Continuous Predictors Odds Ratio 95% CI Card 2.7857 (1.2147, 6.3886) Regression Equation P(1) = exp(Y')/(1 + exp(Y')) Y' = -0.944 + 1.025 Card

Thus, the estimated logit is ˆ( )g x -0.944 + 1.025x

d. For customers that do not have a Simmons credit card (x = 0) ˆ (0)g -0.945 + 1.25(0) = 0.945

and

ˆ (0) 0.945

ˆ 0.945(0)

0.38868ˆ 0.279

1 0.3886811

g

g

e ey

ee

For customers that have a Simmons credit card (x = 1) ˆ (1)g -0.945 + 1.025(1) = 0.0800

and

Multiple Regression



ˆ (1) 0.08

ˆ 0.08(1)

1.0833ˆ 0.52

1 1.083311

g

g

e ey

ee

e. Using the Minitab output shown in part (c), the estimated odds ratio is 2.7857. We can conclude that

the estimated odds of making a purchase for customers who have a Simmons credit card are 2.7857 times greater than the estimated odds of making a purchase for customers that do not have a Simmons credit card.

45. a. odds =.3148

.45941 .3148

b. odds1 =.5796

1.37871 .5796

odds0 = .4594 (from part (a))

odds ratio = 1

0

odds 1.37873.00

odds .4594

c. The odds ratio for x2 computed holding annual spending constant at $4000 is also 3.00. This shows

that the odds ratio for x2 is independent of the value of x1.

46. a. 0

0( )

1

x

x

eE y

e

b. A portion of the Minitab binary logistic regression output follows:

Coefficients Term Coef SE Coef VIF Constant -2.633 0.799 Balance 0.2202 0.0900 1.00 Odds Ratios for Continuous Predictors Odds Ratio 95% CI Balance 1.2463 (1.0447, 1.4868) Regression Equation P(1) = exp(Y')/(1 + exp(Y')) Y' = -2.633 + 0.2202 Balance

Thus, the estimated logistic regression equation is

2.633 0.2202

2.633 0.2202( )

1

x

x

eE y

e

Chapter 15



c. A portion of the Minitab binary logistic regression output follows:

Deviance Table Source DF Adj Dev Adj Mean Chi-Square P-Value Regression 1 9.460 9.460 9.46 0.002 Balance 1 9.460 9.460 9.46 0.002 Error 48 51.626 1.076 Total 49 61.086

Significant result: the p-value corresponding to the 2 test statistic is 0.002. d. For an average monthly balance of $1000, x = 10

2.633 0.2202 2.633 0.2202(10) 0.431

2.633 0.2202 2.633 0.2202(10) 0.431

0.6499( ) 0.3939

1.64991 1 1

x

x

e e eE y

e e e

Thus, an estimate of the probability that customers with an average monthly balance of $1000 will

sign up for direct payroll deposit is 0.3939. e. Repeating the calculations in part (d) using various values for x, a value of x = 12 or an average

monthly balance of approximately $1200 is required to achieve this level of probability. f. Using the Minitab output shown in part (b), the estimated odds ratio is 1.2463. Because values of x

are measured in hundreds of dollars, the estimated odds of signing up for payroll direct deposit for customers that have an average monthly balance of $600 is 1.2463 times greater than the estimated odds of signing up for payroll direct deposit for customers that have an average monthly balance of $500. Moreover, this interpretation is true for any one hundred dollar increment in the average monthly balance.

47. a. 0 1 1 2 2

0 1 1 2 2( )

1

x x

x x

eE y

e

b. For a given GPA, it is an estimate of the probability that a student who did not attend the orientation

program will return to Lakeland for the sophomore year. c. A portion of the Minitab binary logistic regression output follows:

Coefficients Term Coef SE Coef VIF Constant -6.89 1.75 GPA 2.539 0.673 1.01 Program 1.561 0.563 1.01 Odds Ratios for Continuous Predictors Odds Ratio 95% CI GPA 12.6644 (3.3872, 47.3515) Program 4.7624 (1.5794, 14.3607) Regression Equation P(1) = exp(Y')/(1 + exp(Y'))

Multiple Regression



Y' = -6.89 + 2.539 GPA + 1.561 Program Thus, the estimated logit is 1 2 1 2ˆ ( , ) 6.89 2.539 1.561g x x x x

d. A portion of the Minitab binary logistic regression output follows:

Deviance Table Source DF Adj Dev Adj Mean Chi-Square P-Value Regression 2 47.869 23.9347 47.87 0.000 GPA 1 20.966 20.9663 20.97 0.000 Program 1 7.862 7.8616 7.86 0.005 Error 97 80.338 0.8282 Total 99 128.207

Significant result: the p-value corresponding to the 2 test statistic is 0.000. e. From the portion of the Minitab binary logistic regression output shown in the solution to part (d),

both variables are significant at = .01: the p-value for x1 is 0.000 and the p-value for x2 is 0.005. f. For x1 =2.5 and x2 = 0 g (2.5, 0) = -6.89 + 2.539(2.5) + 1.561(0) = -0.5425

and

ˆ (2.5,0) 0.5425

ˆ 0.5425(2.5,0)

0.5813ˆ 0.3676

1 0.581311

g

g

e ey

ee

For x1 =2.5 and x2 = 1 g (2.5, 1) = -6.89 + 2.539(2.5) + 1.561(1) = 1.0185

and

ˆ (2.5,1) 1.0185

ˆ 1.0185(2.5,1)

2.769ˆ 0.7347

1 2.76911

g

g

e ey

ee

g. From the Minitab output in part (c) we see that the estimated odds ratio is 4.7624 for the orientation

program. This means that the odds of students who attended the orientation program continuing are 4.7624 times greater than for students who did not attend the program.

h. We recommend making the orientation program required. From part (e), we see that the odds of

continuing are much higher for students who have attended the orientation program.

48. a. 0 1 1 2 2

0 1 1 2 2( )

1

x x

x x

eE y

e

Chapter 15



b. A portion of the Minitab binary logistic regression output follows:

Coefficients Term Coef SE Coef VIF Constant -39.5 12.5 Wet 3.37 1.26 1.03 Noise 1.816 0.831 1.03 Odds Ratios for Continuous Predictors Odds Ratio 95% CI Wet 29.2095 (2.4521, 347.9413) Noise 6.1489 (1.2059, 31.3540) Regression Equation P(1) = exp(Y')/(1 + exp(Y')) Y' = -39.5 + 3.37 Wet + 1.816 Noise

Thus, the estimated logit is ˆ ( )g x -39.5 + 3.37Wet + 1.816Noise

c. For tires that have a Wet performance rating of 8 and a Noise performance rating of 8 ˆ ( )g x -39.5 + 3.37Wet + 1.816Noise

ˆ ( )g x -39.5 + 3.37(8) + 1.816(8) = 1.988

1.988

1.988

7.30092ˆ 0.8795

1 7.300921

ey

e

The probability that a customer will probably or definitely purchase a particular tire again with these

performance characteristics is .8795. d. For tires that have a Wet performance rating of 7 and a Noise performance rating of 7 ˆ ( )g x -39.5 + 3.37Wet + 1.816Noise

ˆ ( )g x -39.5 + 3.37(7) + 1.816(7) = -3.198

3.198

3.198

.04084ˆ 0.0392

1 .040841

ey

e

The probability that a customer will probably or definitely purchase a particular tire again with these

performance characteristics is .0392. e. Wet and Noise performance ratings of 7 are both considered Excellent performance ratings using the

Tire Rack performance scale. Nonetheless, the probability that the customer will repurchase a tire with these characteristics is very low. But, a one point increase in both ratings increases the probability to .8795. So, achieving the highest possible levels of performance is essential if the manufacture wants to have the greatest chance of having an existing customer buy their tire again.

Multiple Regression



49. a. The expected increase in final college grade point average corresponding to a one point increase in high school grade point average is .0235 when SAT mathematics score does not change. Similarly, the expected increase in final college grade point average corresponding to a one point increase in the SAT mathematics score is .00486 when the high school grade point average does not change.

b. y = -1.41 + .0235(84) + .00486(540) = 3.19

50. a. Job satisfaction can be expected to decrease by 8.69 units with a one unit increase in length of

service if the wage rate does not change. A dollar increase in the wage rate is associated with a 13.5 point increase in the job satisfaction score when the length of service does not change.

b. y = 14.4 - 8.69(4) + 13.5(13) = 155.14

51. a. The computer output with the missing values filled in is as follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression _2 1612 806 71.820 0.000 x1 1 146.366 146.366 13.042 0.004 x2 1 289.047 289.047 25.756 0.000 Error 12 134.67 11.223 Total 14 1746.67 Model Summary S R-sq R-sq(adj) R-sq(pred) 3.35 92.30% 91.02% 85.12% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 8.103 2.667 3.04 0.010 x1 7.602 2.105 3.61 0.004 1.62 x2 3.111 0.613 5.08 0.000 1.62 Regression Equation y = 8.103 + 7.602 X1 + 3.111 X2

b. F.05 = 3.89 F = 71.82 > F.05; significant relationship Actual p-value = .000 Because p-value = .05, the overall relationship is significant c. Using t table (12 degrees of freedom), area in tail corresponding to t = 3.61 is less than .005; p-value

is less than .01 Actual p-value = .004 Because p-value , reject H0: = 0

Chapter 15



Using t table (12 degrees of freedom), area in tail corresponding to t = 5.08 is less than .005; p-value is less than .01

Actual p-value = .000 Because p-value , reject H0: = 0 d. See computer output.

e. 2 141 (1 .9230) .9102

12aR

52. a. The computer output with the missing values filled in is as follows

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 1.76209 0.88105 52.29 0.000 X1 1 0.12389 0.12389 7.35 0.030 X2 1 0.34308 0.34308 20.36 0.003 Error 7 0.11794 0.01685 Total 9 1.88003 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.1298 93.73% 91.93% 74.53% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -1.41 0.4848 -2.91 0.023 x1 0.0235 0.0087 2.71 0.030 1.54 x2 0.00486 0.0011 4.51 0.003 1.54 Regression Equation y = -1.41 + 0.0235 X1 + 0.00486 X2

b. F.05 = 4.74 F = 52.29 > F.05; significant relationship Actual p-value = .000 Because p-value = .05, the overall relationship is significant c. for 1 : p-value = .030; reject H0: 1 = 0

for 2 : p-value = .003; reject H0: 2 = 0

d. 2 SSR.9373

SSTR

Multiple Regression



2 91 (1 .9373) .9193

7aR

good fit 53. a. The computer output with the missing values filled in is as follows

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 648.82 324.41 22.79 0.003 X1 1 444.58 444.58 31.23 0.003 X2 1 598.57 598.57 42.05 0.001 Error 5 71.18 14.24 Total 7 720.00 Model Summary S R-sq R-sq(adj) R-sq(pred) 3.773 90.11% 86.16% 69.93% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 14.41 8.191 1.76 0.139 x1 -8.69 1.555 -5.59 0.003 1.95 x2 13.52 2.085 6.48 0.001 1.95 Regression Equation y = 14.41 – 8.69 X1 + 13.52 X2

b. F.05 = 5.79 F = 22.79 > F.05; significant relationship. Actual p-value = .003 Because p-value ≤ α = .05, the overall relationship is significant.

c. 2 SSR.9011

SSTR

2 71 (1 .9011) .8616

5aR

good fit d. for : t = p-value = .0035; reject H0: = 0 for : p-value = .001; reject H0: = 0 54. a. A portion of the Minitab output follows:

Analysis of Variance

Chapter 15



Source DF Adj SS Adj MS F-Value P-Value Regression 1 60.787 60.7866 85.93 0.000 Steering 1 60.787 60.7866 85.93 0.000 Error 16 11.318 0.7074 Lack-of-Fit 11 10.092 0.9174 3.74 0.078 Pure Error 5 1.227 0.2453 Total 17 72.105 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.841071 84.30% 83.32% 81.08% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -7.52 1.47 -5.13 0.000 Steering 1.815 0.196 9.27 0.000 1.00 Regression Equation Buy Again = -7.52 + 1.815 Steering

Because the p-value = .000 < α = .05, there is a significant relationship. b. The estimated regression equation provided a good fit; 84.3 % of the variability in the Buy Again

rating was explained by the linear effect of the Steering rating. c. A portion of the Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 67.185 33.5924 102.41 0.000 Steering 1 1.888 1.8880 5.76 0.030 Tread Wear 1 6.398 6.3982 19.51 0.001 Error 15 4.920 0.3280 Total 17 72.105 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.572723 93.18% 92.27% 90.18% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant -5.39 1.11 -4.86 0.000 Steering 0.690 0.288 2.40 0.030 4.65 Tread Wear 0.911 0.206 4.42 0.001 4.65 Regression Equation Buy Again = -5.39 + 0.690 Steering + 0.911 Tread Wear

Multiple Regression



d. For the Treadwear independent variable, the p-value = .001 < α = .05; thus, the addition of Treadwear is significant.

55. a. A portion of the Regression tool output follows.

Regression Statistics

Multiple R 0.8013

R Square 0.6421

Adjusted R Square 0.6409

Standard Error 3.4123

Observations 309

ANOVA

df SS MS F Significance F

Regression 1 6413.2883 6413.2883 550.8029 1.79552E-70

Residual 307 3574.5628 11.6435

Total 308 9987.8511

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 41.0534 0.5166 79.4748 8.1E-207 40.0370 42.0699

Displacement -3.7232 0.1586 -23.4692 1.8E-70 -4.0354 -3.4110

Because the p-value corresponding to F = 550.8029 is .0000 < = .05, there is a significant relationship.

b. A portion of the Excel Regression tool output follows.


Multiple R 0.8276

R Square 0.6849



Observations 309

ANOVA


Regression 2 6841.0876 3420.5438 332.6232 1.79466E-77

Residual 306 3146.7635 10.2835

Total 308 9987.8511

Chapter 15




Intercept 40.5946 0.4906 82.7379 1.8E-211 39.6291 41.5600

Displacement -3.1944 0.1701 -18.7745 7.43E-53 -3.5292 -2.8596

FuelPremium -2.7230 0.4222 -6.4498 4.37E-10 -3.5537 -1.8922

c. For FuelPremium, the p-value corresponding to t = -6.4498 is .000 < = .05; significant. The addition of the dummy variables is significant.

d. A portion of the Excel Regression tool output follows.


Multiple R 0.8554

R Square 0.7317



Observations 309

ANOVA


Regression 4 7308.5436 1827.1359 207.3108 1.54798E-85

Residual 304 2679.3075 8.8135

Total 308 9987.8511


Intercept 37.9626 0.7892 48.1055 3.5E-144 36.4097 39.5155

Displacement -3.2418 0.1941 -16.7007 6.97E-45 -3.6238 -2.8599

FuelPremium -2.1352 0.4519 -4.7253 3.52E-06 -3.0243 -1.2460

FrontWheel 3.0747 0.5394 5.7005 2.83E-08 2.0133 4.1360

RearWheel 3.3114 0.5413 6.1174 2.92E-09 2.2462 4.3765

e. Since the p-value corresponding to F = 207.3108 is .0000 < = .05, there is a significant overall relationship. Because the p-values for each independent variable are also < = .05, each of the independent variables is significant.

Multiple Regression



56. a. Type of Fund is a categorical variable with three levels. Let FundDE = 1 for a domestic equity fund and FundIE = 1 for an international fund. The Excel output follows:


Multiple R 0.7838

R Square 0.6144



Observations 45

ANOVA

Df SS MS F Significance F

Regression 2 2096.8489 1048.4245 33.4584 2.03818E-09

Residual 42 1316.0771 31.3352

Total 44 3412.9260


Intercept 4.9090 1.7702 2.7732 0.0082 1.3366 8.4814

FundDE 10.4658 2.0722 5.0505 9.033E-06 6.2839 14.6477

FundIE 21.6823 2.6553 8.1658 3.288E-10 16.3237 27.0408 y = 4.9090+ 10.4658 FundDE + 21.6823 FundIE

Since the p-value corresponding to F = 33.4584 is .0000 < = .05, there is a significant relationship. b. R Square = .6144. A reasonably good fit using only Type of Fund. c. The Excel output follows:

Regression Statistics Multiple R 0.8135 R Square 0.6617 Adjusted R Square 0.6279 Standard Error 5.3726

Observations 45

ANOVA

Df SS MS F Significance F Regression 4 2258.3432 564.5858 19.5598 5.48647E-09 Residual 40 1154.5827 28.8646

Total 44 3412.9260

Chapter 15



Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 1.1899 2.3781 0.5004 0.6196 -3.6164 5.9961FundDE 6.8969 2.7651 2.4942 0.0169 1.3083 12.4854FundIE 17.6800 3.3161 5.3315 4.096E-06 10.9778 24.3821Net Asset Value ($) 0.0265 0.0670 0.3950 0.6950 -0.1089 0.1619

Expense Ratio (%) 6.4564 2.7593 2.3399 0.0244 0.8798 12.0331 Since the p-value corresponding to F = 19.5558 is .0000 < = .05, there is a significant relationship. For Net Asset Value ($), the p-value corresponding to t = .3950 is .6950 > = .05, Net Asset Value

($) is not significant and can be deleted from the model. d. Morningstar Rank is a categorical variable. The data set only contains funds with four ranks (2-Star

through –5Star), so three dummy variables are needed. Let 3StarRank = 1 for a 3-StarRank, 4StarRank = 1 for a 4-StarRank, and 5StarRank = 1 for a 5-StarRank. The Excel output follows:


Multiple R 0.8501

R Square 0.7227



Observations 45

ANOVA

Df SS MS F Significance F

Regression 6 2466.5721 411.0954 16.5072 2.96759E-09

Residual 38 946.3539 24.9040

Total 44 3412.9260


Intercept -4.6074 3.2909 -1.4000 0.1696 -11.2694 2.0547

FundDE 8.1713 2.2754 3.5912 0.0009 3.5650 12.7776

FundIE 19.5194 2.7795 7.0227 2.292E-08 13.8926 25.1461

Expense Ratio (%) 5.5197 2.5862 2.1343 0.0393 0.2843 10.7552

3StarRank 5.9237 2.8250 2.0969 0.0427 0.2048 11.6426

4StarRank 8.2367 2.8474 2.8927 0.0063 2.4725 14.0009

5StarRank 6.6241 3.1425 2.1079 0.0417 0.2624 12.9858 y = -4.6074 + 8.1713 FundDE + 19.5194 FundIE +5.5197 Expense Ratio (%) + 5.9237 3StarRank +

8.2367 4StarRank + 6.6241 5StarRank At the .05 level of significance, all the independent variables are significant.

Multiple Regression



e. y = -4.6074 + 8.1713(1) + 19.5194(0) +5.5197(1.05) + 5.9237(1) + 8.2367(0) +6.62415(0) =

15.28% 57. a. A portion of the Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 14048.6 14048.6 15.34 0.001 Hourly ($1000s) 1 14048.6 14048.6 15.34 0.001 Error 28 25645.2 915.9 Lack-of-Fit 25 25630.7 1025.2 212.12 0.000 Pure Error 3 14.5 4.8 Total 29 39693.9 Model Summary S R-sq R-sq(adj) R-sq(pred) 30.2639 35.39% 33.09% 24.60% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 40.3 15.7 2.58 0.016 Hourly ($1000s) 1.195 0.305 3.92 0.001 1.00 Regression Equation Salaried ($1000s) = 40.3 + 1.195 Hourly ($1000s)

b. Because the p-value = .001 < α = .05, there is a significant relationship. c. A portion of the Minitab output follows:

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 3 22820.3 7606.8 11.72 0.000 Hourly ($1000s) 1 14595.9 14595.9 22.49 0.000 Size-Midsize 1 41.8 41.8 0.06 0.802 Size-Small 1 7050.1 7050.1 10.86 0.003 Error 26 16873.6 649.0 Total 29 39693.9 Model Summary S R-sq R-sq(adj) R-sq(pred) 25.4752 57.49% 52.59% 44.68% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 27.0 14.0 1.93 0.065 Hourly ($1000s) 1.224 0.258 4.74 0.000 1.01 Size-Midsize -3.2 12.6 -0.25 0.802 1.18 Size-Small 34.4 10.4 3.30 0.003 1.17

Chapter 15



Regression Equation Salaried ($1000s) = 27.0 + 1.224 Hourly ($1000s) - 3.2 Size-Midsize + 34.4 Size-Small

e. Hourly ($1000s): Significant because the p-value = .000 < α = .05 Size-Midsize: Not significant because the p-value = .802 > α = .05 Size-Small: Significant because the p-value = .003 < α = .05 f. A portion of the Minitab output using Hourly ($1000s) and Size-Small as the independent variables

follows.

Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 2 22778 11389.2 18.18 0.000 Hourly ($1000s) 1 14584 14583.8 23.28 0.000 Size-Small 1 8730 8729.8 13.93 0.001 Error 27 16915 626.5 Total 29 39694 Model Summary S R-sq R-sq(adj) R-sq(pred) 25.0299 57.39% 54.23% 46.53% Coefficients Term Coef SE Coef T-Value P-Value VIF Constant 26.3 13.5 1.95 0.062 Hourly ($1000s) 1.218 0.252 4.82 0.000 1.00 Size-Small 35.41 9.49 3.73 0.001 1.00 Regression Equation Salaried ($1000s) = 26.3 + 1.218 Hourly ($1000s) + 35.41 Size-Small

Chapter 15 Multiple Regression - Salisbury Universityfacultyfp.salisbury.edu/fxsalimian/Info281/cs/SM SBE13E Chapter 15.pdf · Chapter 15 Multiple Regression Learning Objectives 1.

Documents