13 ch ken black solution

Chapter 13: Simple Regression and Correlation Analysis 1

Chapter 13 Simple Regression and Correlation Analysis

LEARNING OBJECTIVES

The overall objective of this chapter is to give you an understanding of bivariate regression and correlation analysis, thereby enabling you to:

1. Compute the equation of a simple regression line from a sample of data and

interpret the slope and intercept of the equation. 2. Understand the usefulness of residual analysis in testing the assumptions

underlying regression analysis and in examining the fit of the regression line to the data.

3. Compute a standard error of the estimate and interpret its meaning. 4. Compute a coefficient of determination and interpret it. 5. Test hypotheses about the slope of the regression model and interpret the results. 6. Estimate values of y using the regression model.

CHAPTER TEACHING STRATEGY

This chapter is about all aspects of simple (bivariate, linear) regression. Early in

the chapter through scatter plots, the student begins to understand that the object of simple regression is to fit a line through the points. Fairly soon in the process, the student learns how to solve for slope and y intercept and develop the equation of the regression line. Most of the remaining material on simple regression is to determine how good the fit of the line is and if assumptions underlying the process are met.

The student begins to understand that by entering values of the independent

variable into the regression model, predicted values can be determined. The question


then becomes: Are the predicted values good estimates of the actual dependent values? One rule to emphasize is that the regression model should not be used to predict for independent variable values that are outside the range of values used to construct the model. MINITAB issues a warning for such activity when attempted. There are many instances where the relationship between x and y are linear over a given interval but outside the interval the relationship becomes curvilinear or unpredictable. Of course, with this caution having been given, many forecasters use such regression models to extrapolate to values of x outside the domain of those used to construct the model. Whether the forecasts obtained under such conditions are any better than "seat of the pants" or "crystal ball" estimates remains to be seen.

The concept of residual analysis is a good one to show graphically and

numerically how the model relates to the data and the fact that it more closely fits some points than others, etc. A graphical or numerical analysis of residuals demonstrates that the regression line fits the data in a manner analogous to the way a mean fits a set of numbers. The regression model passes through the points such that the geometric distances will sum to zero. The fact that the residuals sum to zero points out the need to square the errors (residuals) in order to get a handle on total error. This leads to the sum of squares error and then on to the standard error of the estimate. In addition, students can learn why the process is called least squares analysis (the slope and intercept formulas are derived by calculus such that the sum of squares of error is minimized - hence "least squares"). Students can learn that by examining the values of se, the residuals, r2, and the t ratio to test the slope they can begin to make a judgment about the fit of the model to the data. Many of the chapter problems ask the student to comment on these items (se, r

2, etc.). It is my view that for many of these students, an important facet of this chapter

lies in understanding the "buzz" words of regression such as standard error of the estimate, coefficient of determination, etc. They may well only interface regression again as some type of computer printout to be deciphered. The concepts then become as important or perhaps more important than the calculations.

CHAPTER OUTLINE

13.1 Introduction to Simple Regression Analysis 13.2 Determining the Equation of the Regression Line 13.3 Residual Analysis Using Residuals to Test the Assumptions of the Regression Model Using the Computer for Residual Analysis 13.4 Standard Error of the Estimate


13.5 Coefficient of Determination Relationship Between r and r2 13.6 Hypothesis Tests for the Slope of the Regression Model and Testing the Overall Model Testing the Slope Testing the Overall Model 13.7 Estimation Confidence Intervals to Estimate the Conditional Mean of y: µy/x Prediction Intervals to Estimate a Single Value of y 13.8 Interpreting Computer Output

KEY TERMS Coefficient of Determination (r2) Prediction Interval Confidence Interval Probabilistic Model Dependent Variable Regression Analysis Deterministic Model Residual Heteroscedasticity Residual Plot Homoscedasticity Scatter Plot Independent Variable Simple Regression Least Squares Analysis Standard Error of the Estimate (se) Outliers Sum of Squares of Error (SSE)


SOLUTIONS TO CHAPTER 13

13.1 x x 12 17 21 15 28 22 8 19 20 24

Σx = 89 Σy = 97 Σxy = 1,767 Σx2= 1,833 Σy2 = 1,935 n = 5

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

5

)89(833,1

5

)97)(89(767,1

2

−

− = 0.162

b0 = 5

89162.0

5

971 −=− ∑∑

n

xb

n

y = 16.5

y = 16.5 + 0.162 x


13.2 x y 140 25 119 29 103 46 91 70 65 88 29 112 24 128

Σx = 571 Σy = 498 Σxy = 30,099 Σx2 = 58,293 Σy2 = 45,154 n = 7

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

7

)571(293,58

7

)498)(571(099,30

2

−

− = -0.898

b0 = 7

571)898.0(

7

4981 −−=− ∑∑

n

xb

n

y = 144.414

y = 144.414 – 0.898 x


13.3 (Advertising) x (Sales) y 12.5 148 3.7 55 21.6 338 60.0 994 37.6 541 6.1 89 16.8 126 41.2 379 Σx = 199.5 Σy = 2,670 Σxy = 107,610.4 Σx2 = 7,667.15 Σy2 = 1,587,328 n = 8

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

8

)5.199(15.667,7

8

)670,2)(5.199(4.610,107

2

−

− = 15.24

b0 = 8

5.19924.15

8

670,21 −=− ∑∑

n

xb

n

y = -46.29

y = -46.29 + 15.24 x 13.4 (Prime) x (Bond) y 16 5 6 12 8 9 4 15 7 7 Σx = 41 Σy = 48 Σxy = 333 Σx2 = 421 Σy2 = 524 n = 5

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

5

)41(421

5

)48)(41(333

2

−

− = -0.715


b0 = 5

41)715.0(

5

481 −−=− ∑∑

n

xb

n

y = 15.46

y = 15.46 – 0.715 x

13.5 Starts Failures 233,710 57,097 199,091 50,361 181,645 60,747 158,930 88,140 155,672 97,069 164,086 86,133 166,154 71,558 188,387 71,128 168,158 71,931 170,475 83,384 166,740 71,857 Σx = 1,953,048 Σy = 809,405 Σx2 = 351,907,107,960 Σy2 = 61,566,568,203 Σxy = 141,238,520,688 n = 11

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

11

)048,953,1(960,107,907,351

11

)405,809)(048,953,1(688,520,238,141

2

−

− =

b1 = -0.48042194

b0 = 11

048,953,1)48042194.0(

11

405,8091 −−=− ∑∑

n

xb

n

y = 158,881.1

y = 158,881.1 – 0.48042194 x


13.6 No. of Farms (x) Avg. Size (y) 5.65 213 4.65 258 3.96 297 3.36 340 2.95 374 2.52 420 2.44 426 2.29 441 2.15 460 2.07 469 2.17 434 Σx = 34.21 Σy = 4,132 Σx2 = 120.3831 Σy2 = 1,627,892 Σxy = 11,834.31 n = 11

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

11

)21.34(3831.120

11

)132,4)(21.34(31.834,11

2

−

− = -72.6383

b0 = 11

21.34)6383.72(

11

132,41 −−=− ∑∑

n

xb

n

y = 601.542

y = 601.542 – 72.6383 x


13.7 Steel New Orders 99.9 2.74 97.9 2.87 98.9 2.93 87.9 2.87 92.9 2.98 97.9 3.09 100.6 3.36 104.9 3.61 105.3 3.75 108.6 3.95

Σx = 994.8 Σy = 32.15 Σx2 = 99,293.28 Σy2 = 104.9815 Σxy = 3,216.652 n = 10

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

10

)8.994(28.293,99

10

)15.32)(8.994(652.216,3

2

−

− = 0.05557

b0 = 10

8.994)05557.0(

10

15.321 −=− ∑∑

n

xb

n

y = -2.31307

y = -2.31307 + 0.05557 x


13.8 x x 15 47 8 36 19 56 12 44 5 21 y = 13.625 - 2.303 x Residuals: x y y Residuals (y- y ) 15 47 48.1694 -1.1694 8 36 32.0489 3.9511 19 56 57.3811 -1.3811 12 44 41.2606 2.7394 5 21 25.1401 -4.1401 13.9 x y Predicted (y ) Residuals (y- y ) 12 17 18.4582 -1.4582 21 15 19.9196 -4.9196 28 22 21.0563 0.9437 8 19 17.8087 1.1913 20 24 19.7572 4.2428 y = 16.5 + 0.162 x 13.10 x y Predicted (y ) Residuals (y- y ) 140 25 18.6597 6.3403 119 29 37.5229 -8.5229 103 46 51.8948 -5.8948 91 70 62.6737 7.3263 65 88 86.0280 1.9720 29 112 118.3648 -6.3648 24 128 122.8561 5.1439 y = 144.414 - 0.848 x


13.11 x y Predicted (y ) Residuals (y- y ) 12.5 148 144.2053 3.7947 3.7 55 10.0953 44.9047 21.6 338 282.8873 55.1127 60.0 994 868.0945 125.9055 37.6 541 526.7236 14.2764 6.1 89 46.6708 42.3292 16.8 126 209.7364 -83.7364 41.2 379 581.5868 -202.5868 y = -46.29 + 15.24x 13.12 x y Predicted (y ) Residuals (y- y ) 16 5 4.0259 0.9741 6 12 11.1722 0.8278 8 9 9.7429 -0.7429 4 15 12.6014 2.3986 7 7 10.4575 -3.4575 y = 15.46 - 0.715 x 13.13 x y Predicted (y ) Residuals (y- y ) 5 47 42.2756 4.7244 7 38 38.9836 -0.9836 11 32 32.3996 -0.3996 12 24 30.7537 -6.7537 19 22 19.2317 2.7683 25 10 9.3558 0.6442 y = 50.5056 - 1.6460 x

No apparent violation of assumptions


13.14 Miles (x) Cost y (y ) (y- y )

1,245 2.64 2.5376 .1024 425 2.31 2.3322 -.0222 1,346 2.45 2.5628 -.1128 973 2.52 2.4694 .0506 255 2.19 2.2896 -.0996 865 2.55 2.4424 .1076 1,080 2.40 2.4962 -.0962 296 2.37 2.2998 .0702

No apparent violation of assumptions 13.15

Error terms appear to be non independent


13.16

There appears to be a non constant error variance. 13.17

There appears to be nonlinear regression 13.18 The MINITAB Residuals vs. Fits graphic is strongly indicative of a violation of

the homoscedasticity assumption of regression. Because the residuals are very close together for small values of x, there is little variability in the residuals at the left end of the graph. On the other hand, for larger values of x, the graph flares out indicating a much greater variability at the upper end. Thus, there is a lack of homogeneity of error across the values of the independent variable.


13.19 SSE = Σy2 – b0Σy - b1ΣXY = 1,935 - (16.51)(97) - 0.1624(1767) = 46.5692

3

5692.46

2=

−=

n

SSEse = 3.94

Approximately 68% of the residuals should fall within ±1se. 3 out of 5 or 60% of the actually residuals in 11.13 fell within ± 1se. 13.20 SSE = Σy2 – b0Σy - b1ΣXY = 45,154 - 144.414(498) - (-.89824)(30,099) = SSE = 272.0

5

0.272

2=

−=

n

SSEse = 7.376

6 out of 7 = 85.7% fall within + 1se 7 out of 7 = 100% fall within + 2se 13.21 SSE = Σy2 – b0Σy - b1ΣXY = 1,587,328 - (-46.29)(2,670) - 15.24(107,610.4) = SSE = 70,940

6

940,70

2=

−=

n

SSEse = 108.7

Six out of eight (75%) of the sales estimates are within $108.7 million. 13.22 SSE = Σy2 – b0Σy - b1ΣXY = 524 - 15.46(48) - (-0.71462)(333) = 19.8885

3

8885.19

2=

−=

n

SSEse = 2.575

Four out of five (80%) of the estimates are within 2.5759 of the actual rate for bonds. This amount of error is probably not acceptable to financial analysts.


13.23 (y- y ) (y- y )2 4.7244 22.3200 -0.9836 .9675 -0.3996 .1597 -6.7537 45.6125 2.7683 7.6635 0.6442 .4150 Σ(y- y )2 = 77.1382 SSE = 2)ˆ(∑ − yy = 77.1382

4

1382.77

2=

−=

n

SSEse = 4.391

13.24 (y- y ) (y- y )2 .1023 .0105 -.0222 .0005 -.1128 .0127 .0506 .0026 -.0996 .0099 .1076 .0116 -.0962 .0093 .0702 .0049 Σ(y- y )2 = .0620 SSE = 2)ˆ(∑ − yy = .0620

6

0620.

2=

−=

n

SSEse = .1017

The model produces estimates that are ±.1017 or within about 10 cents 68% of the time. However, the range of milk costs is only 45 cents for this data.


13.25 Volume (x) Sales (y) 728.6 10.5 497.9 48.1 439.1 64.8 377.9 20.1 375.5 11.4 363.8 123.8 276.3 89.0 n = 7 Σx = 3059.1 Σy = 367.7 Σx2 = 1,464,071.97 Σy2 = 30,404.31 Σxy = 141,558.6 b1 = -.1504 b0 = 118.257 y = 118.257 - .1504x

SSE = Σy2 – b0Σy - b1ΣXY = 30,404.31 - (118.257)(367.7) - (-0.1504)(141,558.6) = 8211.6245

5

6245.8211

2=

−=

n

SSEse = 40.5256

This is a relatively large standard error of the estimate given the sales values (ranging from 10.5 to 123.8).

13.26 r2 =

5

)97(935,1

6399.461

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .123

This is a low value of r2

13.27 r2 =

7

)498(154,45

12.2721

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .972

This is a high value of r2


13.28 r2 =

8

)670,2(328,587,1

940,701

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .898

This value of r2 is relatively high

13.29 r2 =

5

)48(524

8885.191

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .685

This value of r2 is a modest value.

68.5% of the variation of y is accounted for by x but 31.5% is unaccounted for.

13.30 r2 =

6

)173(837,5

1384.771

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .909

This value is a relatively high value of r2. Almost 91% of the variability of y is accounted for by the x values. 13.31 CCI Median Income 116.8 37.415 91.5 36.770 68.5 35.501 61.6 35.047 65.9 34.700 90.6 34.942 100.0 35.887 104.6 36.306 125.4 37.005 Σx = 323.573 Σy = 824.9 Σx2 = 11,640.93413 Σy2 = 79,718.79 Σxy = 29,804.4505 n = 9


b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

9

)573.323(93413.640,11

9

)9.824)(573.323(4505.804,29

2

−

− =

b1 = 19.2204

b0 = 9

573.323)2204.19(

9

9.8241 −=− ∑∑

n

xb

n

y = -599.3674

y = -599.3674 + 19.2204 x SSE = Σy2 – b0Σy - b1ΣXY = 79,718.79 – (-599.3674)(824.9) – 19.2204(29,804.4505) = 1283.13435

7

13435.1283

2=

−=

n

SSEse = 13.539

r2 =

9

)9.824(79.718,79

13435.12831

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .688

13.32 sb =

5

)89(833.1

94.3

)( 222 −

=

−∑∑

n

xx

se = .2498

b1 = 0.162 Ho: β = 0 α = .05 Ha: β ≠ 0 This is a two-tail test, α/2 = .025 df = n - 2 = 5 - 2 = 3

t.025,3 = ±3.182

t = 2498.

0162.011 −=−

bs

b β = 0.65

Since the observed t = 0.65 < t.025,3 = 3.182, the decision is to fail to reject the null hypothesis.


13.33 sb =

7

)571(293,58

376.7

)( 222 −

=

−∑∑

n

xx

se = .068145

b1 = -0.898 Ho: β = 0 α = .01 Ha: β ≠ 0 Two-tail test, α/2 = .005 df = n - 2 = 7 - 2 = 5 t.005,5 = ±4.032

t = 068145.

0898.011 −−=−

bs

b β = -13.18

Since the observed t = -13.18 < t.005,5 = -4.032, the decision is to reject the null hypothesis.

13.34 sb =

8

)5.199(15.667,7

7.108

)( 222 −

=

−∑∑

n

xx

se = 2.095

b1 = 15.240 Ho: β = 0 α = .10 Ha: β ≠ 0 For a two-tail test, α/2 = .05 df = n - 2 = 8 - 2 = 6 t.05,6 = 1.943

t = 095.2

0240,1511 −=−

bs

b β = 7.27

Since the observed t = 7.27 > t.05,6 = 1.943, the decision is to reject the null hypothesis.


13.35 sb =

5

)41(421

575.2

)( 222 −

=

−∑∑

n

xx

se = .27963

b1 = -0.715 Ho: β = 0 α = .05 Ha: β ≠ 0 For a two-tail test, α/2 = .025 df = n - 2 = 5 - 2 = 3 t.025,3 = ±3.182

t = 27963.

0715.011 −−=−

bs

b β = -2.56

Since the observed t = -2.56 > t.025,3 = -3.182, the decision is to fail to reject the null hypothesis. 13.36 Analysis of Variance SOURCE df SS MS F Regression 1 5,165 5,165.00 1.95 Error 7 18,554 2,650.57 Total 8 23,718 Let α = .05 F.05,1,7 = 5.59 Since observed F = 1.95 < F.05,1,7 = 5.59, the decision is to fail to reject the null hypothesis. There is no overall predictability in this model.

t = 95.1=F = 1.40 t.025,7 = 2.365

Since t = 1.40 < t.025,7 = 2.365, the decision is to fail to reject the null hypothesis.

The slope is not significantly different from zero.


13.37 F = 8.26 with a p-value of .021. The overall model is significant at α = .05 but not at α = .01. For simple regression,

t = F = 2.8674 t.05,5 = 2.015 but t.01,5 = 3.365. The slope is significant at α = .05 but not at α = .01.

13.38 x0 = 25 95% confidence α/2 = .025 df = n - 2 = 5 - 2 = 3 t.025,3 = ±3.182

5

89== ∑n

xx = 17.8

Σx = 89 Σx2 = 1,833 se = 3.94 y = 16.5 + 0.162(25) = 20.55

y ± t /2,n-2 se

∑∑−

−+

n

xx

xx

n 22

20

)(

)(1

20.55 ± 3.182(3.94)

5

)89(833,1

)8.1725(

5

12

2

−

−+ = 20.55 ± 3.182(3.94)(.63903) =

20.55 ± 8.01 12.54 < E(y25) < 28.56


13.39 x0 = 100 For 90% confidence, α/2 = .05 df = n - 2 = 7 - 2 = 5 t.05,5 = ±2.015

7

571== ∑n

xx = 81.57143

Σx= 571 Σx2 = 58,293 Se = 7.377 y = 144.414 - .0898(100) = 54.614

y ± t /2,n-2 se

∑∑−

−++

n

xx

xx

n 22

20

)(

)(11 =

54.614 ± 2.015(7.377)

7

)571(293,58

)57143.81100(

7

11 2

2

−

−++ =

54.614 ± 2.015(7.377)(1.08252) = 54.614 ± 16.091 38.523 < y < 70.705 For x0 = 130, y = 144.414 - .0898(130) = 27.674

y ± t /2,n-2 se

∑∑−

−++

n

xx

xx

n 22

20

)(

)(11 =

27.674 ± 2.015(7.377)

7

)571(293,58

)57143.81130(

7

11 2

2

−

−++ =

27.674 ± 2.015(7.377)(1.1589) = 27.674 ± 17.227 10.447 < y < 44.901 The width of this confidence interval of y for x0 = 130 is wider that the confidence interval of y for x0 = 100 because x0 = 100 is nearer to the value of x = 81.57 than is x0 = 130.


13.40 x0 = 20 For 98% confidence, α/2 = .01 df = n - 2 = 8 - 2 = 6 t.01,6 = 3.143

8

5.199== ∑n

xx = 24.9375

Σx = 199.5 Σx2 = 7,667.15 Se = 108.8 y = -46.29 + 15.24(20) = 258.51

y ± t /2,n-2 se

∑∑−

−+

n

xx

xx

n 22

20

)(

)(1

258.51 ± (3.143)(108.8)

8

)5.199(15.667,7

)9375.2420(

8

12

2

−

−+

258.51 ± (3.143)(108.8)(0.36614) = 258.51 ± 125.20 133.31 < E(y20) < 383.71 For single y value:

y ± t /2,n-2 se

∑∑−

−++

n

xx

xx

n 22

20

)(

)(11

258.51 ± (3.143)(108.8)

8

)5.199(15.667,7

)9375.2420(

8

11 2

2

−

−++

258.51 ± (3.143)(108.8)(1.06492) = 258.51 ± 364.16 -105.65 < y < 622.67 The confidence interval for the single value of y is wider than the confidence interval for the average value of y because the average is more towards the middle and individual values of y can vary more than values of the average.


13.41 x0 = 10 For 99% confidence α/2 = .005 df = n - 2 = 5 - 2 = 3 t.005,3 = 5.841

5

41== ∑n

xx = 8.20

Σx = 41 Σx2 = 421 Se = 2.575 y = 15.46 - 0.715(10) = 8.31

y ± t /2,n-2 se

∑∑−

−+

n

xx

xx

n 22

20

)(

)(1

8.31 ± 5.841(2.575)

5

)41(421

)2.810(

5

12

2

−

−+ =

8.31 ± 5.841(2.575)(.488065) = 8.31 ± 7.34 0.97 < E(y10) < 15.65 If the prime interest rate is 10%, we are 99% confident that the average bond rate is between 0.97% and 15.65%.


13.42 x y 5 8 7 9 3 11 16 27 12 15 9 13 Σx = 52 Σx2 = 564 Σy = 83 Σy2 = 1,389 b1 = 1.2853 Σxy = 865 n = 6 b0 = 2.6941

a) y = 2.6941 + 1.2853 x b) y (Predicted Values) (y- y ) residuals

9.1206 -1.1206 11.6912 -2.6912 6.5500 4.4500 23.2588 3.7412 18.1176 -3.1176 14.2618 -1.2618 c) (y- y )2 1.2557 7.2426 19.8025 13.9966 9.7194 1.5921 SSE = 53.6089

4

6089.53

2=

−=

n

SSEse = 3.661

d) r2 =

6

)83(389,1

6089.531

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .777


e) Ho: β = 0 α = .01 Ha: β ≠ 0 Two-tailed test, α/2 = .005 df = n - 2 = 6 - 2 = 4 t.005,4 = ±4.604

sb =

6

)52(564

661.3

)( 222 −

=

−∑∑

n

xx

se = .34389

t = 34389.

02853.111 −=−

bs

b β = 3.74

Since the observed t = 3.74 < t.005,4 = 4.604, the decision is to fail to reject the null hypothesis. f) The r2 = 77.74% is modest. There appears to be some prediction with this model.

The slope of the regression line is not significantly different from zero using α = .01. However, for α = .05, the null hypothesis of a zero slope is rejected. The standard error of the estimate, se = 3.661 is not particularly small given the range of values for y (11 - 3 = 8).

13.43 x y 53 5 47 5 41 7 50 4 58 10 62 12 45 3 60 11 Σx = 416 Σx2 = 22,032 Σy = 57 Σy2 = 489 b1 = 0.355 Σxy = 3,106 n = 8 b0 = -11.335 a) y = -11.335 + 0.355 x


b) y (Predicted Values) (y- y ) residuals 7.48 -2.48 5.35 -0.35 3.22 3.78 6.415 -2.415 9.255 0.745 10.675 1.325 4.64 -1.64 9.965 1.035 c) (y- y )2 6.1504 .1225 14.2884 5.8322 .5550 1.7556 2.6896 1.0712 SSE = 32.4649

d) se = 6

4649.32

2=

−=

n

SSEse = 2.3261

e) r2 =

8

)57(489

4649.321

)(1

222 −

−=−

−

∑∑

n

yy

SSE = .608

f) Ho: β = 0 α = .05 Ha: β ≠ 0 Two-tailed test, α/2 = .025 df = n - 2 = 8 - 2 = 6 t.025,6 = ±2.447

sb =

8

)416(032,22

3261.2

)( 222 −

=

−∑∑

n

xx

se = 0.116305


t = 116305.

03555.011 −=−

bs

b β = 3.05

Since the observed t = 3.05 > t.025,6 = 2.447, the decision is to reject the null hypothesis. The population slope is different from zero. g) This model produces only a modest r2 = .608. Almost 40% of the variance of Y is unaccounted for by X. The range of Y values is 12 - 3 = 9 and the standard error of the estimate is 2.33. Given this small range, the se is not small. 13.44 Σx = 1,263 Σx2 = 268,295 Σy = 417 Σy2 = 29,135 Σxy = 88,288 n = 6 b0 = 25.42778 b1 = 0.209369 SSE = Σy2 - b0Σy - b1Σxy = 29,135 - (25.42778)(417) - (0.209369)(88,288) = 46.845468

r2 = 5.153

845468.461

)(1

22

−=−

−

∑∑

n

yy

SSE = .695

Coefficient of determination = r2 = .695 13.45 a) x0 = 60 Σx = 524 Σx2 = 36,224 Σy = 215 Σy2 = 6,411 b1 = .5481 Σxy = 15,125 n = 8 b0 = -9.026 se = 3.201 95% Confidence Interval α/2 = .025 df = n - 2 = 8 - 2 = 6 t.025,6 = ±2.447 y = -9.026 + 0.5481(60) = 23.86


8

524== ∑n

xx = 65.5

y ± tα /2,n-2 se

∑∑−

−+

n

xx

xx

n 22

20

)(

)(1

23.86 + 2.447((3.201)

8

)524(224,36

)5.6560(

8

12

2

−

−+

23.86 + 2.447(3.201)(.375372) = 23.86 + 2.94 20.92 < E(y60) < 26.8 b) x0 = 70 y 70 = -9.026 + 0.5481(70) = 29.341

y + tα/2,n-2 se

∑∑−

−++

n

xx

xx

n 22

20

)(

)(11

29.341 + 2.447(3.201)

8

)524(224,36

)5.6570(

8

11 2

2

−

−++

29.341 + 2.447(3.201)(1.06567) = 29.341 + 8.347 20.994 < y < 37.688 c) The confidence interval for (b) is much wider because part (b) is for a single value of y which produces a much greater possible variation. In actuality, x0 = 70 in part (b) is slightly closer to the mean (x) than x0 = 60. However, the width of the single interval is much greater than that of the average or expected y value in part (a).


13.46 Σy = 267 Σy2 = 15,971 Σx = 21 Σx2 = 101 Σxy = 1,256 n = 5 b0 = 9.234375 b1 = 10.515625 SSE = Σy2 - b0Σy - b1Σxy = 15,971 - (9.234375)(267) - (10.515625)(1,256) = 297.7969

r2 = 2.713,1

7969.2971

)(1

22

−=−

−

∑∑

n

yy

SSE = .826

If a regression model would have been developed to predict number of cars sold by the number of sales people, the model would have had an r2 of 82.6%. The same would hold true for a model to predict number of sales people by the number of cars sold. 13.47 n = 12 Σx = 548 Σx2 = 26,592 Σy = 5940 Σy2 = 3,211,546 Σxy = 287,908 b1 = 10.626383 b0 = 9.728511 y = 9.728511 + 10.626383 x SSE = Σy2 - b0Σy - b1Σxy = 3,211,546 - (9.728511)(5940) - (10.626383)(287,908) = 94337.9762

10

9762.337,94

2=

−=

n

SSEse = 97.1277

r2 = 246,271

9762.337,941

)(1

22

−=−

−

∑∑

n

yy

SSE = .652


t =

12

)548(592,26

1277.970626383.10

2

−

− = 4.33

If α = .01, then t.005,10 = 3.169. Since the observed t = 4.33 > t.005,10 = 3.169, the decision is to reject the null hypothesis. 13.48 Sales(y) Number of Units(x) 17.1 12.4 7.9 7.5 4.8 6.8 4.7 8.7 4.6 4.6 4.0 5.1 2.9 11.2 2.7 5.1 2.7 2.9 Σy = 51.4 Σy2 = 460.1 Σx = 64.3 Σx2 = 538.97 Σxy = 440.46 n = 9 b1 = 0.92025 b0 = -0.863565 SSE = Σy2 - b0Σy - b1Σxy = 460.1 - (-0.863565)(51.4) - (0.92025)(440.46) =

r2 = 55.166

153926.991

)(1

22

−=−

−

∑∑

n

yy

SSE = .405


13.49 1977 2000 581 571 213 220 668 492 345 221 1476 1760 1776 5750 Σx= 5059 Σy = 9014 Σx2 = 6,280,931 Σy2 = 36,825,446 Σxy = 13,593,272 n = 6

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

6

)5059(931,280,6

6

)9014)(5059(272,593,13

2

−

− = 2.97366

b0 = 6

5059)97366.2(

6

90141 −=− ∑∑

n

xb

n

y = -1004.9575

y = -1004.9575 + 2.97366 x for x = 700: y = 1076.6044

y + tα/2,n-2se

∑∑−

−+

n

xx

xx

n 22

20

)(

)(1

α = .05, t.025,4 = 2.776 x0 = 700, n = 6

x = 843.167 SSE = Σy2 – b0Σy –b1Σxy = 36,825,446 – (-1004.9575)(9014) – (2.97366)(13,593,272) = 5,462,363.69


4

69.363,462,5

2=

−=

n

SSEse = 1168.585

Confidence Interval =

1076.6044 + (2.776)(1168.585)

6

)5059(931,280,6

)167.843700(

6

12

2

−

−+ =

1076.6044 + 1364.1632 -287.5588 to 2440.7676 H0: β1 = 0 Ha: β1 ≠ 0 α = .05 df = 4 Table t.025,4 = 2.132

t = 8231614.

9736.2

833.350,015,2

585.116809736.201 =−=

−

bs

b = 3.6124

Since the observed t = 3.6124 > t.025,4 = 2.132, the decision is to reject the null hypothesis. 13.50 Σx = 11.902 Σx2 = 25.1215 Σy = 516.8 Σy2 = 61,899.06 b1 = 66.36277 Σxy = 1,202.867 n = 7 b0 = -39.0071 y = -39.0071 + 66.36277 x SSE = Σy2 - b0 Σy - b1 Σxy SSE = 61,899.06 - (-39.0071)(516.8) - (66.36277)(1,202.867)

SSE = 2,232.343

5

343.232,2

2=

−=

n

SSEse = 21.13


r2 =

7

)8.516(06.899,61

343.232,21

)(1

222 −

−=−

−

∑∑

n

yy

SSE = 1 - .094 = .906

13.51 Σx = 44,754 Σy = 17,314 Σx2 = 167,540,610 Σy2 = 24,646,062 n = 13 Σxy = 59,852,571

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

13

)754,44(610,540,167

13

)314,17)(754,44(571,852,59

2

−

− = .01835

b0 = 13

754,44)01835(.

13

314,171 −=− ∑∑

n

xb

n

y = 1268.685

y = 1268.685 + .01835 x

r2 for this model is .002858. There is no predictability in this model. Test for slope: t = 0.18 with a p-value of 0.8623. Not significant 13.52 Σx = 323.3 Σy = 6765.8 Σx2 = 29,629.13 Σy2 = 7,583,144.64 Σxy = 339,342.76 n = 7

b1 =

∑∑

∑∑ ∑

−

−=

n

xx

n

yxxy

SS

SS

x

xy

22

)( =

7

)3.323(13.629,29

7

)8.6765)(3.323(76.342,339

2

−

− = 1.82751

b0 = 7

3.323)82751.1(

7

8.67651 −=− ∑∑

n

xb

n

y = 882.138

y = 882.138 + 1.82751 x SSE = Σy2 –b0Σy –b1Σxy


= 7,583,144.64 –(882.138)(6765.8) –(1.82751)(339,342.76) = 994,623.07

5

07.623,994

2=

−=

n

SSEse = 446.01

r2 =

7

)8.6765(64.144,583,7

07.623,9941

)(1

222 −

−=−

−

∑∑

n

yy

SSE = 1 - .953 = .047

H0: β = 0 Ha: β ≠ 0 α = .05 t.025,5 = 2.571

SSxx = ( )

7

)3.323(13.629,29

22

2 −=−∑∑

n

xx = 14,697.29

t =

29.697,14

01.446082751.101 −=

−

xx

e

SS

sb

= 0.50

Since the observed t = 0.50 < t.025,5 = 2.571, the decision is to fail to reject the null hypothesis. 13.53 Let Water use = y and Temperature = x Σx = 608 Σx2 = 49,584 Σy = 1,025 Σy2 = 152,711 b1 = 2.40107 Σxy = 86,006 n = 8 b0 = -54.35604 y = -54.35604 + 2.40107 x

y 100 = -54.35604 + 2.40107(100) = 185.751 SSE = Σy2 - b0 Σy - b1 Σxy SSE = 152,711 - (-54.35604)(1,025) - (2.40107)(86,006) = 1919.5146

6

5146.919,1

2=

−=

n

SSEse = 17.886


r2 =

8

)1025(711,152

5145.919,11

)(1

222 −

−=−

−

∑∑

n

yy

SSE = 1 - .09 = .91

Testing the slope: Ho: β = 0

Ha: β ≠ 0 α = .01

Since this is a two-tailed test, α/2 = .005 df = n - 2 = 8 - 2 = 6

t.005,6 = ±3.707

sb =

8

)608(584,49

886.17

)( 222 −

=

−∑∑

n

xx

se = .30783

t = 30783.

040107.211 −=−

bs

b β = 7.80

Since the observed t = 7.80 < t.005,6 = 3.707, the decision is to reject the null

hypothesis.

13.54 a) The regression equation is: y = 67.2 – 0.0565 x

b) For every unit of increase in the value of x, the predicted value of y will decrease by -.0565. c) The t ratio for the slope is –5.50 with an associated p-value of .000. This is significant at α = .10. The t ratio negative because the slope is negative and the numerator of the t ratio formula equals the slope minus zero. d) r2 is .627 or 62.7% of the variability of y is accounted for by x. This is only a modest proportion of predictability. The standard error of the estimate is 10.32. This is best interpreted in light of the data and the magnitude of the data. e) The F value which tests the overall predictability of the model is 30.25. For simple regression analysis, this equals the value of t2 which is (-5.50)2. f) The negative is not a surprise because the slope of the regression line is also


negative indicating an inverse relationship between x and y. In addition, taking the square root of r2 which is .627 yields .7906 which is the magnitude of the value of r considering rounding error.

13.55 The F value for overall predictability is 7.12 with an associated p-value of .0205 which is significant at α = .05. It is not significant at alpha of .01. The coefficient of determination is .372 with an adjusted r2 of .32. This represents very modest predictability. The standard error of the estimate is 982.219 which in units of 1,000 laborers means that about 68% of the predictions are within 982,219 of the actual figures. The regression model is: Number of Union Members = 22,348.97 - 0.0524 Labor Force. For a labor force of 100,000 (thousand, actually 100 million), substitute x = 100,000 and get a predicted value of 17,108.97 (thousand) which is actually 17,108,970 union members.

13.56 The Residual Model Diagnostics from MINITAB indicate a relatively healthy set

of residuals. The Histogram indicates that the error terms are generally normally distributed. This is confirmed by the nearly straight line Normal Plot of Residuals. The I Chart indicates a relatively homogeneous set of error terms throughout the domain of x values. This is confirmed by the Residuals vs. Fits graph. This residual diagnosis indicates no assumption violations.

13 ch ken black solution

Technology