Top Banner
Chapter 11: Inferential methods in Regression and Correlation http://jonfwilkins.blogspot.com/ 2011_08_01_archive.html
44
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 11: Inferential methods in Regression and Correlation .

Chapter 11: Inferential methods in Regression and Correlation

http://jonfwilkins.blogspot.com/2011_08_01_archive.html

Page 2: Chapter 11: Inferential methods in Regression and Correlation .

Example: distribution of y

The relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment has a linear regression equation of y = 20.11 – 0.526x + e with σ = 6.52.

a) What is the mean value of y when x = 30? x = 50? x = 70?

b) What is the standard deviation of y when x = 30? x = 50? x = 70?

Page 3: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and The cetane number is a critical property in specifying the

ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil.

a) What are the point estimates of and ?b) What is a point estimate of the true average cetane

number whose iodine value is 100?

Page 4: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and (cont)x: 132.0 129.0 120.0 113.2 105.0 92.0 84.0y: 46.0 48.0 51.0 52.1 54.0 52.0 59.0x: 83.2 88.4 59.0 80.0 81.5 71.0 69.2y: 58.7 61.6 64.0 61.4 54.6 58.8 58.0

a) What are the point estimates of and ?

Page 5: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and (cont)

50 60 70 80 90 100 110 120 130 14045

50

55

60

65

Iodine (g)

ceta

ne n

umbe

r

Page 6: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and (cont)x: 132.0 129.0 120.0 113.2 105.0 92.0 84.0y: 46.0 48.0 51.0 52.1 54.0 52.0 59.0x: 83.2 88.4 59.0 80.0 81.5 71.0 69.2y: 58.7 61.6 64.0 61.4 54.6 58.8 58.0

ix 1307.5 iy 779.22ix 128913.93 2

iy 43745.22i ix y 71347.30

Page 7: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and (cont)

b) What is a point estimate of the true average cetane number whose iodine value is 100?

50 60 70 80 90 100 110 120 130 14045

50

55

60

65

f(x) = − 0.209387416490154 x + 75.2124319329198

Iodine (g)

ceta

ne n

umbe

r

Page 8: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and The cetane number is a critical property in specifying the

ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil.

c) Find the point estimate of the error standard deviation, σ.d) What proportion of the observed variation in y can be

attributed to the simple linear regression relationship between x and y?

Page 9: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and (cont)

ix 1307.5 iy 779.22ix 128913.93 2

iy 43745.22i ix y 71347.30

c) Find the point estimate of the error standard deviation, σ.

d) What proportion of the observed variation in y can be attributed to the simple linear regression relationship between x and y?

Page 10: Chapter 11: Inferential methods in Regression and Correlation .

Example: Estimating and (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane

Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 298.25443 298.25443 45.35 <.0001 Error 12 78.91986 6.57665 Corrected Total 13 377.17429

Root MSE 2.56450 R-Square 0.7908 Dependent Mean 55.65714 Adj R-Sq 0.7733 Coeff Var 4.60767

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 75.21243 2.98363 25.21 <.0001 iodine 1 -0.20939 0.03109 -6.73 <.0001

Page 11: Chapter 11: Inferential methods in Regression and Correlation .

Example: CIThe cetane number is a critical property in specifying

the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil.

e) What is the 95% CI for the true slope?

Page 12: Chapter 11: Inferential methods in Regression and Correlation .

Example: Output (SAS) The SAS System 09:20 Thursday, November 10, 2011 3 The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 298.25443 298.25443 45.35 <.0001 Error 12 78.91986 6.57665 Corrected Total 13 377.17429

Root MSE 2.56450 R-Square 0.7908 Dependent Mean 55.65714 Adj R-Sq 0.7733 Coeff Var 4.60767 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits

Intercept 1 75.21243 2.98363 25.21 <.0001 68.71165 81.71321 iodine 1 -0.20939 0.03109 -6.73 <.0001 -0.27713 -0.14164

Sxx = 6802.7693

Page 13: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis testThe cetane number is a critical property in specifying

the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil.

f) Is the model useful (that is, is there a useful linear relationship between x and y)?

Page 14: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane

Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 298.25443 298.25443 45.35 <.0001 Error 12 78.91986 6.57665 Corrected Total 13 377.17429

Root MSE 2.56450 R-Square 0.7908 Dependent Mean 55.65714 Adj R-Sq 0.7733 Coeff Var 4.60767

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 75.21243 2.98363 25.21 <.0001 iodine 1 -0.20939 0.03109 -6.73 <.0001

Page 15: Chapter 11: Inferential methods in Regression and Correlation .

Summary SlideSource df SS MS

Model(Regression) 1 SSR

Error n - 2 SST – b Sxy

Total n - 1 Syy

SSE SSE

dfe n 2

2iˆ(y y)

2

i2yy i

yS y

n

i ixy i i

x yS x y

n xy

xx

Sb

S

Page 16: Chapter 11: Inferential methods in Regression and Correlation .

Example: ANOVA (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane

Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 298.25443 298.25443 45.35 <.0001 Error 12 78.91986 6.57665 Corrected Total 13 377.17429

Root MSE 2.56450 R-Square 0.7908 Dependent Mean 55.65714 Adj R-Sq 0.7733 Coeff Var 4.60767

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 75.21243 2.98363 25.21 <.0001 iodine 1 -0.20939 0.03109 -6.73 <.0001

Page 17: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for The cetane number is a critical property in specifying

the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil.

g) Is the model useful (that is, is there a useful linear relationship between x and y) using the population correlation coefficient?

Page 18: Chapter 11: Inferential methods in Regression and Correlation .

Example: ANOVA (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane

Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 298.25443 298.25443 45.35 <.0001 Error 12 78.91986 6.57665 Corrected Total 13 377.17429

Root MSE 2.56450 R-Square 0.7908 Dependent Mean 55.65714 Adj R-Sq 0.7733 Coeff Var 4.60767

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 75.21243 2.98363 25.21 <.0001 iodine 1 -0.20939 0.03109 -6.73 <.0001

Page 19: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2)In some locations, there is a strong association

between concentrations for two different pollutants. The following data consists of the concentrations of x = ozone (ppm) and y = secondary carbon concentration (μg/m3).

x 0.066 0.088 0.120 0.050 0.162 0.186 0.057 0.100y 4.6 11.6 9.5 6.3 13.8 15.4 2.5 11.8x 0.112 0.055 0.154 0.074 0.111 0.140 0.071 0.110y 8.0 7.0 20.6 16.6 9.2 17.9 2.8 13.0

Page 20: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2)

x

y

Page 21: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2)

Page 22: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2)

Using the population correlation coefficient, is this model useful?

ix 1.656 iy 170.62ix 0.196912 2

iy 2253.56i ix y 20.0397

The summary statistics are:

Page 23: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 222.47934 222.47934 14.69 0.0018 Error 14 212.05816 15.14701 Corrected Total 15 434.53750

Root MSE 3.89192 R-Square 0.5120 Dependent Mean 10.66250 Adj R-Sq 0.4771 Coeff Var 36.50097

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.99801 2.70292 0.37 0.7175 x 1 93.37670 24.36448 3.83 0.0018

Page 24: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2) The REG Procedure Model: MODEL1 Dependent Variable: cetane

Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 298.25443 298.25443 45.35 <.0001 Error 12 78.91986 6.57665 Corrected Total 13 377.17429

Root MSE 2.56450 R-Square 0.7908 Dependent Mean 55.65714 Adj R-Sq 0.7733 Coeff Var 4.60767

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 75.21243 2.98363 25.21 <.0001 iodine 1 -0.20939 0.03109 -6.73 <.0001

Page 25: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2)

Page 26: Chapter 11: Inferential methods in Regression and Correlation .

Example: Hypothesis test for (2)

Page 27: Chapter 11: Inferential methods in Regression and Correlation .

Example: Multiple Linear RegressionIt is important to know how long a tool will last (min) in

the industrial setting. The cutting tool in this study is used to cut a particular type and size of cold-rolled steel. The predictors of interest are x1 = cutting speed (feet/min), x2 = feed rate (in/revolution) and x3 = depth of cut (in). The predicted model is

y = 101.765 – 0.0958 x1 – 667.972 x2 - 472.304 x3 + ea) What is the mean life of a tool that is being used to

cut depths of 0.03 inch at a speed rate of 450 feet/min with a feed rate of 0.01 in/revolution?

b) What is the interpretation of 1 = -0.0958? Of 2 = -667.972? Of 3 = -472.304?

Page 28: Chapter 11: Inferential methods in Regression and Correlation .

Example: Polynomial RegressionSuppose the mean daily peak load (MW) for a power

plant and the maximum outdoor temperature (oF) for a sample of 10 days is given below.

a) What is the estimated regression line using a quadratic regression model (besides the equation of the line, include the values of adj. r2 and se?

b) Using the line, predict the required peak power if the temperature is 98 oF?

xi (oF) 95 82 90 81 99 100 93 95 93 97yi

(MW)214 152 156 129 254 266 210 204 213 150

Page 29: Chapter 11: Inferential methods in Regression and Correlation .

Example: Polynomial Regression (SAS)data newpower;set power;temp2 = temp*temp; proc reg data=newpower;model load=temp temp2;output out=fit r=res;run; Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > F

Model 2 18089 9044.26725 53.88 <.0001Error 7 1175.06549 167.86650Corrected Total 9 19264

Root MSE 12.95633 R-Square 0.9390 Dependent Mean 194.80000 Adj R-Sq 0.9216 Coeff Var 6.65109

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 1784.18833 944.12303 1.89 0.1007 temp 1 -42.38624 21.00079 -2.02 0.0833 temp2 1 0.27216 0.11634 2.34 0.0519

Page 30: Chapter 11: Inferential methods in Regression and Correlation .

Example: Polynomial Regression (cont)

b) Using the line, predict the required peak power if the temperature is 98 oF?

line adj r2 se

linear y=-419.8+6.7 temp 0.88 16.18quadratic y=1784-42.4temp+0.27temp2 0.92 12.96

Page 31: Chapter 11: Inferential methods in Regression and Correlation .

Residual Plots

80 85 90 95 100-30

-10

10

30

Temperature oF

Resi

dual

Page 32: Chapter 11: Inferential methods in Regression and Correlation .

Interaction Effect

Page 33: Chapter 11: Inferential methods in Regression and Correlation .

I love statistics!Thank you for not eating me!

Page 34: Chapter 11: Inferential methods in Regression and Correlation .

Example: Multiple RegressionQualitative Predictors

A study is conducted to determine the effects of x1 = company size and x2 = the presence (1) or absence (0) of a safety program on y = the number of work hours lost due to work-related accidents (thousands). 20 companies with no active safety programs were randomly chosen and 20 companies with active safety programs were randomly chosen. The SAS file (qualpred.txt) is on the class notes web site. The estimated regression line is

y = 31.6244 + 0.01428 x1 – 58.0779 x2 + e

What are the interpretations of 1 = 0.01428 and 2 = -58.0779?

Page 35: Chapter 11: Inferential methods in Regression and Correlation .

Conceptual Understanding

X1

X2

X3

Total Variation of Y

Page 36: Chapter 11: Inferential methods in Regression and Correlation .

ANOVA table - MRRSource df SS MS

Model(Regression) k SSM

(from data)

Error n – k - 1 SSE(from data)

Total n - 1 SST(from data)

SSE SSE

dfe n k 1

SSM SSM

dfm k

Page 37: Chapter 11: Inferential methods in Regression and Correlation .

Example: Multiple Linear RegressionIt is important to know how long a tool will last (min) in

the industrial setting. The cutting tool in this study is used to cut a particular type and size of cold-rolled steel. The predictors of interest are x1 = cutting speed (feet/min), x2 = feed rate (in/revolution) and x3 = depth of cut (in).

a) Is there a useful linear relationship between the cutting tool lifetime and the predictors?

Page 38: Chapter 11: Inferential methods in Regression and Correlation .

Example: MLR (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr >

F

Model 3 2743.82814 914.60938 20.93 <.0001

Error 20 874.13019 43.70651 Corrected Total 23 3617.95833

Root MSE 6.61109 R-Square 0.7584 Dependent Mean 38.54167 Adj R-Sq 0.7222 Coeff Var 17.15310

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 101.76536 8.33310 12.21 <.0001 speed 1 -0.09578 0.01426 -6.72 <.0001 feed 1 -667.97241 386.23081 -1.73 0.0991 depth 1 -472.30426 161.81434 -2.92 0.0085

Page 39: Chapter 11: Inferential methods in Regression and Correlation .

Example: MLR (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr >

F

Model 3 2743.82814 914.60938 20.93 <.0001

Error 20 874.13019 43.70651 Corrected Total 23 3617.95833

Root MSE 6.61109 R-Square 0.7584 Dependent Mean 38.54167 Adj R-Sq 0.7222 Coeff Var 17.15310

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 101.76536 8.33310 12.21 <.0001 speed 1 -0.09578 0.01426 -6.72 <.0001 feed 1 -667.97241 386.23081 -1.73 0.0991 depth 1 -472.30426 161.81434 -2.92 0.0085

Page 40: Chapter 11: Inferential methods in Regression and Correlation .

Conceptual Understanding

X1

X2

X3

Total Variation of Y

Page 41: Chapter 11: Inferential methods in Regression and Correlation .

Example: MLR (backwards elimination) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr >

F

Model 3 2743.82814 914.60938 20.93 <.0001

Error 20 874.13019 43.70651 Corrected Total 23 3617.95833

Root MSE 6.61109 R-Square 0.7584 Dependent Mean 38.54167 Adj R-Sq 0.7222 Coeff Var 17.15310

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 101.76536 8.33310 12.21 <.0001 speed 1 -0.09578 0.01426 -6.72 <.0001 feed 1 -667.97241 386.23081 -1.73 0.0991 depth 1 -472.30426 161.81434 -2.92 0.0085

Page 42: Chapter 11: Inferential methods in Regression and Correlation .

Example: MLR (backwards elimination) (cont)

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 2613.09992 1306.54996 27.30 <.0001 Error 21 1004.85841 47.85040 Corrected Total 23 3617.95833

Root MSE 6.91740 R-Square 0.7223 Dependent Mean 38.54167 Adj R-Sq 0.6958 Coeff Var 17.94784

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 95.88869 7.96137 12.04 <.0001 speed 1 -0.09543 0.01492 -6.40 <.0001 depth 1 -500.32482 168.46077 -2.97 0.0073

Page 43: Chapter 11: Inferential methods in Regression and Correlation .

Example: MLR (backwards elimination) (cont)

full w/o feedline y = 101.77 – 0.096

speed - 667.97 feed – 472.30 depth

y = 95.89 – 0.095 speed – 500.32 depth

R2 0.7584 0.7223adj R2 0.7222 0.6958ANOVA table

Model 3 2743.83 914.61Error 20 874.13 43.71Total 23 3617.96

Model 2 2613.10 1306.55Error 21 1004.86 47.85Total 23 3617.96

Page 44: Chapter 11: Inferential methods in Regression and Correlation .

Example: MLR (backwards elimination) (cont)

full full - P w/o w/o - PF – test 20.93 <0.0001 27.30 <0.0001t-tests:speed -6.72 <0.0001 -6.40 <0.0001t- tests: feed -1.73 0.0991t-tests:depth -2.92 0.0085 -2.97 0.0073