M24- Std Error & r-square 1 Department of ISM, University of Alabama, 1992-2003 Lesson Objectives Understand how to calculate and interpret the “r-square” value. Understand how to calculate and interpret the “standard error of regression”. Learn more about doing regression in Minitab.
154
Embed
M24- Std Error & r-square 1 Department of ISM, University of Alabama, 1992-2003 Lesson Objectives Understand how to calculate and interpret the “r-square”
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
M24- Std Error & r-square 1 Department of ISM, University of Alabama, 1992-2003
Lesson Objectives
Understand how to calculate and interpret the “r-square” value.
Understand how to calculate and interpretthe “standard error of regression”.
Learn more about doing regression in Minitab.
M24- Std Error & r-square 2 Department of ISM, University of Alabama, 1992-2003
Two measures of Two measures of “How Well Does the Line “How Well Does the Line
Fit the Data?”Fit the Data?”
Two measures of Two measures of “How Well Does the Line “How Well Does the Line
Fit the Data?”Fit the Data?”
1. Standard Error of Estimation,
= SQRT of (Mean Square Error)
2. r- Square
M24- Std Error & r-square 3 Department of ISM, University of Alabama, 1992-2003
Variation in the Variation in the YY values values
SST = SSR + SSE
total = variation + variationvariation accounted unaccounted in Y for by the for by the regression regression
can be split into identifiable parts:
M24- Std Error & r-square 4 Department of ISM, University of Alabama, 1992-2003
Y
X-axis
Without X variable information:
SST is the sum of squared deviationsfrom the mean of Y.
Y
Note: This is the concept.
You will NOT calculate
this way.
M24- Std Error & r-square 5 Department of ISM, University of Alabama, 1992-2003
Y^Using X variable information:Y
X-axis
SSE is the sum of squared deviations from the regression line.
Note: This is the concept.
You will NOT calculate
this way.
Each deviationis a “residual”.
M24- Std Error & r-square 6 Department of ISM, University of Alabama, 1992-2003
Calculations
SST = (n–1)sy2
SSE
SSR =
Total Variation:
Unaccounted forby regression:
Accounted forby regression:
e 2i=
SST - SSE
3=
M24- Std Error & r-square 7 Department of ISM, University of Alabama, 1992-2003
Weight vs. Height example:
SSE = 868.06
SST = 4858.00
SSR =
See file M22 &
file M23; or
use computer
output!
See file M22 &
file M23; or
use computer
output!
Example 1, continued
M24- Std Error & r-square 8 Department of ISM, University of Alabama, 1992-2003
n - 2
e 2 i
Mean Square Error (MSE)
MSE =
Example 1, continued
M24- Std Error & r-square 9 Department of ISM, University of Alabama, 1992-2003
Mean Square Error (MSE)
SSE n - 2MSE =
Standard Error of Estimation:Standard Error of Estimation:
MSE = 289.3 = 17.0 lb.
Estimate of “Std. Dev. around the fitted line.”
=
=
Example 1, continued
M24- Std Error & r-square 10 Department of ISM, University of Alabama, 1992-2003
r 2 = the “r-square” value r 2 = the “r-square” value
“is the fraction of the total variation of Y accounted for by using regression.”
variation of Y “accounted for”
total variation of Yr
2 =
SSRSSR
SSTSSTrr
22 = =or
M24- Std Error & r-square 11 Department of ISM, University of Alabama, 1992-2003
0 0 rr22 1.0 1.0
r2 = 0.0 no regression effect;X is NOT useful.
r2 = 1.0 perfect fit to the data;X is USEFUL!
M24- Std Error & r-square 12 Department of ISM, University of Alabama, 1992-2003
Calculating r2, for Wt vs. Ht
or, have the computer do it for you!
or, have the computer do it for you!
SSR
SSTr2 =
3989.94
4858= = .8213
Example 1, continued
M24- Std Error & r-square 13 Department of ISM, University of Alabama, 1992-2003
Equivalently,
r2 = 1.0 - SSE
SST
total variation
“UNaccounted for”
M24- Std Error & r-square 14 Department of ISM, University of Alabama, 1992-2003
r 2 (correlation)2
= .8213 = (.9063)2
Equivalently,
r2 is also called the “coefficient of determination”“coefficient of determination”r2 is also called the “coefficient of determination”“coefficient of determination”
For the weight-height data:
Example 1, continued
M24- Std Error & r-square 15 Department of ISM, University of Alabama, 1992-2003
For the weight-height data:
““82.1% of the total variation 82.1% of the total variation of the of the body weightsbody weights is is accounted for by using accounted for by using
heightheight as a as a predictor variable.”predictor variable.”
r 2
= .8213 Interpretation:
Example 1, continued
L.O.P.
M24- Std Error & r-square 16 Department of ISM, University of Alabama, 1992-2003
“ “ % of the total variation % of the total variation of of the the YY-variable-variable is is
accounted for by using accounted for by using thethe XX-variable-variable as a as a predictor variable.”predictor variable.”
r 2
interpretation in general:
L.O.P.
M24- Std Error & r-square 17 Department of ISM, University of Alabama, 1992-2003
Std. Error of Estimation:
MSE = 289.4 = 17.0 lb.
““The estimated std. dev. ofThe estimated std. dev. ofbody weights body weights around thearound theregression lineregression line is 17.0 pounds.” is 17.0 pounds.”
Interpretation:
Example 1, continued
L.O.P.
M24- Std Error & r-square 18 Department of ISM, University of Alabama, 1992-2003
““The estimated std. dev. ofThe estimated std. dev. ofthe the YY-variable-variable around the around theregression line is regression line is unitsunits.”.”
L.O.P.
estimation the regression variation around the regression line
interpretation in general:
Std. Error of
M24- Std Error & r-square 19 Department of ISM, University of Alabama, 1992-2003
Regression
Error
Total
Source ofVariation
degrees offreedom
Sum ofSquares
MeanSquares
F-Ratio
1*
n – 2**
n - 1
* Number of X-variables used, “k”** n – 1 - k
SSR
SSE
SST
MSR
MSE
SY2
Source DF SS MS =SS
dfF =
MSR
MSE
F
Analysis of Variance Table
Analysis of Variance Table
Regression
Error
Total
Source ofVariation
degrees offreedom
Sum ofSquares
MeanSquares
F-Ratio
1
3
4
3989.94
868.06
4858.00
3989.94
289.35
1214.50
Source DF SS MS =SS
dfF =
MSR
MSE
13.79
Variance of Variance of YY without without XX::Variance of Variance of YY withwith XX::
Example 1, continued
M24- Std Error & r-square 21 Department of ISM, University of Alabama, 1992-2003
Y
If we have data for the response variable, but no knowledge of an X-variable, what is the best estimate of the mean of Y?
M24- Std Error & r-square 22 Department of ISM, University of Alabama, 1992-2003
Y
X
Y
“High” r 2,Low Std. Err.
We now have data for both Y and X. What is the best estimate of the mean of Y?
We now have data for both Y and X. What is the best estimate of the mean of Y?
M24- Std Error & r-square 23 Department of ISM, University of Alabama, 1992-2003
Y
X
Y
Lower r2,Higher Std. Err.
Lower r 2,Higher Std. Err.
Y
X
Yr 2 = ,
Std. Err. =
Why?
e i2
=SSE =
SST =
M24- Std Error & r-square 25 Department of ISM, University of Alabama, 1992-2003
RegressionRegression
AnalysisAnalysis
in Minitabin Minitab
RegressionRegression
AnalysisAnalysis
in Minitabin Minitab
More
M24- Std Error & r-square 26 Department of ISM, University of Alabama, 1992-2003
Example 4 Can the “depth” of lakes Can the “depth” of lakes
be estimated using “surface area”?be estimated using “surface area”?
Lakes in Vilas and Oneida counties in northern Wisconsin from the years 1959-1963.
M24- Std Error & r-square 27 Department of ISM, University of Alabama, 1992-2003
Regression Analysis
The regression equation isDepth = 28.2 + 0.00726 Area
Analysis of VarianceSource DF SS MS F PRegression 1 914.9 914.9 2.88 0.094Error 69 21891.0 317.3Total 70 22805.9
Max. depth in feetsurface area in acresData in Mtbwin/data/lake.
Example 4 Estimate depth of lakes using surface area?Estimate depth of lakes using surface area?
The P-value for “surface area” IS SMALL (<.10).Conclusion:The “area” coefficient is NOT zero!The “area” coefficient is NOT zero!“Surface area” IS a useful predictor“Surface area” IS a useful predictor of the mean of “depth”. of the mean of “depth”.
Could “area”Could “area”have a truehave a truecoefficient thatcoefficient thatis actually “zero”?is actually “zero”?
Could “area”Could “area”have a truehave a truecoefficient thatcoefficient thatis actually “zero”?is actually “zero”?
Depth of Lakes (feet) vs. Surface Area (acres)
40003000200010000
9080706050403020100
Area
Dep
th
0
2s2s
Where would theline be if theoutlier is removed? ______________.
Example 4
M24- Std Error & r-square 34 Department of ISM, University of Alabama, 1992-2003
Analysis DiaryStep Y X s r-sqr Comments
1 Depth Area 17.81 4.00% Most lakes have area less than 900 acres. Large lakes dominate the line.Although p-value is small, the line does not fit the points well.Eliminate large lakes; re-run.
Example 4 Lakes in northern Wisconsin
n = 71 lakes
M24- Std Error & r-square 35 Department of ISM, University of Alabama, 1992-2003
The regression equation isDepth = 25.3 + 0.0226 Area Predictor Coef SE Coef T PConstant 25.325 3.380 7.49 0.000Area 0.02265 0.01454 1.56 0.124 S = 18.00 R-Sq = 3.7% Analysis of Variance Source DF SS MS F PRegression 1 785.8 785.8 2.43 0.124Residual Error 64 20726.1 323.8Total 65 21511.9
Max. depth in feetsurface area in acresData in Mtbwin/data/lake.
Example 4 Estimate depth of lakes using surface area?Estimate depth of lakes using surface area?
n = 66 lakes
M24- Std Error & r-square 36 Department of ISM, University of Alabama, 1992-2003
700600500400300200100 0
90
80
70
60
50
40
30
20
10
0
Area
De
pth
S = 17.9957 R-Sq = 3.7 % R-Sq(adj) = 2.1 %
Depth = 25.3253 + 0.0226494 Area
Regression Plot
Example 4 Estimate depth of lakes using surface area?Estimate depth of lakes using surface area?
n = 66 lakes
2s2s
M24- Std Error & r-square 37 Department of ISM, University of Alabama, 1992-2003
Analysis DiaryStep Y X s r-sqr Comments
1 Depth Area 17.81 4.00% Most lakes have area less than 900 acres. Large lakes dominate.Although p-value is small, the line does not fit the points well.Eliminate large lakes; re-run.
2 Depth Area 18.00 3.70%
n = 71 lakes
Lakes larger than 900 acres in surface area are removed andthe population is redefined. The p-value for “area” is 0.124. “Surface area” is NOT a goodpredictor of lake “depth.”
n = 66 lakes
Example 4 Lakes < 900 acres in northern Wisconsin
M24- Std Error & r-square 38 Department of ISM, University of Alabama, 1992-2003
How helpful is “engine size” for estimating “mpg”?
Example 5
M24- Std Error & r-square 39 Department of ISM, University of Alabama, 1992-2003
How helpful is engine size for estimating mpg?
Regression Analysis
The regression equation ismpg_city = 29.3 - 0.0480 displace
113 cases used 4 cases contain missing valuesPredictor Coef StDev T P
Analysis of VarianceSource DF SS MS F PRegression 1 1106.1 1106.1 133.33 0.000Error 111 920.8 8.3Total 112 2026.9
displacement in cubic in.mpg_city in ??? Data in Car89 Data
Example 5
The P-value for “displacement” IS SMALL (<.10).Conclusion:The The “displacement”“displacement” coefficient is NOT zero! coefficient is NOT zero!“Displacement” IS a useful predictor“Displacement” IS a useful predictor of the mean of “mpg_city”. of the mean of “mpg_city”. (But, …(But, …
“t” measures how many standard errors the estimated coefficient is from “zero.”
P-value: a measure of the likelihoodthat the true coefficient is “zero.”
M24- Std Error & r-square 41 Department of ISM, University of Alabama, 1992-2003
mpg_city vs. displacementmpg_city vs. displacement
35025015050
35
30
25
20
15
displace
mpg
_city
S = 2.88 Is this a good fit? The data pattern appears curved; we can do better!
Example 5
M24- Std Error & r-square 42 Department of ISM, University of Alabama, 1992-2003
Plot of residuals vs. Y-hatsPlot of residuals vs. Y-hats
27221712
10
5
0
-5
-10
FITS1
RE
SI1
S = 2.88
mpg_city vs. displacementmpg_city vs. displacementExample 5
Apply a transformationin the next section.
M24- Std Error & r-square 43 Department of ISM, University of Alabama, 1992-2003
Analysis DiaryStep Y X s r-sqr Comments
1 mpg displac 2.880 54.6%
Slope of “displacement” in not zero; but plot indicates a curvedpattern.Transform a variable and re-run.
Example 5 “mpg_city” versus engine “displacement”
2 to be done in next section.
M24- Std Error & r-square 44 Department of ISM, University of Alabama, 1992-2003
Which variable is a better predictor of the rating of professional football quarterbacks, percent of touchdown passes or percent of interceptions?
Page 626, Problem 15.23
Example 6
M24- Std Error & r-square 45 Department of ISM, University of Alabama, 1992-2003