Regression Regression
Mar 19, 2016
RegressionRegression
60 62 64 66 68Height
Wei
ght
60 62 64 66 68
How much would an adult female weigh if she were 5 feet
tall?
She could weigh varying amounts –
in other words, there is a
distribution of weights for adult
females who are 5 feet tall.
This distribution is normally distributed.(we hope)
What would you expect for other heights?
Where would you expect the TRUE LSRL
to be?
What about the standard deviations of
all these normal
distributions?
60 62 64 66 68
60 62 64 66 68
xy
We want the standard deviations of all these normal distributions to be
the same.
Regression ModelRegression Model• The mean response my has a straight-
line relationship with x: – Where: slope b and intercept a are unknown
parameters• For any fixed value of x, the response y
varies according to a normal distribution. Repeated responses of y are independent of each other.
• The standard deviation of y (sy) is the same for all values of x. (sy is also an unknown parameter)
xy
Person #
Ht Wt
1 64 13010 64 17515 64 15019 64 12521 64 14540 64 18647 64 12160 64 13763 64 14368 64 12070 64 11278 64 10883 64 160
Suppose we look at part of a population of adult women. These women are all 64 inches tall. What distribution does their weight have?
• The slope b of the LSRL is an unbiased estimator of the true slope b.
• The intercept a of the LSRL is an unbiased estimator of the true intercept a.
• The standard error s is an unbiased estimator of the true standard deviation of y (sy).
bxay ˆ xy
22
ˆ 22
n
residualsn
yysNote:
df = n-2
We use to estimate
Notes!Student # of
beers BAC
1 5 .102 2 .033 9 .194 8 .125 3 .046 7 .0957 3 .078 5 .069 3 .02
10 5 .0511 4 .0712 6 .1013 5 .08514 7 .0915 1 .0116 4 .05
For a study on student drinking and blood alcohol level, sixteen student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC). The results are show below:
Use your calculator to find a regression equation (ax + b) for this data. State your equation using descriptive notation.
What does the value a represent in the context of this problem?
We would like to create a confidence interval for the slope of the regression line.
In other words, we want to know
.
Conditions for regression Conditions for regression inferenceinference• For any fixed value of x, the response
variable y varies normally about the true regression line.– Check a histogram or boxplot of residuals
• The mean response, ,has a straight line relationship with x– Check the scatter plot & residual plot
• The standard deviation of y is the same for values of x.– Check the scatter plot & residual plot
yxy
For problems involving inference for regression, we use a
.
60 62 64 66 68Height
Wei
ght
Suppose the LSRL has a
horizontal line –would height
be useful in predicting weight?
What is the slope of a horizontal
line?
A slope of zero – means that there is NO relationship
between x & y!
• Confidence Interval:
bSE the standard error of the least squares slope, b
Formulas:Formulas:
b statistic of SD valuecritical statisticCI
*t bSEdf = n -2
Because there are two unknowns &
1
nssSE
xb
Interpretation:
We are 95% confident that the mean change in BAC per beer is between ___________ and _____________
Back to our Example: For a study on student drinking and blood alcohol level, sixteen student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC). The results are show below:a)Find the LSRL, correlation coefficient, and coefficient of determination.BAC = -.0127 + 0.018 (Beers)r = 0.8943r2 = 0.7998
b) Explain the meaning of slope in the context of the problem.There is approximately 1.8% increase in BAC for every Beer
c) Explain the meaning of the coefficient of determination in context.Approximately 80% of the variation in BAC can be explained by the regression of BAC on number of Beers drunk.
d) Estimate a, b, and s.
a = -.0127b = .0180s = .0204
e) Create a scatter plot, residual plot and box plot of the residuals for the data.
22
nresidualss
Beers
BA
C
Beers
Res
idua
ls
Residuals
f) Give a 95% confidence interval for the true slope of the LSRL.Assumptions:•Have an SRS of students•Since the residual plot is randomly scattered, BAC and # of beers are linear•Since the points are evenly spaced across the LSRL on the scatterplot, sy is approximately equal for all values of BAC•Since the boxplot of residual is approximately symmetrical, the responses are approximately normally distributed.
We are 95% confident that the true slope of the LSRL of weight & body fat is between 0.12 and 0.38.
Be sure to show all graphs!
14)0231.,0128(.
0024.145.2018.0*
df
stb b