Top Banner
Regression Regression
16

Regression

Mar 19, 2016

Download

Documents

sheba

Regression. Weight. Height. What would you expect for other heights?. How much would an adult female weigh if she were 5 feet tall?. This distribution is normally distributed. (we hope). - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regression

RegressionRegression

Page 2: Regression

60 62 64 66 68Height

Wei

ght

60 62 64 66 68

How much would an adult female weigh if she were 5 feet

tall?

She could weigh varying amounts –

in other words, there is a

distribution of weights for adult

females who are 5 feet tall.

This distribution is normally distributed.(we hope)

What would you expect for other heights?

Where would you expect the TRUE LSRL

to be?

What about the standard deviations of

all these normal

distributions?

60 62 64 66 68

60 62 64 66 68

xy

We want the standard deviations of all these normal distributions to be

the same.

Page 3: Regression

Regression ModelRegression Model• The mean response my has a straight-

line relationship with x: – Where: slope b and intercept a are unknown

parameters• For any fixed value of x, the response y

varies according to a normal distribution. Repeated responses of y are independent of each other.

• The standard deviation of y (sy) is the same for all values of x. (sy is also an unknown parameter)

xy

Page 4: Regression

Person #

Ht Wt

1 64 13010 64 17515 64 15019 64 12521 64 14540 64 18647 64 12160 64 13763 64 14368 64 12070 64 11278 64 10883 64 160

Suppose we look at part of a population of adult women. These women are all 64 inches tall. What distribution does their weight have?

Page 5: Regression

• The slope b of the LSRL is an unbiased estimator of the true slope b.

• The intercept a of the LSRL is an unbiased estimator of the true intercept a.

• The standard error s is an unbiased estimator of the true standard deviation of y (sy).

bxay ˆ xy

22

ˆ 22

n

residualsn

yysNote:

df = n-2

We use to estimate

Page 6: Regression

Notes!Student # of

beers BAC

1 5 .102 2 .033 9 .194 8 .125 3 .046 7 .0957 3 .078 5 .069 3 .02

10 5 .0511 4 .0712 6 .1013 5 .08514 7 .0915 1 .0116 4 .05

For a study on student drinking and blood alcohol level, sixteen student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC). The results are show below:

Use your calculator to find a regression equation (ax + b) for this data. State your equation using descriptive notation.

What does the value a represent in the context of this problem?

Page 7: Regression

We would like to create a confidence interval for the slope of the regression line.

In other words, we want to know

.

Page 8: Regression

Conditions for regression Conditions for regression inferenceinference• For any fixed value of x, the response

variable y varies normally about the true regression line.– Check a histogram or boxplot of residuals

• The mean response, ,has a straight line relationship with x– Check the scatter plot & residual plot

• The standard deviation of y is the same for values of x.– Check the scatter plot & residual plot

yxy

Page 9: Regression

For problems involving inference for regression, we use a

.

Page 10: Regression

60 62 64 66 68Height

Wei

ght

Suppose the LSRL has a

horizontal line –would height

be useful in predicting weight?

What is the slope of a horizontal

line?

A slope of zero – means that there is NO relationship

between x & y!

Page 11: Regression

• Confidence Interval:

bSE the standard error of the least squares slope, b

Formulas:Formulas:

b statistic of SD valuecritical statisticCI

*t bSEdf = n -2

Because there are two unknowns &

1

nssSE

xb

Page 12: Regression

Interpretation:

We are 95% confident that the mean change in BAC per beer is between ___________ and _____________

Page 13: Regression

Back to our Example: For a study on student drinking and blood alcohol level, sixteen student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC). The results are show below:a)Find the LSRL, correlation coefficient, and coefficient of determination.BAC = -.0127 + 0.018 (Beers)r = 0.8943r2 = 0.7998

Page 14: Regression

b) Explain the meaning of slope in the context of the problem.There is approximately 1.8% increase in BAC for every Beer

c) Explain the meaning of the coefficient of determination in context.Approximately 80% of the variation in BAC can be explained by the regression of BAC on number of Beers drunk.

Page 15: Regression

d) Estimate a, b, and s.

a = -.0127b = .0180s = .0204

e) Create a scatter plot, residual plot and box plot of the residuals for the data.

22

nresidualss

Beers

BA

C

Beers

Res

idua

ls

Residuals

Page 16: Regression

f) Give a 95% confidence interval for the true slope of the LSRL.Assumptions:•Have an SRS of students•Since the residual plot is randomly scattered, BAC and # of beers are linear•Since the points are evenly spaced across the LSRL on the scatterplot, sy is approximately equal for all values of BAC•Since the boxplot of residual is approximately symmetrical, the responses are approximately normally distributed.

We are 95% confident that the true slope of the LSRL of weight & body fat is between 0.12 and 0.38.

Be sure to show all graphs!

14)0231.,0128(.

0024.145.2018.0*

df

stb b