Top Banner
Multiple Regression I 4/9/12 • Transformations • The model • Individual coefficients • R 2 • ANOVA for regression • Residual standard error Section 9.4, 9.5 Professor Kari Lock Morgan Duke University
44

Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Jan 03, 2016

Download

Documents

Owen Doyle
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Multiple Regression I4/9/12

• Transformations • The model• Individual coefficients• R2 • ANOVA for regression• Residual standard error

Section 9.4, 9.5 Professor Kari Lock MorganDuke University

Page 2: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

• Project 2 Proposal (due Wednesday, 4/11)

• Homework 9 (due Monday, 4/16)

• Project 2 Presentation (Thursday, 4/19)

• Project 2 Paper (Wednesday, 4/25)

To Do

Page 3: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Non-Constant Variability

Page 4: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Non-Normal Residuals

Page 5: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Transformations• If the conditions are not satisfied, there are some common transformations you can apply to the response variable

• You can take any function of y and use it as the response, but the most common are• log(y) (natural logarithm - ln)• y (square root)• y2 (squared)• ey (exponential))

Page 6: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

log(y)Original Response, y:

Logged Response, log(y):

Page 7: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

yOriginal Response, y:

Square root of Response, y:

Page 8: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

y2

Original Response, y:

Squared response, y2:

Page 9: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

ey

Original Response, y:

Exponentiated Response, ey:

Page 10: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

• Multiple regression extends simple linear regression to include multiple explanatory variables:

Multiple Regression

0 1 2 21 ... k k ix xxy ò

Page 11: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

• We’ll use your current grades to predict final exam scores, based on a model from last semester’s students

• Response: final exam score

• Explanatory: hw average, clicker average, exam 1, exam 2

Grade on Final

0 1 2 3 4hw clicker exam1 exam2y

Page 12: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

What variable is the most significant predictor of final exam score?

a) Homework averageb) Clicker averagec) Exam 1 d) Exam 2

Grade on Final

Page 13: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

The p-value for explanatory variable xi is associated with the hypotheses

For intervals and p-values of coefficients in multiple regression, use a t-distribution with degrees of freedom n – k – 1, where k is the number of explanatory variables included in the model

0 : 0

: 0a

i

iH

H

Inference for Coefficients

Page 14: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Estimate your score on the final exam.

What type of interval do you want for this estimate?

a) Confidence intervalb) Prediction interval

Grade on Final

Page 15: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Estimate your score on the final exam.(hw average is out of 10, clicker average is out of 2)

Grade on Final

Page 16: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Is the clicker coefficient really negative?!?

Give a 95% confidence interval for the clicker coefficient (okay to use t* = 2).

Grade on Final

Page 17: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Is your score on exam 2 really not a significant predictor of your final exam score?!?

Grade on Final

Page 18: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

• The coefficient (and significance) for each explanatory variable depend on the other variables in the model!

• In predicting final exam scores, if you know someone’s score on Exam 1, it doesn’t provide much additional information to know their score on Exam 2 (both of these explanatory variables are highly correlated)

Coefficients

Page 19: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

If you take Exam 1 out of the model…

Grade on Final

Model with Exam 1:

Now Exam 2 is significant!

Page 20: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

If you include Project 1 in the model…

Grade on Final

Model without Project 1:

Page 21: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Grades

Page 22: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Multiple Regression• The coefficient for each explanatory variable is the

predicted change in y for one unit change in x, given the other explanatory variables in the model!

• The p-value for each coefficient indicates whether it is a significant predictor of y, given the other explanatory variables in the model!

• If explanatory variables are associated with each other, coefficients and p-values will change depending on what else is included in the model

Page 23: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Residuals

Are the conditions satisfied?(a) Yes (b) No

Page 24: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Evaluating a Model

• How do we evaluate the success of a model?

• How we determine the overall significance of a model?

• How do we choose between two competing models?

Page 25: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Variability• One way to evaluate a model is to partition variability

• A good model “explains” a lot of the variability in Y

Total Variability

VariabilityExplained

by the Model

Error Variability

Page 26: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Exam Scores• Without knowing the explanatory variables, we can say that a person’s final exam score will probably be between 60 and 98 (the range of Y)

• Knowing hw average, clicker average, exam 1 and 2 grades, and project 1 grades, we can give a narrower prediction interval for final exam score

• We say the some of the variability in y is explained by the explanatory variables

• How do we quantify this?

Page 27: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

VariabilityHow do we quantify variability in Y?

a) Standard deviation of Yb) Sum of squared deviations from the

mean of Yc) (a) or (b)d) None of the above

Page 28: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Sums of Squares

2

1

n

ii

Y Y

Total Variability

VariabilityExplained

by the model

Error variability

2

1

ˆn

ii

Y Y

2

1

ˆn

i ii

Y Y

SST SSM SSE

Page 29: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Variability

Y

2

1

Total Sum of Squares:n

ii

SST y y

2

1

Model Sum of Squares:

ˆn

ii

SSM y y

2

1

Error Sum of Squares:

ˆn

i ii

SSE y y

• If SSM is much higher than SSE, than the model explains a lot of the variability in Y

Page 30: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

R2

• R2 is the proportion of the variability in Y that is explained by the model

2 "Variability in Y explained by the model"

"Total variability in Y"

SSMR

SST

Total Variability

Variability Explained by the Model

Page 31: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

R2

• For simple linear regression, R2 is just the squared correlation between X and Y

• For multiple regression, R2 is the squared correlation between the actual values and the predicted values

Page 32: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

R2

2 0.67R 2 0.09R

Page 33: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Final Exam Grade

Page 34: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Is the model significant?• If we want to test whether the model is significant (whether the model helps to predict y), we can test the hypotheses:

• We do this with ANOVA!

0 1 2: ... 0

: At least one 0k

a i

H

H

Page 35: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

ANOVA for Regression

k: number of explanatory variablesn: sample size

Source

Model

Error

Total

df

k

n-k-1

n-1

Sum ofSquares

SSM

SSE

SST

MeanSquare

MSM = SSM/k

MSE = SSE/(n-k-1)

F

MSMMSE

p-value

Use Fk,n-k-1

Page 36: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Final Exam GradeFor this model, do the explanatory variables significantly help to predict final exam score? (calculate a p-value).

(a) Yes

(b) No

n = 69SSM = 3125.8SSE = 1901.4

Page 37: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

ANOVA for Regression

Source

Model

Error

Total

df

5

63

68

Sum ofSquares

3125.8

1901.4

5027.2

MeanSquare

625.16

30.18

F

20.71

p-value

0

Page 38: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Final Exam Grade

Page 39: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Simple Linear Regression• For simple linear regression, the following tests will all give equivalent p-values:

• t-test for non-zero correlation

• t-test for non-zero slope

• ANOVA for regression

Page 40: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Mean Square Error (MSE)• Mean square error (MSE) measures the average variability in the errors (residuals)

• The square root of MSE gives the standard deviation of the residuals (giving a typical distance of points from the line)

• This number is also given in the R output as the residual standard error, and is known as s in the textbook

Page 41: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Final Exam Grade

Page 42: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

0 1i i iy x

Simple Linear Model

~ 0,i N

Residual standard error = MSE = se estimates the standard deviation of

the residuals (the spread of the normal distributions around the

predicted values)

Page 43: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

Residual Standard Error• Use the fact that the residual standard error is 5.494 and your predicted final exam score to compute an approximate 95% prediction interval for your final exam score

• NOTE: This calculation only takes into account errors around the line, not uncertainty in the line itself, so your true prediction interval will be slightly wider

ˆ 2 5.494y

Page 44: Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.

• How do we decide which explanatory variables to include in the model?

• How do we use categorical explanatory variables?

• What is the coefficient of one explanatory variable depends on the value of another explanatory variable?

To Come…