Top Banner
TESTING THE STRENGTH TESTING THE STRENGTH OF THE OF THE MULTIPLE REGRESSION MODEL MULTIPLE REGRESSION MODEL
25

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL

Feb 06, 2016

Download

Documents

Shawn_a

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL. Test 1: Are Any of the x’s Useful in Predicting y?. We are asking: Can we conclude at least one of the ’s (other than  0 )  0? H 0 :  1 =  2 =  3 =  4 = 0 H A : At least one of these ’s  0  = .05. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

TESTING THE STRENGTH TESTING THE STRENGTH

OF THEOF THE

MULTIPLE REGRESSION MODELMULTIPLE REGRESSION MODEL

Page 2: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Test 1: Are Any of the x’s Useful in Predicting y?

We are asking: Can we conclude at least one of the ’s (other than 0) 0?

H0: 1 = 2 = 3 = 4 = 0

HA: At least one of these ’s 0

= .05

Page 3: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Idea of the Test

• Measure the overall “average variability” due to changes in the x’s

• Measure the overall “average variability” that is due to randomness (error)

• If the overall “average variability” due to changes in the x’s IS A LOT LARGERIS A LOT LARGER than “average variability” due to error, we conclude at least is non-zero, i.e. at least one factor (x) is useful in predicting y

Page 4: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

“Total Variability”

• Just like with simple linear regression we have total sum of squares due to regression SSR , and total sum of squares due to error, SSE, which are printed on the EXCEL output.

– The formulas are a more complicated (they involve matrix operations)

Page 5: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

“Average Variability”

• “Average variability” (Mean variability) for a group is defined as the Total Variability divided by the degrees of freedom associated with that group:

• Mean Squares Due to RegressionMSR = SSR/DFR

• Mean Squares Due to ErrorMSE = SSE/DFE

Page 6: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Degrees of Freedom

• Total number of degrees of freedom DF(Total) always = n-1

• Degrees of freedom for regression (DFR) = the number of factors in the regression (i.e. the number of x’s in the linear regression)

• Degrees of freedom for error (DFE) = difference between the two = DF(Total) -DFR

Page 7: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

The F-Statistic

• The F-statistic is defined as the ratio of two measures of variability. Here,

• Recall we are saying if MSR is “large” compared to MSE, at least one β ≠ 0.

• Thus if F is “large”, we draw the conclusion is that HA is true, i.e. at least one β ≠ 0.

MSE

MSRF

Page 8: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

The F-test

• “Large” compared to what?

• F-tables give critical values for given values of

• TEST: REJECT H0 (Accept HA) if:

F = MSR/MSE > F,DFR,DFE

Page 9: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

RESULTS

• If we do not get a large F statistic– We cannot conclude that any of the variables

in this model are significant in predicting y.

• If we do get a large F statistic– We can conclude at least one of the variables

is significant for predicting y .– NATURAL QUESTION --

• WHICH ONES?

Page 10: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

DFR = #x’sDFE = Total DF- DFRTotal DF = n-1

SSRSSE

Total SS = (yi - )2y

Page 11: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

MSR = SSR/DFRMSE = SSE/DFE

F = MSR/MSE

P-value for the F test

Page 12: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Results

• We see that the F statistic is 20.89762• This would be compared to F.05,3,34

– From the F.05 Table, the value of F.05,3,34 is not given.– But F.05,3,30 = 2.92 and F.05,3,40 = 2.84.– And 20.89762 > either of these numbers.– The actual value of F.05,3,34 can be calculated by Excel

by FINV(.05,3,34) = 2.882601

• USE SIGNIFICANCE F USE SIGNIFICANCE F – This is the p-valuep-value for the F-Test– Significance F = 7.46 x 10-8 = .0000000746 < .05– Can conclude that at least one x is useful in predicting y

Page 13: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Test 2: Which Variables Are Significant IN THIS MODEL?

• The question we are asking is, “taking all the other factors (x’s) into consideration, does a change in a particular x (x3, say) value significantly affect y.

• This is another hypothesis test (a t-test).

• To test if the age of the house is significant:

H0: 3 = 0 (x3 is not significant in this modelin this model)

HA: 3 0 (x3 is significant in this modelin this model)

Page 14: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

The t-test for a particular factor IN THIS MODEL

• Reject H0 (Accept HA) if:

DFE.025,DFE.025,β

3 tor ts

0β̂t

3

Page 15: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

t-value for test of 3 = 0

p-value for test of 3 = 0

Page 16: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Reading Printout for the t-test

• Simply look at the p-value– p-value for 3 = 0 is .02194 < .05

• Thus the age of the house is significant in this modelin this model

• The other variables– p-value for 1 = 0 is .0000839 < .05

• Thus square feet is significant in this modelin this model

– p-value for 2 = 0 is .15503 > .05• Thus the land (acres) is not significant in this modelin this model

Page 17: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Does A Poor t-value Imply the Variable is not Useful in Predicting y?

• NO

• It says the variable is not significant IN THIS IN THIS MODELMODEL when we consider all the other factors.

• In this model – land is not significant when included with square footage and age.

• But if we would have run this model without square footage we would have gotten the output on the next slide.

Page 18: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

p-value for land is .00000717.In this model Land is significant.

Page 19: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Can it even happen that F says at least one variable is significant, but none of

the t’s indicate a useful variable?

• YES

EXAMPLES IN WHICH THIS MIGHT HAPPEN:– Miles per gallon vs. horsepower and engine size

– Salary vs. GPA and GPA in major

– Income vs. age and experience – HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND

• There is a relation between the x’s – – Multicollinearity

Page 20: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Approaches That Could Be Used When Multicollinearity Is Detected

• Eliminate some variables and run again

• Stepwise regressionThis is discussed in a future module.

Page 21: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Test 3 --What Proportion of the Overall Variability in y Is Due to

Changes in the x’s?

R2 • R2 = .442197• Overall 44% of the total variation in sales price is

explained by changes in square footage, land, and age of the house.

Page 22: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

What is Adjusted R2?

• Adjusted R2 adjusts R2 to take into account degrees of freedom.

• By assuming a higher order equation for y, we can force the curve to fit this one set of data points in the model – eliminating much of the variability (See next slide).

• But this is not what is going on!R2 might be higher – but adjusted R2 might be much

lower

• Adjusted R2 takes this into account

• Adjusted R2 = 1-MSE/SST

Page 23: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

ScatterplotSales vs Ad Dollars

$0

$20,000

$40,000

$60,000

$80,000

$100,000

$120,000

$140,000

$- $200 $400 $600 $800 $1,000 $1,200 $1,400

Ad Dollars

Sale

s

This is not what is really going on

Page 24: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

Review

• Are any of the x’s useful in predicting y IN THIS MODEL – Look at p-value for F-test – Significance F

– F = MSR/MSE would be compared to F,DFR,DFE

• Which variables are significant in this model?– Look at p-values for the individual t-tests

• What proportion of the total variance in y can be explained by changes in the x’s?– R2

– Adjusted R2 takes into account the reduced degrees of freedom for the error term by including more terms in the model

Page 25: TESTING THE STRENGTH  OF THE MULTIPLE REGRESSION MODEL

1-regression equation3- p-values for t-tests

Which variables are significantin this model?

4- R2

What proportion of y can beexplained by changes in x?

4 Places to Look on Excel Printout

2- Significance FAre any variables useful?