Top Banner
1 Multiple Regression (SW Ch. 6) 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution of the OLS estimator 6. Multicollinearity
20

Multiple Regression (SW Ch. 6)

Dec 31, 2015

Download

Documents

Multiple Regression (SW Ch. 6). Omitted variable bias Causality and regression analysis Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator Multicollinearity. The Least Squares Assumptions. LSA #1: E ( u | X = x ) = 0. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Regression  (SW Ch. 6)

1

Multiple Regression (SW Ch. 6)

1. Omitted variable bias

2. Causality and regression analysis

3. Multiple regression and OLS

4. Measures of fit

5. Sampling distribution of the OLS estimator

6. Multicollinearity

Page 2: Multiple Regression  (SW Ch. 6)

2

The Least Squares Assumptions

Page 3: Multiple Regression  (SW Ch. 6)

3

LSA #1: E(u|X = x) = 0

Page 4: Multiple Regression  (SW Ch. 6)

4

LSA #2: (Xi,Yi), i = 1,…,n are i.i.d.

LSA #3: E(X4) < ∞ and E(Y4) < ∞

Page 5: Multiple Regression  (SW Ch. 6)

5

Sampling Distribution of 1̂b

Page 6: Multiple Regression  (SW Ch. 6)

6

Measures of Fit

Page 7: Multiple Regression  (SW Ch. 6)

7

Measures of Fit

Page 8: Multiple Regression  (SW Ch. 6)

8

Measures of Fit: example

Page 9: Multiple Regression  (SW Ch. 6)

9

Measures of Fit• Akaike’s Information Criterion (AIC) is an alternative

method for adjusting the residual sum of squares for the sample size (n) and number of covariates (k)

• Is the improved fit “worth” it?

Page 10: Multiple Regression  (SW Ch. 6)

10

Example: caschool.dta. reg testscr str, rob

Linear regression Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057------------------------------------------------------------------------------

. estat ic

----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC-------------+--------------------------------------------------------------- . | 420 -1833.296 -1822.25 2 3648.499 3656.58-----------------------------------------------------------------------------

Page 11: Multiple Regression  (SW Ch. 6)

11

Example: caschool.dta. reg testscr str el_pct, rob

Linear regression Number of obs = 420 F( 2, 417) = 223.82 Prob > F = 0.0000 R-squared = 0.4264 Root MSE = 14.464

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616 el_pct | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786 _cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189------------------------------------------------------------------------------

. estat ic

----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC-------------+--------------------------------------------------------------- . | 420 -1833.296 -1716.561 3 3439.123 3451.243-----------------------------------------------------------------------------

Page 12: Multiple Regression  (SW Ch. 6)

12

The Least Squares Assumptions for Multiple Regression

Page 13: Multiple Regression  (SW Ch. 6)

13

Page 14: Multiple Regression  (SW Ch. 6)

14

Page 15: Multiple Regression  (SW Ch. 6)

15

. gen incq1 = 1 if avginc <10.639(314 missing values generated)

. replace incq1 = 0 if avginc>=10.639 & avginc < .(314 real changes made)

. gen incq2 = 1 if avginc < 13.727 & avginc >=10.639(316 missing values generated)

. replace incq2 = 0 if avginc < 10.639 & avginc >= 13.727 & avginc < . (0 real changes made)

. replace incq2 = 0 if avginc < 10.639 | (avginc >= 13.727 & avginc < .) (316 real changes made)

. gen incq3 = 1 if avginc < 17.638 & avginc >=13.727(315 missing values generated)

. replace incq3 = 0 if avginc < 13.727 | (avginc >= 17.638 & avginc < .) (315 real changes made)

“.” treated as +∞ in STATA

. gen incq4 = 1 if avginc >= 17.638 & avginc < .(315 missing values generated)

. replace incq4 = 0 if avginc < 17.638(315 real changes made)

. gen testdum = incq1 + incq2 + incq3 + incq4

. sum avginc inc* testdum

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- avginc | 420 15.31659 7.22589 5.335 55.328 incq1 | 420 .252381 .4348967 0 1 incq2 | 420 .247619 .4321441 0 1 incq3 | 420 .25 .4335291 0 1 incq4 | 420 .25 .4335291 0 1-------------+-------------------------------------------------------- testdum | 420 1 0 1 1

Page 16: Multiple Regression  (SW Ch. 6)

16

Dummy Variable Trap. reg testscr str incq1 incq2 incq3 incq4, robustnote: incq3 omitted because of collinearity

Linear regression Number of obs = 420 F( 4, 415) = 72.03 Prob > F = 0.0000 R-squared = 0.4468 Root MSE = 14.24

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -1.417963 .400663 -3.54 0.000 -2.205545 -.6303814 incq1 | -16.97711 1.953708 -8.69 0.000 -20.81751 -13.13672 incq2 | -6.795768 1.83231 -3.71 0.000 -10.39753 -3.194003 incq3 | (omitted) incq4 | 16.17749 1.880508 8.60 0.000 12.48098 19.87399 _cons | 683.929 8.136528 84.06 0.000 667.9351 699.923------------------------------------------------------------------------------

• Solution #1 is to …• Interpretation is then …

Page 17: Multiple Regression  (SW Ch. 6)

17

Dummy Variable Trap. reg testscr str incq1 incq2 incq3 incq4, robust noconstant

Linear regression Number of obs = 420 F( 5, 415) = . Prob > F = 0.0000 R-squared = 0.9995 Root MSE = 14.24

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -1.417963 .400663 -3.54 0.000 -2.205545 -.6303814 incq1 | 666.9519 7.862759 84.82 0.000 651.4961 682.4077 incq2 | 677.1333 7.931178 85.38 0.000 661.543 692.7236 incq3 | 683.929 8.136528 84.06 0.000 667.9351 699.923 incq4 | 700.1065 8.014253 87.36 0.000 684.3529 715.8601------------------------------------------------------------------------------

• Solution #2 is to …• Interpretation is then …

Page 18: Multiple Regression  (SW Ch. 6)

18

The Sampling Distribution of the OLS Estimator in Multiple Reg

Page 19: Multiple Regression  (SW Ch. 6)

19

Imperfect Multicollinearity

Page 20: Multiple Regression  (SW Ch. 6)

20

Detection and Remedies for Imperfect Multicollinearity

• Detection· calculate all the pairwise correlation coefficients· > .7 or .8 is some cause for concern· Variance Inflation Factors (VIFs) can be calculated· Hallmark is high R2 but insignificant t-statistics

· Remedy· Do nothing· Drop a variable · Transform multicollinear variables

· need to have same sign and magnitudes· Get more data (i.e., increase the sample size)