Multiple Regression (SW Ch. 6)

Post on 31-Dec-2015

47 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Multiple Regression (SW Ch. 6). Omitted variable bias Causality and regression analysis Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator Multicollinearity. The Least Squares Assumptions. LSA #1: E ( u | X = x ) = 0. - PowerPoint PPT Presentation

Transcript

1

Multiple Regression (SW Ch. 6)

1. Omitted variable bias

2. Causality and regression analysis

3. Multiple regression and OLS

4. Measures of fit

5. Sampling distribution of the OLS estimator

6. Multicollinearity

2

The Least Squares Assumptions

3

LSA #1: E(u|X = x) = 0

4

LSA #2: (Xi,Yi), i = 1,…,n are i.i.d.

LSA #3: E(X4) < ∞ and E(Y4) < ∞

5

Sampling Distribution of 1̂b

6

Measures of Fit

7

Measures of Fit

8

Measures of Fit: example

9

Measures of Fit• Akaike’s Information Criterion (AIC) is an alternative

method for adjusting the residual sum of squares for the sample size (n) and number of covariates (k)

• Is the improved fit “worth” it?

10

Example: caschool.dta. reg testscr str, rob

Linear regression Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057------------------------------------------------------------------------------

. estat ic

----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC-------------+--------------------------------------------------------------- . | 420 -1833.296 -1822.25 2 3648.499 3656.58-----------------------------------------------------------------------------

11

Example: caschool.dta. reg testscr str el_pct, rob

Linear regression Number of obs = 420 F( 2, 417) = 223.82 Prob > F = 0.0000 R-squared = 0.4264 Root MSE = 14.464

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616 el_pct | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786 _cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189------------------------------------------------------------------------------

. estat ic

----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC-------------+--------------------------------------------------------------- . | 420 -1833.296 -1716.561 3 3439.123 3451.243-----------------------------------------------------------------------------

12

The Least Squares Assumptions for Multiple Regression

13

14

15

. gen incq1 = 1 if avginc <10.639(314 missing values generated)

. replace incq1 = 0 if avginc>=10.639 & avginc < .(314 real changes made)

. gen incq2 = 1 if avginc < 13.727 & avginc >=10.639(316 missing values generated)

. replace incq2 = 0 if avginc < 10.639 & avginc >= 13.727 & avginc < . (0 real changes made)

. replace incq2 = 0 if avginc < 10.639 | (avginc >= 13.727 & avginc < .) (316 real changes made)

. gen incq3 = 1 if avginc < 17.638 & avginc >=13.727(315 missing values generated)

. replace incq3 = 0 if avginc < 13.727 | (avginc >= 17.638 & avginc < .) (315 real changes made)

“.” treated as +∞ in STATA

. gen incq4 = 1 if avginc >= 17.638 & avginc < .(315 missing values generated)

. replace incq4 = 0 if avginc < 17.638(315 real changes made)

. gen testdum = incq1 + incq2 + incq3 + incq4

. sum avginc inc* testdum

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- avginc | 420 15.31659 7.22589 5.335 55.328 incq1 | 420 .252381 .4348967 0 1 incq2 | 420 .247619 .4321441 0 1 incq3 | 420 .25 .4335291 0 1 incq4 | 420 .25 .4335291 0 1-------------+-------------------------------------------------------- testdum | 420 1 0 1 1

16

Dummy Variable Trap. reg testscr str incq1 incq2 incq3 incq4, robustnote: incq3 omitted because of collinearity

Linear regression Number of obs = 420 F( 4, 415) = 72.03 Prob > F = 0.0000 R-squared = 0.4468 Root MSE = 14.24

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -1.417963 .400663 -3.54 0.000 -2.205545 -.6303814 incq1 | -16.97711 1.953708 -8.69 0.000 -20.81751 -13.13672 incq2 | -6.795768 1.83231 -3.71 0.000 -10.39753 -3.194003 incq3 | (omitted) incq4 | 16.17749 1.880508 8.60 0.000 12.48098 19.87399 _cons | 683.929 8.136528 84.06 0.000 667.9351 699.923------------------------------------------------------------------------------

• Solution #1 is to …• Interpretation is then …

17

Dummy Variable Trap. reg testscr str incq1 incq2 incq3 incq4, robust noconstant

Linear regression Number of obs = 420 F( 5, 415) = . Prob > F = 0.0000 R-squared = 0.9995 Root MSE = 14.24

------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- str | -1.417963 .400663 -3.54 0.000 -2.205545 -.6303814 incq1 | 666.9519 7.862759 84.82 0.000 651.4961 682.4077 incq2 | 677.1333 7.931178 85.38 0.000 661.543 692.7236 incq3 | 683.929 8.136528 84.06 0.000 667.9351 699.923 incq4 | 700.1065 8.014253 87.36 0.000 684.3529 715.8601------------------------------------------------------------------------------

• Solution #2 is to …• Interpretation is then …

18

The Sampling Distribution of the OLS Estimator in Multiple Reg

19

Imperfect Multicollinearity

20

Detection and Remedies for Imperfect Multicollinearity

• Detection· calculate all the pairwise correlation coefficients· > .7 or .8 is some cause for concern· Variance Inflation Factors (VIFs) can be calculated· Hallmark is high R2 but insignificant t-statistics

· Remedy· Do nothing· Drop a variable · Transform multicollinear variables

· need to have same sign and magnitudes· Get more data (i.e., increase the sample size)

top related