Specification Error II

10.1

Specification Error II

10.2

Aims and Learning Objectives

By the end of this session students should be able to: • Understand the causes and consequences of multicollinearity

• Analyse regression results for possible multicollinearity

• Understand the nature of endogeneity

• Analyse regression results for possible endogeneity

10.3

Introduction

In this lecture we consider what happens when we violate Assumption 7:

No exact collinearity or perfect multicollinearity among the explanatory variables

and Assumption 3:

Cov(Ui, X2i) = Cov(Ui, X3i)... =... Cov(Ui,Xki) = 0

10.4

What is Multicollinearity?

Definitions

Perfect Multicollinearity: exact linear relationshipbetween two or more explanatory variables

Imperfect Multicollinearity: two or more explanatory variables are approximately linearly related

The term “independent variable” means an explanatory variable is independent of of the error term, but not necessarily independent of other explanatory variables.

10.5

Example: Perfect Multicollinearity

Suppose we want to estimate the following model:

iiii UXXY 33221

If there is an exact linear relationship between X2 and X3. For example, if

ii XX 32 510

Then we cannot estimate the individual partial regression coefficients

10.6

This is because substituting the last expressioninto the first we get:

iii

iiii

UXYUXXY

32321

33321

)5()10()510(

If we let

232211 5;10 AA

iii UXAAY 221

10.7

Example: Imperfect Multicollinearity

Although perfect multicollinearity is theoretically possible, in practice imperfect multicollinearity is what we commonly observed.

Typical examples of perfect multicollinearity are when the researcher makes a mistake (including the same variable twice or forgetting to omit a default category for a series of dummy variables)

10.8

Consequences of Multicollinearity

1. No OLS output when multicollinearity is exact.2. large standard errors and wide confidence

intervals. 3. Estimators sensitive to deletion or addition of a

few observations or “insignificant” variables. Estimators non-robust.

4. Estimators have the “wrong” sign

OLS remains BLUE, however some adverse practicalconsequences:

10.9

Detecting Multicollinearity

1. Few significant t-ratios but a high R2 and a collective significance of the variables 2. High pairwise correlation between the

explanatory variables3. Examination of partial correlations4. Estimate auxiliary regressions5. Estimate variance inflation factor (VIF)

No formal “tests” for multicollinearity

10.10

Auxiliary Regressions

Auxiliary Regressions - regress each explanatory variable on the remaining explanatory variables

iii XaXaaX 433212

iii XbXbbX 432213

iii XcXccX 332214

The R2 will show how strongly Xji is collinear withthe other explanatory variables

10.11

Variance Inflation Factor

2

2

2 )ˆvar(ix

22 )( XXx ii

)1()ˆvar( 22

2

jjj Rx

In the two variable model (bivariate regression) the variance of the OLS estimator was:

where

Extending this to the case of more than two variables leads to the formulae laid out in lecture 5, or alternatively:

10.12

Example: Imperfect Multicollinearity

CON INC WLTH 1 70 80 8102 65 100 10093 90 120 12734 95 140 14255 110 160 16336 115 180 18767 120 200 20528 140 220 22019 155 240 243510 150 260 2686

Hypothetical dataon weekly family consumption expenditure (CON), weeklyfamily income (INC)and wealth (WLTH)

10.13

CON = 24.775 + 0.942INC -0.0424WLTH (3.669) (1.1442) (-0.526)

(t-ratios in parentheses) R2 = 0.964 ESS = 8,565.554 RSS = 324.446F= 92.349

Regression Results:

R2 is high (96%); wealth has the wrong sign but neither slope coefficient is individually statistically significant. Joint hypothesis, however, is significant

10.14

Auxiliary Regression Results:

INC = -0.386 + 0.098WLTH (-0.133) (62.04)

(t-ratios in parentheses) R2 = 0.998 F= 3849

Variance Inflation Factor:

500998.01

11

12

jR

10.15

Remedying Multicollinearity

High multicollinearity occurs because of a lack of adequate information in the sample

1. Collect more data with better information.2. Perform robustness checks3. If all else fails at least point out that the

poor model performance might be due to the multicollinearity problem (or it might not).

10.16

The Nature of Endogenous Explanatory Variables

In real world applications we distinguish between:

• Exogenous (pre-determined) Variables

• Endogenous (jointly determined) Variables

When one or more explanatory variable isendogenous, there is implicitly a system ofsimultaneous equations

10.17

Example: Endogeneity

iiiii UEESW 24321ln

But

ii VAS 21 Therefore Cov(S, U) 0

OLS of the relationship between W and S gives “credit” to education for changes in the disturbances. Resulting OLS estimator is biased upwards (since Cov (Si, Ui) > 0) and, because the problem persists even in large samples, the estimator is also inconsistent

10.18

Remedies for Endogeneity

Two options:

• Try and find a suitable proxy for the unobserved variable

• Leave the unobserved variable in the error term but use an instrument for the endogenous explanatory variable (involves a different estimation technique)

10.19

Example

iiiii UEESW 24321ln

ii VAS 21 and

Include a proxy for ability

iiiii UAEESW 52

4321ln

Find an instrument for education

Needs to have the following properties Cov(Z,U) = 0 and Cov(Z, S) 0

10.20

Hausman Test for Endogeneity

Stage 1: Estimate the reduced form:

Stage 2: Add to the structural equation and test the significance of

Decision rule: if is significantreject null hypothesis of exogeneity

iiiii UEESW 24321ln

Suppose we wish to test whether S is uncorrelatedwith U.

iiiii VMSEES 42

321

iV̂

iV̂

10.21

Summary

In this lecture we have:

1. Outlined the theoretical and practical consequences of multicollinearity

2. Described a number of procedures for detecting the presence of multicollinearity

3. Outlined the basic consequences of endogeneity

4. Outlined a procedure for detecting the presence of endogeneity

Specification Error II

Documents

possible multicollinearity

explanatory variables3

regression results

high r2

insignificant variables

term independent variable

parentheses r2

exact collinearity