10.1 pecification Error
Feb 09, 2016
10.2
Aims and Learning Objectives
By the end of this session students should be able to: • Understand the causes and consequences of multicollinearity
• Analyse regression results for possible multicollinearity
• Understand the nature of endogeneity
• Analyse regression results for possible endogeneity
10.3
Introduction
In this lecture we consider what happens when we violate Assumption 7:
No exact collinearity or perfect multicollinearity among the explanatory variables
and Assumption 3:
Cov(Ui, X2i) = Cov(Ui, X3i)... =... Cov(Ui,Xki) = 0
10.4
What is Multicollinearity?
Definitions
Perfect Multicollinearity: exact linear relationshipbetween two or more explanatory variables
Imperfect Multicollinearity: two or more explanatory variables are approximately linearly related
The term “independent variable” means an explanatory variable is independent of of the error term, but not necessarily independent of other explanatory variables.
10.5
Example: Perfect Multicollinearity
Suppose we want to estimate the following model:
iiii UXXY 33221
If there is an exact linear relationship between X2 and X3. For example, if
ii XX 32 510
Then we cannot estimate the individual partial regression coefficients
10.6
This is because substituting the last expressioninto the first we get:
iii
iiii
UXYUXXY
32321
33321
)5()10()510(
If we let
232211 5;10 AA
iii UXAAY 221
10.7
Example: Imperfect Multicollinearity
Although perfect multicollinearity is theoretically possible, in practice imperfect multicollinearity is what we commonly observed.
Typical examples of perfect multicollinearity are when the researcher makes a mistake (including the same variable twice or forgetting to omit a default category for a series of dummy variables)
10.8
Consequences of Multicollinearity
1. No OLS output when multicollinearity is exact.2. large standard errors and wide confidence
intervals. 3. Estimators sensitive to deletion or addition of a
few observations or “insignificant” variables. Estimators non-robust.
4. Estimators have the “wrong” sign
OLS remains BLUE, however some adverse practicalconsequences:
10.9
Detecting Multicollinearity
1. Few significant t-ratios but a high R2 and a collective significance of the variables 2. High pairwise correlation between the
explanatory variables3. Examination of partial correlations4. Estimate auxiliary regressions5. Estimate variance inflation factor (VIF)
No formal “tests” for multicollinearity
10.10
Auxiliary Regressions
Auxiliary Regressions - regress each explanatory variable on the remaining explanatory variables
iii XaXaaX 433212
iii XbXbbX 432213
iii XcXccX 332214
The R2 will show how strongly Xji is collinear withthe other explanatory variables
10.11
Variance Inflation Factor
2
2
2 )ˆvar(ix
22 )( XXx ii
)1()ˆvar( 22
2
jjj Rx
In the two variable model (bivariate regression) the variance of the OLS estimator was:
where
Extending this to the case of more than two variables leads to the formulae laid out in lecture 5, or alternatively:
10.12
Example: Imperfect Multicollinearity
CON INC WLTH 1 70 80 8102 65 100 10093 90 120 12734 95 140 14255 110 160 16336 115 180 18767 120 200 20528 140 220 22019 155 240 243510 150 260 2686
Hypothetical dataon weekly family consumption expenditure (CON), weeklyfamily income (INC)and wealth (WLTH)
10.13
CON = 24.775 + 0.942INC -0.0424WLTH (3.669) (1.1442) (-0.526)
(t-ratios in parentheses) R2 = 0.964 ESS = 8,565.554 RSS = 324.446F= 92.349
Regression Results:
R2 is high (96%); wealth has the wrong sign but neither slope coefficient is individually statistically significant. Joint hypothesis, however, is significant
10.14
Auxiliary Regression Results:
INC = -0.386 + 0.098WLTH (-0.133) (62.04)
(t-ratios in parentheses) R2 = 0.998 F= 3849
Variance Inflation Factor:
500998.01
11
12
jR
10.15
Remedying Multicollinearity
High multicollinearity occurs because of a lack of adequate information in the sample
1. Collect more data with better information.2. Perform robustness checks3. If all else fails at least point out that the
poor model performance might be due to the multicollinearity problem (or it might not).
10.16
The Nature of Endogenous Explanatory Variables
In real world applications we distinguish between:
• Exogenous (pre-determined) Variables
• Endogenous (jointly determined) Variables
When one or more explanatory variable isendogenous, there is implicitly a system ofsimultaneous equations
10.17
Example: Endogeneity
iiiii UEESW 24321ln
But
ii VAS 21 Therefore Cov(S, U) 0
OLS of the relationship between W and S gives “credit” to education for changes in the disturbances. Resulting OLS estimator is biased upwards (since Cov (Si, Ui) > 0) and, because the problem persists even in large samples, the estimator is also inconsistent
10.18
Remedies for Endogeneity
Two options:
• Try and find a suitable proxy for the unobserved variable
• Leave the unobserved variable in the error term but use an instrument for the endogenous explanatory variable (involves a different estimation technique)
10.19
Example
iiiii UEESW 24321ln
ii VAS 21 and
Include a proxy for ability
iiiii UAEESW 52
4321ln
Find an instrument for education
Needs to have the following properties Cov(Z,U) = 0 and Cov(Z, S) 0
10.20
Hausman Test for Endogeneity
Stage 1: Estimate the reduced form:
Stage 2: Add to the structural equation and test the significance of
Decision rule: if is significantreject null hypothesis of exogeneity
iiiii UEESW 24321ln
Suppose we wish to test whether S is uncorrelatedwith U.
iiiii VMSEES 42
321
iV̂
iV̂
10.21
Summary
In this lecture we have:
1. Outlined the theoretical and practical consequences of multicollinearity
2. Described a number of procedures for detecting the presence of multicollinearity
3. Outlined the basic consequences of endogeneity
4. Outlined a procedure for detecting the presence of endogeneity