Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification ii: inclusion of an irrelevant variable Original.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Christopher Dougherty
EC220 - Introduction to econometrics (chapter 6)Slideshow: variable misspecification ii: inclusion of an irrelevant variable
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 6). [Teaching Resource]
This version available at: http://learningresources.lse.ac.uk/132/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
In this sequence we will investigate the consequences of including an irrelevant variable in a regression model.
1
Consequences of variable misspecification
TRUE MODEL
FIT
TE
D M
OD
EL
uXXY 33221 uXY 221
33
221ˆ
Xb
XbbY
221ˆ XbbY
Correct specification,no problems
Correct specification,no problems
Coefficients are biased (in general). Standarderrors are invalid.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
Consequences of variable misspecification
TRUE MODEL
FIT
TE
D M
OD
EL
uXXY 33221 uXY 221
33
221ˆ
Xb
XbbY
221ˆ XbbY
Correct specification,no problems
Correct specification,no problems
Coefficients are biased (in general). Standarderrors are invalid.
The effects are different from those of omitted variable misspecification. In this case the coefficients in general remain unbiased, but they are inefficient. The standard errors remain valid, but are needlessly large.
Coefficients are unbiased (in general),
but inefficient.Standard errors are
valid (in general)
2
uXY 221
33221ˆ XbXbbY
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
These results can be demonstrated quickly.
3
uXY 221
33221ˆ XbXbbY
uXXY 3221 0
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
Rewrite the true model adding X3 as an explanatory variable, with a coefficient of 0. Now the true model and the fitted model coincide. Hence b2 will be an unbiased estimator of 2 and b3 will be an unbiased estimator of 0.
4
uXY 221
33221ˆ XbXbbY
uXXY 3221 0
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
However, the variance of b2 will be larger than it would have been if the correct simple regression had been run because it includes the factor 1 / (1 – r2), where r is the correlation between X2 and X3.
2,
222
22
32
2 11
XXi
ub rXX
5
uXY 221
33221ˆ XbXbbY
uXXY 3221 0
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
The estimator b2 using the multiple regression model will therefore be less efficient than the alternative using the simple regression model.
2,
222
22
32
2 11
XXi
ub rXX
6
uXY 221
33221ˆ XbXbbY
uXXY 3221 0
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
The intuitive reason for this is that the simple regression model exploits the information that X3 should not be in the regression, while with the multiple regression model you find this out from the regression results.
2,
222
22
32
2 11
XXi
ub rXX
7
uXY 221
33221ˆ XbXbbY
uXXY 3221 0
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
The standard errors remain valid, because the model is formally correctly specified, but they will tend to be larger than those obtained in a simple regression, reflecting the loss of efficiency.
2,
222
22
32
2 11
XXi
ub rXX
8
These are the results in general. Note that if X2 and X3 happen to be uncorrelated, there will be no loss of efficiency after all.
uXY 221
33221ˆ XbXbbY
uXXY 3221 0
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
2,
222
22
32
2 11
XXi
ub rXX
9
. reg LGFDHO LGEXP LGSIZE
Source | SS df MS Number of obs = 868---------+------------------------------ F( 2, 865) = 460.92 Model | 138.776549 2 69.3882747 Prob > F = 0.0000Residual | 130.219231 865 .150542464 R-squared = 0.5159---------+------------------------------ Adj R-squared = 0.5148 Total | 268.995781 867 .310260416 Root MSE = .388
The analysis will be illustrated using a regression of LGFDHO, the logarithm of annual household expenditure on food eaten at home, on LGEXP, the logarithm of total annual household expenditure, and LGSIZE, the logarithm of the number of persons in the household.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
10
. reg LGFDHO LGEXP LGSIZE
Source | SS df MS Number of obs = 868---------+------------------------------ F( 2, 865) = 460.92 Model | 138.776549 2 69.3882747 Prob > F = 0.0000Residual | 130.219231 865 .150542464 R-squared = 0.5159---------+------------------------------ Adj R-squared = 0.5148 Total | 268.995781 867 .310260416 Root MSE = .388
Now add LGHOUS, the logarithm of annual expenditure on housing services. It is safe to assume that LGHOUS is an irrelevant variable and, not surprisingly, its coefficient is not significantly different from zero.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE
12
. reg LGFDHO LGEXP LGSIZE LGHOUS
Source | SS df MS Number of obs = 868---------+------------------------------ F( 3, 864) = 307.22 Model | 138.841976 3 46.2806586 Prob > F = 0.0000Residual | 130.153805 864 .150640978 R-squared = 0.5161---------+------------------------------ Adj R-squared = 0.5145 Total | 268.995781 867 .310260416 Root MSE = .38812