Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the dummy variable trap Original citation: Dougherty, C. (2012) EC220.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Christopher Dougherty
EC220 - Introduction to econometrics (chapter 5)Slideshow: the dummy variable trap
Original citation:
Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 5). [Teaching Resource]
This version available at: http://learningresources.lse.ac.uk/131/
Available in LSE Learning Resources Online: May 2012
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
THE DUMMY VARIABLE TRAP
1
Suppose that you have a regression model with Y depending on a set of ordinary variables X2, ..., Xk and a qualitative variable.
uDDXXY sskk ...... 22221
THE DUMMY VARIABLE TRAP
2
Suppose that the qualitative variable has s categories. We choose one of them as the omitted category (without loss of generality, category 1) and define dummy variables D2, ..., Ds for the rest.
uDDXXY sskk ...... 22221
THE DUMMY VARIABLE TRAP
3
What would happen if we did not drop the reference category? Suppose we defined a dummy variable D1 for it and included it in the specification. What would happen then?
uDDXXY sskk ...... 22221
uDDDXXY sskk ...... 2211221
THE DUMMY VARIABLE TRAP
4
We would fall into the dummy variable trap. I would be impossible to fit the model as specified.
uDDXXY sskk ...... 22221
uDDDXXY sskk ...... 2211221
THE DUMMY VARIABLE TRAP
5
We will start with an intuitive explanation. The coefficient of each dummy variable represents the increase in the intercept relative to that for the basic category. But there is no basic category for such a comparison.
uDDXXY sskk ...... 22221
uDDDXXY sskk ...... 2211221
THE DUMMY VARIABLE TRAP
6
1 represents the fixed component of Y for the basic category. But again, there is no basic category. Thus the model does not have any logical interpretation.
uDDXXY sskk ...... 22221
uDDDXXY sskk ...... 2211221
THE DUMMY VARIABLE TRAP
7
Mathematically, we have a special case of exact multicollinearity. If there is no omitted category, there is an exact linear relationship between X1 and the dummy variables. The table gives an example where there are 4 categories.
X1 is the variable whose coefficient is 1. It is equal to 1 in all observations. Usually we do not write it explicitly because there is no need to do so.
If there is an exact linear relationship among a set of the variables, it is impossible in principle to estimate the separate coefficients of those variables. To understand this properly, one needs to use linear algebra.
If you tried to run the regression anyway, the regression application should detect the problem and do one of two things. It may simply refuse to run the regression.
There is another way of avoiding the dummy variable trap. That is to drop the intercept (and X1). There is no longer a problem because there is no longer an exact linear relationship linking the variables.
The parameters are now the intercepts in the relationship for the individual categories. For example, if the observation relates to category 2, all the dummy variables except D2 will be equal to 0. D2 = 1, and hence the relationship for that observation has intercept 2.