Endogenous explanatory variables Violation of the assumption that Cov (x i ;u i )=0 has serious consequences for the OLS estimator This is one of the key assumptions needed to establish consistency When one or more of the explanatory variables is correlated with the error term u i , we have both E (u i jx i ) 6=0 and E (x i u i ) 6=0, so the OLS estimator will be both biased and inconsistent 1
33
Embed
Endogenous explanatory variables - Nuffield College … · Endogenous explanatory variables ... Note that measurement error in the dependent variable does not lead to the same bias
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Endogenous explanatory variables
Violation of the assumption that Cov(xi, ui) = 0 has serious consequences
for the OLS estimator
This is one of the key assumptions needed to establish consistency
When one or more of the explanatory variables is correlated with the error
term ui, we have both E(ui|xi) 6= 0 and E(xiui) 6= 0, so the OLS estimator
will be both biased and inconsistent
1
We will consider two situations where this occurs:
- a linear model with Cov(xi, ui) = 0 is the correct specification, but one
or more of the explanatory variables is measured with error
- a linear model with Cov(xi, ui) = 0 is the correct specification, but one or
more of the explanatory variables is not measured at all, and hence omitted
from the model we can estimate
These are simply two examples of cases where we have simultaneity or
endogeneity, i.e. one or more of the explanatory variables is correlated
with the error term
2
Measurement error/errors-in-variables
A common concern in applied econometrics is that relevant explanatory
variables may be poorly measured
Examples - survey data on households:
- recall bias: how much time did you spend unemployed last year?
- rounding bias: how much money did you spend on food last week?
3
Illustrate ‘attenuation bias’for the case of a single explanatory variable,
measured with error
- the OLS estimator is biased towards zero if the explanatory variable is
measured with error
- this bias does not disappear in large samples (OLS is inconsistent)
Note that measurement error in the dependent variable does not lead to
the same bias and inconsistency problems, provided the measurement error
in yi is uncorrelated with (correctly measured) xi
4
Consider the model with a single explanatory variable and no intercept
y∗i = x∗iβ + ui for i = 1, ..., N
where y∗i and x∗i denote the true values of these variables, that we may not
observe
To simplify, suppose E(ui) = E(x∗i ) = E(y∗i ) = 0 for i = 1, ..., N (original
variables may be expressed as deviations from their sample means)
We focus on large sample properties, and assume that E(x∗iui) = 0 for
i = 1, ..., N , and we have independent observations, so that β̂OLS would be
a consistent estimator of β if we observed the true values of y∗i and x∗i
5
First consider additive, mean zero measurement error in the dependent
variable only
yi = y∗i + vi↔ y∗i = yi − vi
yi is the observed value
y∗i is the true value
vi is the measurement error, with E(vi) = 0 for i = 1, ..., N
The true values x∗i are observed
6
Substituting this expression for y∗i in the true model
(yi − vi) = x∗iβ + ui
or yi = x∗iβ + (ui + vi)
Consistency requires x∗i to be uncorrelated with the error term (ui + vi)
Given E(x∗iui) = 0, the additional requirement is that E(x∗ivi) = 0 for
i = 1, ..., N
That is, the measurement error in the dependent variable is uncorrelated
with the explanatory variable
7
Now consider additive, mean zero measurement error in the explanatory
variable (only)
xi = x∗i + ei↔ x∗i = xi − ei
Substituting for x∗i in the true model
y∗i = (xi − ei)β + ui
or y∗i = xiβ + (ui − eiβ)
The OLS estimator of β here is biased and inconsistent
- for a given value of x∗i , observed xi and the measurement error ei are
positively correlated, which implies non-zero correlation between xi and the
error term in this model (ui − eiβ)8
y∗i = xiβ + (ui − eiβ)
For β > 0, this implies a negative correlation between xi and (ui − eiβ)
For β < 0, this implies a positive correlation between xi and (ui − eiβ)
For β > 0, the OLS estimator of β will be biased downwards
For β < 0, the OLS estimator of β will be biased upwards
In either case, the OLS estimator of β will be biased towards zero
- this is known as ‘attenuation bias’
9
To analyse this further, we invoke the classical errors-in-variables assump-
tions (for i = 1, ..., N)
E(x∗iei) = 0 Measurement error is uncorrelated with the true value of x∗i
E(uiei) = 0 Measurement error is uncorrelated with the true model error ui
V (ei) = σ2e Measurement error is homoskedastic
V (x∗i ) = σ2x∗ Population variance of the true x∗i exists and is finite
Now β̂OLS = (X′X)−1X ′y∗ =
N∑i=1
xiy∗i
N∑i=1
x2i
=
1N
N∑i=1
xiy∗i
1N
N∑i=1
x2i
Using xi = x∗i +ei and y∗i = x∗iβ+ui together with the above assumptions,
we obtain10
p limN→∞
β̂OLS =
p lim 1N
N∑i=1
(x∗i + ei)(x∗iβ + ui)
p lim 1N
N∑i=1
(x∗i + ei)2
=
(p lim 1
N
N∑i=1
x∗2i
)β + p lim 1
N
N∑i=1
x∗iui +
(p lim 1
N
N∑i=1
x∗i ei
)β + p lim 1
N
N∑i=1
uiei
p lim 1N
N∑i=1
x∗2i + 2p lim1N
N∑i=1
x∗i ei + p lim1N
N∑i=1
e2i
=E(x∗2i )β + E(x
∗iui) + E(x
∗iei)β + E(uiei)
E(x∗2i ) + 2E(x∗iei) + E(e
2i )
=E(x∗2i )β + 0 + 0 + 0
E(x∗2i ) + 0 + E(e2i )
=
(σ2x∗
σ2x∗ + σ2e
)β =
β
1 + (σ2e/σ2x∗)6= β if σ2e > 0
11
p limN→∞
β̂OLS =β
1 + (σ2e/σ2x∗)
< β for β > 0 and σ2e > 0
p limN→∞
β̂OLS =β
1 + (σ2e/σ2x∗)
> β for β < 0 and σ2e > 0
The OLS estimator of β is inconsistent, with a bias towards zero that does
not diminish as the sample becomes large
For given σ2x∗, the severity of this ‘attenuation bias’ increases with the
variance of the measurement error (σ2e)
The magnitude of the inconsistency depends inversely on the ‘signal-to-
noise’ratio (σ2x∗/σ2e)
12
Under the classical errors-in-variables assumptions with homoskedasticmea-
surement error, the presence of measurement error affects the estimated slope
parameter, but not the linearity of the relationship between y∗i and observed
xi
With heteroskedastic measurement error, the presence of measurement er-
ror may also introduce an incorrect indication of non-linearity in the rela-
tionship
For example, if β > 0 and V (ei) tends to be larger for individuals with
higher values of x∗i , then estimation of a non-linear relationship between y∗i
and observed xi could give an incorrect indication of a concave relationship
(illustrate)13
Multiple regression with errors in variables
y∗i = x∗′i β + ui
x′i = x∗′i + e′i
where x′i, x∗′i and e
′i are 1×K vectors
As before
y∗i = x′iβ + (ui − e′iβ)
In general, the OLS estimator of the K × 1 vector of parameters β will be
biased and inconsistent, since E[xi(ui − e′iβ)] 6= 0
14
If only one of the explanatory variables in xi is measured with error, we
can show that
- the OLS estimator of the coeffi cient on that variable is biased towards
zero
- the OLS estimator of the coeffi cients on the other explanatory variables
are also biased, in unknown directions
If several explanatory variables are measured with error, it is very diffi cult
to sign the biases for any of the coeffi cients
15
Omitted variables
Another common concern in applied econometrics is that relevant explana-
tory variables may be omitted from the model
Relevant explanatory variables are often unobserved or unobservable
Example
- survey data on individuals do not contain data on characteristics like
ability or motivation
This may make it diffi cult to attach causal significance to estimated para-
meters in linear regression-type models
16
Illustrate omitted variable bias for the case of a single included variable
and a single omitted variable
- the OLS estimator is biased if the omitted variable is relevant and corre-
lated with the included regressor
- this bias does not disappear in large samples (OLS is inconsistent)
- the direction of the bias depends on the sign of the correlation between
the included variable and the omitted variable
17
Consequently omitted variables - or ‘unobserved heterogeneity’- presents a
formidable challenge to drawing causal inferences from cross-section regres-
sions
There is a serious danger that observed, included explanatory variables
may just be proxying for unobserved, omitted factors - rather than exerting
a direct, causal influence on the outcome of interest
18
Note that this problem is not confined to empirical research in economics
Beware of medical studies claiming that some activity will help you live
longer
These claims are often based on cross-section correlations
It is diffi cult to draw causal conclusions unless we are confident that the
study has controlled for all potentially relevant confounding factors
19
We first consider the model with one included variable (x1i) and one omit-