-
ApEc 8212 Econometric Analysis II Lecture #3
Review of Linear Models and OLS Estimation (Wooldridge, Chapter
4)
This lecture will review much of the material you had in Apec
8211, but in a way that will prepare you for topics that will be
covered this semester that you have not seen before. I hope you
appreciate that it is much less theoretical than the material in
Lectures 1 and 2, although I will often refer to the material in
those two lectures (which covered Wooldridges Chapters 2 and 3). I.
Overview of the Single Equation Linear Model Start with a model
that shows a linear relationship between the dependent variable, y,
and the explanatory variables in the vector x:
y = 0 + 1x1 + 2x2 + + KxK + u = x + u Strictly speaking, this is
a model of the population from which the data are drawn, not a
model of the data itself. By definition, this model assumes that
there is a causal (structural) relationship in that changes in x
cause changes in y, and that this causal relationship is
linear.
1
-
The error term u in the model can represent many things, such as
other (omitted or unobserved) variables or measurement error in the
variables that we do observe. We will get to this later in this
lecture. The key assumption regarding u that is required for
consistent estimation of using OLS is:
E[u] = 0
Cov(xj, u) = 0, j = 1, 2, , K The assumption that E[u] = 0 is
trivial if x includes a constant term (intercept), since the for
that constant term can always be adjusted to ensure that E[u] = 0
without changing any of the other elements in . It is the zero
covariance assumption that is the really important assumption. Note
that E[u| x] = 0 implies Cov(u, x), but not vice versa. Thus E[u|
x] = 0 is a stronger assumption than Cov(u, x). Combining the
linear model assumption and the assumption that E[u| x] = 0 implies
that:
E[y| x] = 0 + 1x1 + 2x2 + + KxK = x
2
-
Of course, since x can include interaction or higher order terms
(e.g. x3 could be x12 or x1x2), the linear model has a fair amount
of flexibility. Wooldridge states most of his results using the
assumption Cov(u, x) = 0, since it is a weaker assumption than E[u|
x] = 0. Loosely speaking (well talk more about this in later
lectures) we can say that a variable xj is endogenous if Cov(u, xj)
0, and a variable is exogenous if Cov(u, xj) = 0. Thus in this
linear model the assumption that Cov(u, x) = 0 implies that all of
the variables in x are exogenous (for this model only!).
Endogeneity of one or more of the variables in x implies that OLS
estimates of are inconsistent. Endogeneity can arise for three
reasons:
1. Omitted variables. Suppose that we are interested in the
causal impact on y of both the variables in x and another variable
q. That is, we want to know E[y| x, q] for a wide range of x and q.
But suppose that we do not have data on q, so we can estimate only
E[y| x]. This relationship between x and y will not necessarily be
the same as the (causal) relationship between x (and q) and y
represented by E[y| x, q] if q is
3
-
correlated with one or more of the variables in x. An example of
this is estimating the impact of years of schooling on wages, where
q is unobserved innate ability.
2. Measurement error. Our data may be bad in
that the x in our data is not the true x, which can be denoted
by x*. This is a very serious (and often ignored) problem in
econometrics.
3. Simultaneity. Perhaps some of the variables in
x not only cause y but are also caused by y. Since y is caused
by u, and these x variables are cause by y, then these x variables
will be correlated with u. An interesting example is crime rates
and the size of the police force. We may be interested in
estimating the impact of the size of the police force on the crime
rate, but high crime rates may cause the government to increase the
size of the police force.
In todays lecture we will discuss in detail the first two
problems, but the third problem will have to wait until we take up
instrumental variables (starting this Wednesday) and estimation of
systems of equations (starting next week).
4
-
II. Asymptotic Properties of OLS Estimate of Economists and
other researchers often use OLS to estimate in the model y = x + u.
For simplicity, let the first element in x, x1, be a constant term.
We have a sample of size N: {(yi, xi): i = 1, 2, N}. Assume that
each observation is randomly drawn from the same population, so any
two observations, (yi, xi) and (yj, xj) are independent (if i j)
and identically (jointly) distributed (i.i.d.) random variables.
For each observation we have:
yi = xi + ui. Consistency In addition to the assumption that the
model is linear, we need two more assumptions for consistency:
Assumption OLS.1: E[xu] = 0 Assumption OLS.2: Rank(E[xx]) = K Since
the first element of x is assumed to be a constant, Assumption
OLS.1 implies that E[u] = 0.
5
-
You should be able to show that E[xu] = 0 implies Cov(xj, u) = 0
for all xj in x. Assumption OLS.2 is needed to ensure that the OLS
estimate of is unique. If it does not hold, then one or more
elements of x is a linear combination of the others, which means
that there are many s that give the same (conditional) expected
values of y. Note that this assumption is equivalent to assuming
that E[xx] is a positive definite KK matrix, and that the
variance-covariance matrix of the variables in x (i.e. after
removing the constant term) is nonsingular. These two assumptions
imply that is identified, that is it can be expressed (solved for)
in terms of the population moments (e.g. variances and covariances)
of x and y. The solution comes from pre-multiplying y = x + u by x,
taking expectations, and rearranging
= (E[xx])-1E[xy] To show consistency write out the OLS estimate
of :
OLS = 1N
1iii
1 'N
=
xx
=
N
1iii
1 yN x
= + 1N
1iii
1 'N
=
xx
=
N
1iii
1 uN x
6
-
Of course, OLS can be written as (XX)-1(Xy). By Corollary 3.1,
Assumption OLS.2 implies that XX is nonsingular w.p.a.1 and
plim[((1/N)
=
N
1ixixi)-1] = A-1,
where A E[xx]. By assumption OLS.1 we have plim[(1/N)
=
N
1ixiui] = E[xu] = 0. Then by Slutskys
theorem we have plim[ OLS] = + A-10 = : Theorem 4.1 (consistency
of OLS): Under assumptions OLS.1 and OLS.2, OLS from a random
sample of the population model y = x + u is a consistent estimator
of . Note that OLS is also a consistent estimate of the parameters
of a linear projection of y on x, as long as assumption OLS.2
holds, since the linear projection is defined as (E[xx])-1E[xy],
plim[((1/N)
=
N
1ixixi)-1] =
(E[xx])-1 and plim[((1/N)=
N
1ixiyi)-1] = E[xy].
A few other important points:
1. In practice, you usually dont have to worry about assumption
OLS.2; it is obvious when it fails (and it rarely fails). It is
assumption OLS.1
7
-
that often fails and, more perniciously, it is very difficult to
determine whether it holds or fails.
2. Assumptions OLS.1 and OLS.2 do not by
themselves imply that OLS is unbiased. To show that we need the
slightly more restrictive assumption that E[u| x] = 0.
3. The consistency result requires only that u and x
are uncorrelated, not that they are independent. Independence
would imply that Var[u| x] is constant, but we dont need this for
consistency.
Asymptotic Normality The above expression for OLS implies
that:
N ( OLS - ) = 1N
1iii
1 'N
=
xx
=
N
1iii
2/1 uN x
Theorem 4.1 implies that 1N
1iii
1 'N
=
xx - A-1 = op(1)
[this expression converges in probability to zero]. Note that
{(xiui): i = 1, 2, N} is an i.i.d. sequence with zero mean, and
assume that each element of xiui has a finite variance, then by the
Central Limit
8
-
Theorem we have N-1/2=
N
1ixiui
d N(0, B), where B
E[u2xx]. By Lemma 3.5, N-1/2=
N
1ixiui = Op(1). Thus:
N ( OLS - ) = A-1
=
N
1iii
2/1 uN x + op(1)
since op(1) Op(1) = op(1). To go farther, we need a
homoskedasticity assumption: Assumption OLS.3: E[u2xx] = 2E[xx],
where 2 = E[u2] A stronger assumption is that E[u2| x] = 2, but it
is not needed. (Although it is easier to interpret than Assumption
OLS.3). Putting all of this together gives: Theorem 4.2 (asymptotic
normality of OLS): Under assumptions OLS.1, OLS.2 and OLS.3:
N ( OLS - ) ~a
N(0, 2A-1) , where A E[xx] Proof: Lemma 3.7 and Corollary 3.2
imply that
N ( OLS - ) ~a
N(0, A-1BA-1). Then Assumption OLS.3 implies that B = 2A.
9
-
In practice we can estimate Avar( OLS) as 2 (XX)-1,
where 2 = (1/(N-K))=
N
1i(yi xi OLS)2.
If the assumptions of Theorem 4.2 hold, then the usual
(OLS-based) formulas for standard errors of OLS, for t-statistics
and for F-statistics are all asymptotically valid.
Heteroscedasticity-Robust Inference If Assumption OLS.3 fails, OLS
is still a consistent estimator of , but we cant use 2 (XX)-1 to
estimate Avar( OLS). This is a serious problem because it is quite
possible that Assumption OLS.3 does fail. Fortunately, it is not
hard to get an estimate of Avar( OLS) under less restrictive
assumptions. (An alternative approach is weighted least squares
(WLS), but this requires a (parametric) model for Var[y| x], which
may be just as restrictive as Assumption OLS.3.) To estimate Avar(
OLS) under a set of less restrictive assumptions, go back to the
asymptotic normality discussion. The asymptotic variance of OLS
without Assumption OLS.3 is Avar( OLS) = A-1BA-1/N. Our
10
-
consistent estimate of A-1 is N(XX)-1, so we just need a
consistent estimate of B. By the (weak) law of large numbers
(Theorem 3.1) we have:
(1/N)=
N
1iui2xixi
p E[u2xx] = B
We can replace ui with iu = yi xi OLS. Thus
(1/N)=
N
1iiu2xixi is a consistent estimator of B and:
Ava r( OLS) = (XX)-1(
=
N
1iiu2xixi)(XX)-1
The standard errors of OLS are heteroscedasticity-robust
standard errors. They are also called White standard errors or
Huber standard errors. These can be used to obtain t-statistics in
the usual way. However, the usual F-test is invalid. You should use
the heteroscedasticity-robust Wald statistic instead to test
hypotheses of the form H0: R = r (R is QK, r is Q1). That test
statistic is:
W = (R OLS - r)(RV R)-1(R OLS - r) Where V is the above formula
for Ava r( OLS).
11
-
Under H0 W ~a
Q2. Dividing W by Q yields an (approximate) F-statistic that is
distributed as FQ,N-K. Lagrange Multiplier (Score) Tests In
econometrics there are 3 classic statistical tests: the Wald test,
the Lagrange multiplier test and the likelihood ratio (LR) test. We
saw a very general exposition of the Wald test at the end of
Lecture 2. Now we will look at the Lagrange Multiplier test in the
context of a linear model. Suppose that we divide the x variables
into 2 groups:
y = x11 + x22 + u where x1 has K1 elements and x2 has K2
elements. If Assumptions OLS.1, OLS.2 and OLS.3 hold, it is easy to
test the hypothesis H0: 2 = 0 using a standard F-test. But what if
Assumption OLS.3 doesnt hold? Let 1
~ be the estimate of 1 when the constraint that
2 = 0 is imposed. [Question: How do you estimate 1
~ ?] Define u~ i = yi - xi1 1
~ . Under H0, u~ i should be
uncorrelated with xi2 because, conditional on x11, x2 should
have no explanatory power for y and thus no explanatory power for
u~ .
12
-
This lack of explanatory power can be tested by regressing u~ on
x1 and x2. Let Ru2 be the R2 from this regression (assume that x1
contains the constant term). The Lagrange multiplier (LM) statistic
is:
LM = NRu2 Under H0, LM ~
a K22. Note that the regression must
include x1 as well as x2, even though by construction x1 will
not be correlated with u~ . Finally, if x1 does not include a
constant, then Ru2 should be the uncentered R2. The LM test just
described requires Assumption OLS.3. If that assumption is
incorrect, the procedure can be modified as follows. After some
algebra (this would be a good homework problem!) you can show that
the LM statistic is:
LM =
=
N
1iii
2/1 u~N r 1N
1iii
12 'N~
=
rr
=
N
1iii
2/1 u~N r
where 2~ =
=
N
1iu~ i2 and each ir is a K21 vector of OLS
residuals from regressing x2 on x1. This test statistic is not
robust to heteroscedasticity because the middle term is not a
consistent estimate of the asymptotic
13
-
variance of N-1/2=
N
1iir u~ i when u is heteroscedastic.
Using the Huber-White approach just discussed, the
heteroscedasticity-robust LM test statistic is:
LM =
=
N
1iii
2/1 u~N r 1N
1iii
2i
1 'u~N
=
rr
=
N
1iii
2/1 u~N r
=
=
N
1iiiu~r
1N
1iii
2i 'u~
=
rr
=
N
1iiiu~r ~
a 2 2K
This can be obtained by regressing (without an intercept) a
constant on r u~ . Let SSR0 be the sum of the squared residuals
from this regression. Then:
LM = N SSR0 ~a
2 2K III. OLS Methods for Omitted Variable Bias Suppose there is
a causal variable that determines y but is unobserved, denoted by
q. If we observed q we could derive the causal impacts of x and q
on y by observing E[y| x, q] for a large number of observations.
Suppose that this relationship is linear:
E[y| x, q] = x + q
14
-
Note that measures the impacts of each variable in x on y,
holding the other x variables, and q, constant. Question: Suppose
there are 2 unobserved variables; is this very different from
having 1 such variable? We can rewrite the above relationship
as:
y = x + q + v, E[v| x, q] = 0 Wooldridge calls v the structural
error, while the effective error (due to the unobservability of q)
is u q + v. So the above equation can be rewritten as:
y = x + u
By changing the constant term we can set E[q] = 0, and thus E[u]
= 0. Unfortunately, if q is correlated with any of the regressors,
then so is u. Using OLS to estimate by regressing x on y will, in
general, give inconsistent estimates for all the elements of (not
just the ones of the x variables that are correlated with q). This
is called omitted variables inconsistency or (more casually)
omitted variables bias. To see the bias, consider a linear
projection of q on x (note that we are not assuming that E[q| x] =
L[q| x]):
15
-
q = x + r, where E[r] = 0 and Cov(x, r) = 0 Plugging this into
the first equation for y gives:
y = x( + ) + v + r By assumption, E[v + r] = 0 and x is
uncorrelated with v + r. Thus a regression of y on x satisfies
OLS.1 and we will find that, for any element xj of x:
plim j, OLS = j + j Sometimes people assert that only one
variable in x, call it xK, is correlated with q, or (more
precisely) that all the elements of other than K equal zero. If
this more precise assumption is true, then:
plim = j, OLS = j for j K
plim K, OLS = K + K = K + Cov(xK, q)/Var(xK) But in the general
case j is a partial correlation of q and xj, not a raw correlation,
and in general these will not be the same. Yet if a good argument
can be made that all js other than K equal zero then the raw
correlation of xK and q gives the direction of the bias.
16
-
Using Proxy Variables to Remove Omitted Var. Bias Sometimes we
have data on a proxy variable that is similar to the omitted
variable q. Intuitively, it seems reasonable that putting that
variable in will fix, or at least reduce, the omitted variable
problem. In fact, researchers do this very often, so it is useful
to work out the precise conditions needed for this procedure to
succeed in reducing or eliminating omitted variable bias. Let z be
the candidate proxy variable. Adding it will remove omitted
variable bias if 2 conditions hold:
E[y| x, q, z] = E[y| x, q]
L[q| 1, x, z] = L[q| 1, z] The first condition means that z does
not have any explanatory power for y that is not already contained
in x and q. In many cases this is a reasonable assumption. For
example, in a wage regression and you have data on true innate
ability then you would not expect some kind of IQ test to have any
explanatory power beyond the impact of innate ability. Actually,
the first condition is a little stronger than needed, all we really
need is that z is uncorrelated with v.
17
-
The second requirement is that, once we condition on z, there is
no correlation between q and x. Intuitively, adding z to the xs as
regressors in a linear model removes the (partial) correlation
between x and q. Another way to think of this is that q can be
divided into two parts, q = 1z + r, where Cov(z, r) = 0 and that
all of the correlation between q and x comes from the 1z part, so
that Cov(x, r) = 0. If these two conditions hold, we can insert q =
1z + r into the equation y = x + q + v:
y = x + 1z + (r + v) [If you add a constant to q = 1z + r then
the constant in will change, but nothing else will change in .]
Since the (composite) error term (r + v) is uncorrelated with both
x and z, by Theorem 4.1 we can consistently estimate (and 1). What
if z is an imperfect proxy in the sense that the r in q = 1z + r is
correlated with x? A linear projection of q on x and z will then
give:
q = x + 1z + r*
18
-
where r*, which comes from r = x + r*, is not correlated with x.
The OLS estimate from a regression of y on x and z will have a plim
of + . You could argue (hope?) that is smaller than to justify the
use of such an imperfect proxy. A final point is that adding
proxies and even imperfect proxies may reduce your estimates of the
variance of OLS. The intuition here is that that variance is
determined in part by the variance of the error term, and if you
convert part of the error term into an observable variable then you
reduce the variance of the remaining error term. Models with
Interactions in Unobservables Things become more complicated if the
impact of an observed variable depends on the value of an
unobserved variable, i.e. the underlying model has an interaction
term between q and one or more variables in x. In particular,
assume that the structural model is:
y = x + 1q + 2xKq + v, E[v| x, q] = 0 If xK is a continuous
variable then the partial effect of xK on E[y| x, q] is:
19
-
Kx]q,|y[E
x = K + 2q
We can never estimate this for a particular q, since we never
observe q. However, we can estimate the average partial effect
(APE) if we have a good estimate for K. The APE is simply the
population average of the partial effect, so:
APE = E[K + 2q] = K + 2E[q] (= K if E[q]=0) How do we estimate
when there is an unobservable q and it is interacted with one of
the x variables? Assume that we have a proxy variable z that meets
the above criteria E[y| x, q, z] = E[y| x, q] and:
E[q| x, z] = E[q| z] = 1z, where E[z] = 0 These assumptions +
law of iterated expectations give:
E[y| x, z] = x + 11z + 21xKz All of these parameters can be
estimated consistently using OLS. Wooldridge explains (p.69) that
even if the original structural model is homoscedastic this model
will be heteroscedastic, so you should always calculate
heteroscedasticity-robust standard errors.
20
-
IV. Impact of Measurement Error on OLS Estimation Measurement
error is a big and (until recently) often ignored problem in
applied econometrics. The big problem is measurement error in x.
Measurement error in y is only a problem if the errors in
measurement are correlated with some of the x variables. To see
this, let y* be the true y and assume that y measures it with
error: y = y* + e0. The structural model is thus:
y* = x + v but the data we have show the relationship:
y = x + v + e0 If eo is uncorrelated with x (and v is
uncorrelated with x) then Assumption OLS.1 is satisfied and OLS
consistently estimates . The only disadvantage is that the
(uncorrelated) composite error term is larger, which implies a
larger covariance matrix for OLS. Yet if e0 is correlated with some
or all of the elements of x then the composite error term will be
correlated with x and then plim OLS .
21
-
Measurement Error in the x variables This is a much bigger
problem because even if measurement error in the x variables is
uncorrelated with everything else, OLS will still be an
inconsistent estimate of . In the simplest case, suppose one of the
x variables, call it xK, is measured with error: xK = xK* + eK,
where x* is the true x. The structural model is:
y = 0 + 1x1 + + KxK* + v Assume that E[v] = 0 and Cov(x, v) = 0
(x is x1, , xK*). Assume that Cov(eK, v) = 0 and Cov(x, eK) = 0.
(Both properties of eK follow from the redundancy assumption E[y|
x, xK] = E[y| x].) Finally, by letting the constant term 0 adjust
we can impose E[eK] = 0. [On pp.73-74 Wooldridge discusses the case
where Cov(xK, eK) = 0; he points out that this is too good to be
true since it leads to no problem with OLS, so I wont discuss this
case.] The assumption that Cov(x, eK) = 0 implies that Cov(xK*, eK)
= 0, this is the classical errors in
22
-
variables (CEV) assumption. This implies that Cov(xK, eK) =
Var(eK) eK2. IF xK* is uncorrelated with all of the other variables
in x then plim[ j] = j for all j K. In the more general case where
xK* could be correlated with some or all of the other x variables,
you can show (maybe a homework problem) that OLS estimation of this
model will give:
plim[K,OLS] = K
+ 2e2
*r
2*r
KK
K
where rK* is the linear projection error from:
xK* = 0 + 1x1 + K-1xK-1 + rK* The important point to note is
that |K,OLS| will always underestimate |K| IF xK is the only x
variable with measurement error. This is often referred to as
attenuation bias. Question: Suppose xK* is uncorrelated with all
the other x variables. What will be the values of the s in the
above regression? Is K,OLS still inconsistent?
23
-
If there is measurement error in some of the other x variables,
then things become more complicated, and it is not guaranteed that
that bias will be toward zero (though in practice this is often the
case). Even worse, it may be that the eK is correlated with the
xK*, then it is also possible that bias may not be toward zero. One
way to fix this: instrumental variables!
24