exogeneity

ApEc 8212 Econometric Analysis II Lecture #3

Review of Linear Models and OLS Estimation (Wooldridge, Chapter 4)

This lecture will review much of the material you had in Apec 8211, but in a way that will prepare you for topics that will be covered this semester that you have not seen before. I hope you appreciate that it is much less theoretical than the material in Lectures 1 and 2, although I will often refer to the material in those two lectures (which covered Wooldridges Chapters 2 and 3). I. Overview of the Single Equation Linear Model Start with a model that shows a linear relationship between the dependent variable, y, and the explanatory variables in the vector x:

y = 0 + 1x1 + 2x2 + + KxK + u = x + u Strictly speaking, this is a model of the population from which the data are drawn, not a model of the data itself. By definition, this model assumes that there is a causal (structural) relationship in that changes in x cause changes in y, and that this causal relationship is linear.

1

The error term u in the model can represent many things, such as other (omitted or unobserved) variables or measurement error in the variables that we do observe. We will get to this later in this lecture. The key assumption regarding u that is required for consistent estimation of using OLS is:

E[u] = 0

Cov(xj, u) = 0, j = 1, 2, , K The assumption that E[u] = 0 is trivial if x includes a constant term (intercept), since the for that constant term can always be adjusted to ensure that E[u] = 0 without changing any of the other elements in . It is the zero covariance assumption that is the really important assumption. Note that E[u| x] = 0 implies Cov(u, x), but not vice versa. Thus E[u| x] = 0 is a stronger assumption than Cov(u, x). Combining the linear model assumption and the assumption that E[u| x] = 0 implies that:

E[y| x] = 0 + 1x1 + 2x2 + + KxK = x

2

Of course, since x can include interaction or higher order terms (e.g. x3 could be x12 or x1x2), the linear model has a fair amount of flexibility. Wooldridge states most of his results using the assumption Cov(u, x) = 0, since it is a weaker assumption than E[u| x] = 0. Loosely speaking (well talk more about this in later lectures) we can say that a variable xj is endogenous if Cov(u, xj) 0, and a variable is exogenous if Cov(u, xj) = 0. Thus in this linear model the assumption that Cov(u, x) = 0 implies that all of the variables in x are exogenous (for this model only!). Endogeneity of one or more of the variables in x implies that OLS estimates of are inconsistent. Endogeneity can arise for three reasons:

1. Omitted variables. Suppose that we are interested in the causal impact on y of both the variables in x and another variable q. That is, we want to know E[y| x, q] for a wide range of x and q. But suppose that we do not have data on q, so we can estimate only E[y| x]. This relationship between x and y will not necessarily be the same as the (causal) relationship between x (and q) and y represented by E[y| x, q] if q is

3

correlated with one or more of the variables in x. An example of this is estimating the impact of years of schooling on wages, where q is unobserved innate ability.

2. Measurement error. Our data may be bad in

that the x in our data is not the true x, which can be denoted by x*. This is a very serious (and often ignored) problem in econometrics.

3. Simultaneity. Perhaps some of the variables in

x not only cause y but are also caused by y. Since y is caused by u, and these x variables are cause by y, then these x variables will be correlated with u. An interesting example is crime rates and the size of the police force. We may be interested in estimating the impact of the size of the police force on the crime rate, but high crime rates may cause the government to increase the size of the police force.

In todays lecture we will discuss in detail the first two problems, but the third problem will have to wait until we take up instrumental variables (starting this Wednesday) and estimation of systems of equations (starting next week).

4

II. Asymptotic Properties of OLS Estimate of Economists and other researchers often use OLS to estimate in the model y = x + u. For simplicity, let the first element in x, x1, be a constant term. We have a sample of size N: {(yi, xi): i = 1, 2, N}. Assume that each observation is randomly drawn from the same population, so any two observations, (yi, xi) and (yj, xj) are independent (if i j) and identically (jointly) distributed (i.i.d.) random variables. For each observation we have:

yi = xi + ui. Consistency In addition to the assumption that the model is linear, we need two more assumptions for consistency: Assumption OLS.1: E[xu] = 0 Assumption OLS.2: Rank(E[xx]) = K Since the first element of x is assumed to be a constant, Assumption OLS.1 implies that E[u] = 0.

5

You should be able to show that E[xu] = 0 implies Cov(xj, u) = 0 for all xj in x. Assumption OLS.2 is needed to ensure that the OLS estimate of is unique. If it does not hold, then one or more elements of x is a linear combination of the others, which means that there are many s that give the same (conditional) expected values of y. Note that this assumption is equivalent to assuming that E[xx] is a positive definite KK matrix, and that the variance-covariance matrix of the variables in x (i.e. after removing the constant term) is nonsingular. These two assumptions imply that is identified, that is it can be expressed (solved for) in terms of the population moments (e.g. variances and covariances) of x and y. The solution comes from pre-multiplying y = x + u by x, taking expectations, and rearranging

= (E[xx])-1E[xy] To show consistency write out the OLS estimate of :

OLS = 1N

1iii

1 'N

=

xx

=

N

1iii

1 yN x

= + 1N

1iii

1 'N

=

xx

=

N

1iii

1 uN x

6

Of course, OLS can be written as (XX)-1(Xy). By Corollary 3.1, Assumption OLS.2 implies that XX is nonsingular w.p.a.1 and plim[((1/N)

=

N

1ixixi)-1] = A-1,

where A E[xx]. By assumption OLS.1 we have plim[(1/N)

=

N

1ixiui] = E[xu] = 0. Then by Slutskys

theorem we have plim[ OLS] = + A-10 = : Theorem 4.1 (consistency of OLS): Under assumptions OLS.1 and OLS.2, OLS from a random sample of the population model y = x + u is a consistent estimator of . Note that OLS is also a consistent estimate of the parameters of a linear projection of y on x, as long as assumption OLS.2 holds, since the linear projection is defined as (E[xx])-1E[xy], plim[((1/N)

=

N

1ixixi)-1] =

(E[xx])-1 and plim[((1/N)=

N

1ixiyi)-1] = E[xy].

A few other important points:

1. In practice, you usually dont have to worry about assumption OLS.2; it is obvious when it fails (and it rarely fails). It is assumption OLS.1

7

that often fails and, more perniciously, it is very difficult to determine whether it holds or fails.

2. Assumptions OLS.1 and OLS.2 do not by

themselves imply that OLS is unbiased. To show that we need the slightly more restrictive assumption that E[u| x] = 0.

3. The consistency result requires only that u and x

are uncorrelated, not that they are independent. Independence would imply that Var[u| x] is constant, but we dont need this for consistency.

Asymptotic Normality The above expression for OLS implies that:

N ( OLS - ) = 1N

1iii

1 'N

=

xx

=

N

1iii

2/1 uN x

Theorem 4.1 implies that 1N

1iii

1 'N

=

xx - A-1 = op(1)

[this expression converges in probability to zero]. Note that {(xiui): i = 1, 2, N} is an i.i.d. sequence with zero mean, and assume that each element of xiui has a finite variance, then by the Central Limit

8

Theorem we have N-1/2=

N

1ixiui

d N(0, B), where B

E[u2xx]. By Lemma 3.5, N-1/2=

N

1ixiui = Op(1). Thus:

N ( OLS - ) = A-1

=

N

1iii

2/1 uN x + op(1)

since op(1) Op(1) = op(1). To go farther, we need a homoskedasticity assumption: Assumption OLS.3: E[u2xx] = 2E[xx], where 2 = E[u2] A stronger assumption is that E[u2| x] = 2, but it is not needed. (Although it is easier to interpret than Assumption OLS.3). Putting all of this together gives: Theorem 4.2 (asymptotic normality of OLS): Under assumptions OLS.1, OLS.2 and OLS.3:

N ( OLS - ) ~a

N(0, 2A-1) , where A E[xx] Proof: Lemma 3.7 and Corollary 3.2 imply that

N ( OLS - ) ~a

N(0, A-1BA-1). Then Assumption OLS.3 implies that B = 2A.

9

In practice we can estimate Avar( OLS) as 2 (XX)-1,

where 2 = (1/(N-K))=

N

1i(yi xi OLS)2.

If the assumptions of Theorem 4.2 hold, then the usual (OLS-based) formulas for standard errors of OLS, for t-statistics and for F-statistics are all asymptotically valid. Heteroscedasticity-Robust Inference If Assumption OLS.3 fails, OLS is still a consistent estimator of , but we cant use 2 (XX)-1 to estimate Avar( OLS). This is a serious problem because it is quite possible that Assumption OLS.3 does fail. Fortunately, it is not hard to get an estimate of Avar( OLS) under less restrictive assumptions. (An alternative approach is weighted least squares (WLS), but this requires a (parametric) model for Var[y| x], which may be just as restrictive as Assumption OLS.3.) To estimate Avar( OLS) under a set of less restrictive assumptions, go back to the asymptotic normality discussion. The asymptotic variance of OLS without Assumption OLS.3 is Avar( OLS) = A-1BA-1/N. Our

10

consistent estimate of A-1 is N(XX)-1, so we just need a consistent estimate of B. By the (weak) law of large numbers (Theorem 3.1) we have:

(1/N)=

N

1iui2xixi

p E[u2xx] = B

We can replace ui with iu = yi xi OLS. Thus

(1/N)=

N

1iiu2xixi is a consistent estimator of B and:

Ava r( OLS) = (XX)-1(

=

N

1iiu2xixi)(XX)-1

The standard errors of OLS are heteroscedasticity-robust standard errors. They are also called White standard errors or Huber standard errors. These can be used to obtain t-statistics in the usual way. However, the usual F-test is invalid. You should use the heteroscedasticity-robust Wald statistic instead to test hypotheses of the form H0: R = r (R is QK, r is Q1). That test statistic is:

W = (R OLS - r)(RV R)-1(R OLS - r) Where V is the above formula for Ava r( OLS).

11

Under H0 W ~a

Q2. Dividing W by Q yields an (approximate) F-statistic that is distributed as FQ,N-K. Lagrange Multiplier (Score) Tests In econometrics there are 3 classic statistical tests: the Wald test, the Lagrange multiplier test and the likelihood ratio (LR) test. We saw a very general exposition of the Wald test at the end of Lecture 2. Now we will look at the Lagrange Multiplier test in the context of a linear model. Suppose that we divide the x variables into 2 groups:

y = x11 + x22 + u where x1 has K1 elements and x2 has K2 elements. If Assumptions OLS.1, OLS.2 and OLS.3 hold, it is easy to test the hypothesis H0: 2 = 0 using a standard F-test. But what if Assumption OLS.3 doesnt hold? Let 1

~ be the estimate of 1 when the constraint that

2 = 0 is imposed. [Question: How do you estimate 1

~ ?] Define u~ i = yi - xi1 1

~ . Under H0, u~ i should be

uncorrelated with xi2 because, conditional on x11, x2 should have no explanatory power for y and thus no explanatory power for u~ .

12

This lack of explanatory power can be tested by regressing u~ on x1 and x2. Let Ru2 be the R2 from this regression (assume that x1 contains the constant term). The Lagrange multiplier (LM) statistic is:

LM = NRu2 Under H0, LM ~

a K22. Note that the regression must

include x1 as well as x2, even though by construction x1 will not be correlated with u~ . Finally, if x1 does not include a constant, then Ru2 should be the uncentered R2. The LM test just described requires Assumption OLS.3. If that assumption is incorrect, the procedure can be modified as follows. After some algebra (this would be a good homework problem!) you can show that the LM statistic is:

LM =

=

N

1iii

2/1 u~N r 1N

1iii

12 'N~

=

rr

=

N

1iii

2/1 u~N r

where 2~ =

=

N

1iu~ i2 and each ir is a K21 vector of OLS

residuals from regressing x2 on x1. This test statistic is not robust to heteroscedasticity because the middle term is not a consistent estimate of the asymptotic

13

variance of N-1/2=

N

1iir u~ i when u is heteroscedastic.

Using the Huber-White approach just discussed, the heteroscedasticity-robust LM test statistic is:

LM =

=

N

1iii

2/1 u~N r 1N

1iii

2i

1 'u~N

=

rr

=

N

1iii

2/1 u~N r

=

=

N

1iiiu~r

1N

1iii

2i 'u~

=

rr

=

N

1iiiu~r ~

a 2 2K

This can be obtained by regressing (without an intercept) a constant on r u~ . Let SSR0 be the sum of the squared residuals from this regression. Then:

LM = N SSR0 ~a

2 2K III. OLS Methods for Omitted Variable Bias Suppose there is a causal variable that determines y but is unobserved, denoted by q. If we observed q we could derive the causal impacts of x and q on y by observing E[y| x, q] for a large number of observations. Suppose that this relationship is linear:

E[y| x, q] = x + q

14

Note that measures the impacts of each variable in x on y, holding the other x variables, and q, constant. Question: Suppose there are 2 unobserved variables; is this very different from having 1 such variable? We can rewrite the above relationship as:

y = x + q + v, E[v| x, q] = 0 Wooldridge calls v the structural error, while the effective error (due to the unobservability of q) is u q + v. So the above equation can be rewritten as:

y = x + u

By changing the constant term we can set E[q] = 0, and thus E[u] = 0. Unfortunately, if q is correlated with any of the regressors, then so is u. Using OLS to estimate by regressing x on y will, in general, give inconsistent estimates for all the elements of (not just the ones of the x variables that are correlated with q). This is called omitted variables inconsistency or (more casually) omitted variables bias. To see the bias, consider a linear projection of q on x (note that we are not assuming that E[q| x] = L[q| x]):

15

q = x + r, where E[r] = 0 and Cov(x, r) = 0 Plugging this into the first equation for y gives:

y = x( + ) + v + r By assumption, E[v + r] = 0 and x is uncorrelated with v + r. Thus a regression of y on x satisfies OLS.1 and we will find that, for any element xj of x:

plim j, OLS = j + j Sometimes people assert that only one variable in x, call it xK, is correlated with q, or (more precisely) that all the elements of other than K equal zero. If this more precise assumption is true, then:

plim = j, OLS = j for j K

plim K, OLS = K + K = K + Cov(xK, q)/Var(xK) But in the general case j is a partial correlation of q and xj, not a raw correlation, and in general these will not be the same. Yet if a good argument can be made that all js other than K equal zero then the raw correlation of xK and q gives the direction of the bias.

16

Using Proxy Variables to Remove Omitted Var. Bias Sometimes we have data on a proxy variable that is similar to the omitted variable q. Intuitively, it seems reasonable that putting that variable in will fix, or at least reduce, the omitted variable problem. In fact, researchers do this very often, so it is useful to work out the precise conditions needed for this procedure to succeed in reducing or eliminating omitted variable bias. Let z be the candidate proxy variable. Adding it will remove omitted variable bias if 2 conditions hold:

E[y| x, q, z] = E[y| x, q]

L[q| 1, x, z] = L[q| 1, z] The first condition means that z does not have any explanatory power for y that is not already contained in x and q. In many cases this is a reasonable assumption. For example, in a wage regression and you have data on true innate ability then you would not expect some kind of IQ test to have any explanatory power beyond the impact of innate ability. Actually, the first condition is a little stronger than needed, all we really need is that z is uncorrelated with v.

17

The second requirement is that, once we condition on z, there is no correlation between q and x. Intuitively, adding z to the xs as regressors in a linear model removes the (partial) correlation between x and q. Another way to think of this is that q can be divided into two parts, q = 1z + r, where Cov(z, r) = 0 and that all of the correlation between q and x comes from the 1z part, so that Cov(x, r) = 0. If these two conditions hold, we can insert q = 1z + r into the equation y = x + q + v:

y = x + 1z + (r + v) [If you add a constant to q = 1z + r then the constant in will change, but nothing else will change in .] Since the (composite) error term (r + v) is uncorrelated with both x and z, by Theorem 4.1 we can consistently estimate (and 1). What if z is an imperfect proxy in the sense that the r in q = 1z + r is correlated with x? A linear projection of q on x and z will then give:

q = x + 1z + r*

18

where r*, which comes from r = x + r*, is not correlated with x. The OLS estimate from a regression of y on x and z will have a plim of + . You could argue (hope?) that is smaller than to justify the use of such an imperfect proxy. A final point is that adding proxies and even imperfect proxies may reduce your estimates of the variance of OLS. The intuition here is that that variance is determined in part by the variance of the error term, and if you convert part of the error term into an observable variable then you reduce the variance of the remaining error term. Models with Interactions in Unobservables Things become more complicated if the impact of an observed variable depends on the value of an unobserved variable, i.e. the underlying model has an interaction term between q and one or more variables in x. In particular, assume that the structural model is:

y = x + 1q + 2xKq + v, E[v| x, q] = 0 If xK is a continuous variable then the partial effect of xK on E[y| x, q] is:

19

Kx]q,|y[E

x = K + 2q

We can never estimate this for a particular q, since we never observe q. However, we can estimate the average partial effect (APE) if we have a good estimate for K. The APE is simply the population average of the partial effect, so:

APE = E[K + 2q] = K + 2E[q] (= K if E[q]=0) How do we estimate when there is an unobservable q and it is interacted with one of the x variables? Assume that we have a proxy variable z that meets the above criteria E[y| x, q, z] = E[y| x, q] and:

E[q| x, z] = E[q| z] = 1z, where E[z] = 0 These assumptions + law of iterated expectations give:

E[y| x, z] = x + 11z + 21xKz All of these parameters can be estimated consistently using OLS. Wooldridge explains (p.69) that even if the original structural model is homoscedastic this model will be heteroscedastic, so you should always calculate heteroscedasticity-robust standard errors.

20

IV. Impact of Measurement Error on OLS Estimation Measurement error is a big and (until recently) often ignored problem in applied econometrics. The big problem is measurement error in x. Measurement error in y is only a problem if the errors in measurement are correlated with some of the x variables. To see this, let y* be the true y and assume that y measures it with error: y = y* + e0. The structural model is thus:

y* = x + v but the data we have show the relationship:

y = x + v + e0 If eo is uncorrelated with x (and v is uncorrelated with x) then Assumption OLS.1 is satisfied and OLS consistently estimates . The only disadvantage is that the (uncorrelated) composite error term is larger, which implies a larger covariance matrix for OLS. Yet if e0 is correlated with some or all of the elements of x then the composite error term will be correlated with x and then plim OLS .

21

Measurement Error in the x variables This is a much bigger problem because even if measurement error in the x variables is uncorrelated with everything else, OLS will still be an inconsistent estimate of . In the simplest case, suppose one of the x variables, call it xK, is measured with error: xK = xK* + eK, where x* is the true x. The structural model is:

y = 0 + 1x1 + + KxK* + v Assume that E[v] = 0 and Cov(x, v) = 0 (x is x1, , xK*). Assume that Cov(eK, v) = 0 and Cov(x, eK) = 0. (Both properties of eK follow from the redundancy assumption E[y| x, xK] = E[y| x].) Finally, by letting the constant term 0 adjust we can impose E[eK] = 0. [On pp.73-74 Wooldridge discusses the case where Cov(xK, eK) = 0; he points out that this is too good to be true since it leads to no problem with OLS, so I wont discuss this case.] The assumption that Cov(x, eK) = 0 implies that Cov(xK*, eK) = 0, this is the classical errors in

22

variables (CEV) assumption. This implies that Cov(xK, eK) = Var(eK) eK2. IF xK* is uncorrelated with all of the other variables in x then plim[ j] = j for all j K. In the more general case where xK* could be correlated with some or all of the other x variables, you can show (maybe a homework problem) that OLS estimation of this model will give:

plim[K,OLS] = K

+ 2e2

*r

2*r

KK

K

where rK* is the linear projection error from:

xK* = 0 + 1x1 + K-1xK-1 + rK* The important point to note is that |K,OLS| will always underestimate |K| IF xK is the only x variable with measurement error. This is often referred to as attenuation bias. Question: Suppose xK* is uncorrelated with all the other x variables. What will be the values of the s in the above regression? Is K,OLS still inconsistent?

23

If there is measurement error in some of the other x variables, then things become more complicated, and it is not guaranteed that that bias will be toward zero (though in practice this is often the case). Even worse, it may be that the eK is correlated with the xK*, then it is also possible that bias may not be toward zero. One way to fix this: instrumental variables!

24

exogeneity

Documents

vector x

linear model assumption

x cause changes

wide range of x

assumption covu

linear relationship

stronger assumption

important assumption