Top Banner
Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1
26

Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Jan 14, 2016

Download

Documents

Francis Webster
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

1

Omitted Variable BiasEvaluation Research (8521)

Prof. Jesse Lecy

Page 2: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

2

OMITTED VARIABLES

Page 3: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

3

By the end of Stats II you should be able to interpret regression coefficients in a linear model. You are looking ONLY at direct effects. As a result, you think about the world like this:

The true causal model is much more complicated. There are important implications for how we build a program evaluation and interpret the data. We aim to identify an unbiased estimate of the effects of a single policy variable.

Understanding bias in regression

MCATGPA

SAT

IQ

income

MCAT

GPA

SAT

IQ

income

Page 4: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

4

Case 1 CS TS

TQ

CS TS

SES

Case 2

TS

TQCS

TS

SESCS

Page 5: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

5

A note on how I use terms in this section:

tyTeachQualiSESClassSizeTestScore 3210

etyTeachQualibClassSizebbTestScore 210

“Full Model”, i.e. the “truth”. The slopes will be correct because we have all of the variables included, therefore we use Greek letters.

“Naive Model” - We are missing variables and therefore we do NOT know if the slopes are correct. They represent our best guess. They may contain bias. We use Latin characters to denote this.

You are used to thinking in terms of population statistics and sample. In regressions, you can have the entire population in your sample, but if you are missing variables in your regression then your slopes will be wrong. To map concepts, when I say “full model” think population statistic (the truth), and when I say “naïve model” think sample statistic (the best guess).

SES

Page 6: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

6

Full model (the “truth”): test scores regressed on class size, socio-economic status, and teacher quality. Class size is the policy variable, meaning it is the input into a policy process and the one we care about getting right.

It is statistically significant. Is it practically significant? Recall that multi-billion dollar policy decisions are being based upon this estimate.

Class size and academic performance

tyTeachQualiSESClassSizeTestScore 3210

Page 7: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

7

Assume that Model 5 is the “full model” – it is all of the relevant information in the world. Now we can see what happens if we happened to omit important variables from the model. How does the slope of class size change? Why does the significance level change?

How do omitted variables affect regression results?

SES omitted

TQ omitted

SES & TQ omitted

“Policyvariable”

Full Model

Page 8: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

8

How do omitted variables affect regression results?

SES omitted

TQ omitted

SES & TQ omitted

Full Model

Bias is the difference between the “truth” (Model 5 in this case) and what we would get if we ran a naïve regression (Model 1 here).

Note that the bias can be quite large.%51

893.2

483.1

483.1

893.2

376.4

11

1

1

biasofsize

bbias

b

We overestimate

the impact of our program by 51% !

Page 9: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

9

Some examples• Class size versus SES• Institutions and Geography (Sachs vs. Rodrick)• Is it drugs or environment that hurts developing babies?

Page 10: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

10

WHY DOES THIS HAPPEN?

Page 11: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Omitted Variable Bias

1111

1211

112111

221121020

2211211020

21102110

2

21102

1110

22110

:)2()1(

)2(

)1(

bbiasORbiasb

b

XXb

andofeEquivalenctheofbc

XY

XXY

XXY

XforSubstitute

XX

XbbY

Decomposed

XXY

Page 12: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

12

The slope does not change significantly as a result of adding a non-correlated control variable. As a result, omitting this variable would NOT bias the results. Adding the variable, however, increases precision of the estimates (the standard error decreases by a factor of seven).

How do omitted variables affect regression results?

Test

TQCS

Page 13: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

13

Note from the correlation matrix on the right that teacher quality has very low correlation with classroom size and SES. As a result, there is almost no omitted variable bias when this variable is left out of the model. SES and class size are highly correlated, though, so omitting one of these variables has a large impact on the slope estimate for the other. Why is this?

The correlation of the independent variables affects omitted variable bias

Page 14: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

14

Calculation of bias: Case 1

Page 15: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

15

056.0)377.0(433.0

056.0099.065.5

11

12

11

12

1211

b

bbias

or

bias

where

b

Calculation of bias: Case 1

Page 16: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

16

Calculation of bias: Case 2

test

0 1 2 3 4

0.13

***0.14

***2 3 4 5 6 7 8

0100

200

300

400

0.98***

01

23

4ses 0.99*** 0 . 0 3 5

csize

-250

-150

-50

0

0 . 0 2 5

0 100 200 300 400

23

45

67

8

-250 -150 -50 0

tqual

???

Page 17: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Case 1: Omitted variable correlated with regressors

In this case, the omitted variable X2 is correlated with the policy variable X1. There is shared co-variance, represented by the region B. This is the region that is discarded as part of the regression procedure

The naïve slope, b1, and the full-model slope, B1, will now be different because of the exclusion of the region B. The naïve model will be biased as a result of omitting X2.

17

Y

X2

X1 BA

C

BBAbiasb

Bbias

A

BAb

)(11

1

1

Page 18: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Case 2: Omitted variable uncorrelated with regressors

In this case, the omitted variable X2 is uncorrelated with the policy variable X1. There is no overlap in the Venn Diagram.

Since the naïve slope, b1, and the full-model slope, B1, are the same, there is no bias that results from omitting X2.

Y

X2X1

A C

011

1

1

biasb

A

Ab

18

Page 19: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

21biasYX1

X2

1

12

Y

X2

X1 B

AC

BBAbiasb

Bbias

A

BAb

)(11

1

1

PathDiagram

Case 1: Omitted Variable Correlated with Regressors

Page 20: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

00 2

21

bias

biasYX1

X2

1

01 2

Y

X2X1

A C011

1

1

biasb

A

Ab

PathDiagram

Case 2: Omitted Variable Uncorrelated with Regressors

Page 21: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

21

EXAMPLE: OVB

Page 22: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Table 1.

Graduate Grade Point Averages Related to Criteria Used When Making Admission Decisions (N = 30).

Zero-Order r sr2 b

Variable AR MAT GREV GREQ GPA

GREQ .611* .32* .07 .0040

GREV .468* .581* .21 .03 .0015

MAT .426* .267 .604* .32* .07 .0209

AR .525* .405* .508* .621* .20 .02 .1442

Intercept = -1.738

Mean 3.57 67.00 575.3 565.3 3.31

SD 0.84 9.25 83.0 48.6 0.60 R2 = .64*

*p < .05

Class Example

Page 23: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

True Model:

What happens when we omit X2?

22110 XXY

1110 XbbY

Page 24: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Calculations

002.0

031.0051.0

006.0

002.0008.0

031.0051.0008.0

21

211

11

bias

b

biasb

1

1b

12

Omit MAT

Page 25: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

25

BACK TO THE SIMULATIONS

Page 26: Omitted Variable Bias Evaluation Research (8521) Prof. Jesse Lecy 1.

Omitted Variable Bias

26