of 49

# Chapter 9: Assessing Studies Based on Multiple Regression ... Threats to Internal Validity of Multiple

Aug 06, 2020

## Documents

others

• Chapter 9: Assessing Studies Based on Multiple Regression

• Outline

1. Internal and External Validity

2. Threats to Internal Validity

a) Omitted variable bias

b) Functional form misspecification

b) Functional form misspecification

c) Errors-in-variables bias

d) Missing data and sample selection bias

e) Simultaneous causality bias

3. Application to Test Scores

• Internal and External Validity

• Is there a systematic way to assess (critique) regression studies? We know the strengths of multiple regression – but what are the pitfalls?

– We will list the most common reasons that multiple regression estimates, based on observational data, can result in biased estimates of the causal effect of interest.

result in biased estimates of the causal effect of interest.

– In the test score application, let us try to address these threats– and assess what threats remain. After all, what have we learned about the effect on test scores of class size reduction?

• A Framework for Assessing Statistical Studies: Internal and External Validity

• Internal validity: the statistical inferences about causal effects are valid for the population being studied.

• External validity: the statistical inferences can be generalized to other populations and “settings” (legal, political, institutional, social, physical, demographic

political, institutional, social, physical, demographic variations)

• Threats to External Validity

Assessing threats to external validity requires detailed knowledge and judgment on a case-by-case basis.

How do results about test scores in California generalize?

– Differences in populations

• California in 2011?

• California in 2011?

• Massachusetts in 2011?

• Mexico in 2011?

– Differences in settings

• different legal requirements (e.g. special education)

• different treatment of bilingual education

– Differences in teacher characteristics

• Threats to Internal Validity of Multiple Regression Analysis

Internal validity: the statistical inferences about causal effects are valid for the population being studied.

Five threats to the internal validity of regression studies:

– Omitted variable bias

– Wrong functional form

– Wrong functional form

– Errors-in-variables bias

– Sample selection bias

– Simultaneous causality bias

All imply that E(ui|X1i,…,Xki) ≠ 0 (or conditional mean

independence fails) – making OLS biased & inconsistent

• 1. Omitted variable bias

Omitted variable bias arises if an omitted variable is both:

I. a determinant of Y

II. correlated with at least one regressor

If the multiple regression includes control variables, we still

If the multiple regression includes control variables, we still need to ask whether there are OVs that are not adequately controlled for.

The concern remains that the error term is correlated with the variable of interest even after including control variables.

• Solutions to omitted variable bias

1. If the omitted causal variable can be measured, include it as an additional regressor in multiple regression;

2. If you have data on one or more controls and they are adequate (in the sense of conditional mean independence plausibly holding) then include the control variables;

3. Possibly, use panel data in which each entity (individual) is observed more than once (to be studied later);

observed more than once (to be studied later);

4. If the omitted variable(s) cannot be measured, use instrumental variables regression (to be studied later);

5. Run a randomized controlled experiment.

– Remember, if X is randomly assigned, then X necessarily will be distributed independently of u; thus E(u|X = x) = 0.

• 2. Misspecified/Wrong functional form

Arises if the functional form is incorrect – for example, an interaction or polynomial term is omitted. Then term becomes part of error term, causing correlation b/w error and regressor, biasing OLS estimates.

Solutions to functional form misspecification

Solutions to functional form misspecification

1. If dependent variable is continuous: Use the “appropriate” nonlinear specifications in X (logarithms, interactions, etc.) … scatter plots are suggestive

2. If dependent variable is discrete (eg binary): Need an extension of multiple regression methods (“probit” or “logit” analysis for binary dependent variables) (to be studied later)

• 3. Errors-in-variables bias

So far we have assumed that X is measured without error.

In reality, economic data often have measurement error

– Data entry errors

– Recollection errors in surveys (When did you start your current job?)

– Ambiguous questions (What was your income last year?)

– Dishonest responses to surveys (What is the value of your financial assets? How often do you drink and drive?)

• Errors-in-variables bias, ctd.

In general, measurement error in a regressor results in “errors-in-variables” bias.

A bit of math shows that errors-in-variables typically leads to correlation between the measured variable and the regression error. Consider the single-regressor model:

Y = β + β X + u

Yi = β0 + β1Xi + ui

and suppose E(ui|Xi) = 0). Let

Xi = unmeasured true value of X (unbserved)

= mis-measured version of X (observed)iX �

• Then

Yi = β0 + β1Xi + ui

= β0 + β1 + [β1(Xi – ) + ui]

So the regression you run is,

Yi = β0 + β1 + , where = β1(Xi – ) + ui

i X�

i X� iu� iu� iX

X� u� β̂

i X�

Typically is correlated with so is biased:

cov( , ) = cov( , β1(Xi – ) + ui)

= β1cov( , Xi – ) + cov( ,ui)

It is often plausible that cov( ,ui) = 0 (if E(ui|Xi) = 0 then

cov( ,ui) = 0 if the measurement error in is uncorrelated

with ui). But typically cov( , Xi – ) ≠ 0…

i X�

i u�

1 β̂

i X�

i u� iX

� i

X�

i X�

i X�

i X�

i X�

i X�

i X�

i X�

i X�

• Errors-in-variables bias, ctd.

Yi = β0 + β1 + , where = β1(Xi – ) + ui

cov( , ) = β1cov( , Xi – ) if cov( ,ui) = 0

To get some intuition for the problem, consider two special cases:

i X�

i X�

i X�iu�

i u�

i X�iX

� i

X�

i u�

To get some intuition for the problem, consider two special cases:

A.Classical measurement error

B.“Best guess” measurement error

• A. Classical measurement error

The classical measurement error model assumes that

= Xi + vi,

where vi is mean-zero random noise with corr(Xi, vi) = 0 and corr(ui, vi) = 0.

Under the classical measurement error model, is biased

i X�

β̂

Under the classical measurement error model, is biased

towards zero. Intuition: Suppose you add to the true

variable X a huge amount of random noise to create . Then

will be virtually uncorrelated to Yi (and to everything else),

and the OLS estimate will have expectation zero (recall the

estimate is a ratio, with numerator = cov(Y,X) in case of

single regressor). If you add just a bit of noise, you still dilute

correlation with Y and lower OLS estimate toward 0.

1 β̂

i X�

X ~

• Classical measurement error: the math

= Xi + vi, where corr(Xi, vi) = 0 and corr(ui, vi) = 0.

Then var( ) = +

cov( , Xi – ) = cov(Xi + vi, –vi) = –

so

cov( , ) = –β1

i X�

i X�

σ X 2

σ v 2

i X� iX

� σ v

2

i X� iu� σ v

2

Related Documents See more >
##### Threats to Internal and External Validity
Category: Documents
##### Validity: Theoretical Basis - GitHub Pagesmark- 5. Validity....
Category: Documents
##### Reliability and Validity Threats to Internal Validity Da Lee...
Category: Documents
##### The external validity of multiple regression analyses ...·....
Category: Documents
##### Objectives To enable you to recognize Forms of bias Forms of...
Category: Documents
##### Construct Validity of e-rater in Scoring TOEFL Validity of...
Category: Documents