Top Banner

of 49

Chapter 9: Assessing Studies Based on Multiple Regression ... Threats to Internal Validity of Multiple

Aug 06, 2020

ReportDownload

Documents

others

  • Chapter 9: Assessing Studies Based on Multiple Regression

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-11-1

  • Outline

    1. Internal and External Validity

    2. Threats to Internal Validity

    a) Omitted variable bias

    b) Functional form misspecification

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-2

    b) Functional form misspecification

    c) Errors-in-variables bias

    d) Missing data and sample selection bias

    e) Simultaneous causality bias

    3. Application to Test Scores

  • Internal and External Validity

    • Is there a systematic way to assess (critique) regression studies? We know the strengths of multiple regression – but what are the pitfalls?

    – We will list the most common reasons that multiple regression estimates, based on observational data, can result in biased estimates of the causal effect of interest.

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-3

    result in biased estimates of the causal effect of interest.

    – In the test score application, let us try to address these threats– and assess what threats remain. After all, what have we learned about the effect on test scores of class size reduction?

  • A Framework for Assessing Statistical Studies: Internal and External Validity

    • Internal validity: the statistical inferences about causal effects are valid for the population being studied.

    • External validity: the statistical inferences can be generalized to other populations and “settings” (legal, political, institutional, social, physical, demographic

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-4

    political, institutional, social, physical, demographic variations)

  • Threats to External Validity

    Assessing threats to external validity requires detailed knowledge and judgment on a case-by-case basis.

    How do results about test scores in California generalize?

    – Differences in populations

    • California in 2011?

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-5

    • California in 2011?

    • Massachusetts in 2011?

    • Mexico in 2011?

    – Differences in settings

    • different legal requirements (e.g. special education)

    • different treatment of bilingual education

    – Differences in teacher characteristics

  • Threats to Internal Validity of Multiple Regression Analysis

    Internal validity: the statistical inferences about causal effects are valid for the population being studied.

    Five threats to the internal validity of regression studies:

    – Omitted variable bias

    – Wrong functional form

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-6

    – Wrong functional form

    – Errors-in-variables bias

    – Sample selection bias

    – Simultaneous causality bias

    All imply that E(ui|X1i,…,Xki) ≠ 0 (or conditional mean

    independence fails) – making OLS biased & inconsistent

  • 1. Omitted variable bias

    Omitted variable bias arises if an omitted variable is both:

    I. a determinant of Y

    II. correlated with at least one regressor

    If the multiple regression includes control variables, we still

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-7

    If the multiple regression includes control variables, we still need to ask whether there are OVs that are not adequately controlled for.

    The concern remains that the error term is correlated with the variable of interest even after including control variables.

  • Solutions to omitted variable bias

    1. If the omitted causal variable can be measured, include it as an additional regressor in multiple regression;

    2. If you have data on one or more controls and they are adequate (in the sense of conditional mean independence plausibly holding) then include the control variables;

    3. Possibly, use panel data in which each entity (individual) is observed more than once (to be studied later);

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-8

    observed more than once (to be studied later);

    4. If the omitted variable(s) cannot be measured, use instrumental variables regression (to be studied later);

    5. Run a randomized controlled experiment.

    – Remember, if X is randomly assigned, then X necessarily will be distributed independently of u; thus E(u|X = x) = 0.

  • 2. Misspecified/Wrong functional form

    Arises if the functional form is incorrect – for example, an interaction or polynomial term is omitted. Then term becomes part of error term, causing correlation b/w error and regressor, biasing OLS estimates.

    Solutions to functional form misspecification

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-9

    Solutions to functional form misspecification

    1. If dependent variable is continuous: Use the “appropriate” nonlinear specifications in X (logarithms, interactions, etc.) … scatter plots are suggestive

    2. If dependent variable is discrete (eg binary): Need an extension of multiple regression methods (“probit” or “logit” analysis for binary dependent variables) (to be studied later)

  • 3. Errors-in-variables bias

    So far we have assumed that X is measured without error.

    In reality, economic data often have measurement error

    – Data entry errors

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-10

    – Recollection errors in surveys (When did you start your current job?)

    – Ambiguous questions (What was your income last year?)

    – Dishonest responses to surveys (What is the value of your financial assets? How often do you drink and drive?)

  • Errors-in-variables bias, ctd.

    In general, measurement error in a regressor results in “errors-in-variables” bias.

    A bit of math shows that errors-in-variables typically leads to correlation between the measured variable and the regression error. Consider the single-regressor model:

    Y = β + β X + u

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-11

    Yi = β0 + β1Xi + ui

    and suppose E(ui|Xi) = 0). Let

    Xi = unmeasured true value of X (unbserved)

    = mis-measured version of X (observed)iX �

  • Then

    Yi = β0 + β1Xi + ui

    = β0 + β1 + [β1(Xi – ) + ui]

    So the regression you run is,

    Yi = β0 + β1 + , where = β1(Xi – ) + ui

    i X�

    i X� iu� iu� iX

    X� u� β̂

    i X�

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-12

    Typically is correlated with so is biased:

    cov( , ) = cov( , β1(Xi – ) + ui)

    = β1cov( , Xi – ) + cov( ,ui)

    It is often plausible that cov( ,ui) = 0 (if E(ui|Xi) = 0 then

    cov( ,ui) = 0 if the measurement error in is uncorrelated

    with ui). But typically cov( , Xi – ) ≠ 0…

    i X�

    i u�

    1 β̂

    i X�

    i u� iX

    � i

    X�

    i X�

    i X�

    i X�

    i X�

    i X�

    i X�

    i X�

    i X�

  • Errors-in-variables bias, ctd.

    Yi = β0 + β1 + , where = β1(Xi – ) + ui

    cov( , ) = β1cov( , Xi – ) if cov( ,ui) = 0

    To get some intuition for the problem, consider two special cases:

    i X�

    i X�

    i X�iu�

    i u�

    i X�iX

    � i

    X�

    i u�

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-13

    To get some intuition for the problem, consider two special cases:

    A.Classical measurement error

    B.“Best guess” measurement error

  • A. Classical measurement error

    The classical measurement error model assumes that

    = Xi + vi,

    where vi is mean-zero random noise with corr(Xi, vi) = 0 and corr(ui, vi) = 0.

    Under the classical measurement error model, is biased

    i X�

    β̂

    Copyright © 2011 Pearson Addison-Wesley. All rights reserved. 9-14

    Under the classical measurement error model, is biased

    towards zero. Intuition: Suppose you add to the true

    variable X a huge amount of random noise to create . Then

    will be virtually uncorrelated to Yi (and to everything else),

    and the OLS estimate will have expectation zero (recall the

    estimate is a ratio, with numerator = cov(Y,X) in case of

    single regressor). If you add just a bit of noise, you still dilute

    correlation with Y and lower OLS estimate toward 0.

    1 β̂

    i X�

    X ~

  • Classical measurement error: the math

    = Xi + vi, where corr(Xi, vi) = 0 and corr(ui, vi) = 0.

    Then var( ) = +

    cov( , Xi – ) = cov(Xi + vi, –vi) = –

    so

    cov( , ) = –β1

    i X�

    i X�

    σ X 2

    σ v 2

    i X� iX

    � σ v

    2

    i X� iu� σ v

    2

    Copyright