Top Banner
Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF Grant DRL-1228866
88

Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dec 28, 2015

Download

Documents

Chad Townsend
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Evidence-based Methods for improving Evidence-based Policy

Thomas D. Cook, Northwestern University and Mathematica Policy

Research, Inc.

Funded by NSF Grant DRL-1228866

Page 2: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

My Confessions

• 1. I am a randomista. Nothing I say challenges that RCTs are best for testing causal hypotheses because they control for all potential biases -- observed and unobserved.

• 2. However, I am a conditional randomista. I believe there is a strong empirical evidence that certain kinds of non-experiments provide acceptable causal answers. I am not afraid of non-experiments, therefore.

Page 3: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Why are they Acceptable Causal Estimates?

• Because the methods generating them have often reproduced similar estimates to RCTs in rigorous tests.

• Why is this important?

Page 4: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Hierarchy of Accepted Causal Methods in Clearinghouses

• Coalition for Evidence-Based Policy • What Works Clearinghouse – Education• Blueprints – historically about crime and violence

prevention • Cochrane Collaboration in Medicine • Many other inventories of effective practices

from NGOs or government agencies• All agree on “best” single method, but they

disagree about acceptability of other methods.

Page 5: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

One Gain?

• We assume most social effects are co-conditioned by many extraneous factors that cannot be held constant in the worlds of application as one might do in a lab

• E.g., designer effects, social context effects, temporal comparison effects

• We usually need many studies to probe causal robustness/conditionality.

• Nice to have more than just RCTs if we trust ‘em

Page 6: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

The Rigorous Method: Within Study Comparisons aka Design Experiments

Page 7: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

WSC Design: Three-Arm Study Overall Population

sampled/selected into

Randomized Experiment randomly assigned to

Control Group

Treatment Group

Comparison Group

ITT of OS ITT of RCT

Page 8: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

WSC Design: Four-Arm Study POPULATION

Randomly Assigned to

Randomized Experiment

Observational Study

Treatment

Control

Treatment

Control

ATE=?

Page 9: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Conditions for a good WSC

• A well implemented RCT, with minimal sampling error

• No third variable confounds –like from measurement

• Comparable estimands – RD and RCT• Blinding to the RCT or adjusted QE results• Defensible criterion for correspondence of

RCT and adjusted QE results

Page 10: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Limitations of WSCs

• Only be done on topics where an RCT is possible

• No reason to believe that a given design will always replicate experimental findings; our more modest goal is to identify designs that often replicate findings.

• This is inductive and requires a large sample size of WSCs. This talk is not the final word. More definitive word requires more WSCs

Page 11: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

PURPOSES

• We will now compare RCT to non-RCT results using this method, differentiating non-RCT

• Regression-Discontinuity (RD), especially comparative RD (CRD)

• Interrupted Time Series (ITS), especially comparative ITS (CITS)

• Non-Equivalent Control Group Designs without RD or ITS feature

Page 12: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Comparing RCT to Regression Discontinuity (RD) and CRD

RESULTS

Page 13: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

RDD Visual Depiction

Comparison

Page 14: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

RDD Visual Depiction

Comparison Treatment

Counterfactual regression line

Discontinuity, or treatment effect

Page 15: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

RDD Visual Depiction

Comparison Treatment

Page 16: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Three Major Limitations of RDD

• Functional form assumptions – see green line• Causal generalization, LATE at cutoff• Statistical power relative to RCTs – about 3

times lower

Page 17: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.
Page 18: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.
Page 19: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

WSC Results for RD

• There have now been 10 WSCs and all report similar finding – generally comparable effect estimates at the cutoff, but no meta-analysis

• Could be considered: (i) Theoretically trivial; (ii) a traditional empirical test of statistical hypothesis; or iii) as we prefer, a test of robustness of RD in practice – it is generally good enough in real research practice despite sampling variability in both the RD and RCT

Page 20: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Addressing 3 Major Limitations of RD by Adding a Comparison Function

• The no-treatment comparison regression function can be one or several pretest time points or a non-equivalent comparison group

• Functional form estimation in RD?• Causal generalization?• Statistical power?

Page 21: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Posttest regression Pretest regression

Page 22: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Example 1: Effects of Head Start – Tang & Cook (2014)

• Random selection of HS centers (89% agree) followed by random assignment within centers of 3 year olds

• Outcome = math, literacy; social behavior• CRD-Pre has pretest as no-treatment

regression function• CRD-CG has non-equivalent group of 4 year

olds from same locations

Page 23: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Summary: CRD-Pre above the cutoff

Page 24: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Summary: CRD-CG above the cutoff

Page 25: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Example 2: Wing & Cook (2013)

RCT = professionals or family making decisions about services for disabled. •Here we examine how much of allotment spent•Assignment variable = age (35, 50 and 70)•Comparison = payments made before the RCT began (pretest measure of the outcome)•No CRD-CG

Page 26: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Comparative RD Results in Standardized Difference from RCT away from Cutoff:

Non-Parametric Analysis

State Cutoff Age = 35 50 70

AK .07 .05 04

NJ .19 .13 .12

FL -.09 -08 -04

Page 27: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

CRD• 3 studies to date using WSC methods• All 3 include pretest as comparison regression

function• 2 include non-equivalent comparison group

function – e.g, 4 yr for 3 yr olds in Head Start• All show plausibly parallel functional forms• All show much smaller SEs than simple RD and

close to RCT• All show unbiased causal inference at cutoff and

also away from it (highest = .13 SDs)

Page 28: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Summary for RD vs RCT

• Little doubt from 10 studies that unbiased causal inference results at the cutoff in actual research

• Empirical reason from 3 studies to believe that generic limitations of RD can be mitigated by adding a reliable RDD function from pretest or from non-equivalent control group

• We should call for more CRD designs in the future; their results particularly close to those of RCT in terms of bias and precision. Yet not a category in any compendium.

Page 29: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Comparison of RCT and Interrupted Time Series (ITS)

Results

Page 30: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Interrupted Time Series Can Provide Strong Evidence for Causal Effects

• Clear Intervention Time Point

• Huge and Immediate Effect

• Clear Pretest Functional Form + many Observations

• No AlternatIve at Interventio Can Explain Change

Page 31: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Limitations of Simple One-Group ITS

• History alternative explanations around the intervention point

• Functional form extrapolation needed• Analysis has to account for correlated errors

(we will not deal with this issue here)• First two points, suggest the advisability of a

comparative ITS

Page 32: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

NCLB

NA

EP

Test

Sco

re

Time

208

200

Hypothetical NCLB effects on public (red) versus private schools (blue)

Page 33: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

What if Everyone in Canada Flushed at the Same Time *

Page 34: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

WSC and CITS

• Four studies in medicine, three in education • All claim causal inferences similar• No meta-analysis to date• No analysis of file drawer problem

Page 35: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

St. Clair, Cook, & Hallberg (In Press)

• RCT: Study of Indiana’s system for feedback on student performance (schools as unit of assignment)

• Comparative ITS comparison groups– Basically all schools in the state– Matched schools in the state

Page 36: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Math (All schools)-.

6-.

4-.

20

.2.4

Ma

th S

core

(S

D u

nits

)

1 2 3 4 5 6 7Year

All Other Schools in the State Treatment

Page 37: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Math: WSC Results

-.6 -.4 -.2 0 .2Bias

6 pre-test time points with slope terms

6 pre-test time points

5 pre-test time points

4 pre-test time points

3 pre-test time points

2 pre-test time points

1 pre-test time point

Naive comparison of post-test means

Page 38: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

ELA (All Schools)-.

6-.

4-.

20

.2.4

EL

A S

core

(S

D u

nits

)

1 2 3 4 5 6 7Year

All Other Schools in the State Treatment

Page 39: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

ELA: WSC Results

-.6 -.4 -.2 0 .2Bias

6 pre-test time points with slope terms

6 pre-test time points

5 pre-test time points

4 pre-test time points

3 pre-test time points

2 pre-test time points

1 pre-test time point

Naive comparison of post-test means

Page 40: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

With matching C to T Units, instead of Modeling Baseline Means/Slopes

• Same results• Somers et al got the same results• Environmental science found replicate RCT

only with matching

Page 41: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

CITS Summary

• To date, CITS does well relative to RCT to date• Get similar effects despite possible group

differences in (a) pre-treatment trend,(b) historical events at treatment; (c) changes in instrument; (d) stat regression– all these could be confounds, but they have not been to date

• CITS is not in any compendium except Cochrane

Page 42: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Non-Equivalent Control Group Designs : (a) Modeling a likely fully known selection process

Page 43: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Statistical Theory

• Knowing the selection process and measuring it perfectly always gives unbiased causal inference

• Rarely do we know it fully, but we often know major elements of selection process – why children are retained in grade; why couples self-select into divorce;

• Here’s one example – why students self-select into learning more about English or mathematics

Page 44: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Possibly fully known selection processShadish, Clark & Steiner (2008)

N = 445 Undergraduate Psychology Students

Randomly Assigned to

Randomized Experiment N = 235

Randomly Assigned to

Observational Study N = 210

Self-Selected into

Mathematics Training N = 119

Vocabulary Training N = 116

Mathematics Training N = 79

Vocabulary Training N = 131

ATE=?

Page 45: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

23 Constructs and 5 Construct Domains assessed prior to Intervention Proxy-pretests (2 multi-item constructs):

36-item Vocabulary Test II, 15-item Arithmetic Aptitude Test • Prior academic achievement (3 multi-item

constructs): High school GPA, current college GPA, ACT college admission score

• Topic preference (6 multi-item constructs): Liking literature, liking mathematics, preferring mathematics over literature, number of prior mathematics courses, major field of study (math-intensive or not), 25-item mathematics anxiety scale

Page 46: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Construct Domains

• Psychological predisposition (6 multi-item constructs): Big five personality factors (50 items on extroversion, emotional stability, agreeableness, openness to experience, conscientiousness), Short Beck Depression Inventory (13 items)

• Demographics (5 single-item constructs): Student‘s age, sex, race (Caucasian, Afro-American, Hispanic), marital status, credit hours

Page 47: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Was there Bias in the QE with Self-Selection into Tracks?

• RCT showed effects for each outcome.• But both math and vocab effects were larger

when students self-selected into T versus C• So our question is: How much of this self-

selection bias is reduced by use of covariates measuring several different possible selection processes?

Page 48: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Bias Reduction: Construct DomainsMathematics

11

1

1

1

1

1

1

11

1 1

1

1

1

1

-20

0

20

40

60

80

100

120

140

Bia

s R

ed

uct

ion

(%

)

2

2

22

2

2

2

2 2

2

2 2

2

2

2

2

3

33

3

3

3

3 3 3

3

33

3

3

3

3

4

44

4

4

4

4 44

4

4 4

4

44 4

1234

PS-stratificationPS-ANCOVAPS-weightingANCOVA1

1

1

1

1

1

1

1

11

1 1

1

1

1

1

2

2

22

2

2

2

2 2

2

2 2

2

2

2

2

3

33

3

3

3

3 3 3

3

33

3

3

3

3

4

44

4

4

4

4 44

4

4 4

4

44 4

psy dem aca pre top dempsy

prepsy

demaca

dempre

preaca

demtop

pretop

dempreaca

dempretop

dempreacatop

dempreacatoppsy

psy dem aca

Page 49: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Bias Reduction: Single ConstructsMathematics

1

1

1

1

1

1

1 1 11

1

1

-40

-20

0

20

40

60

80

100

120

140

Bia

s R

ed

uct

ion

(%

)

2

2

2

2 2

2

2

2

22

2

2

3

3

3

33

3

33

33

33

4

4

4

44

4

4

4

44

4 4

1234

PS-stratificationPS-ANCOVAPS-weightingANCOVA1

1

1

1

1

1

1 1 11

1

1

2

2

2

2 2

2

2

2

22

2

2

3

3

3

33

3

33

33

33

4

4

4

44

4

4

4

44

4 4

proxy-pretest topic preference all covariates except

voca

b.p

re

ma

th.p

re

ma

rs

like

.lit

nu

mb

ma

th

ma

jor

pre

f.ma

th

like

.ma

th

-lik

e.m

ath

-pre

f.ma

th

-lik

e.m

ath

-p

ref.m

ath all

Page 50: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

1

11

1 1 11

1

11

1

11

1 1 1

-20

0

20

40

60

80

100

120

140

Bia

s R

ed

uct

ion

(%

)

22 2

2

2

22 2 2

22

2

2

2

2 2

3 3 3

3 3

3

33

33

33 3 3 3

3

4 4

4

4 4

44

4 44 4

4

4

4

44

1234

PS-stratificationPS-ANCOVAPS-weightingANCOVA

1

11

1 1 11

1

11

1

11

1 1 1

22 2

2

2

22 2 2

22

2

2

2

2 2

3 3 3

3 3

3

33

33

33 3 3 3

3

4 4

4

4 4

44

4 44 4

4

4

4

44

psy aca dem pre top dempsy

demaca

dempre

prepsy

demtop

preaca

pretop

dempreaca

dempretop

dempreacatop

dempreacatoppsy

psy aca dem

Bias Reduction: Construct DomainsVocabulary

Page 51: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Bias Reduction: Single ConstructsVocabulary

1

1

1

1

11

1

1

1 1

11

-40

-20

0

20

40

60

80

100

120

140

Bia

s R

ed

uct

ion

(%

)

2

2

22

22

2

2 22 2

2

3

3

33

3

3 3

33 3 3

3

4

4

4

44 4

4

4

4

4

4

4

1234

PS-stratificationPS-ANCOVAPS-weightingANCOVA

1

1

1

1

11

1

1

1 1

11

2

2

22

22

2

2 22 2

2

3

3

33

3

3 3

33 3 3

3

4

4

4

44 4

4

4

4

4

4

4

proxy-pretest topic preference all covariates except

ma

th.p

re

voca

b.p

re

nu

mb

ma

th

ma

rs

ma

jor

like

.ma

th

like

.lit

pre

f.ma

th

-vo

cab

.pre

-pre

f.ma

th

-vo

cab

.pre

-p

ref.m

ath all

Page 52: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Given Initial Group Differences• 1. The choice of covariates for selection

adjustment is crucial• 2. How you analyze the outcome using

covariates (OLS and PS matching) makes little difference, though PS preferred in theory

• 3. Replicated in Pohl et al. (2011)

Page 53: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Non-Equivalent Control Group Design: (B) Forms of Intact Group

Matching and Case Matching when selection is Partially known

Page 54: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Case 1: Intact Group Matching

• Have T group and select an intact control group without case matching – each unit in T is matched to some Cs

• Diaz & Handa (2006) – Oportunidades• Aiken, West et al. (1999)• Same result as RCT without any case matching• But no guarantee – job training• But it can be Step 1 to maximize overlap followed

by case matching

Page 55: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Case 2: Local, Focal and Hybrid Case Matching

• Local Matching: school districts or labor markets – hope is to achieve match on some unobservables, esp all those local policies that apply to T and C. Reduces bias in labor econ

• Focal matching. Product of analyses of variables responsible for selection and correlated with outcome but depends on getting the right covariates – e.g., just seen in Shadish et al.

Page 56: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Hybrid sampling model of Stuart and Rubin (2008)

• Define caliper for adequate case matches • Match all LOCAL Cs to T that fall within caliper• For those Ts with no adequate matches, perform

a match on the basis of the best propensity score after analysis of possible selection processes

• Now have a mix of acceptably matched local Cs, being preferred for control over some unobservables, and of acceptably matched non-local Cs, matched only on observables

Page 57: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Hallberg, Wong, & Cook (in press)

• This paper draws on a WSC to examines correspondence with the RCT benchmark (Indiana student feedback study) after matching– Within district as long as the schools do not differ by

more than 0.75 standard deviations of the propensity score (Local)

– For others match on observed school-level covariates known to be highly correlated with the outcome of interest (Focal)

– Combine both T and C matched cases (Hybrid)

Page 58: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Performance of local, focal and hybrid matching across two dependent variables

-.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

Hybrid match

Focal match

Local match

Naive effect

Math ELA

Page 59: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Percentage of times observational approach performed best across 1000 replications

0 20 40 60 80

Hybrid approach

Within district

Covariate match

Naive effect

Math ELA

Page 60: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Summary

• Intact group matching increases overlap. Useful first stage in a QE design strategy

• Local matching matching is always useful and often brings about RCT result.

• Neither is a guarantee• Hybrid matching is perhaps best, but only one

study and at school and not individual level• Need for more studies of hybrid matching

Page 61: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Non-Equivalent Control Group Design: (c) among covariates,

how special is a pretest measure of study outcome?

Page 62: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Claims about Pretest

• Claim that pretest is privileged for precision, but here now bias reduction

• In studies limited to modeling the outcome, pretest often most highly correlated, but issue is correlation of pretest with selection into T

• Though selection on the pretest may be frequent, no one knows how often and when

• Next WSC studies vary when the pretest does and does not vary with selection

Page 63: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Existing Empirical Evidence• WSCs provide some support for privileging true

pretest because it is better than others at reducing bias, but does not always reduce all bias

• Workforce development (Glazerman, Levy, & Myers, 2003; Bloom, Michalopus, and Hill, 2005; Smith & Todd, 2005)

Yet Magnet school study (Bifulco, 2010) and earlier CITS studies here

• This study examines the bias reduction due to conditioning on pretest measures when we vary the correlation with selection both between and within studies

Page 64: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Between-Studies: Kindergarten Retention

• Hong and Raudenbush (2005; 2006) used the rich covariates in the ECLS-K to estimate the effect of kindergarten retention on academic outcomes in math and reading

Page 65: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Retention Selection Process

• Past academic performance plays a critical role in identifying which students will be retained– Students are retained “to remedy inadequate

academic progress and to aid in the development of students who are judged to be emotionally immature” (Jackson, 1975, p. 614)

– “It is a ‘high risk’ profile generally – for academic setbacks in the near-term, for a lifetime of struggle over the longer term.” (Alexander, Entwisle, and Dauber, 2003, p. 68)

Page 66: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 1: Correlation with Selection

Correlation with Retention in Kindergarten

Correlation Lower Bound

Percent of lower bound

Reading Pretest -0.185* -0.38 48.7%

Math Pretest -0.179* -0.37 48.4%

Page 67: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 1: Math Results

-.7 -.6 -.5 -.4 -.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

All covariates

Two or more pretest covariates

All covariates minus pretest

One pretest covariate

No covariates

Math

Page 68: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 1: ELA Results

-.7 -.6 -.5 -.4 -.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

All covariates

Two or more pretest covariates

One pretest covariate

All covariates minus pretest

No covariates

ELA

Page 69: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Between- Study Contrast Dataset 2: Indiana Benchmark Assessment Study (Grade 5)

• 56 K-8 schools serving 5th graders randomly assigned to:– Treatment: implementation of the state’s benchmark

assessment system (n=34)– Control schools: business as usual (n=22)– Outcomes: Math and ELA ISAT scores

• Quasi-experimental comparison group drawn from all other schools in the state that served 5th grade students (n = 681)

• Rich set of student and school covariates with multiple waves of pretest data

Page 70: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 2: No Meaningful Correlation with Selection

Correlation with Selection into Benchmark Assessment System

Reading Pretest 0.041

Math Pretest -0.012

Page 71: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 2: Math Results

-.7 -.6 -.5 -.4 -.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

All covariates minus pretest

All covariates

Two or more pretest covariates

One pretest covariate

No covariates

Math

Page 72: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 2: ELA Results

-.7 -.6 -.5 -.4 -.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

All covariates minus pretest

All covariates

Two or more pretest covariates

One pretest covariate

No covariates

ELA

Page 73: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Correlation with Selection

Correlation with Selection into Vocabulary Training

Reading Pretest 0.169*

Math Pretest -0.090

Page 74: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Dataset 3: ELA Results where Pretest and Selection correlate

-.7 -.6 -.5 -.4 -.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

All covariates minus pretest

All covariates

Two or more pretest covariates

One pretest covariate

No covariates

ELA

Page 75: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Math Results where Pretest and Selection not correlate

-.7 -.6 -.5 -.4 -.3 -.2 -.1 0 .1 .2 .3Treatment effect (in sd units) relative to benchmark

All covariates minus pretest

All covariates

Two or more pretest covariates

One pretest covariate

No covariates

Math

Page 76: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Summary of Pretest Results

• Cannot assume the pretest is always related to selection even if it often is

• You should probably always include it • But consistent with the principle of fully

knowing the selection process, you should include it in a PS analysis predicated on measures guided by theoretical explication of all other plausible selection processes

Page 77: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Data-Bound Summary • No meta-analysis to date, but looks good for RD, CRD, and CITS• It also looks good for designing prospective studies and including

measures to account for multiple possible selection processes• Intact, local and focal matching with heterogeneous covariates

each sometimes reduce all bias, almost always reduce some bias, but likely best together in the form of hybrid matching

• Pretests do not always reduce bias, though they sometimes achieve this. They afford no guarantee, but should be a significant part of a bias reduction strategy with other sampling and covariate choice efforts.

• This presentation would be very different five years from now, not so much with respect to RD and ITS, but with respect to the non-equivalent control group design options worth disseminating and suppressing.

Page 78: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Broader Summary

• Let us all acknowledge that RCT is best in theory and not get into meaningless fights.

• Let’s ask: is the assumption warranted that the RCT is “far” superior for warranting causal inference?

• Is an evidence-based empirical rationale already emerging for including some QE studies as acceptable contributions to evidence-based policy suggestions?

Page 79: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Broader Summary

• The second assumption is that evidence-based policy will be better if we have more info about external validity so as to learn about robustness or conditions under which effect sizes vary for the same treatment and effect

• Will having more acceptable studies in our knowledge compendia promote external validity, an Achilles Heel of much evidence-based practice research in the social and educational domains at least?

Page 80: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

END and THANKS

Page 81: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Reliability of Construct Measurement Steiner, Cook & Shadish (2011)

• How important is the reliable measurement of constructs (given selection on latent constructs)?– Does including many covariates in the PS model

compensate for any one covariate’s unreliability? – We add measurement error to the observed

covariates in a simulation study– Assume that original set of covariates is measured

without error and removes 100% of selection bias– Systematically added measurement error such that the

reliability of each covariate was = .6, .7, .8, .9, 1.0

Page 82: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Mathematics: Reliability 1.0

11

1

1 1 1

11

11

1-40

-20

0

20

40

60

80

100

120

Bia

s R

ed

uct

ion

(%

)

22

2

22 2

22

22

2

3 3

3

33

3

33

33

3

44

4

44

4

44

44

4

1234

PS-stratificat.PS-ANCOVAPS-weightingANCOVA

11

1

1 1 1

22

2

22 2

3 3

3

33

3

44

4

44

4

44

4

44

4

44

44

4

all top pre dem aca psy likemath

prefmath

mathpre

likelit

vocabpre

11

1

1 1 1

22

2

22 2

3 3

3

33

3

44

4

44

4

44

44

4

Page 83: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Mathematics: Reliability .6

11

1

1 1 1

11

11

1-40

-20

0

20

40

60

80

100

120

Bia

s R

ed

uct

ion

(%

)

22

2

22 2

22

22

2

3 3

3

33

3

33

33

3

44

4

44

4

44

44

4

1234

PS-stratificat.PS-ANCOVAPS-weightingANCOVA

11

1

1 1 1

22

2

22 2

3 3

3

33

3

44

4

44

4

44

4

44

4

44

44

4

all top pre dem aca psy likemath

prefmath

mathpre

likelit

vocabpre

11

1

1 1 1

22

2

22 2

3 3

3

33

3

44

4

44

4

44

44

4

1 1

1

1 11

22

22

22

3 3

3

33

3

44

4

4 44

44

44

4

1 1

11 1

1

22

22

22

3 3

33

33

44

44

44

44

44

4

1 1

11 1

1

22

2 22

2

3 3

33

33

44

44 4

4

44

44

4

1 1

1 1 11

22

2 22

2

33

3 33 3

44

4 4 44

44

44

4

Page 84: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

The three untreated parallel segments of comparative regression discontinuity (CRD) design

Page 85: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Bias of CRD-Pre above cutoff only

Page 86: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Results: precision of CRD-Pre

Page 87: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Results: bias of CRD-CG above the cutoff

Page 88: Evidence-based Methods for improving Evidence-based Policy Thomas D. Cook, Northwestern University and Mathematica Policy Research, Inc. Funded by NSF.

Results: precision of CRD-CG