Top Banner

of 32

Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Feb 23, 2016

ReportDownload

Documents

makala

Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings. Michael Woolcock Development Research Group, World Bank Kennedy School of Government, Harvard University mwoolcock@worldbank.org InterAction May 13, 2013. Overview. Background - PowerPoint PPT Presentation

But How Generalizable is That? A Primer for Development Practitioners and Policymakers on the External Validity of Complex Interventions

Assessing Threats to Validity and Implications for Use of Impact Evaluation FindingsMichael WoolcockDevelopment Research Group, World BankKennedy School of Government, Harvard Universitymwoolcock@worldbank.org

InterActionMay 13, 2013OverviewBackgroundThe art, science and politics of evaluationForms and sources of validityConstruct, Internal, ExternalApplications to complex interventionsWith a focus on External Validity If it works there, will it work here?Expanding range of ideas, methods and strategies[T]he bulk of the literature presently recommended for policy decisions cannot be used to identify what works here. And this is not because it may fail to deliver in some particular cases [; it] is not because its advice fails to deliver what it can be expected to deliver The failing is rather that it is not designed to deliver the bulk of the key facts required to conclude that it will work here.

Nancy Cartwright and Jeremy Hardie (2012) Evidence-Based Policy: A Practical Guide to Doing it Better (New York: Oxford University Press, p. 137)

Contesting Development Participatory Projects and LocalConflict Dynamics in IndonesiaPATRICK BARRONRACHAEL DIPROSEMICHAEL WOOLCOCKYale University Press, 20114The art, science and politics of evaluationThe ArtSensibility, experienceOptimizing under (numerous) constraintsTaking implementation, monitoring, context seriouslyThe ScienceSkills, theoryModes of causal reasoning (statistical, logical, legal), timeand the PoliticsCompetence and confidence under pressurePicking battlesMaking, assessing impact claims Quality of empirical knowledge claims turns onConstruct validityDo key concepts (property rights, informal) mean the same thing to different people? What gets lost in translation?Internal validityIn connecting cause (better schools) and effect (smarter children), have we considered other factors that might actually be driving the result (home environment, community safety, cultural norms)? Programs rarely placed randomly 3. assessed against a theory of changeSpecification of how projects components (and their interaction) and processes generate outcomesReasoned Expectations: where by when?External validity (how generalizable are the claims?)If it works here, will it work there? If it works with this group, will it work with that group? Will bigger be better? 1. Construct validityAsking, answering and interpreting questionsTo what extent do all parties share similar understandings of key concepts? E.g., Poverty, ethnicity, violence, justiceCan be addressed using mixed methods:Iterative field testing of questionnaire items, and their sequencingNOT cut-and-paste from elsewhereAnchoring vignettes (Gary King et al)Assessing quality of government in China and Mexico2. Internal validityIn Evaluation 101, we assume

Impact = f (Design) | Selection, Confounding Variables

Adequate for simple interventions with a good-enough counterfactual.

But this is inadequate for assessing complex interventions: * design is multi-faceted (i.e., many moving parts)* interaction with context is pervasive, desirable * implementation quality is vital* trajectories of change are probably non-linear (perhaps unknowable ex ante)Pervasive problemSuch projects are inherently very complex, thus:Very hard to isolate true impactVery hard to make claims about likely impact elsewhereUnderstanding how (not just whether) impact is achieved is also very importantProcess Evaluations, or Realist Evaluations, can be most helpful (see work of Ray Pawson, Patricia Rogers et al)Mixed methods, theory, and experience all crucial for investigating these aspectsEvaluating complex projectsImpact = f ([DQ, CD], SF) | SE, CV, RE

DQ = Design quality (weak, strong)CD = Causal density (low, high)SF = Support factors: Implementation, ContextSE = Selection effects (non-random placement, participation)CV = Confounding variablesRE = Reasoned expectations (where by when?)

In Social Development projects (cf. roads, immunizations):* CD is high, loose, often unobserved (unobservable?)* Implementation and context are highly variable* RE is often unknown (unknowable?)Evaluating complex projectsImpact = f ([DQ, CD], SF) | SE, CV, RE

DQ = Design quality (weak, strong)CD = Causal density (low, high)SF = Support factors: Implementation, ContextSE = Selection effects (non-random placement, participation)CV = Confounding variablesRE = Reasoned expectations (where by when?)

In Social Development projects (cf. roads, immunizations):* CD is high, loose, often unobserved (unobservable?)* Implementation and context are highly variable* RE is often unknown (unknowable?)Timet = 0t = 1Net Impact3. Theory of Change, Reasoned Expectations: Understanding impact trajectoriesTimet = 0t = 1Net ImpactUnderstanding impact trajectoriesSame impact claim, but entirely a function of when the assessment was doneTimet = 0t = 1Net ImpactUnderstanding impact trajectoriesABCIf an evaluation was done at A or B, what claims about impact would be made?Timet = 0t = 1Net ImpactUnderstanding impact trajectoriesABC?Dt = 24. External Validity: Some BackgroundRising obsession with causality, RCTs as gold standardPushed by donors, foundations (e.g., Gates), researchersCampbell Collaboration, Cochrane Collaboration, NIJ, JPAL, et alFor busy policymaker, warehouses of interventions that workYet also serious critiquesIn medicine: Rothwell (2005), Groopman (2008)In philosophy: Cartwright (2011)In economics: Deaton (2010), Heckman (1992), Ravallion (2009)Reddy (2013) on Poor Economics: from rigor to rigor mortis; a radical approach to defining development down, delimiting innovation spaceespecially as it pertains to external validityNYT (2013), Engber (2011) on Black 6 (biomedical research)Heinrich et al (2011) on WEIRD people (social psychology)Across time, space, groups, scale, units of analysisand understanding of mechanismsTrue science of delivery requires knowledge of how, not just whether, something works (Cartwright and Hardie 2012)

Evaluating complex projectsImpact = f ([DQ, CD], SF) | SE, CV, RE

DQ = Design quality (weak, strong)CD = Causal density (low, high)SF = Support factors: Implementation, ContextSE = Selection effects (non-random placement, participation)CV = Confounding variablesRE = Reasoned expectations (where by when?)

In Social Development projects (cf. roads, immunizations):* CD is high, loose, often unobserved (unobservable?)* IE and CC are highly variable* RE is often unknown (unknowable?)From IV to EV, simple to complexCausal densitySupport factorsImplementationContextReasoned expectations

Central claim: the higher the interventions complexity, the lower its external validity

1. Causal density Which way up? RCTs vs QICs

Eppstein et al (2012) Searching the clinical fitness landscape PLoS ONE: 7(11): e49901How simple or complex is your policy/project? Specific questions to ask:To what extent does producing successful outcomes from your policy/project requirethat the implementing agents make finely based distinctions about the state of the world? Are these distinctions difficult for a third party to assess/verify?Local discretionmany agents to act or few, over extended time periods?Transaction intensitythat the agents resist large temptations/pressures to do something besides implement the policy?High stakesthat agents innovate to achieve desired outcomes?Known technology

Classification of activities in healthLocal Discretion?Transaction intensive?Contentious, temptations to do otherwise?Known technology?Iodization of saltNoNoNoYesVaccinationsNoYesNoYesAmbulatory curative careYesYesNo(ish)YesRegulation of private providersYesYesYesYesEncouraging preventive healthYesYesNoNoTechnocratic (implementation light; policy decree)Logistical (implementation intensive, but easy)Implementation Intensive Downstream (of services)Implementation Intensive Upstream (of obligations)Complex (implementation intensive, motivation hard), need (continuous?) innovation2. Implementation:Using RCTs to test EV of RCTsBold, Sandefur et al (2013)Take a project (contract teachers) with a positive impact from India, as determined by an RCTto Kenya; 192 schools randomly split into three groups to receive a contract teacher:a control groupthrough an NGO (World Vision)through the MoEResult? Implementation matters (a lot)

Bold et al (2013)The fact is that RCTs come at the end, when you have already decided that it will probably work, here and maybe anywhere To know that this is a good bet, you have to have thought about causal roles and support factors [A]nswering the how question is made easier in science by background knowledge of how things work.

Nancy Cartwright and Jeremy Hardie (2012) Evidence-Based Policy: A Practical Guide to Doing it Better (New York: Oxford University Press, p. 125)Timet = 0t = 1ImpactLearning from intra-project variationABTimet = 0t = 1ImpactLearning from intra-project variationComplex projectsTimet = 0t = 1ImpactIterative, adaptive learningLearning from intra-project variationComplex projectsPutting it all togetherProject Design FeaturesTechnocraticLogisticalImplementation Intensive (Downstream)Implementation Intensive(Upstream)ComplexImplementation QualityStrongWeakStrongWeakStrongWeakStrongWeakStrongWeakContext Compatibility+-+-+-+-+-+-+-+-+-+-External ValidityHighLowEven with low EV interventions, the ideas and processes behind them may still travel well Putting it all togetherProject Design FeaturesTechnocraticLogisticalImplementation Intensive (Downstream)Implementation Intensive(Upstream)ComplexImplementation QualityStrongWeakStrongWeakStrongWeakStrongWeakStrongWeakContext Compatibility+-+-+-+-+-+-+-+-+-+-External ValidityHighLowEven with low