YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Michael WoolcockDevelopment Research Group, World Bank

Kennedy School of Government, Harvard [email protected]

InterActionMay 13, 2013

Page 2: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Overview

• Background• The art, science and politics of evaluation• Forms and sources of validity

– Construct, Internal, External• Applications to ‘complex’ interventions

– With a focus on External Validity • If it works there, will it work here?

• Expanding range of ideas, methods and strategies

Page 3: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

[T]he bulk of the literature presently recommended for policy decisions… cannot be used to identify ‘what works here’. And this is not because it may fail to deliver in some particular cases [; it] is not because its advice fails to deliver what it can be expected to deliver… The failing is rather that it is not designed to deliver the bulk of the key facts required to conclude that it will work here.

Nancy Cartwright and Jeremy Hardie (2012) Evidence- Based Policy: A Practical Guide to Doing it Better (New York: Oxford University Press, p. 137)

Page 4: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Contesting Development Participatory Projects and LocalConflict Dynamics in Indonesia

PATRICK BARRONRACHAEL DIPROSEMICHAEL WOOLCOCK

Yale University Press, 2011

Page 5: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

The art, science and politics of evaluation

• The Art…– Sensibility, experience– Optimizing under (numerous) constraints– Taking implementation, monitoring, context seriously

• The Science…– Skills, theory

• Modes of causal reasoning (statistical, logical, legal), time

• …and the Politics– Competence and confidence under pressure– Picking battles…

Page 6: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Making, assessing impact claims Quality of empirical knowledge claims turns on…

1. Construct validity• Do key concepts (‘property rights’, ‘informal’) mean the same thing to

different people? What gets “lost in translation”?

2. Internal validity…• In connecting ‘cause’ (better schools) and ‘effect’ (smarter children), have

we considered other factors that might actually be driving the result (home environment, community safety, cultural norms)? Programs rarely placed randomly…

3. …assessed against a ‘theory of change’• Specification of how project’s components (and their interaction) and

processes generate outcomes• Reasoned Expectations: where by when?

4. External validity (how generalizable are the claims?)• If it works here, will it work there? If it works with this group, will it work

with that group? Will bigger be better?

Page 7: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

1. Construct validity• Asking, answering and interpreting questions

– To what extent do all parties share similar understandings of key concepts?

• E.g., ‘Poverty’, ‘ethnicity’, ‘violence’, ‘justice’…

• Can be addressed using mixed methods:– Iterative field testing of questionnaire items, and their

sequencing• NOT cut-and-paste from elsewhere

– ‘Anchoring vignettes’ (Gary King et al)• Assessing “quality of government” in China and Mexico

Page 8: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

2. Internal validityIn Evaluation 101, we assume…

Impact = f (Design) | Selection, Confounding Variables

Adequate for ‘simple’ interventions with a ‘good-enough’ counterfactual.

But this is inadequate for assessing ‘complex’ interventions: * design is multi-faceted (i.e., many ‘moving parts’)* interaction with context is pervasive, desirable * implementation quality is vital* trajectories of change are probably non-linear (perhaps unknowable ex ante)

Page 9: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Pervasive problem• Such projects are inherently very complex, thus:

– Very hard to isolate ‘true’ impact– Very hard to make claims about likely impact elsewhere– Understanding how (not just whether) impact is

achieved is also very important• Process Evaluations, or ‘Realist Evaluations’, can be most

helpful (see work of Ray Pawson, Patricia Rogers et al)• Mixed methods, theory, and experience all crucial for

investigating these aspects

Page 10: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Evaluating ‘complex’ projects

Impact = f ([DQ, CD], SF) | SE, CV, REDQ = Design quality (weak, strong)CD = Causal density (low, high)SF = Support factors: Implementation, ContextSE = Selection effects (non-random placement, participation)CV = Confounding variablesRE = Reasoned expectations (where by when?)

In Social Development projects (cf. roads, immunizations):* CD is high, loose, often unobserved (unobservable?)* Implementation and context are highly variable* RE is often unknown (unknowable?)

Page 11: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Evaluating ‘complex’ projects

Impact = f ([DQ, CD], SF) | SE, CV, REDQ = Design quality (weak, strong)CD = Causal density (low, high)SF = Support factors: Implementation, ContextSE = Selection effects (non-random placement, participation)CV = Confounding variablesRE = Reasoned expectations (where by when?)

In Social Development projects (cf. roads, immunizations):* CD is high, loose, often unobserved (unobservable?)* Implementation and context are highly variable* RE is often unknown (unknowable?)

Page 12: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Net Impact

3. Theory of Change, Reasoned Expectations: Understanding impact trajectories

Page 13: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Net Impact

Understanding impact trajectories

“Same” impact claim, but entirely a function of when the assessment was done…

Page 14: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Net Impact

Understanding impact trajectories

A

B C

If an evaluation was done at ‘A’ or ‘B’, what claims about impact would be made?

Page 15: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Net Impact

Understanding impact trajectories

A

B C

?

D

t = 2

Page 16: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

4. External Validity: Some Background• Rising obsession with causality, RCTs as ‘gold standard’

– Pushed by donors, foundations (e.g., Gates), researchers• Campbell Collaboration, Cochrane Collaboration, NIJ, JPAL, et al

– For “busy policymaker”, “warehouses” of interventions that “work”

• Yet also serious critiques…– In medicine: Rothwell (2005), Groopman (2008)– In philosophy: Cartwright (2011)– In economics: Deaton (2010), Heckman (1992), Ravallion (2009)

• Reddy (2013) on Poor Economics: “from rigor to rigor mortis”; a radical approach to defining development down, delimiting innovation space

• …especially as it pertains to external validity…– NYT (2013), Engber (2011) on ‘Black 6’ (biomedical research)– Heinrich et al (2011) on ‘WEIRD’ people (social psychology)– Across time, space, groups, scale, units of analysis

• …and understanding of mechanisms– True “science of delivery” requires knowledge of how, not just whether,

something ‘works’ (Cartwright and Hardie 2012)

Page 17: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Evaluating ‘complex’ projects

Impact = f ([DQ, CD], SF) | SE, CV, REDQ = Design quality (weak, strong)CD = Causal density (low, high)SF = Support factors: Implementation, ContextSE = Selection effects (non-random placement, participation)CV = Confounding variablesRE = Reasoned expectations (where by when?)

In Social Development projects (cf. roads, immunizations):* CD is high, loose, often unobserved (unobservable?)* IE and CC are highly variable* RE is often unknown (unknowable?)

Page 18: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

From IV to EV, ‘simple to ‘complex’

1. Causal density2. Support factors

– Implementation– Context

3. Reasoned expectations

Central claim: the higher the intervention’s complexity, the lower its external validity

Page 19: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

1. ‘Causal density’

Which way up? RCTs vs QICs

Eppstein et al (2012) “Searching the clinical fitness landscape” PLoS ONE: 7(11): e49901

Page 20: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

How ‘simple’ or ‘complex’ is your policy/project? Specific questions to ask:

• To what extent does producing successful outcomes from your policy/project require…– that the implementing agents make finely based distinctions about

the “state of the world”? Are these distinctions difficult for a third party to assess/verify?

• Local discretion– many agents to act or few, over extended time periods?

• Transaction intensity– that the agents resist large temptations/pressures to do something

besides implement the policy?• High stakes

– that agents innovate to achieve desired outcomes?• Known technology

Page 21: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Classification of “activities” in healthLocal Discretion?

Transaction intensive?

Contentious, ‘temptations’ to do otherwise?

Known technology?

Iodization of salt

No No No Yes

Vaccinations No Yes No Yes

Ambulatory curative care

Yes Yes No(ish) Yes

Regulation of private providers

Yes Yes Yes Yes

Encouraging preventive health

Yes Yes No No

Technocratic (implementation light; policy decree)

Logistical (implementation intensive, but easy)

Implementation Intensive ‘Downstream’ (of services)

Implementation Intensive ‘Upstream’ (of obligations)

Complex (implementation intensive, motivation hard), need (continuous?) innovation

Page 22: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

2. Implementation:Using RCTs to test EV of RCTs

• Bold, Sandefur et al (2013)– Take a project (contract teachers) with a positive

impact from India, as determined by an RCT…– …to Kenya; 192 schools randomly split into three

groups to receive a contract teacher:• a control group• through an NGO (World Vision)• through the MoE

– Result?

Page 23: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Implementation matters (a lot)

Bold et al (2013)

Page 24: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

The fact is that RCTs come at the end, when you have already decided that it will probably work, here and maybe anywhere… To know that this is a good bet, you have to have thought about causal roles and support factors… [A]nswering the how question is made easier in science by background knowledge of how things work.

Nancy Cartwright and Jeremy Hardie (2012) Evidence-Based Policy: A Practical Guide to Doing it Better (New York: Oxford University Press, p. 125)

Page 25: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Impact

Learning from intra-project variation

A

B

Page 26: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Impact

Learning from intra-project variation‘Complex’ projects

Page 27: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Timet = 0 t = 1

Impact

Iterative, adaptive learning

Learning from intra-project variation‘Complex’ projects

Page 28: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Putting it all togetherProject Design

Features Technocratic LogisticalImplementation

Intensive (‘Downstream’)

Implementation Intensive

(‘Upstream’)Complex

Implementation Quality Strong Weak Strong Weak Strong Weak Strong Weak Strong Weak

Context Compatibility + - + - + - + - + - + - + - + - + - + -

External Validity

High

Low

Even with low EV interventions, the ideas and processes behind them may still travel well

Page 29: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Putting it all togetherProject Design

Features Technocratic LogisticalImplementation

Intensive (‘Downstream’)

Implementation Intensive

(‘Upstream’)Complex

Implementation Quality Strong Weak Strong Weak Strong Weak Strong Weak Strong Weak

Context Compatibility + - + - + - + - + - + - + - + - + - + -

External Validity

High

Low

Even with low EV interventions, the ideas and processes behind them may still travel well

Page 30: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Putting it all togetherProject Design

Features Technocratic LogisticalImplementation

Intensive (‘Downstream’)

Implementation Intensive

(‘Upstream’)Complex

Implementation Quality Strong Weak Strong Weak Strong Weak Strong Weak Strong Weak

Context Compatibility + - + - + - + - + - + - + - + - + - + -

External Validity

High

LowUtility of case studies, of process evaluations, of MM

High

Low

Even with low EV interventions, the ideas and processes behind them may still travel well

Page 31: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Implications• Take the analytics of knowledge claims surrounding EV as seriously

as we do IV• Engage with the vast array of social science tools available for

rigorously assessing complex interventions– Within and beyond economics

• RCTs as one tool among many• New literature on case studies (Mahoney), QCA (Ragin), Complexity

– See especially ‘realist evaluation’ (Pawson, Tilly)• Make implementation cool; it really matters…

– Learning from intra-project variation; projects themselves as laboratories, as “policy experiments” (Rondinelli 1993)

• ‘Science of delivery’ must know how, not just whether, interventions work (mechanisms, theory of change)

• Especially important for engaging with ‘complex’ interventions

• Need ‘counter-temporal’ (not just counterfactual)– Reasoned expectations about what and where, by when?

Page 32: Assessing Threats to Validity and Implications for Use of Impact Evaluation Findings

Primary source material• Bamberger, Michael, Vijayendra Rao and Michael Woolcock (2010) ‘Using

Mixed Methods in Monitoring and Evaluation: Experiences from International Development’, in Abbas Tashakkori and Charles Teddlie (eds.) Handbook of Mixed Methods (2nd revised edition) Thousand Oaks, CA: Sage Publications, pp. 613-641

• Barron, Patrick, Rachael Diprose and Michael Woolcock (2011) Contesting Development: Participatory Projects and Local Conflict Dynamics in Indonesia New Haven: Yale University Press

• Pritchett, Lant, Salimah Samji and Jeffrey Hammer (2012) ‘It’s All About MeE: Using Experiential Learning to Navigate the Design Space’ Center for Global Development Working Paper No.

• Woolcock, Michael (2009) ‘Toward a Plurality of Methods in Project Evaluation: A Contextualized Approach to Understanding Impact Trajectories and Efficacy’ Journal of Development Effectiveness 1(1): 1-14

• Woolcock, Michael (forthcoming) ‘Using Case Studies to Explore the External Validity of Complex Development Interventions’ Evaluation


Related Documents