Impact Evaluation Measuring Impact Measuring Impact Sebastian Martinez Sebastian Martinez The World Bank The World Bank Impact Evaluation Methods for Policymakers Impact Evaluation Methods for Policymakers Note: slides by Sebastian Martinez. The content of this presentation reflects the views of the author, and not necessarily those of the World Bank. December 2007.
79
Embed
7. Measuring Impact (Martinez) Manila - World Bankpubdocs.worldbank.org/en/165451526071510608/14-Measuring... · Sebastian Martinez The World Bank Impact Evaluation Methods for Policymakers
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Impact Evaluation
Measuring ImpactMeasuring Impact
Sebastian MartinezSebastian MartinezThe World BankThe World Bank
Impact Evaluation Methods for PolicymakersImpact Evaluation Methods for Policymakers
Note: slides by Sebastian Martinez. The content of this presentation reflects the views of the author, and not necessarily those of the World Bank. December 2007.
Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
47
Discontinuities in Eligibility
Social programs many times target programs according to an eligibility index:Anti-poverty programs:
targeted to households below a given poverty index
Pension programs:targeted to population above a certain age
Scholarships: targeted to students with high scores on standardized
test
Hint: For a discontinuity design, you need:-Continuous eligibility index
-Clearly defined eligibility cut-off
48
Example:
Eligibility index (score) from 1 to 100 Based on pre-intervention characteristics
Score <=50 are eligibleScore >50 are not eligibleOffer treatment to eligibles
49
6065
7075
80O
utco
me
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Baseline
50
6065
7075
80O
utco
me
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Baseline
Not Eligible
Eligible
51
6570
7580
Out
com
e
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Post Intervention
52
6570
7580
Out
com
e
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Post Intervention
IMPACT
53
Oportunidades assigned benefits based on a poverty index
WhereTreatment = 1 if score <=750Treatment = 0 if score >750
Case 5: Discontinuity Design
54
Fitte
d va
lues
puntaje estimado en focalizacion276 1294
153.578
379.224
2
Baseline – No treatment
0 1 ( )i i iy Treatment scoreβ β δ ε= + + +
Case 5: Discontinuity Design
55
Fitte
d va
lues
puntaje estimado en focalizacion276 1294
183.647
399.51
Treatment Period
Case 5: Discontinuity Design
Estimated Impact on CPC
** Significant at 1% level
Case 5 - Regression DiscontinuityMultivariate Linear Regression
30.58**(5.93)
56
Potential Disadvantages of RD
Local average treatment effects – not always generalizablePower: effect is estimated at the discontinuity, so we generally have fewer observations than in a randomized experiment with the same sample size Specification can be sensitive to functional form: make sure the relationship between the assignment variable and the outcome variable is correctly modeled, including:
Nonlinear RelationshipsInteractions
57
Advantages of RD for Evaluation
RD yields an unbiased estimate of treatment effect at the discontinuityCan many times take advantage of a known rule for assigning the benefit that are common in the designs of social policy
No need to “exclude” a group of eligible households/individuals from treatment
Difference in Difference (Diff-in-diff)Matching (P-score matching)
59
Diff-in-Diff
Compare change in outcomes between treatments and non-treatment
Impact is the difference in the change in outcomes
Impact = (Yt1-Yt0) - (Yc1-Yc0)
60
TimeTreatment
Outcome
Treatment Group
Control Group
Average Treatment Effect
B
A
D
C
61
TimeTreatment
Outcome
Treatment Group
Control Group
EstimatedAverageTreatment Effect
Average Treatment Effect
62
Diff in diff
Fundamental assumption that trends (slopes) are the same in treatments and controlsNeed a minimum of three points in time to verify this and estimate treatment (two pre-intervention)
63
Case 6: Diff-in-Diff
Not Enrolled Enrolled t-statMean ∆CPC 8.26 35.92 10.31
Case 6 - Diff in Diff
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC 27.66** 25.53**(2.68) (2.77)
** Significant at 1% level
Case 6 - Diff in Diff
64
Case Study
Case 1 - Before and After
Case 2 - Enrolled/Not
Enrolled
Case 3 - Randomization
Case 4 - IV (TOT)
Case 5 - Regression
Discontinuity
Case 6 - Diff in Diff
Multivariate Linear
RegressionMultivariate Linear
Regression
Multivariate Linear
Regression 2SLS
Multivariate Linear
Regression
Multivariate Linear
RegressionEstimated Impact on CPC 34.28** -4.15 29.79** 30.44** 30.58** 25.53**
(2.11) (4.05) (3.00) (3.07) (5.93) (2.77)** Significant at 1% level
Pick up the ideal comparison that matches the treatment group from a larger survey.The matches are selected on the basis of similarities in observed characteristicsThis assumes no selection bias based on unobservable characteristics.
Source: Martin Ravallion
67
Propensity-Score Matching (PSM)
Controls: non- participants with same characteristics as participants
In practice, it is very hard. The entire vector of X observed characteristics could be huge.
Rosenbaum and Rubin: match on the basis of the propensity score=
P(Xi) = Pr (Di=1|X)Instead of aiming to ensure that the matched control for each participant has exactly the same value of X, same result can be achieved by matching on theprobability of participation.This assumes that participation is independent of outcomes given X.
68
Steps in Score Matching
1. Representative & highly comparables survey of non-participants and participants.
2. Pool the two samples and estimated a logit (or probit) model of program participation.
3. Restrict samples to assure common support(important source of bias in observational studies)
4. For each participant find a sample of non-participants that have similar propensity scores
5. Compare the outcome indicators. The difference is the estimate of the gain due to the program for that observation.
6. Calculate the mean of these individual gains to obtain the average overall gain.
69
Density of scores for participants
Density
0 1Propensity score
Region of common support
70
PSM vs an experiment
Pure experiment does not require the untestable assumption of independence conditional on observablesPSM requires large samples and good data
71
Lessons on Matching Methods
Typically used when neither randomization, RD or other quasi-experimental options are not possible (i.e. no baseline)
Be cautious of ex-post matchingMatching on endogenous variables
Matching helps control for OBSERVABLE heterogeneityMatching at baseline can be very useful:
Estimation:combine with other techniques (i.e. diff in diff)Know the assignment rule (match on this rule)
Case 7 - PROPENSITY SCORE: Pr(treatment=1)Variable Coef. Std. Err.
Age Head -0.0282433 0.0024553Educ Head -0.054722 0.0086369Age Spouse -0.0171695 0.0028683Educ Spouse -0.0643569 0.0093801Ethnicity 0.4166998 0.0397539Female Head -0.2260407 0.0714199_cons 1.6048 0.1013011
Case 7: P-Score Matching
73
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC 1.16 7.06+(3.59) (3.65)
** Significant at 1% level, + Significant at 10% level
Objective of impact evaluation is to estimate the CAUSAL effect of a program on outcomes of interestIn designing the program we must understand the data generation process
behavioral process that generates the datahow benefits are assigned
Fit the best evaluation design to the operational context
78
Appendix 1:Two Stage Least Squares (2SLS)
Model with endogenous Treatment (T):
Stage 1: Regress endogenous variable on the IV (Z) and other exogenous regressors
Calculate predicted value for each observation: T hat
1 2y T xα β β ε= + + +
0 1 1T x Zδ δ θ τ= + + +
79
Appendix 1Two stage Least Squares (2SLS)
Stage 2: Regress outcome y on predicted variable (and other exogenous variables)
Need to correct Standard Errors (they are based on T hat rather than T)
In practice just use STATA - ivregIntuition: T has been “cleaned” of its correlation with ε.